 Okay, so let's get started with today's lesson and let's see what Jan likes to do research on. All right, so today we are gonna be talking about model predictive policy learning with uncertainty regularization for driving in dense traffic. Oh, what a mouthful. The nice part is that in roughly 50 minutes you're gonna be able to understand every word in this title and you actually should be able to even be ready to implement this because we basically know all the basic components that we have covered so far. And so this is just, you know, put together something perhaps not in a trivial way, but I don't think it's too crazy. So this is work done by my friend and colleague, Mikael Hennath, myself, and then Jan here at Corant a few years back. I think it was 2019. So let's see here, I think. Oh, maybe the year before, I don't know. All right, so let's see how you can learn how to drive the model-free reinforcement learning way, right? So pay attention to the car here in the back, this black guy. So let's say I'd like to train this guy with model-free reinforcement learning. So how would you go about that? So you would have to try to do things that are maybe not good. And then you figure, oh, you shouldn't do those things, right? Because it's not good. And so you have to die a few times, right? Before actually learning not to die, but that's arguably not the way you learn how to drive, right? Especially if you're driving your parent's car and you don't really want to crash your parent's car before learning how to not crash your parent's car, right? So let's figure out a more principle way to learn how to drive a car, right? So I would argue here, and this is just my intuition, is that if you have a car, now you're driving 100 kilometers per hour, which is like 30 meters per second. If you look 30 meters in front of you, that will mean that you look one second in the future, right? And therefore you can see that, the road center turns slightly to the left in one second in the future. Therefore I would like to change the steering wheel right now, such that in one second, I will be following the trajectory and where the street takes me to, okay? And so here I'm just trying to push forward the idea that we need to look in the future in order to be able to make some sort of nice plan of action, right? So you'd like to figure out that if something is not good, you may not want to do it, such that you don't get into trouble. Cool. So, but what's the main problem here? Others, other people are a problem, right? So here you have that, you know, the equals around you are quite not deterministic. Therefore it might be slightly hard to take in account every possible thing that might happen, right? So let me give you like an introduction here or a small recap of the, what are the main components in this system here. So we have a agent here represented by this pink brain, which gets as input a state, ST, and produces an action as an output, which is AT, which is, for example, my steering control and my acceleration or braking signal. Moreover, I observe a cost, which is, you know, a consequence of taking a specific action, A, given that I find myself in state, ST, okay? So far it should be, okay, let me see, there are chat messages here. Let me see what's going on. Oh, okay, people are laughing, cool. All right, on the other side, you had the real world. The real world, given an internal state, you get a new action, and then you produce the new state, okay, and also you produce what is the result, this consequence CT to provide to your agent. And so this is how, you know, how you can have like a network interacting with the real world. You take actions given a specific state and the world gives you the next state and the next consequence. So this is model free because you interact with the real world, but then the nice part is that you can do interaction with a model of the world, okay? So on the left hand side, instead of trying things in the real world and, you know, try to cook and you burn something and then that's not good. You burned also your hand. I burned my arm last day making cookies. Maybe you would like to try in your mind first, how can I make cookies without getting burned? Maybe don't touch the oven with your hands, right? That would be a very smart option, right? All right, so how can we get, how can we think, right? How can we ponder? How can we make this kind of interaction between my actions and the expected consequences, you know? Of taking a specific action without actually taking the action, right? So how can I avoid to burn myself again tonight while baking without actually getting burned, right? So you want to think ahead, don't touch what's hot, okay? All right, so how do we train this world model? The same way we trained it last week, right? So we start with initial state. We provide some action, which was random last week and this week instead is not. For example, the action here might be the action taken by some expert that we observed, like mom cooking for you, of dad, I don't know. And you can see what is the current state and then you can have the next state. Dinner's ready, I'm hungry. I'm actually hungry, damn. All right, moreover, you also had a consequence, CT. They're gonna be, you know, your mouth is watering. Anyhow, you're gonna be providing now like a distance between the state that are observed in the real world and the state that are provided to you by the model of the world. And then you have an MSC loss, right? So that's just regression. So you try to regress what is the next state given that you start from the initial same state and you provide a specific action. Okay, so far, this is something we already seen last week. I'll just give you an overview again, right? All right, keep going if you don't complain. So let's figure out here what we can do. For example, in my case, in this case, I don't have my model outputting a cost or I don't have the word outputting a cost. And more specifically, I have that in my case, I will have my brain here, my agent, that again takes a state and provides an action, which is feeding this model. And then I have my cost is going to be a differentiable function of the state, okay? Which is exactly what we had seen last week, right? With the final destination to be, you know, the specific thing that you run back propagation through time and back propagation through time would be unrolling this loop, right? So you go forward, you go like that, and then you do a back prop, you go, boom, boom, boom, boom. Okay, all right, cool. So let me introduce to you right now the dataset, right? So far, that was, you know, just like setting up the problem. So here is my actual real case scenario. I have six cameras, the seven cameras mounted on the top of a 30 story building, facing the IAT interstate segment of the highway, right? And so here I have these cameras recording these cars. First part is gonna be fixing the perspective, right? Such that I can have like a hovering view or like a top down view. Moreover, here we extract some bounding boxes for each vehicle, right? So there is detection, there is regression of the, you know, well, detection and figuring out the size of these bounding boxes. And then there is tracking because these bounding boxes are following my cars. And so you can see here the red track on the left side, or you can see like the red pickup car, a pickup truck that is, you know, from both views or from the top camera down to one single view. Here I can input these cars into my, you know, Python program, Python game emulator, game engine. And I can draw this representation. In this case, you can still see the red track, there, the pickup, pickup truck, and then the bus on the right hand side. So here for every, each and every vehicle, the blue, the cyan one, I have two vectors, the vector PT, which is representing the position of the vehicle, the rear part. VT, which is the velocity, which is, you know, the vector representing the VX and VY component. And then moreover, I, given that I know what is the kinematics of driving a car or a bicycle, I can invert the kinematics of these cars that have been driven by, you know, experts. And I can figure out what the actions are, that, you know, what the actions of the driver are in the sense that if the car moves with a rectilinear uniform motion, you have no action, right? So if you don't apply any acceleration, which is longitudinal or transverse, you have, you know, the car keeps going. So if you have a rectilinear uniform motion, there is no action. Every time you try to diverge from this, you know, motion, for example, you accelerate, you brake, or you steer, then you have some basically, you know, action involved, right? And you can invert the kinematic model in order to figure out what the action is. Cool. So here, again, in this representation, which is the thing that I call machine representation, I have, again, the same vehicle there, the other pick-up car and the one on the right-hand side. So in this case, each vehicle here has a bounded box, which is representing my viewable area. So each car can only view that kind of box around itself. And so, for example, here I can extract the first box where I replace myself in the center and I move myself to the blue channel, okay? Such that I'm not like I make myself different from the others. And the red channel represents the lane and the green channel represents the other vehicles. You have the other guy here, this one here, that one and the last one, okay? And here you can see, again, the pick-up track there, the red one, and then here you have a piece of the bus. So these are my images, IT, or observations, right? These are representing two things, the state of the lanes, of the street, basically. And the second part is gonna be to represent the traffic situation surrounding me. So overall, the set of PT, the position, VT, the velocity, and IT in this observation represent my state, ST. So ST represents the current state at a specific time, T, right? For my given vehicle. So far, any question? Or is it clear? I mean, we already seen basically everything so far last week and I just gave you an overview about the specific data set. So there are no new concepts so far. So it should be clear, right? Yeah, no? Okay, you're very quiet today. Maybe it's my audio on. Okay, you're not texting, so I expect you to be okay. Yes, it's clear. Okay, thank you. All right, cool. So the cost, right? So we defined before, we said before that my cost is gonna be a function of my state. So let's see how I compute this cost. There are two different costs. There is a lane cost that is basically telling me whether I am on the lane, like within the lane, like inside the lane, or I am going off lane, off road. And the other one is gonna be a cost that is gonna be telling me how close I am to other vehicles. So the first one, it looks like this. On my y-axis, so x-axis, the direction of motion, y-axis is gonna be the one that is 90 degrees to the left. And therefore, you can think about having a potential that is like a house on top of you. If you overlay this with the red channel, there is some intersection on the left hand side. This intersection, the height of that intersection will go to zero if you are exactly in the center of the two lanes. If you are shifted towards one side, you're gonna get some non-zero intersection. And if you're exactly on top of the lane, you get exactly the value on the top of the triangle. On the other side, you have the proximity cost. So I have exactly the same, but for the other vehicles. So in this case, I have one longitudinal, sorry, transverse potential. I have one longitudinal potential, which is changing the length with the speed. So the faster I go and the more I'd like to look ahead and behind in this case, and the slower I go, I can, you know, I don't really care so much about things that are too far. So I just look close to myself. So we can plug these two things in my environment. You can see now that there is an intersection. For example, here, that is pretty high for the purple because we are exactly in front of us. But then the orange is quite low because it's, you know, further away in the front. So you can simply do the multiplication of the two. And you can get what is my current proximity cost. Okay. So how does this look right now? I can show you, for example, for a situation where we go at 20 kilometers per hour, you have all these vehicles are very close. So each other, and then if you go at 50 kilometers per hour, you know, on average, everyone is a bit further away. So if you multiply that potential that is in the Y and the other one that was in the X, you get something that looks like this, okay? And in the case that we go at a higher speed, you're gonna get something like this because again, the extension in the X direction depends on the speed. And you can tell like that this, the right one is further far reaching front and back. Cool. So how do we get this final cost? Well, my final cost right now, at least as we published this paper, we just multiply this potential mask with the green channel. And then we pick the max. Basically, we figure out which one is the closest car or closest, the part closest to us that belongs to some other object. And so if you multiply element-wise multiplication of these two guys, and then you take the max, you get a number, and the cool part is that differentiable, right? So now you can run gradients through the network such that you can compute some actions such that that value overall is gonna be reduced and it goes to zero, such that you avoid collisions. All right, so let me give you now the outline of the talk of the lesson for today. So first we said we had to learn how to mimic the word, right? So that was pretty abstract so far. Now we can start getting concrete tools and information. So first we want to learn and mimic the real world. Second part, we'd like to use the learned model of the environment in order to train this agent by making thinking how to drive. So first you learn how other vehicles behave in the real world. Second part, given that you have an understanding of how other people interact, you try to think, oh, what would happen if I perform such and such action in this condition? Finally, we can figure out what is a nice way to evaluate a possible way to evaluate this policy back in the real world, right? So once you've thought how to drive, let's figure out whether you can drive, right? Cool. So let's get started with the first part, the word model, predicting what's next, given history and action. So this is gonna be something that you already have seen a few lessons back, but let me give you, again, like a full picture here. So we have a word model, which is fed with my S1 to T, which is a sequence of states. And each state is represented by a position vector, Pt, a velocity vector, Pt, and these context images, it, right? These observations. So these are a set of things. Of course, as you can tell, you will have different, you know, different input branches in your network, right? Because there are different kind of data. One is, you know, four dimensional vector, the other is an image. So you'd like to use a convolutional net. Moreover, this word model gets an action. And the word model will produce a prediction for the next action. On the other side, you had the real word, which is telling you, well, this happened instead. Okay, so that's the target. Okay, so how do we train this stuff? As we said, it's just a regression problem. So we just train with MSC, right? So we have my state, like a sequence of states, an action. We provide this to this predictor module. The predictor gives me some kind of hidden representation of the, you know, of whatever future. Then I have a decoder, which is decoding this hidden representation of the future. And this one should give me a prediction, right? So this is pretty straightforward, right? You have a predictor that predicts the past into the hidden representation of the future. They have a decoder, which is decoding the hidden representation of the future into the actual future. So we have a target on the other side. So all you need to do is going to be having this tool going inside MSC. And then you minimize the MSC by training these two modules. So does it work? What's the action here? So the action here is, for example, the acceleration and the steering command that we have observed in our dataset, right? So my states S1 to T are a sequence of, you know, observations of position velocities and context images. And the action are the action taken by the driver that I obtained by inverting the kinematic model of the car. Pt is the position of the car and Vt is the velocity of the car. So you have position, velocity and the AT is the acceleration, right? So Pt, X and Y position. Vt, X and Y velocity. AT, X and Y acceleration, basically. Is it preferred to add a decoder instead simply using a predictor to improve the accuracy? So the predictor predicts what's gonna be the hidden state of the future, okay? So the predictor gets the past and compress it and tries to give you what is the future, you know, the future hidden representation. Then you have this code, you'd like to decode it, right? This is just, you know, one neural net, but we'd like to separate those in two blocks. Okay. So the action at AT is calculated from ST plus one. Yeah, so these actions are the ground truth action coming from the, yeah, from the ground truth. Thanks. How do we calculate ST? We don't, how do we get actual ST? Oh, the actual ST are, I showed you before, right? The camera are checking the cars going on the highway. I get the bounding boxes. I track the bounding boxes. And therefore I have all those, you know, positions, X and Ys over time. And then those are the X, the PT positions. You can also compute the velocity, right? You know, position time, T plus one minus position at times T divided by time, you have the velocity. Wait, I'm still a little confused on the role of the decoder. Is it just converting the vector representation of the future from the predictor into actual real-world predictions? Yeah, that's correct. So the decoder is just like a semantic subdivision right now. It's just one neural net. And you can think about a neural net as having an encoder and decoder all the time, right? You can decide where you want to have the hidden representation. How do we, oh, hold on. How do we determine the dimensionality of f-pad output? Well, it depends on your network, right? So whatever your network is outputting, in my case, I think it was 128 dimensional vector. For ST, since there are variable number of cars surrounding our agent vehicle, then, okay, that's a good question. It's the size of ST variable. So ST represents the position and velocity of myself. And then I have an image which shows me the occupancy grid. So it basically shows me what's the surrounding, right? So I show you here. This is my image, IT, which represents the configuration of the lanes of the street and what is the configuration of the vehicles. You have a different number of vehicles, other vehicles in the left image and the right image. Nevertheless, an image can just show you any number of vehicles, right? So that's a very cute way of using images just for the fact that I don't need to have a variable length set of things, right? Otherwise, you should have used some kind of attention or a set network or some other crafty things. This one, you can use images as a mean of representing information. These are not natural images, right? These are completely synthetic images, but I can use them in order to cope with the fact that I have a different number of items nearby me, okay? I think I answer everyone. Like a Boolean grid, yeah, that's a Boolean grid. Yeah, yeah, yeah, it is a Boolean grid, right? So my image has RGB and each of them are or zero or one. In this case, you can see the pixels and they are a little bit blurry because there is like, I think, some up-sampling with bi-linear up-sampling here in this case, let me think. Yeah, I think there is something like that. But yeah, it should be a Boolean grid, that's correct. I think here we do, okay, there is down-sampling. There is down-sampling. So these images actually are larger and they are binary, but then they are too large so that we actually make them a fourth and by doing the fourth scaling down, they start looking like a little blurry right now. Otherwise, if you have a card that turns, you're gonna have all that kind of staircase, right? Now instead, if you have this kind of blurred version, it's not that staircase-y. Okay, too many questions. With a differentiable cost function, do we back-propagate from the end of the trajectory all the way back to, yeah, that's gonna be, yeah, sure. That's correct. We are gonna see this in the second part, right? This is first part where I train a regression network, right? So here it was the first model, right? So you have like a encoder decoder, which is not because it's a predictor decoder just because of the fact that the stuff on the left-hand side is from the past, right? That's why you need a predictor that gives you the next thing in line. But otherwise, you can think about an encoder, but it's not correct. Way of thinking, just, you know, similar. Does it work? So on the left-hand side, you have the actual future, the thing that really happened on the right-hand side, you get the deterministic predictor decoder, the network that I just showed you train with MSC in order to replicate the thing on the left. This is from the testing set, of course, and it's been trained on the training set. So on the top right, you're gonna see the frames. You're gonna have a 10 frame per second, and the direction of motion is going forward. And the blue guy is ourselves, the green guys are the others, and the red are the lanes. So you can see here that after three seconds, four, five seconds, everything gets quite fucked up. Yep, nothing works. Okay, how nice. I just taught you something that doesn't work. How happy are you? Are you happy? No, I can't hear you, but okay, I imagine. Yeah, thank you for the no, not happy, okay. All right, so what's happening? Okay, who can tell, you should know, right? What's happening? Because Jan has been talking about this stuff all along these past three weeks. So what's the problem here? Latent variables, okay, that's the solution. What's the problem? Okay, if the MSC loss, yeah, what's the actual problem? Oh, okay, someone answer, someone answer, LM. Who's LM? I don't know, it's average in future outcomes, yeah. Yes, yes, yes, someone answer, stop. Okay, this is basically everything that can happen from that initial point in time, you have everything else. Therefore, it looks like that, every kind of image looks like a blurred image, right? So again, you have this example from Jan, if you have a pencil on a plane, and since I can't really draw in 3D, I'm gonna give you the top down view on the right-hand side. Let's say you make it fall, right? One, two, three, four times, five, six, whatever. And if you compute what is the average falling location, well, this is since is X and Y is just coordinate, the average final location is like, oh, the pen never fell, and that's really wrong, right? Otherwise, if you actually use the pixel space, you would have like the overlay, anyhow. The problem is this one, right? You have a multi-possible future, and then you only try to regress the average future, right? Ah, how do we fix it? I already told me, with latent variable, yes, yes. No, no, no, no, L1, latent variable, latent variable. This is, okay, cool. This is the energy-based stuff he likes a lot, and we like it too, right? Because we like Jan. So, okay, and actually you already know everything here. So this is the original network, I just showed you before, it doesn't work, and let's fix this. So in the center, we are gonna be summing something. And we sum these low-dimensional latent variable in green ZT, which goes through an expansion module, such that it match the dimensionality. So where is this ZT coming from? Well, I guess you can tell. So ZT is gonna be chosen such that the prediction is minimized, that the MSC is minimized for this specific peak of the prediction, right? So you can do inference, right? So you train everything, it's already trained actually, because we trained a deterministic one. Now you can do inference of the latent variable, such that you can still get the MSC to zero by doing gradient descent into latent space, right? So you can change that ZT, to change, change, change, change, change until the MSC dies. This is very expensive, because you have to do gradient descent into the damn thing there. Otherwise, you can just plug this, you can actually predict that latent. How do you do that? With the encoder, and there you go, where the encoder is. So the encoder gets the future, the state, and gives you the mean and variance from which you sample. What is it? Variational predictive network. Actually, it's a variational conditional predictive network because you actually start from an actual action, AT, right? So AT is the condition. Is it advisable to have the output fed into a encoder for the latent variable during training? Oh, never mind, okay. All right, yeah, that's how, yeah, okay, that was the answer, yeah. So you don't have to, right? This is just a very convenient way to get that ZT. I guess in the next, next lab, when we do the energy-based model, we are gonna do first inference, right? But inference is gonna take you forever because every time you have to try, try, try, try, try a new Z, then where do you start? Wait, why is there a narrow from ST plus one to F, and because, unless you know what will happen, how would you find that ZT, right? So that ZT is the missing information that you can't have from the past because something happened, you know? Now, my roommate is gonna be coming naked inside. Okay, no, it doesn't do that things usually, but that's unpredictable part, and it never happened before, yeah? Hopefully, luckily. Okay, the point was that, given that I have no idea about what's gonna happen next in the future, you know, a meteorite crashes here, you can't really predict the unpredictable part, right? You don't know what's going on. And therefore, during training, I look in the future, what's happening? Ha, I see something, I get information from the future, and from that information, I can predict the latent variable, okay? So someone's gonna say, oh, you're cheating, right? Because you look in the future, and you don't have the future at testing time, right? When you drive, you don't have access to the future, but since you're training here, you can kind of cheat and look what's happened there. But we can fix this, how do we fix that? So this is the posterior line of a variational encoder, and you fix it this way, right? You enforce the posterior, the decoder, the encoder there, to be giving you a distribution that is as close as possible to the prior through that KL, right? And so in this case, you learn how to predict basically meaningful latent variable ZT, trying to disconnect as well from that kind of future. Okay, I'm not sure, hold on, where is it? I'm still not really understanding the FX part. FX means expansion. So ZT is the latent, might be 16 dimensional vector, very tiny thing, and the F-pred can easily be like some kind of spatial information, right? So given that the state is an image, I'm gonna be like, through a neural net, I'm gonna be making it convolutional neural net a bit smaller, but it's gonna be still spatial. It won't be collapsing in one vector, such that my FX, the expander, will expand my 16 dimensional vector maybe into 16 planes of the same size of these F-pred hidden representation, okay? Such that I can sum them together. Could you repeat how you're putting a restriction on the model not being able to look at actual future states? So right now, only by looking in the future, you can figure out what's happening, okay? So this F encoder is trained to produce the latent variable, which is minimizing that MSC. So that's not the variational auto encoder. The encoder there on the top part is gonna be trying to predict the latent that really gives you zero prediction error. But then on the other side, you also enforce that encoder to give you something that is close to the prior P over there. So the fact that there is like a KL term between the posterior and the Q and the P allows you later on to sample from the prior when you're actually doing, you know, you're using this network. So you build a prior distribution and then during the test time, instead of looking at the actual future state, you sample from the distribution that you've built? You just learn, yeah, you just sample from the prior distribution. So this is gonna be a fixed prior. You fix the prior, which is a normal distribution, you enforce the encoder, you know, to stick with this kind of prior or you can even learn what is the distribution of those latent variables. You can do many things. This is what actually we managed to get the best result with, okay? You enforce the encoder to give you something that looks like a prior, which is a Gaussian, you know, the independent Gaussian. And then from, well, independent in also ISO, what's called ISO something, I forgot. That's, you know, the unitary identity metrics, right? For the covariance. And so later on, we are gonna be just sampling from that prior distribution to get latent that look, you know, reasonable. Okay, thanks. Sure. Okay, there are many more questions. How do the latent variable prevent the averaging? Oh, well, that's actually, so basically whenever, okay. So let's say we train the brains on the bottom, the deterministic part, right? So at the end, you're gonna get a prediction for the predictive state, for the future state, which is looking like some kind of possible, but, you know, average future. Now, for a specific future, you can figure out now, you know, by doing gradient descent, you can minimize that MSE by changing that additional latent variable. This is how latent variable models work, right? So for every training sample, you have one latent variable, which is gonna give you exactly zero MSE loss, right? You can train first with all of them, such that you can get a very initial starting point, which is the average prediction. And then you can refine that average prediction by adding this additional latent variable. And that value for the latent variable can be found, for example, by doing gradient descent, meaning you minimize the MSE, like you minimize these MSE loss by getting gradients coming down here. The gradient comes here, and then you get gradients here. So you can do Z equal, you know, Z gets Z minus eta gradient of the loss with respect to the Z, right? Got it, okay, awesome. No, that wasn't someone else say got it. Okay, someone had a question before. Okay, I don't know. That was the answer I gave. So basically we are adding the F-PRED, F-PRED. Our prediction of what will happen with FX, the representation of what actually happens. No, FX is not what actually happens. FX is what I couldn't figure out that would have happened. Okay, so F-PRED output, I would guess my best prediction for what happens tomorrow is gonna be that the sun will rise. Might not be the case. So the Z that comes down to the FX spender will add that component that will, you know, manage to fix my broken prediction, right? So the lower branch will try to do as the best job it can without having knowledge of the future. And the other one allows me to refine my prediction to be actually correct, okay? But in this case, we have access to the actual future. So it's kind of cheating. Nevertheless, you enforce that this generation of latent variable is gonna be as close as possible to my prior distribution. I hope it makes more sense, but maybe we can go further and then you may get a little bit clear, clear mind. Well, last question, is it guaranteed? Okay, it's starting to make sense, okay, sure. Is it guaranteed that there exists a Z which gives us MSC zero? I think I'm not sure about guarantees, but if your network is reasonably well-behaved, I guess, yeah, you're guaranteed. You know, if your network capacity is, can overfit, but you can't overfit because, you know, there are noise in the data. You can reduce, you can zero out that noise by adding this additional term, okay? So it's guaranteed if the network is properly sized, yeah. Okay, cool. So inference, how do we drive? Okay, we already spoiled it. Okay, more question. Training time. We try to learn QZ that's close to prior PZ. Test time, okay, yeah, I'm getting there. Hold on. All right, test time, not test, driving time, right? How the hell do you use this stuff? Variation and predictive, conditional and predictive network inference, cool. So we had this lower branch and this low dimensional latent variable, not an O16 dimensional vector, and where it comes from. How do we sample from prior, right? Because we enforced that the encoder there was trying to, you know, shoot it towards this distribution. Cool. Then what do you do next? You get the prediction, you put it back, so you do an auto regressive step, right? You get next prediction, that what next? Yeah, correct. Okay, no one wrote anything, but I know you understood. And so you keep feeding this stuff. Does it work? Yes, it works. So here you can see again a comparison between the actual future on the left hand side, the deterministic branch that is just a lower branch, trained as before. And then here I give you four different draws from an N distribution, what's called? Okay, I can't tell. The help, normal distribution. So you have, you know, 200 times four. So you have 800 samples from a normal distribution of size, I don't know, 16. And I feed, you know, the first 200 values to the first model. Then I start again from same past and I feed 200 new values to the latent there. Then I have third time, I get the same initial condition and then I feed the sequence, another, the third sequence of 200 variables. Pay attention here to the car on the right-hand side of me that is in the circle, white circle, and then the guy behind that guy, right in the square, okay? And so if you show, if you see here, you get basically that all these different predictions will predict a different location for the car going there, the one in the circle and the car behind the one, the one in the square also has completely arbitrary prediction. So this is super cool, right? Right now you have that each of these, you know, possible futures, you know, are completely hallucinated by my network. So this stuff doesn't exist, but we have a network that generates future. How cool is this, right? So before maybe we had, you know, limited amount of training data. Now we have a network that is just generating futures from the hot like a magician, right? And so this is super, super, super cool, right? You have infinite amount of data, which is, you know, completely different from what actually happened in the actual future. This data comes from only observational data, so that we have seen in the actual reality, but is applied to this specific initial condition. So what next? Now you can use this huge amount of data to train this policy, right? This network that allows you to control our agent such that it minimizes those losses, that the cost for the going over these lanes and the cost for, you know, colliding against other vehicles. Okay? The cool part is that, okay, I can tell you this one. I should tell you, yeah, I can tell you. The cool part is that this future here, these multiple futures come from the specific sequence of latent variable you feed to this network, right? What if you perform gradient ascent in the latent space, you get your initial, you know, sample from this normal distribution, then you tweak these values such that you fucked up the hardness, like you increase the hardness, you don't fuck up, you increase the hardness. I like to swear, you know. But you increase the hardness, for example, you try to increase the proximity cost, right? So you get the sequence of latent where other cars are gonna be like kamikaze, like they're gonna be driving into you and like crazy, right? So this is super cool. You have a network that gives you futures that you want, right? Okay, I'm already too excited. I will not be able to sleep tonight. Okay, question. If we try to sample Z from PZ during test time, it is possible for us to look for a specific future. For example, I would like to know the solution of turning left, not moving forward. So here, here I don't, let me think, right, right. So this predictive network, you feed the action, right? So this is a conditional predictive network. So you have the action here on the bottom part. So you actually, that's the whole point, you can take different actions and the whole future will change based on the action you take, right? I was mentioning here, we have initial state, that's the initial condition. And then given different latent, you're gonna have different kind of behavior of the other vehicles. And then you can decide to tweak this latent by, for example, doing gradient ascending the latent space by increasing that kind of collision term, such that you have crazy cars coming at you. But nevertheless, as you pointed out here, you can also figure out what are the outcomes of changing the action, right? This is actually how we're gonna be training our policy. That is actually the whole thing, right? Okay, issues. First issue. So given that you actually have access to the future, if you turn slightly to the left, everything is gonna turn to the right, right? Because if you drive, you know, you turn to the left and how's it gonna be like? It's like that, right? So if I turn to the left, you just turn to the right. And turning to the right is gonna be contributing in a huge way to the MSC, right? So right now, you're gonna get basically that these MSC loss can be minimized if your encoder there in the latent variable is gonna tell my bottom part that everything's turned to the right. But this is absolutely not what we want, right? Because we know how to tell everything turns to the right because that's deterministic, right? Given the past, given that I turn to the left, the steering wheel, everything turns to the right. I don't care, I don't want to look in the future to see everything turns to the right regardless of what I'm doing with my steering wheel, right? So it's a very, very huge variable problem because the network basically was learning to cheat, right? And was figuring out that we were turning without us telling the system we turned, okay? So that was terrible because basically this big arrow here is a leak of information and therefore it was not any more sensitive to the current action I was providing to my predictor, right? This was a very big nightmare. How do you now kill that big arrow, right? So how do I get my predictive network to actually care about the action I take, right? So let me show you the problem here. So here we have the real sequence of latent, the one that I actually have computed from minimizing the MSC. Yeah, that's the answer, that's correct. So here I have the real sequence of latent and the real sequence of actions that are taken by the agent, the real one. And so here you can see it's actually sped up four times such that you can see there is some kind of turning. Then you can see here random variables but the real sequence of action. You can see now that things are kind of turning, right? You can see things are turning. On the last one there, you have the real sequence of latent but sample actions. And you can clearly see at the last point that the turning came mostly from the latent, right? So the latent, the real latent encodes the rotation and the action that it's written a tilde and those are sample at random, right? And so, oh well, sample from other episodes. So the problem here is that the fact that we were turning, can we learn? So you'd like to learn that thing, you would like to reject the fact that we were turning, right? So the fact that the thing turns should be completely explainable by the action, right? And I think, yeah, I don't know. Anyhow, let me show you how we fix the problem. Sorry, can you explain again? It was very unclear when you were saying first and last what you were referring to. Okay, I try again. So here we have four different rectangles, right? In the last one on the right-hand side, you have the real sequence of latent variable which are the latent variable that allows me to get the precise, the correct future, right? So those are the latent variable coming from the encoder in the variational encoder and have the real sequence of actions taken by the expert, okay? The two blue, the one that have this blue here and this one here are sampled latent variable. The tilde means they are sampled. So they are randomly sampled, but I have the real sequence of actions. And so I would expect to see the steering. And the last one on the left-hand side, I have the real sequence of latent but then I have arbitrary actions. And so if I show you again the animation, you're gonna see here there is some amount of steering involved, okay? And the steering comes from the actions. Then on the other two examples here, I show you that the same actions are not providing the same amount of steering now that I have sample different latent variables. Nevertheless, if I use the exact same latent variable, all the steering happens because of the latent variable. So my decoder will tell me in my network that things were turning just because they have been encoded in the latent variable rather than in the action. That's all I can, I think I was clear, clear. But let me show you how we fix this, maybe. Oh, is it clear what I'm trying to show you? Yeah, that was much better. Oh, okay, thank you, see? But repeating, we say repetita uvant, it's in Latin. Repetition helps. That's why you had to practice, practice, practice. Okay, how do we fix the problem? The problem is that we have not a memory leak but information leak, right? The information leaks from the future. Wow, it's a bit late. All right, so how do we fix this problem? We fix this problem by simply dropping out this latent and picking it and sampling from the prior distribution at random, okay? So we don't always rely on the output of the posterior network, the encoder, but sometimes we pick from the prior. In this way, you can't encode the rotation in a more, in the latent variable because sometimes it will be missing. And therefore the network will be having to, you know, exploit the action you provide. So in this case, the purple on the right-hand side, I show you two different sets of latent variable but the real actions, okay? And this network has been trained with this dropout trick. And so in this case, you can see that the rotation is actually encoded by the action and no longer by the latent variables as it was the case before. So we fix this problem. I think I should be speeding slightly a little because we're kind of running way too late. So sorry, I just didn't notice it. But okay, how do we train the agent? Well, this is pretty much what I said before. On the right-hand side, we learned so far the model of the word from the real word. On the left-hand side, we're gonna be training this agent through using this predictive model. So we have the agent which picks a initial state, the initial condition, right? Position, velocity in those context images and gives you a control and action which is the acceleration in the longitudinal direction and the acceleration in the transverse direction. All right, so how does it work? So this is how we train. We have a state. We feed the state to the policy, you get an action. Then you feed both of them to the word model, right? What does the model tell you? Well, you need to provide some latent variable, some possible way that the future can evolve. Then you get the prediction, cool. You feed the prediction to this loss. Where the loss is gonna be my cost of the task I'm trying to accomplish. In my case, it's gonna be the summation of the proximity cost, the one that is telling me how close I am to other vehicles plus some kind of cost associated to the being in the center of the lane. Cool, so on the next state, I send it to the network, the policy. I get the next action. I feed both of them to the word model. You get the latent variable. You get a new prediction. You send this to the loss. And more prediction like policy, word model, latent variable, next guy, loss and finish. You do a back propagation to train the policy model and it doesn't work. Fuck, what's happened? So we are basically falling outside the manifold. The policy manages to crank up those actions and give predictions that are all black and all black is good because zero cost, right? So that's bad. Okay, so here we actually went outside the road and here we actually collided into other vehicles. So let's maybe try to imitate other vehicles. So how do you do that? Well, you can say that the loss is gonna be the task that we were trying to accomplish plus some regularizer, which is expert regularizer. What is this stuff? So here you try also to get the prediction that you would get by taking a specific action as close as possible to the actual future. So you do that for everyone. But in this case, you actually had to kind of remove this latent variable because the latent variable gives you a specific prediction. It works better if you just work with the average prediction. We train the variation out encoder to just remove it. Does it work? Actually, yeah, imitating the expert works. So this is kind of imitation learning but imitation learning with model base. So model base imitation learning. You use your brain in order to try to imitate others. All right, can we do better? Yes, we can do better and that's gonna be the end of the class. So how can we add a different kind of manifold attractor? What am I talking about here? So forward model uncertainty. So my predictive model outputs a prediction given, a state and the action. And then I have my cost here. So this is, for example, my cost. The point is that there are multiple ways this cost can go whenever you're outside the training region, the one with the red points. So if you're within the red points, as I show you in the lab number three when we were doing regression, if you're within this training region, you have zero variance across those points. As you go away from those training region, the variance increases, right? Guess what? The variance is differentiable. Let's run gradient descent. So we can try to run gradient descent over the variance. And yeah, so you minimize the variance by using SGD. So this is my uncertain theoretical riser. So I have my prediction, my policy, which is feeding my action to the word model. You get this latent variable here and you get the prediction. Also, you get the task cost here, which is the minimization of the lane and proximity cost. Yep. In there, you actually can get several models or you can use some, the dropout trick we haven't talked about, but you can leave the dropout on during inference, such that you can have multiple predictions and then you can compute the variance, right? Of these predictions. And you multiply by a thing, whatever, a lambda, a scalar and that's my model uncertainty regularizer. You sum the two and you get the final loss to optimize. And that's it. So that's the whole model where the final loss is gonna be my C task plus this uncertainty. The slides are gonna be in the next slide. I give you the links. So in this one, I show you in pink that they actually managed to learn how to drive by minimizing the uncertainty of the predicted model. So the action taken by the policy are minimizing the uncertainty with which the predictive network makes predictions, right? What a mouthful, but it works very well. And evaluation. I'll show you just very quickly here in the yellow is the original car and the blue is the car controlled by our policy which got lost because the other guy went to a different path. We saw our blue guy here has to survive this jungle of other cars. They cannot see us, right? So in this case, we are slightly ahead, slightly behind, slightly we are accelerating a lot and we survived. In the other case as well, we are the blue guy. The yellow one is the original guy in the data. And oh, we again managed to diverge from the original trajectory but we still survive until the end. And slides as you were mentioning. So, okay, this is like a summary of the whole thing. Model predictive policy learning with uncertainty regularization for driving during this traffic, you should have understand everything now. So model prediction, like, okay, I just, you have the four different points that are uncertainty regularizer, let them drop out large scale data set and then you have the additional cost for, you know, trying to mimic the experts information and these are the links for everything. So that's the title, that's me. Collaborators are, well, main authors. We are both first authors, Mikael and myself and then there's Ian. Slides are here. The article is here. The code is available on GitHub, on my GitHub and this is the website. And we also have a poster. Sorry to be running so, so, so late but I hope you really enjoyed this small project of ours. There were so many questions, I didn't plan. There is another explanation of this. There is another explanation of this project on my YouTube as well, it's 20 minutes long. Maybe I've done a better job than today but maybe I did a better job today since I was answering your questions. So, if there are no other questions, we're gonna see each other next week. And again, if someone can help out with the review of last scribes from the lab would be very helpful or because I have no idea how to deal with this otherwise. Is the word model frozen when training the agent? Yeah, as we have seen last week. Do we still have time for questions? Of course, any time. Since when you're training the network since it's kind of like a recurrent architecture are we worried about like vanishing gradients? And how do you, with such a complex model how do you like address that? All right, so it's not that complex as in this is two layer neural net. So the policy is very stupid policy like it's very tiny. The word model is a bit larger but nevertheless we have batch norm which is keep helping out sending things through like the gradients through. And then I think we also use some residual connections because those are units. Yeah, so the gradients didn't give us big issues for training this model. Also we are doing 30 time steps in the future. So it was like three seconds in the future given that we start from I think two seconds in the past. So overall it's like five seconds window temporal window for training this system. I didn't quite understand how the, when we're doing the latent dropout, how that works and how that helps us with the, I guess would you call it like, like disentangling the action versus the latent? Right, so the problem was here, right? So the problem was that the fact that we have access to the future, if we make a small tiny change in my steering wheel control, everything will change which is a very large change, right? And that MSC, to minimize that MSC, a lot of information will go through that way because it's a big change, right? So it's very, it's a brutal change, you know? And you know, if that latent variable, that latent variable will try to acknowledge the fact that everything changes. And 80 just changed slightly, right? Because it's just a tiny change of the steering wheel. The problem is that if you're forward, if you're predictive model on this, it's called forward model is no longer looking at what is the steering wheel, then you know, you can steer around but the network will not care about what is your control, right? And instead you want to have a network that if you steer to the left is gonna tell you, everything steers to the right by the fact that you steer to the left. So you had to disentangle this. And to fix that, the thing that really worked very well was this one, which was basically a half of the time or maybe we had some scheduling, I forgot. But you know, sometimes instead of, you know, sampling the latent variable while training this auto, variational encoder, instead of sampling it all the time from your encoder, sometimes you just sample from the prior distribution. In this way, rotation in the future cannot be explained by the latent variable. Therefore, there must be a path that connects the action to the future state, right? So if you break this arrow, you have to break this arrow. If I do the switching between this one and the other, you basically break this arrow for a fraction of the iterations. And therefore the arrow will not actually assist because it cannot use it to make a prediction. If sometimes it happens, sometimes it doesn't happen, the network will always seize this action, right? This action is gonna be always turning left, turn right, deterministically determining the fact that you steer to the left, to the right, hmm? So this is kind of a way to make the network like see the action when it's performing the training. Yeah, this was a big issue. Without this thing, it doesn't work because you would have a network which is not caring at all. Something else you can do is gonna be like adversarial, adversarial something. You want to get a network which is gonna be handling you like if the future change because of the action or not. And you said the prior distribution for the latent variable was just like a Gaussian normal. Okay, all right, cool, thank you so much. Jan keeps saying that we sampled from zero. That was the previous version where we actually didn't even have a variation of the encoder and we had just a encoder which was encoding the mean. So we didn't have the sampling module. We didn't have the variance, just we had an encoder which was just giving me this latent. So that was the previous version. No sampling, no V, you just F, enc, you encode this guy. And so sometimes you were getting it from the encoder and sometimes you were picking it from zero because we trained this first branch with this guy that was not here, right? So we first trained this stuff deterministically which is, what is it? Here, so initially we trained this which means set Z to zero, right? So if you set the Z to zero, there you go, you get this back deterministic. So basically whenever you do this dropout trick you switch back and forth between the deterministic and the stochastic version of the predictive model. Cool, huh? Sounds crazy, I think. But yeah, I think you should be able to digest this stuff because you're a good, you're good students. You're my student. That's really interesting, thank you. More questions. Yeah, Alf. Yeah. So I wanted, I just wanted to check my understanding of latent variable models. It seems that different models use latent variables for different objectives. Like the auto encoder uses it to learn a low dimensional representation. Then VAE, the variational auto encoder builds on it to regularize the latent space so that it can randomly sample from the latent space. And then GANs and the model that we just studied they use it to introduce randomness so that when we have the same input and different outcomes, the network doesn't have to output an average of the predictions. And so basically, here we are using latent space to provide more dimensionality to the input to introduce randomness to the input. So in this case, we have this latent variable to account for what cannot be account from the past, right? So the point is that, not everything is predictable. And so if you cannot predict everything, you're gonna make some errors. So this latent variable allow you to tune your algorithm to actually zero out that error. And later on, you can sample this latent variable in order to get proper predictions. So without the latent variable, you are getting that mashed version of the prediction that's the blurry prediction. Instead, in order to get crisp predictions, you want to refine your average prediction to the specific case you have at hand, right? Got it. So it is basically using it to add dimensionality to the input to make things more specific. I wouldn't say dimensionality to the input. I would say you add latent variables in order to provide the missing information that would be required for you to make a proper prediction. Got it. Got it. Cool, thank you. Of course, more questions. I know it's so late, I'm taking, this was very extremely lengthy class. I feel sorry. More questions. I can't see you. Hold on, stop, share, screen. Okay, oh, 50 people still here. Okay. What do you want to know? Alf, I also want to check my understanding for a later course, correct? So the latent variable will tell us something that we cannot predict from the future. But you just said, if we turn left, everything would be to our right, but you don't want the to tell this information. You want this information only comes from our action. Yeah, exactly. Because if you steer to the left, everything will turn to the right, right? If you're riding a bicycle and you're doing like that, you're going to go, you exactly know how you will evolve, right? You don't know if you're going to be crashing into someone and that's going to be taken care by the latent variable, which is going to be, you know, patching your wrong prediction due to unforeseen events, but everything that is predictable should be done by the deterministic branch such that you use the past information to determine what's happening next, but then you can't really know. So it's like, you know, it's going to be me trying to touch the come, webcam here, but then there is some wind because I opened the window. So I go here and it goes there. I go here and it goes here, right? So if there is an additional component that is like moving my hand, I will have to deal with that for the specific case. So I can only go here, given that I know that my finger will go forward from the past to the future, I exactly know where to go. But if there are additional, you know, inputs that are not under my control, then you will not be able to make accurate predictions unless you have some knowledge. It's like, okay, like going back to the example, Jan makes all the time with a pencil, right? So the pen is going to be falling one direction. Let me get the pens. Yeah, the pen, if you let it go, it's going to be falling in one direction, goes another direction, goes the third direction, right? So you can first learn a, you know, massive network, which is learning the dynamics of falling, right? So you learn how a pen falls down. The only little information you need to know that you can't know in advance is in which direction it will fall. So you're going to have the big ass network learning the falling dynamics. And then the last minimal amount of information that is missing is in which direction you should actually, you know, initiate this falling trajectory, right? So this is how it works. Big network to learn the big bulk of prediction, little tiny latent variable, right? So this latent variable, again, I said it's something like 16 dimensions, very tiny vector, very short vector, which is just providing you the missing information to make a sharp prediction. Okay, okay, so to avoid the latent variable to tell us information that should be predictable, we are trying to impose the regularization term, the KL term, with the help of prior. Ah, okay, so yeah. So the KL term, it's necessary for you otherwise, things will explode as we have seen in class, right? If you don't have the KL, this prediction will go as far as possible because they don't want to overlap. The second term is going to be the actual network will try to collapse them, right? So whenever you introduce this KL, you enforce the bubble to be having variance of one and also all the mean to be close to zero, such that you feel that, you know, sphere with little bubbles. And then later on, given that now all these spheres are within this bubble, you can sample from, you know, a Gaussian distribution, like a normal distribution from the same size. So that's why we use the KL. The part that allows you to be, you know, if you have like, if you put a lot of strength on the KL, you are gonna be also reducing the amount of information. So you can use that KL as a term to reduce the amount of information coming from the future. So that's definitely one way of regularizing the latent variable. The other way is also the other trick we are using is the dropout trick for the latent, which is sometimes, you know, sampling completely from the prior. And such that now the backbone, the big massive network, the deterministic one cannot rely on information that is, you know, not always present, right? Okay, and all this happened in the training stage? Of the predictive model, yeah. Okay, okay, yeah. Thank you, thank you so much. Hey Al. Yep. Could you, so regarding that forward model uncertainty, is that the uncertainty that comes from like trying to predict more in the future or? So the uncertainty there, okay, I think I went a little faster. The uncertainty coming there is the fact that, you know, whenever you have your policy trying to control the other guy, you can, you know, press a lot the acceleration or you can, you know, steer completely like crazy. And if you give, you know, some actions that are far from the training domain of that policy, different policies will reply, will respond in a different way, right? Because if you train several models on the same training data, and then you test these multiple models trained on this data outside the training interval, right? So let's say this is, I cannot speak with the pen in the mouth. This is the training interval, right? And then you have the, you know, you test on our fire here, right? So in this training interval here, all the model will agree, and therefore the variance across this training interval will be very tiny. As you go away from the training interval, the variance will increase, right? The nice part is now that given that this variance is, you know, it's a scalar value, you can minimize the variance by having the action going from here to here. Because here the variance is tiny, right? The variance here is minimum, and here the variance is very large. So you can have actions that migrate in your action space such that the variance is minimized, right? So your variance now is your loss, and you do gradient descent in action space for variance minimization. And therefore you can travel in this action space, which actually is two-dimensional, right? So cool. So you have x direction, y direction, you have the heel. In this case, you can plot the variance, right? So if the variance goes up here, you can go gradient descent down such that in this region where the variance is low, you know that all these different models will agree for the future predictions. Therefore, those actions are coming from, you know, a safe realm. Right, the actions are now coming from more of like the training points rather than the outside points that we have. The training region, let's say this way, yeah? Right. Okay, thank you. Of course. More questions. Ask me everything. Ama, Ema, how do you say? Emi, A. All right, more questions. Are you done, or am I gonna be cooking dinner? I have one question. So, supposedly change the context of this problem from regression to some classification problem. For example, if you want to predict, are we going to crash or not? Or if you want to predict, is there high traffic or not? Then will the effectiveness of latent variables still be present? Like, would latent variables improve the model, even if it were a classification setting? And more, because from what I've observed, it seems like latent variables always improve the performance in like a generation when we want to generate something, like in generated image or a music synthesis or some regression prediction. But in a classification setting, how effective would they be? So, we are actually working on this right now. We are working on classification. And maybe we can hear more about this next week when we're gonna be talking about the language. This is actually what another student of Jan is working on, using latent variable models for performing classification that allows you to pick for multiple options. So, I guess you're gonna hear about this soon, not today. Or you can ask Jan next time. But again, I think we can mention this next week. All right, that's it. All right, have a good night, have enjoyed your meal, stay safe, wash your hands, and the pen, it was not sanitized, but I haven't used it. So, good point, don't touch your face, don't stay safe. Bye-bye. How can you get more out of this course, right, overall? So, let me give you a few suggestions. First, comprehension. If something was still not clear, just ask me the question section below the video. I will answer every question, so you will get it eventually. If you'd like to get more news about the field, things I do in terms of educational content and things I find interesting, you can follow up on Twitter. And there you have my handle, AlffCNZ. If you'd like to have updates about newer videos, don't forget to subscribe to the channel and activate the notification bell. If you actually like this video, don't forget to put a thumb up. It helps as well, recommending this video to other people. If you'd like to search the content of this lesson, we have an English transcription, which is connected directly to this video. So, every title in the transcription is clickable. If you click on the title, you get the right director to the correct location on the video. In the same way, each section of the video has the same title as in the transcription, so you can go back and forth. Maybe English is not your first language. Par l'italiano, a bress espagnol, Don Kwama speaks Korean. I have no idea how to speak Korean. Well, we have several translations of this material available at the website. And we are also looking for more translations if you can help as well. It's really important that you actually try to do some of the exercises and you play around with the notebooks and the source code we provided in order to internalize and understand better the concepts we explained during the lessons. Contribute. This is really giving you the opportunity to show your contribution. For example, you find some typos in the write-up, so you find some bugs in the notebooks. You can fix those and be part of this whole project by sending me a pull request on GitHub or letting me know otherwise. And that was it. So, see you next time. Bye-bye.