 Have you been to Isolde in New York? Love their lasagna. No, I haven't been to Isolde. I usually go to two places. One is called Norma. It's here on 33rd and 3rd Avenue. It's just Sicilian, very authentic, I love it. The other one, which is absolutely the best pizza in the world, it's called Sorbillo. It's very close to NYU. The pizza is just like in Naples, and it's like you can't find anything better on this planet. So, you know, you should go to Sorbillo. And if you mention my name, you get discount, right? So, yeah, I have a special power. Yeah. Chicago pizza is not pizza, it's pie. So if you call it Chicago pie, it's fantastic. It's not pizza. Napoli pizza is the only pizza. Okay. All right. So, what do we talk today about? Today, we talk about something called the trackbacker-upper. What are we trying to do today in this lecture? So this is the first lecture where you're going to have me basically reading with you along in a paper, okay? Such that we can figure out whether things make sense. This is how we educate ourselves in the research world. You don't have necessarily Jan on your side all the time, so you had to read papers and you had to make sense of what's written there. Usually, I transcribe them again back with Taipora just to fix the main topics in my mind, right? And to commit to time, right? Such that I can check later what's going on. So let's get started. So what is this setup? What are we trying to do here? So we are trying to design by self-learning of a nonlinear controller to control the steering of a trailer track while backing up to a loading dock from an arbitrary initial position. Only backing up is allowed. So what does this mean? What we are trying to do is going to be learning how to drive a track, which is maybe not that complicated if you go forward, but we are trying to park a track, so we just go backward and only backward. So if you try to park your car in a parallel parking, it's a little bit complicated, perhaps sometimes, and not always you can get it done with one maneuver. If you have a trailer track attached to your back, it's a mess. We are figuring out this very soon. So how does this trailer track work? We have two items. We have a cab, which is the top right part there. So we have an angle of the cab with respect to the x-axis, and then we have the coordinates x and y of the joint between the cab and the trailer. Then in the part below, we have the trailer, and therefore we have the x and y location of the back of the trailer track, and then we have the theta trailer, which is telling us what is the angle of this trailer track with respect to the x-axis. Our objective is to drive this guy backwards until the back of the trailer hits basically the dock location, which is represented there with the x-dock and y-dock coordinate. Moreover, we would like to have the theta trailer, which is the angle of the trailer to be zero, such that basically the trailer is orthogonal to the docking station, and basically the rear part of the rear, front and rear, rear part of the, is rear back, right? I'm speaking English. I guess it's correct. So the rear part of the trailer should be parallel to the docking station and as close as possible to the location of the dock. So far, makes sense, right? Yeah? Okay. All right, so let's read a little bit more further in this paper. So we have some state variables, there are six, and which is not quite, I think this is perhaps not correct, but we'll figure this out soon. So the state variables are the theta cub, which is the angle of the truck, then x-cub and y-cub, which is the Cartesian position of the yoke, which is the part behind the cub, and then you have x and y trailer, which is the position of the rear of the trailer, plus the theta trailer, which is the angle of the trailer, okay? So basically how it goes is the following. So the whole procedure is the following. The truck backs up until it hits the dock and then it stops. The goal is to back the trailer to be parallel to the loading dock, right? So the goal is to have the backside of the trailer to be parallel of the docking station. And then having the x trailer and y trailer, the location of the back of the trailer as close as possible to the x-dock and y-dock, okay? Are you with me so far? So the initial position is going to be randomly set. And then we have to back up until we, so the objective is to back up until we hit the trailer and the dock with the back of the trailer. And we are exactly like orthogonal, okay? There are a few difficulties with this. And we're going to figure out right now what are these, all right? So we go CD work, GitHub, Pdl, maybe go conda, activate Pdl, right? Now we have Jupyter notebook. So in this case, we are going to be going over this truck backer upper. So something I really like to do in Jupyter notebook is go use actually the Unicode. Since we are using Python 3, we can use Unicode, right? So to write pi there, I just do backslash pi, then I press tab, you get pi, okay? Or you can do alpha, beta, comma, okay? This is not coding. This is notebooks, okay? You can do ugly things like this. All right. So I just initialize some libraries. This is our setup. I initialize some object here. We don't care right now. And we start by visualizing this guy here, okay? And I think I had to unzoom a little bit because you can't see. There we go. All right. Cool. Okay, so now we start the interactive part. We try to draw this guy. So let me go, let's say with zero angle of the steering wheel. And I keep, you know, executing this one, right? So this guy here keeps going. And then what's happening here after the next step is that my computer now is going to be, is complaining. And there you go. The track is jackknifed. What does it mean the track is jackknife? You simply broke your track because you drove into yourself. That's very interesting, but you can drive inside yourself with these tracks, okay? So let me reset the state with something a bit decent. Maybe like, oh, okay. Just someone, volunteers. What is going to be now the angle? You'd like me to steer the wheel. You can give me anything from zero to 45 degrees, I think. Minus 10. So I go minus 10, which is going to be turning to the right 10 degrees. And I go one, two, three, four, five, six, seven, eight, nine, 10 steps, okay? Off. Yeah. If I keep going, you're going to be jackknifeing soon, okay? So what should I do right now? Someone else? Oh, maybe still you. Plus 15. Okay. Let's try plus 15. You go one, two, three, four, five, six, seven, eight, nine, 10. Okay. What next? Should I keep going? Yeah. One, two, three, four, five, six, seven, eight, nine, 10. We are going to be going off screen. Maybe alternate plus and minus, then you go straight, right? If I keep going like this, you can see that this track is going to be going outside. And we're going to be going basically game over. So you don't have a driving license. So minus 15, okay. Do you agree all? Should I go with minus 15? I go with minus 15. One, two, three, four, five, six, seven, eight, nine, 10. Okay. No, we cannot drive forward. Plus 45. We messed up. No, no. I don't think you messed up. I think it's still good. Zero. Someone said zero. Let's try with zero. Not yet. Okay. What do you want me to do? Plus 45. Okay. Plus 45. Uh-oh. I can keep going, but then we are going to be just, uh-oh, yeah, one more. Minus 45. Okay. Let's go minus 45 twice. Then twice more. Then I'm going to go, oh, eight, no, six more. So we go 10. One, two, three, four, five, six. Okay. So if I keep going like this, we are going to be going off the screen, right? So I think you got the picture, right? Is it twice? Twice more? No, it's an older message. Okay. The point here is that this stuff, it's a bit, you know, tricky. You can get a, you can get like a hand of it, and maybe it's going to be like the final solution is going to be like something that we go, we keep going like this. And then we go in the other direction to rotate something like plus 10, uh, then it's going to be killing each other here. So plus maybe 25, maybe, let's see, maybe plus 20, see? So we can turn around. And now we should basically undo these things. So if I go like, um, I think plus 40, yeah, there you go. And we can go basically straight, uh-oh, uh, steal something, 20 maybe, okay. And then we can go zero. See, I've been teaching this forever. So you know, I can actually drive this thing, okay. So there you go. Yeah. Half is a tracker. There you go. See? Oh my God, I'm actually managing to do this. We had to fix a little bit. Hong Kong. Yeah. Hong the horn. I actually had my horn there. So we had to actually, oh, fuck, fuck, fuck, fuck, no, no, oh, okay. Sorry. Okay. Fine. So this actually considered success. Uh, okay. Now my computer now is complaining because there is some lag. I should have turned a little bit before anyhow. The point was that, uh, it's not, okay, the point is that it's not quite straightforward to, uh, you know, to figure out how to drive this guy, uh, because it's highly non-linear. Okay. So we had to figure out how to implement a high non-linear, uh, system, uh, how to invert a high non-linear, uh, kinematic model, basically, uh, such that if we had a kinematic model, you know how the vehicle behaves, right, given an action. If you invert the kinematic model, you figure out what action you would like to take in order to, you know, drive to a final destination. But since we don't know how to do this analytically, or maybe we didn't know in the 90s, we can train a neural net, right? And then we can simply, uh, figure out whether we can, uh, do some, you know, we can learn how, whether we can learn how to, you know, drive backwards. Okay. Are you excited? Are you interested to know how we can do this? Yeah. No? Yes. Okay. Cool. All right. Okay. Okay. So the overall, um, the overall, basically, um, objective here is going to be that given a initial position, uh, in this map here, given initial position, I'd like to know what sequence of actions, which is going to be basically what sequence of steering angle for the wheels I should apply in this six dimensional space, right? So it's not just 2D, because 2D, so this, this, uh, screen here is going to be the 2D screen, but then you have six values, right? You have the X, Y of the cabs, X, Y of the trailer, the two angles. So you have a six dimensional space. And then in this six dimensional space, you'd like to infer, like you'd, sorry, you'd like to, yeah, you get a network, which is telling you for this specific configuration of the track or the trailer track, you should be outputting this specific, um, value, right? So your regression network somehow is going to be outputting you a scalar, which is going from minus 45 to plus 45, I think, um, yeah, or minus, yeah, I think so. I think that's the range, um, for every position, right? So our goal is going to be training a neural net that goes from points in this six dimensional space to one scalar, which is going to be the scalar that allows me to, um, you know, get a, you know, a sequence of scalars. So but one scalar at a time, uh, such that you would like to, uh, drive these track backwards to the final destination. And did I, did I make sense? Does it make sense? Maybe I was confused. Is it okay? All right. All right. So let's figure out how to get all this stuff going, uh, back to the slides. Training. Boom. Okay. Cool. Let's go to the two stages. Um, the first one, we had to train a network, a neural net, to be an emulator of the track and trailer kinematics. Uh, this is how the paper, these people have done, right? So that's not the only option, but that's what we are going for, uh, then a second stage involves training of a neural network controller to control the emulator, which is going to be basically doing the task that we were doing right now when we were, you know, coming up with a value for the steering angle such that we can reach destination, uh, and you know, destination like success is determined by two points, two factors, uh, closeness to the docking station and the, um, orientation of the trailer. So this is the overall diagram. You're going to have a neural net, a controller, which was me right now, uh, following your suggestions through the chat, which is providing you these, uh, steering signal, uh, at the time K, discrete time interval K. Then we have a trailer track kinematics, which are the equations that allow me to, you know, tell you what is the next configuration of the, uh, track, uh, which is again, yeah, giving you the next state, given that you, uh, provide the previous state inside and then the steering signal. And this delta here is simply a time delay, right? So it's that it tells you that this is the next state and this is the previous state. These are a little bit, um, these are like diagrams from the, uh, electronics, like, uh, electrical engineering diagrams. So you're going to see, because that's, you know, uh, there was no computer science back in the days. Right. Um, all right. So a state, uh, K is fed to the controller, which provides a steering signal K between minus one, this case and plus one, uh, for the track. So they kind of use a ton H at the output, uh, at a time index, uh, K each time cycle, the track backs up by a fixed, a small distance again. So these are the equations for the, uh, trailer track with several tracks. Uh, you have the location of the X and Y of the cup. And then you have, like, given that, you know, the distance, the, the length, both of the, the cup and also of the trailers, the one, the two, so on, you can estimate what is the, um, you know, um, the, the variation of the angle given the, uh, input phi, the phi is this negative angle, right? Because it's on the right hand side. So we have that, uh, S is the sign speed. So you can go forward and backward. And then phi is this negative steering angle, positive grade, positive angle is being on the right hand side. Right. So it's like, uh, sweep swapped. Um, and then X and Y is the location here back of the, of the cab. Uh, notice now that basically you have, uh, the X and Y position of the trailer is actually determined by the X and Y position of the cab and the distance, the one and the theta one. Okay. So before I was selling, I was telling you that, um, we had six values for the state, but as you can tell from now, we just need four values, right? You have the X and Y and you have the theta zero, which is the, um, you know, the angle of the cab and then theta one, that is the angle of the first of the trailer. Okay. So you have only four, uh, independent values right here. Uh, this X and Y of the trailer is completely determined, uh, given these other values, nevertheless, uh, question, aren't these velocities? Um, so these are the, uh, a question. Yeah. So this is how the, the position changes, right? So these are velocities, but so whenever you have like, uh, a state, the state in this case is X, Y, theta zero, theta one. And then whenever you'd like to write the equation of the motion, you're going to write that, you know, the variation in time. So the, yeah, the velocity, uh, of X and velocity over Y and the angular velocity over this guy and the angular velocity of the other guy are going to be expressed by these, uh, equations in continuous time, then we're going to be discretizing them, uh, since we are using a computer. But this is, yeah, how you express, you know, uh, a dynamic model, right? Like a model that evolves in time. Um, all right. So we have the equations. Uh, let's figure out now how we train these networks. So this is how we train the network for emulating the track, uh, trailer track kinematics on the left hand side. We provide a random input, which is going to be, is going to be like a random initial value, uh, for the, uh, for the steering, uh, angle, and then we provide these random steering angle to the track, uh, dynamics, which is going to tell me, given the previous state, what is, what is going to be the next state. And then on the bottom part, I'm going to have my first network, which is going to be trained as a regressor regressor, and it's going to be trying to minimize the difference between, you know, the actual next state and the state that is predicted. Then the difference between the prediction and the actual ground truth, it's used in order to, um, and they call these adapt, the term they are using the article is adapt the weights of these network, which is basically, uh, you know, uh, train the network. So how do you, how do you update the weights of the network, uh, the weights of the network are updated by using this error? So question now is going to be for you. Um, what kind of loss function are they using here? Since the signal that is used to update these weights, it's actually the difference between the target and the actual prediction. You should be able to tell me now. It's actually in the midterm right now. It's one of the question. MSE, why MSE? Because if you compute the derivative of the MSE, you get the difference, right? Between the target and your, um, well, you get the, the difference between your prediction and the, and the target, right? And so this, uh, this difference, uh, which is here swapped, right? Uh, is actually used here to change the values in, uh, in our course, we perform gradient descent, right? So you get the positive delta, the positive, you have prediction minus ground truth. And that you use that one to compute the partial derivatives and then you subtract that, right? To the parameter such that you go in the opposite direction here, they actually use the inverted sign, right? This prediction minus the ground truth as, you know, factor for, you know, adapting, which is, you know, changing these parameters, but basically it's the same thing, right? There's a question. Why are we training the emulator if we already have a model for the track movement? Okay. That's a very good question. Because, uh, in this case, we, we have a questions and therefore we could actually get a partial derivatives through these equations, but you might not have the equations, right? Uh, let's say you have a much more complex system. You may just observe, right? Uh, several, uh, experts driving this, you know, complex systems, and then you actually, through this sequence of, um, basically trajectory, you know, the sequence of locations, sequence of states variables, you can learn a network that is able to emulate what is the, uh, dynamics of these, which is the kinematics, sorry, of this, uh, system. Okay. That was the answer for Joseph. All right. Cool. So right now, let's see, uh, how this keeps going on. Maybe we should check them. The, let me actually switch to the, uh, to the, this guy here, right? So let, let's actually start training this stuff, right? So we, we get like, uh, torch, right? Uh, and then we're going to be running, uh, this for loop at the beginning. And so what happens here is going to be, you know, we provide some random, uh, initial steering angle. And then, you know, we use the kinematics model to learn what is the connection between previous state and next state. Whenever the truck goes outside, we just kill it. And this keeps going, right? Until we, uh, we have enough points such that we can train this network to emulate the truck, the trailer truck kinematics. Again, in this case, you wouldn't be necessary because again, we have the equations, but let's say we don't have the equations, we don't have the equations, but let's say we don't have the equations, but we just have, you know, uh, observations from the real, real world. Then you actually have to, uh, you know, you actually have to learn those functions such that we can differentiate through, right? We can run gradients through, uh, okay. So if we keep going like that, my computer is going to crash. So right now you can see that this is taking forever. Why is it taking forever? Because there is this, uh, visualization. So in this case, I turned off the visualization such that, uh, it doesn't draw things there. And then I can run now for 10,000 times, right? Oh, oh, okay. And this happened because I interrupted the thing, right? So that's, okay, how, how lovely. Okay, boom. And this is quite quick, right? Because it doesn't have the, um, graphical, sorry, they're entering, okay? So the emulator, you're emulating what players, what people playing the game do. So the emulator, uh, let me show you, the emulator here is going to be trying to learn what is the next state given the previous state, given I provide a specific random steering signal here, right? So given a state and given a steering angle, I try to learn what is the next state. And this is coming from the ground truth which is this, uh, box here, okay? This is, it's not our current net, right? This is just a normal net. You have, you know, an input which is state plus action. And then the output is going to be just action. There is no recurrence right now, okay? The signals are always random at every step. The steering signal, it's random. The state, I collect the whole trajectory while I input a random steering angle. Okay, very long question. Does this matter that, okay, it's too long. Does it matter that the data you're collecting is a bit different from the human test and action? The data you're collecting changes directions rapidly, frequently. So there is no, it doesn't, there is no time, right? Here, my training, we haven't talked about training yet. But here I'm, so this is going to be a just normal neural net that given a input state is going to map it, map it to the output state given a specific angle for the wheels. There is no time, right? So you have one state, you have one angle, that's the output, no time right now. Okay. Okay, so we go here. These are my inputs and the outputs that I collected over these trajectories. So if you check here, I just have the initial, I have the initial state, I get the state from the track, then I have a phi, which is my angle, which is going to be something between random minus 0.5. So minus five, so from 45 to, minus 45 to plus 45. Then I have my input, so it's gonna be the state plus the angle. And then the output is gonna be the output of the tracker track, which has stepped with angle phi. All right, so we have this thing here. I can split, I can create my network. And then I can start training my network, right? And this one will keep going until forever. But then basically here we are trying to get a neural net, which is simply a couple of layers here. So linear, real or linear. We start from steering angle, steering, so steering size, which is one, plus the state size, which was six, right? Of like the x and y and angle for both, so x and y and angle for both the cab and the trailer. Then you have a number of hidden units, which are gonna be described now in the paper. Then have a real low, and then you have the final linear output, okay? MSC and SGD. All right, let's go back to the presentation. So figure three, which is this one, shows the train, how to train the emulator. So the track backs up randomly, and the emulator learns to generate the next position state factor, given the present state vector and the steering signal. And you can hear now my computer fan spinning because it's training, okay? Sorry for the bad audio. All right, so this is how we train this system, how we train the controller. So this is state transition flow diagram. C represents the controller, so it's another neural net. Whereas T represents the track and trailer emulator. So all the track kinematics or the trailer emulator, either or. So how is this thing working? So how is now time added to the system? So we start with this controller, C. This controller C, we said, provides what? The steering angle to the track or the track emulator. So the controller provides an angle given that is fed with the initial state, right? Which we can call H of the K minus one time stamp. And if we provide these previous states to the controller and the track, we basically get the next state. So you can think about these two guys as just one element of a recurring network, recurring network, which has no input. Or well, it could have an input, but the input is not connected, right? So this is like one of the multiple cells we usually see in a recurring net. But there is a different diagram inside. So how do we train this item? Well, you already know the answer, right? So you get this one down here. We don't have any input on the bottom, right? And then you stack another one, you stack another one and then you keep going until you actually start from the initial location of the track, which is the initial hidden state or which is basically the initial track location and configuration, right? And then how on the right hand side, so on the right hand side, you keep going, right? And then whatever until the end where you reach the final point, which final condition can be a few of the following. Run out of steps, jackknife or third part, you hit one of the edges and then I check what is the distance between your back and the docking station and then I check what is your angle, right? Your orientation of the track, which should have been horizontal, right? Okay, so those are the answer to the question, what is the terminal conditions? There are three terminal conditions. Jackknife running out of steps or hitting edge and whenever you hit an edge, you try to, you want to actually minimize now, right? What is the distance between the docking station and then you want to try to minimize whatever angle you get, right? For the trailer, cool. So figure five, training the controller with back propagation. As you saw before in the previous diagram, where we were training the emulator, we have something similar with this feedback loop. The final state, so after you complete this whole trajectory, the final state will be ending with some specific configuration. I enforce this final state to be as close as possible to my target state, which means you wanna have the X trailer as close as possible to the dock trailer, the Y trailer as close as possible to the Y dock, and then the final angle of the trailer, horizontal, so it's equal to zero. Then you make the difference and you send the difference back, basically to adapt these weights. But of course it's not just a line, the whole thing goes through, so usual as usual chain rule, right? So this actually travels through all previous TCs, right? So it goes inside modules. The visualization shows that only the C, the controller blocks are updated, also proportionally to the final error, which implies an MSC loss. So again, the thing that we came up with before. Cool. So the initial position is set at random. The track backs up until it stops. Final error is used for back propagation. And this is back propagation through time with a variable unrolling period K, right? So the weight changes in C, the controller are taken as the sum of the tentative changes. What is the meaning of this sentence? Well, now you maybe appreciate why PyTorch is accumulating the gradients every time, do you see the point? So in this case, you get this gradient coming back from the future back to the past. And then the meaningful thing is gonna be accumulating all these gradients, right? Because we have reused the same module multiple time, when you go and perform back propagation, this back propagation will sum up things as many times as we went through this module, right? That's why PyTorch accumulates gradient whenever you compute back prop. If it would be just computing them and replacing whatever it was before, then you just have the gradients of the oldest iteration, right? Or the one most back in the past, right? Instead in this case, which is written here, which says that the old tentative changes are summed together means you accumulate the gradient over time, okay, cool. Finally, repeat another initial position back up until it stops, right, done. So the network detail, this is architecture, diagrams of network architectures back in the 90s. You start with six values for the state, X and Y and theta, X and Y and theta for the two items. Now you get through these items here and that they are called potentiometers, which are basically variable resistance, which are again representing these variable weights. Variable weights as in weights that can be tuned, okay? So these are tunable weights. So weights in a neural net, okay? But again, this is the symbol for a tunable weight, a tunable resistor. Now you have 24, 25 of these guys. So from six, we go to 25. And then from 25, we go to one, which is the steering signal. So first, a fine transformation squashing. Second, a fine transformation squashing. And then this guy is gonna be the emulator. So you go from seven, which is six of the state plus the steering angle to 45. So you have a fine transformation squashing. And then from this final 45, you have a fine transformation to six, right? Which is the final, you know, next state. Cool. So that's it, right? So analogous to a neural net having a number of a fine transformations equal to four times the number of backing up steps. So for every time, for every trajectory, we train a net which has a length that is depending on how many time steps this specific track took in order to reach, one stopping, how do you call it? One stopping, what was the word? Condition, right? Stopping, yeah, I guess, stopping condition. All right, so the number of steps varies with the initial position of the trailer or the track and the trailer, right? So you have, you train the core item of this network which has several steps. So have several layers. The length of this network changes based on each specific episode. And then we train this stuff with deck propagation. And each module is the same module replicated multiple times. Therefore, the gradients have to be some as you go through, right? That's what PyTorch does. Makes sense? Cool. So these are a few examples. You start with this initial position and then after a few steps, the network manages to reach the destination. Another one is here. They started with the trailer pointing away from the dock and you can see here how we managed to circulate around and to get back to the correct location. And the final state, again, is horizontal, right? So that's, it looks almost horizontal. Or this one, you start orthogonal in a jackknife position, right? And that's really like being bastard. Nevertheless, the network can figure out how to solve this fucked up configuration. And then this is also really annoying but the network doesn't complain. Be like a neural network. No, just kidding. Okay, additional resources. Full working demo is offered here on this link. I'm gonna be showing you because in the notebook, the code stops there. So here I just show you how we train the emulator and we got like an MSC loss down to very, very tiny value, like 0.0 point, which is this, whatever. Okay, you can read this number, right? And then this is my value for the testing set, right? So this is gonna be 38 micro MSC, if you want to call it. Here, if someone wants an extra grade, maybe, you can add the code for training the controller. And it's not that trivial because, I mean, we cover how to make it work in the class, but it's a bit finicky. So I just show you a final version. You can even copy from here if you want. And I really like, if you can submit would be nice. I mean, you can get, you know, a nice reward. So here we can start with a random position and then you can do drive using the controller. Let's increase the speed. Okay, and boom, another random position. Okay, it's too easy. Oh, let's make something a bit, you know, annoying. So trailer, let's go this one, let's say 180 degrees and let's have this one zero, change angle. Okay, so this is really annoying. Go. See, back and there we go, boom, okay. So here you have the code, it's training in your browser. You can go also at the bottom of this page and you have the source and GitHub. So you can, or are even, you know, encouraged to port this project here, which is written in JavaScript to PyTorch. It's gonna be a very good exercise for you to learn. We would have actually a running notebook which is, you know, left as an exercise for you and you will get an additional grade, I guess. All right, so that was it for today's class. I hope you enjoyed. Let me see if there are questions. What if we train a policy gradient? Yeah, we don't know about reinforcement learning. I don't know anything about reinforcement learning, sorry. I'm still a bit confused about the architecture. Can we look at the diagram of that again? Yeah, of course. Here, so basically the controller goes from six to 25 and then from 25 to one. And they use some kind of tonnage, right? Your activation. So six to 25, 25 to one, and this is the controller. And instead for the trailer truck emulator, we go from seven, which is six, right? The X and Y and the angle times two, six plus one, seven. So from seven to 45 and then from 45 to six. So this is my predictive model, the model that predicts the future given the past and the actual input action. And that's how I implemented this one here. So I have my state size, which is six, X and Y and theta times two and steering is one, just the angle. And we say we have 45 units, right? So we have 45 units here. So we go from steering signal, like steering size plus state size, which is seven to this 45. Then I have Rilu. Again, it looks like they use a tonnage here. So okay, you can write tonnage here. And then you have a linear that goes from this 45 to state size, which is six, which is going to be the next output here, right? And so there is a linear output. And again, this is also something that we just put in the midterm, right? I mean, I give you the answer of the midterm right now, basically. So where are you enforcing that the emulator actually emulates the way the simulation works? The way the simulation works. Yeah, here. So emulator training. Yeah, we haven't, I didn't show you this one. You're right. So here I just get the length of the training inputs and I get a random permutation. So that I pick at random. So I have my phi, my angle and the state, which are going to be the seven coordinates is going to be picked from the ith location of this training inputs. And these training inputs are coming down from here whenever I extracted, right? So you have training inputs. Inputs are going to be appending to this list, which is this list, right? Input list. You append the phi, which is the steering angle, and then the initial state, which comes from the track. So this is my input and then the output is going to be my next, the output step, like, sorry, the output state of the, of the track. So if you go back here, you get the phi, the angle and the state are going to be this first item of this training input. Okay, at the location, sorry. The ith item of training input. And then you get the next state prediction is going to be the output of this emulator, which has a linear layer as output, right? Cool. Then you have the next state, which is going to be, going to be the output at the ith location. And then you have loss, which is the criterion, which was an MSC, criterion, MSC. So you have the MSC between the next state prediction, which is this one and the next state, the true next state. Then I do basically stochastic gradient descent, right? I clean up the gradient. I see. So you're basically training the emulator before you're training the rest of the net. Oh, I train the emulator only before. So we train two models. First model, you train the emulator. Then whenever the emulator is trained, you train the controller, which is driving the emulator, but the emulator is not longer trained. The emulator is trained only once before. Then we use the network in order to train the controller. So you're first training the emulator and then you're using the network you trained as the emulator to train the controller? Correct, yeah. So whenever I'm here, I first, I already trained the trailer emulator, right? So to train the emulator, as I show you right now the code, then I can show you also the slide. So this was the training of the emulator. The kinematics, you have the next state given the previous state and the angle. And then you enforce this emulator to learn to basically do a regression and you try to copy the output. After you're done, when you're done, you actually go and train the controller. And to train the controller, you basically have this chain of blocks, which is controller and trailer track, the track network. And then you get basically a trajectory. And then you enforce that the final location and the final orientation of this trajectory, they must be the docking station and zero for the angle. And then you run back propagation through this chain of things. So it's backprop through time. In order to adapt, they use the word adapt in this paper, the weights of the controller such that it managed to map the original initial random position to this final target position and orientation configuration. Let's call it configuration. So we start with a random initial configuration which is corresponding to position and angles. And then you enforce that the final output of this sequence of modules is going to be giving you the final, the docking station and zero on the backside of the track. So this is two parts, training. Training first, the emulator, finish that, you train the controller in order to actually reach a specific goal. And the number of red boxes here is variable and it depends on every episode because you don't know how many steps is gonna be required for you to reach destination in every given initial condition. So many words I said, oh my God. Does it make sense? Yes. Okay, very good. Next question. So someone did ask this. We were about why we needed to train the emulator instead of just using the model physics. Yeah, the answer was that not every time you actually have the equations of the kinematics, right? So let's say, and this is gonna be coming up next lesson I think whenever I'm gonna be talking about my own research, I'd like to figure out what is the behavior of other cars when you drive in a highway. But I can't control other cars. I have no idea what is the behavior of other cars, right? So the only way I can learn how other cars react to my actions is actually through observations. And therefore we run, like, you know, we have some cameras mounted on a 30 story building watching, you know, looking at the highway station, the highway section. And then you basically figure out what is the interaction between the vehicles. And therefore we had to learn this forward model. It's called predictive model, which is figuring out what is going to be the reaction of other vehicles given to your own actions. So in this case, yeah, you don't need the emulator, but they train these emulators such that it's more generic. You don't need to have differentiable equations, okay? I see, okay. One more question is, so when you said the length of this recurrent network will be variable, there was something about having it be like four times the length of the number of steps or something like that. Could you go into that a bit more? So here we assume that each length of each episode is capital K, okay? So the lower K is K is the actual index where you go like zero, one, two, three, four, five, whatever. So each episode has, you know, whatever capital K number of steps, right? So capital K is gonna be different for every episode, right? Okay, so each of these two items here, this guy here has two affine transformations and this guy here has two affine transformations. So overall the network that you are seeing here has four affine transformations times capital K. Oh, I see, okay. So that's basically one neural net, right? So it's a neural net which is, it's not even a recurrent neural net. It's just a feed forward neural net with four times capital K number of affine transformations and then you train with back prop in order to enforce always the same target, right? So that's actually funny to see this way. So you have a neural network which label or target is gonna be always the same, but then the network can change length and the input can be different, right? So you have a network, you have a network, you input different things, you can change the length, the height of the depth of the network, but you always have the same target here, right? So usually when you do regression or classification, you have a fixed network, you input different inputs like here and then you have different targets on here, right? Targets or classes if you're doing classification or yeah, labels, right? In this case, you have different inputs. The target is always fixed and you have different lengths. So you train a network, a variable depth network with variable inputs and a fixed target. How cool is that, huh? I love it. Do we enforce like a maximum length of... Yeah, yeah, yeah, of course, of course, of course. Otherwise you can, ooh. Do we need an emulator to train the neural net emulator? Is the point of having a neural net emulator to have a differentiable emulator? Yeah, that's the point, right? The whole point is to have a differentiable emulator, which is not always the case, okay? In case of, you had just observations, then you don't have gradients, you just have observations and you want to learn a network that allows you to tell you what is gonna be the Jacobian as you go back. More questions? I have a question more related to implementation. So it seems like for training the controller, we freeze the weights for T. And so, but we still need to pass the gradients through in order to update for each of the cells, is that right? So I guess would we then, when we define our optimizer, we're just essentially just feeding it the controller parameters, but when we call the backwards, it's all still connected to the same network or I guess graph. Yeah, so this is gonna be exactly by torch. Examples, what we have seen last week, when we were going through the DC gun. So you have like a optimizer, hold on, what is it? Yeah, so you're gonna have an optimizer from the one network and I have an optimizer for the other network. That's it. Whenever you have the training, you actually have two distinct training. In this case, when you train generative adversarial network, you will have both network stepping in the same loop, but you can have one training thing first, you step with the first network, then you go later below, you're gonna have the other network stepping there. So you had two optimizers, which are adapting if you use this word from this paper, the weights a different time, right? It's actually two different networks which don't even train at the same time. So you first train the predictive model, the forward model, finish. Whenever you finish that, you use that as a mean to send back gradients through the future, from the future. How do we choose capital K? Capital K is the number of iterations that are necessary in order for you to reach any edge. You just go back until you hit something. Whenever you hit something, that's capital K. So actually, whenever we drive forward, we use the kinematic model, such that you have a truthful final location. But then in order to run the gradients backwards, you're gonna be using the emulator network. So you use the kinematics model to do the forward pass and then you do the backward pass through using the emulator network. I think this can be done better with policy gradient and deep RL model, okay? We don't know about RL, RL for us doesn't make sense. I'm just repeating what the boss says. You can try, right? You can definitely try out and see whether it works. Again, this stuff was done in the 90s, right? So that's, yeah, it's old stuff, but it's always, you know, I would call it pertinent. There you go. More questions? Hi, Alfredo. Sorry, I'm still confused about the training for controller and emulator. So the emulator is trained at each time step because we are comparing the predicted state versus the truth. This one, right? Yeah, at each time step. So we are updating the parameters of a new word for every time step? Oh, well, you can also do a batch if you want, but you can do stochastic gradient, it's okay. So you can provide like new sample and then you update new sample, update and so on, right? Or you can actually, you know, step after several time steps. Okay, okay. And when we hit the big K, we just do that. This is not about the emulator, right? There is no capital K here. Oh, it's like, I'm confused because it's like we do the training for enumerator first and then when we hit the terminate state, we then do back prop for controller. So first you train this network, finish, done. Second step, you train the controller, next next vector, right? So you train first network, this one, with random numbers here, with random steering signal. There is no controller here, right? Yeah. Okay, cool. So you first train this emulator such that it can replicate these dynamics. Okay. Then second part is gonna be training the controller and the controller is training this manner. You provide initial condition, you feed forward this initial condition, you basically end up with a trajectory of several, you know, of HKs until you reach the, one of the three final conditions, which is, you know, jack knifing, running out of steps, or you actually hit a wall. If you hit a wall, then you run back propagation through this network and then you do like, gradient descent, right? And now you're doing gradient descent with regards to the controller. Only see, yes, yes, yes. Okay, okay, yeah. I got it, thank you. Okay, that's awesome. That's it, I think. And so you made it until the end of the lesson. Congratulations. All right, so how can you better get an understanding of what we covered today? So there are a few things you can do to keep going, right? Comprehension, you have some questions. Ask them just in the comment section below. We'll answer everyone. News, if you follow me on Twitter, under AlfCNZ, you can get the latest information as I post them online. Update, if you subscribe to the channel and turn on the notification bell, you will not lose any of the videos I'm posting here. Appreciation, again, if you like this video and the content and everything I do, I would really appreciate if you put a like on the video. Searching, every video has an English transcription which can be searchable and you can find it in the video description below. Language. You speak Italian, you speak Spanish. You speak Korean, Turkish, we have now. All these translations are now available for you. I'm planning to add more languages as we have volunteers. PyTorch, if you'd like to have a concrete understanding and digest better this topic, I highly recommend to you to try to train the controller yourself. And if it works, you can submit a poor request and that's actually the next point. And so such that we're gonna have now an notebook which will have a running controller. So this, again, is left to you as an exercise such that you can actually master this topic, okay? All right, thank you again for listening. I'll see you next time. Bye-bye.