 Good morning to the second to last Practicum of the course. I hope you've been liking it so far has been quite a interesting semester All remote or digital new material halfway through the semester. I don't know why Because I'm crazy So so so what we will talk about today last week, let's start with a small recap Last time we start talking about planning and control like a three-part story, right? So last week was part one. And so this was the first part here. So it was model predictive control Where we were back propping through the kinematic equation Of the of the of the system, right? We had like the state update Equations and then minimization with respect to the latent minimization of what? We define a cost last time right and the cost was the Initially we said it was the square equally and distance to the target, right? So after we had this basically recurrent neural network With this latent input, which is like our control in this case an initial state, which was the initial Uh configuration of the system then we evolved the trajectory and then we fix we set that the final state Of these trajectory has to be Close so we put a spring in quadratic terms, right to the final location destination target That was last time Today inside we're going to be talking about the trackbacker upper. What's this stuff? So this is like an article from the 90s. I believe we're going to see now very soon So it's like a fundamental article that is relevant to understand What everyone is doing these days And so in this case today, we're going to be learning in an emulator of the kinematics forum observations Why is this necessary because sometimes well oftentimes you might not have access to what is the Law of motion of the system at hand You need to be a physicist or a mechanic or you know have to have pretty good regress knowledge To come up with those equations that we came up last time. Remember the cosine of theta sine of theta multiplied by the speed and so on But here we can just observe a whatever vehicle moving, right? So given observations when We can actually infer what is the function That is governing the state update of this station. So this is like Finally, no machine learning still we're gonna be Playing with easier version right today like next week even more complicated things Finally, uh, still today we're gonna be training a policy Which yeah, no one has made it work so far. So, uh, if you if you make it work You you're you're you're gonna get a big extra bonus For for class or for whatever, you know, and you're you know We're gonna be very happy if you Get this to work right because it means you you spend some time you got it to understand and And you develop some skills. Anyway, so we were gonna be training a policy or a controller which is nothing but Basically coming up with those optimal actions, which are the minimizers of that cost we defined before and so this is called Amortized inference, right? So like we have seen for example When training a auto encoder or even before with the Target prop we were first doing a minimization in the latent space Then we had the target and then we were trying to predict the target through the encoder So that was the target prop or later on we just completely removed that double step And then we were just doing a training training of the auto encoder Where you automatically learn this encoding part That is going to give you the optimal, uh, eventually code, okay, so Then there is a third part next week which is this planning and prediction planning under uncertainty which adds stochasticity to the environment uncertainty and minimization for the uncertainty of the four model minimization Done by the controller and then the final part was also introducing some latent decoupling which Basically we had some information Again, this was like a small recap Oh, yeah, one more thing. Uh, maybe Yeah, let's let's also see one extra slide from last time Such that everyone is on on the same page um, the state transition equations Where those equations that allow us to define and and and see how this state evolved over time, right? And so we said that we had this pink bold x, right? So pink bold x represents the state Okay, and so we have a pink bold x the state and then we have a orange um Old you Since it's orange. You already know it's a latent variable In this case, we call it control, but it's the same something same thing You just change the name for the same thing. So instead of having z we have view Instead of calling it latent we call it control That is exactly the same thing. So You know, whatever you just Two names for the same thing, right? x fortunately, we kept the same word, right? But I think today no, I think even today we're going to call it x In in in the rl literature instead, unfortunately, and they changed letters, right? And they call x s for state Or o for observation depending Um, and then u is called a for action, right? But but we don't like rl So we use, you know proper notation Anyway jokes aside, we define also later on That we said that these x dots means the temporal derivative of this x state And moreover This pink bold x pink Orange bold u those are function of times. Okay, so these were continuous function of time um And finally we introduce a So this was a differential equation and we introduced this discretization which gives us this difference equation. Okay And then, you know the corresponding version. Anyway, moving on and we start today version of lesson The track becker upper and guyen and widrow 1990 So what is the setup, right? So we're going to be trying to design by self learning Of a non linear controller, which means a neural net to control the steering of a trailer track While backing up to a loading dock From an arbitrary initial position Only backing up is allowed. So What what doesn't mean this stuff, right so self learning means that we have like All the ingredients on our side. We don't need to collect data. We can generate fake data Well, you can generate data because we actually have access to the kinematic model But we will generate this observation which could be, you know, possibly collected On the field and then we we perform the learning but again Self-learning because we don't have to actually get labeled data Uh non linear controller means a neural net To control we we learn what is controlling changing the you know finding the you So Again, this is going to be the part of amortizing friends, right? So learning to control right learning to generate those optimal actions To steer this trailer track, but we go backwards, right? Okay, interesting. Why do we go backward? um Because we can learn how to park, right? How many of you drive? Can you parallel park? Answer me. Can you parallel park if you drive? No, okay have you ever Try to parallel park when you actually have like a Trailer behind your car like if you're if you're going like camping, you know, it's it's crazy. Trust me We're gonna be trying these together now it's Not that straightforward. Anyway, moving on we have uh the configuration of our Vehicle our agent is the following it has like a cub in the front here Which has a given angle theta like last time and then we have this x-cub which is the position of the York york, I think it's called york y-o-k-e york this location over here and then So we have x and y x and y right then we have theta so we have three items so far like we had Last week Moreover We have also the angle of the trailer, right and then we have x and y location of this location over here, right? Here you can count one two three four five six variable actually Only five would be necessary Because it's a rigid body, but anyway Here instead if we have the x and y location of the dock docking station. Okay, so this is a truck and the trailer and loading dock cool So what are the state variables? So we have the angle theta cub The angle of the truck the x cub and y cub the Cartesian position of the york The x trailer and the y trailer and the Cartesian position of the rear of the center of the trailer theta trailer the angle of the trailer, okay So we have these six uh six variable Cool So let let's see what we are going to be doing then So the protocol is the following now. We're going to be trying to play with this The truck backs up until it hits the dock Then it stops the goal is to back To have the back of the trailer to be parallel to the docking station so you want to have basically if this is the docking station you want to have the um truck that is you know the Or orthogonal right like the back of the truck should be parallel Moreover you want to be as close as possible to the x trailer and y trailer, right? So the x trailer y trailer has to be the closest as possible to the x dock y dock Okay, okay, okay, so Easy gonna be hard it's gonna be easy. Let's let's try out All right, cool. So we got conda activate pttl There we go jupyter notebook Okay Uh, and then we go with the truck back her up All right, so we haven't yet seen these equations. We're gonna see that in a second. I'm gonna be just running this and then this is like the This is basically the updates that are necessary. Well, this is for drawing. So we don't really care Uh, and then here are the update equations. These are discrete update equations Similarly to what we have seen last week, but then there is a additional term here Uh, which we are going to be looking at in a second but doesn't so it's not that much different, right? So we have seen last time that the uh speed update. No the x dot Well, so we had the The next x no the next position is going to be the speed times the cosine times the delta t Right So itself plus right this then itself plus the delta, right? And so on and this one is there was the itself. No the theta Plus these so the itself plus the uh speed divided by the length of the Cub times the tangent of the phi the angle times the dt And this additional term we're gonna see again in a second all right, so Here we go. So this is our track, right? Our objective is to get these back Over here. Let me restart because it's in an unfortunate position Let's go with a very easy starting position. Okay a little closer Uh-huh, okay So now i'm asking on the on the chat here what I should provide is a angle for the steering wheel Such that we can go from this location back to basically parking here So tell me someone on on the chat. What would you like to do? What angle would you like me to steer the wheel? Someone out of the 40 50 people in class. Tell me what angle would you like 30 degrees? So plus 30 degrees i'm gonna be doing 10 steps one two three four five six seven eight nine 10, okay So if I actually keep going like that You uh end up in an issue which is called jack knife And when you jack knife basically means you Cut yourself like you you you run into yourself. Okay. Yeah, it will break right. So how do we exit this situation? So you can tell me some minus 60. Okay, so i'm gonna say So i'm gonna be saying going here minus 60 Okay Tuck more Would you like me to go more or? So now if I go straight I can go just backwards right so if I go Uh, just zero here Now I can actually go back What angle would you like me to try now so we can actually keep going right? Oh So we should have some sort of positive angle, right? Let's try a 15 maybe No, no, no strong direction, right? Maybe minus 15 So minus 15 means I have my wheels towards the right Okay, I just broke the thing. Okay, so I just jack knife the thing so Let's try again. Maybe once again now someone tell me exactly. Okay. Let's start again from an easy position here What angle would you like me to try? Quick quick. So we don't have we don't waste a whole lesson on this Kamila, you suggested something. I don't know what you suggested. Tell me what angle would you like to try? Minus 70. How do you steer minus 70 the wheels? No go The wheel can go 45 degree to 45 degree 10 degree. Okay Plus 10 right so I go plus 10 one two three four five six seven eight nine 10 So this one basically go like this, right? Doesn't make sense. So we try minus 30 Okay Oh, there we go. Okay This seems to be okay. Right now. I can actually go with zero. Maybe Now we actually have to do some positive, right some let's say 10 Maybe a little bit more 20 See see we are getting there And maybe zero now, but okay, you understand, right? This is I'm gonna be getting nuts Uh, uh, doesn't work, right? So you have to do the other way around. I think anyway, you understand You can play later on with this thing. It's fun All right, so the the the objective is actually to try to drive this thing here and by manual inspection looks like not quite easy task. Okay. Anyway going back to the slides Ah Fuck sorry, I just messed up things Okay, there we go All right, cool moving on training. Okay, so we perform training what what training of the world So there is a two-stage learning process, right? First we're gonna be training a neural network to be an emulator of the track and trailer kinematics First second we're gonna be training a network a neural net Uh Controller to control the emulator, right? So we're gonna be learning the thing I was doing right now Click click click click click But also we're gonna be learning a kinematic model, right? Again, why is this necessary where it's not necessary? You can back prop through the equation of the motion as we have seen last week Uh, on the other side, we may want to be able to be more general general, right and be able to learn any type of kind of motion like, uh, how do you call it? Any kind of behavior, right? If you have a different type of vehicle you want to be able to learn, uh How they state evolve From observation, right? You don't necessarily have access to these equations Right, so this is this is the diagram. This is like the electrical engineering diagram. I'm just So, uh, so here we have the steering signal k at time index discrete temporal index k, right? So this is my phi then I have the trailer track kinematics And then I have a state at time, uh, index k plus one And this one has Uh, it should be the other way around, right? So this is like it should be a a temporal delay, right? So this should be k this should be k and this is going to be in theory k minus one, right? So again I think this is not, uh, our correct notation, right? So it should be k k here k minus one because there is a delta cool, uh Yeah, and so here we have the controller, which is going to give us the steering control And then given the previous state and the steering control these trailer track kinematics Who's going to tell us what is going to be the next state? So the state, uh, the discrete temporal index k is fed to the controller Which provides a steering signal at the discrete temporal index k between minus one a hard right And plus one a hard left, uh, to the track the temporal index, right? The time index is k each time cycle the track backs up by a fixed Small distance. So we have that the speed s Is actually negative. So you go backwards that we have seen so far So these are the, uh, equations we we are using for this specific example, right? So we already seen last week x dot y dot and theta zero dot today We have an additional theta one which is going to be the s divided by d one which is a distance between the center of the wheels of this, you know Track trailer to the dock basically not to the yoke And you can actually have multiple fitas for all these other, uh, you know multiple trucks So s is going to be the sine speed in which in our case it's going to be equal to minus one a negative number and then this theta phi is actually The angle here the positive to the left, right? So this is wrong Notation on this should be the so this is my negative phi not positive phi should be This angle over here So we have the x and y of the trailer Are determined are determined By the x and y of the cab d one and theta one, right? So we don't necessarily we shouldn't need These two items for the state variable, right? So we said before we were going to be using six Let me show you here again We said that we had one two three four five six variables It's not necessary, right? Because if you do have the x and y location Uh, you have the angle and you have the distance Then you don't need the x and y of the back, right? But maybe it's Uh convenient for learning. I don't know, you know, sometimes different reparameterization help Cool aren't there questions so far? No, no questions, right? We haven't yet done any major, uh stuff So first part is going to be the training of the neural net track emulator, right? So how do we do this? Well, this is going to be like the self-learning bit We're going to be providing providing a random steering signal Phi to both the track dynamics Which is going to give us the next state And the neural net emulator, right? So both of them are fed with this Steering signal so we have phi in here phi here And then both of them have the previous state, which here is called state at time index time index k So you have Same state and same state the track dynamics produces an output, right here And the output is going to be used to um to be it's going to be subtracted basically well the the Prediction right from the network emulator is going to be subtracted by the actual target, right? And so now the error is proportional To the difference of these two items, right? And so the error is used for updating the parameter So this symbol over here This arrow here means that it's a module and the tune the tune ability like the tune um The the the the signal you're using for tuning the parameters of this module is proportional to the distance between the target and the prediction, okay? Question For people at home right now, right? So these people was written way before deep learning So there is no standard notation. Well, this was the standard at the time, right? This is electrical engineering But here i'm telling you that the error which is again the difference between the target and my prediction is used to update um These parameters, right? So it's also said to adapt the parameter. They call it so Which kind of loss Do you recognize has been used in this case, right? If the signal that is used to update the parameters is proportional to the difference between the target and the prediction You understand the question So jeffrey you say the one l one loss And Someone else can try something else. Someone else knittish is suggesting msc And scamilla is suggesting a hinge So knittish, why do you see? Why do you say msc? So msc involve a square, right? But here we don't have a square. Why do you say an msc? Exactly, right? So you I mentioned that it's proportional to the difference. And so what we use to Change, right? Uh, the the values is going to be the gradient and the gradient coming from an msc It's going to be just the difference, right? So if you compute the the derivative, right of a mean square error you're going to get the You're going to get just the difference, right? So that's the that's the correct answer cool Basically, right? I hope so. So again, sometimes when you read these old papers, they are really informative But then they use different notation or different jargon and you had to somehow translate back to reality in order to current times Uh Okay So figure three shows This train how to train the emulator the track backs up randomly The emulator learns to generate the next position state vector given the present state vector and the steering signal Okay, so simply by observing What is the update? Um Of like what is the next position and given that you also have access to the control Defy you can basically learn what are the Equations that are governing the kinematics of your agent And so the state transition flow diagram is the following, right? So here we have c represents the controller And then t represents the track and trailer emulator So we have an initial position as zero See that I was I was telling you before in a control and machine learning we use x In reinforcement learning and in this old paper they use s, right? But s was the speed, right? So that's not too good Because now s is two different things Although they never use s, right? That was the questions I introduced Anyway, so we have an initial position initial sorry initial configuration initial state Uh, then the state goes inside the controller, which gives me the possible phi the angle The initial state also goes inside the track Track and trailer emulator and I get the next state Then the next state goes inside the controller. You get the next phi and the next control Which goes inside this trailer track and then you get the next Yeah, right. So this is like Before the input to our T which was actually the Kinematic equations was an input, right? So before we didn't have this c We had a latent. We had a latent here. We had a latent here. We had a latent here Now we don't longer have latent. We have a neural network, which is producing the It's no longer latent. It's gonna be hidden, right? So it was gonna be the hidden representation. In this case It's called control, uh, which is a function, no, a non-linear function of the initial state Our policy has to provide the action or control given the state it finds itself in right? So it's simply a regressor network and in this case Uh, let's let's see if you can tell me, no, what is the size of this s zero We should know because I told you already Answer me What is the size of s zero? Hello Who was following The position of the cab and the track. Yes, so Position of the two things are gonna be four right x and y x and y but there is two more variables We also have the angle, right? We had yeah, so how many variables do you count six fantastic? So s is a six-dimensional Uh, so it's a network which has it takes a six input six-dimensional input vector. What is the output size? Here in this arrow here, what's the size of this guy over here? What is the size of the output of the controller? Let's see if you follow The signal yeah, what how many dimension has this signal two One yeah one it's just the angle, right? So we don't have the acceleration It would have been two in the last lesson, no when we had both the angle and the Acceleration in this case, uh, we don't change the acceleration. We only change the angle that we move at a constant speed Okay, but yes, uh one or two depending in this case one Cool, so This one you already know so this is not gonna be uh big news because we already seen this uh that's weak But anyway, so we have a controller connected to this trailer track, right in this trailer emulator Uh, this is fed with the previous state, right? Uh the x h Added in the script temporal index because I cannot say time no because these are temporal indexes are discrete numbers It's belongs to n. No the the natural numbers So anyway h bold h the hidden layer It's gonna be a function of the discrete temporal index k minus one here It's fed to the controller and the truck And then this one gives me the H at time temporal index k So and then this is package, right? So if you package everything like that, you can think about this as a module which has No input actually know that input is not connected anywhere. So this one is like one of these classical modules, uh We have seen in uh RNN, right? So we have an initial state An initial hidden layer, uh initial hidden state initial track location and configuration And then on the other side, we still have multiple of these blocks until we reach a final, uh You know final capital k, uh number of steps, which is my final hidden state um Which is my final configuration. So when do we reach this capital k? So there are a few conditions we can meet to end our You know unrolling basically in time, right? um So the conditions are or you jack knife. So if you if you click you cut yourself then, you know end of the episode Uh, you hit an edge, right? One of the the boundaries, right? So you you hit Something and then that should be ended and then you try to get To be hitting, you know the the docking station Uh, so you also want to minimize the the distance with the Whenever you hit something And then there's one more, which is I think when you reach like a maximum number of steps And that should be it I believe So these are the three different conditions that are, you know, uh telling you when you reach capital k Cool. So this is actually, uh, the the how the training works again with these diagrams representing basically, uh back prop, uh In the gradient descent, right? So you have here that this The difference between our final desired target, which is going to be horizontal Look at orientation then the docking Like the trailer Location needs to be the same as the docking station, right? So you get the difference so I got an msc and you try to use that one to update the controller um It's not exactly like that. So this is actually traveled through all previous tc modules, right? The visualization shows only that all c blocks are updated Also proportionately to the final error, right? Which implies the msc laws Um So yeah back prop go through the the whole thing, right? This doesn't mean that the controllers are only updated through on the because of the this difference, right? Okay, okay, okay cool And so we have basically put The two parts together. So this is the updating learning of the controller Whereas before we have seen how we perform the Um learning of the network emulator, right? So network emulator learning here Which is not necessarily uh necessary to actually do this, right? Because if these t's are differentiable then, you know, just go use the original equation, right? All right, finally We have the final two parts. So the initial position Uh, it's set at random the track backs up until it stops The final error is used for back propagation Back propagation through time basically, right? Uh with unrolling period of capital k The weights changes in a scene the controller are taken as the sum of the tentative changes What does this mean? Well, this simply means you're going to be using pie torch Which is accumulating the gradients, right? So this is why Finally we found a case where uh, you don't want to zero the grad During the process, right? So remember every time we have the five steps, you know fit forward Uh compute the loss zero the the grad compute backward step, right? So these are the five different steps for training a network for doing classification or whatever in this case instead It's important To actually have that the weight changes in CR taken as the sum, right? Of the tentative changes. What does this mean? This means that when you perform back prop, right? You go back prop here So here you compute your first, uh, let's say you first zero out the gradient, right? So we start with clean up version and here you generate, uh, your first gradient of, uh, you know Partial derivative of the loss with respect to the parameters of the controller Then as we keep going We still see more modules see and so in this case The partial derivatives that we get with respect to the weights needs to be summed To what we have done already so far, right? You cannot overwrite that because otherwise if you keep overwriting You're gonna just compute the partial derivative with respect to the only The first module over here, right instead you want to have The sum of the gradients throughout the chain, no again This was not straightforward. Maybe in uh with previous framework or there were no even framework, right? So someone actually Thought that is necessary to specify in our case using pytorch and just do a for loop. Everything is done so nice Finally repeat this, uh Another with another initial position until it backs up until Uh and allowing it to back up until it stops, right? So eventually like again, this is simply We seen things that we have already seen before We have a Something it is analogous to having a neural net having a number of layers equal to Four times the number of backing up steps Why is that because you have a controller? so the number of uh step variables With initial position of the track. Hold on the number of seven of course with initial position of the track and trader Yeah, so what happens here is that you're gonna have as you have seen this one right here You basically unroll as many times, right as you have capital k Items only capital k is going to given to you by one of the stopping condition you jackknife you Hit one of the Edges or you run out of time time steps, right? And so you basically train a network which has Different lengths, right? So these networks have different Temporal extension And then you train the parameters Given that there is no actual input, right? The only input you have well is the initial Condition right and then you have this evolution of the state And then whenever this evolution of the state reaches an end Then you enforce the final state to be What you want, right, which was this location and this orientation Does it work? Yeah, it works. So this is the initial state for example And then you after training you're gonna get that the Network, no the controller and manages to back up until you reaches the docking station until it gets here, right? Which we didn't set we have a constant speed, right? So we don't have We don't control that Or in this other case, which is rather adversarial It has to basically so what does this controller have to do it has to find basically a Kind of vector field sort of speaking like for every location in this six-dimensional space, no of configuration of the track It has to give you A scalar so in this case. Yeah in this case is a scalar field Uh and which is telling you what is the direction it should be steering, right? So you have basically Ending up with this temporal U of t no function Which is giving you what are the correct? Uh orientation of the wheels you have to have in order to be able to park Right and so this was basically pointing away and then you have even one more crazy You start basically in a jackknife position And then the track learns how to steer all around to get back here So someone that tried to already implement this one had the issue that the track was always, you know, uh jackknifing And so you may want to add additional Terms in the loss in the in the cost, which is going to be basically The agreement of the two like a distance between the two angles, right? So if the two are aligned is good if this if the two angle to the two orientation The two fitas are at 90 degrees then you're going to end up in this Kind of uh inconvenient situation um So there are more resources as well if you were if you're interested in looking up this man So there's a full working demo. So this stuff actually works and you can check out Uh this one here And so you can We already try we already tried the manual drive we can try to use the controller And maybe we can I should have increased the simulation speed Okay, there we go There you go. See so this was just nice. So Uh currently I said at the beginning that no one got this one to actually work for our pytorch Notebook, so if you get that to run it's going to be a very big Extra credit let's say for you And but we still see something right so we can still see how to get there, right? So let's let's be on the let's check at the notebook unless other other questions before we actually I moved the notebook Questions questions Okay, so we saw this one where we have the representation Then we start here with this I import torch, right? In this case, what do I do so in this case I just show you 10 episodes, right? What it does. So we how do we train this? emulator Of the kinematics. So we said we start with random, right? So theta phi is going to be a random value between pi half And pi fourth and minus pi fourth, right? So what happens is the following, right? So You end up you start with random Positions you provide random input and you keep You know simulating until this stuff basically Breaks no jackknife position or it reaches a edge So this one you you do many times, right? Now it's very slow because I also visualize I also render the things, right? But we don't have to render things, right? So whenever we actually so Yeah, whenever we actually do this in in in actual in reality, I can this was like 10 episodes I can hear Uncomment this one. So I don't have the rendering on and I do this for 10,000 10,000 episodes, right? And so here basically what I'm doing. Let's check what I'm doing We have an initial state, which I I query the track with the initial state Then I have this phi that has a random number basically from zero to one then I subtract 0.5. So it's minus 0.5 to plus 0.5 times Pi half, right? So it's going to be a minus pi fourth to Pi fourth Then I append this theta Sorry phi and then the initial Expanded state and then we have the outputs is going to be the final Location no like final configuration Potentially I draw, right? Then we can check what is the length These are all the combinations, right? Okay. There's a question. Could you explain? The weight sum necessarily in the in this case hold on weight sum Necessary in the in this case at the end again if the track has already moved to the new position. Why would you need? Yeah, so so what I what I was saying before is that When you train the system to reach a final destination? So this is one trajectory, okay so when you back prop through this chain of controller And and track emulator You actually need to accumulate the gradients When you perform back prop, right? So this controller is the same as this controller is the same as this controller If pi torch wouldn't have a accumulation function When you compute the partial for this one, you compute the partial derivative of the loss with respect to this weight But then when you compute the partial of this one You would be basically overriding What were the partial for this guy here? Instead the default behavior For pi torch is to accumulate the partial derivatives with respect to the weight, right? So whenever you unroll this for loop basically When you get again to the sea You're gonna be adding the new partial derivative with respect to the weights To what we already had before right for this time, right? So if you think about this is like a for loop, right? So you have four until well while not break right while not break you have this You know repeating multiple times and so when back prop goes in the other direction You want to keep summing The partial with respect to the weights for every time you had this controller inside the for loop For the one given trajectory, okay Then when you start the new trajectory, then you don't have this anymore So in the notebook, are we just learning a controller since our model is known And differentiable or did you include a network to learn a model, right? So here I just uh, I'm going through right now, right So here we have a state which had six which has six input the six variables, right? And we have a steering size equal one and then the hidden size equal 45 That's the same as if done in the in the paper And so this is my network emulator. It's a sequential, right? It's a fully connected network going from six plus one What one plus six, right to 45 then I have a Positive part and then I have again this from the hidden hidden layer size to this state size, right? So this is my trailer truck emulator SGD msc lobs, right? Whatever I transform these as Tensors, right? my computing mean as under deviation I split for training and test I check what is the size for the training And then here we form the uh training of the system, I think And it looks like it's doing a Quite good of a job, right? So it's very dying Oh, okay went up a little bit anyway, so Here I'm just training the trailer truck kinematics, right? And then given this then that is the last part which is left for you as an exercise which is like Now that you actually have the trailer truck kinematics emulator train the controller, right? So you just have to train now this controller as an element Of this recurrent network So this is not your recurrent network, right? This is supervised learning or self supervised learning Semi supervised learning. What do you want to call it? I don't know. No, no semi self right self supervised learning Because we we have like a given function, right? We we generated the data and then we just want to learn those, right? And so this is supervised learning basically again And then given that you have this model you will have to possibly train later the controller within this thing Anyway, moving on second part of the class in the next Five minutes So in this quick second part of the class We're going to be talking about Bayesian neural networks estimating a predictive distribution What is this stuff? So why to care about uncertainty? all right, so question If I have a dog cat dog classifier and I want to classify and I feed an image of a hippopotamus what's gonna happen? Okay Jeffrey is suggesting 99 dog or cat Uh, do we do we trust Jeffrey? Maybe maybe we can actually try, right? Okay, so So here I just opened this cat dog detection. I choose a file. I'm gonna be choosing a dog picture And yes, we have a doggy. Okay. It's great Let me actually zoom so you can see how cute This dog is okay. Very cute All right, then we can choose another one. This is gonna be a cat of course, right? And there you go very nice cat. No, it doesn't tell you the confidence, but it looks like Looks a very happy cat here. Then of course, I'm gonna be choosing a hippo And then of course, this is a dog, right? Well not Um, oh, I made a not joke joke, you know, these are English advanced jokes check a not Not joke I'm crazy. Okay, it's fine All right, cool. So as we have under well as we knew we this stuff was it was not going to be and that ending up very well Um, moreover, we may want to be Have some sort of reliability in steering control. What does it mean? So you want to know How certain you are taking a specific, you know Change in your steering control, right? So maybe you may not want to take a You know steer completed to the right option When you're not really certain about that So this is important, right? So whenever you have this, uh regression output, how can you tell how Reliable that outcome is right? So if we have a Estimate of the uncertainty with which a network tells you things maybe would be nice, right? Physics simulator prediction, you know uncertainty in physics are omnipresent and you know If you can also tell how the prediction is That would allow physicists to be much, you know, happier Moreover, there is an application in minimizing action randomness when connected to a reward Uh, so one can decide to minimize this uncertainty as well And this is something we're going to be seeing next week. Okay. So what is dropout? We didn't talk about dropout Maybe yandere so dropout tells you how much or how many neurons you want to drop basically And so basically you randomly Set some neurons to zero Inside your network, right? and so Every time the network basically architecture slightly changes, right and usually this is used at Training time in order to have no single path from beginning to end which is, you know Memorizing overfitting on a specific data, right? So in this case, even that the network we enforce basically a more distributed representation, right? and This animation it's really It took a lot of time anyway Uh So how does it work? This is all diagrams. I'm aware So we have now that the hidden layer of this network Is going to be a rotation of what x element y is multiplied By this delta x. What is delta x? Well, delta x is something of the same size as x, right? The similar similarly for dy Prediction is going to be my h multiple element y is multiplied by this delta h which has the same size Uh, and then these deltas are basically coming from from a Bernoulli uh distribution which has a Probability equal to one minus the dropping rate. Okay? so given that you have Basically, I Zero out some of these items of the vector x on h Then all the other items need to be boost boosted up, right? How much well, of course so if you If you have just 80 of these units set to one Then you want to divide by 0.8 uh this Delta's right. So it's that the intensity, you know, the the norm overall doesn't change Cool cool. All right so We're gonna be learning now how to making how to make this with an notebook, right? So in here we are just trying to regress the function that goes through these blue dots And that function would be this red one over here, right? But then something that we don't have usually is the level of uncertainty Around these points. Okay. We sort of seen this I think in the second lab when we talk about the classification and regression, right? We saw that we could have trained multiple models Right, and then we could have computed the variance between the predictions in order to be able to tell how How much is the level of agreement between networks today? We're going to be doing something very similar But training only one network. Well, you know, it's Sort of training multiple networks with these dropout, right used at inference. So We go here And this is the last bit here. So we go on the Bayesian neural networks We choose our pytorch deep learning All right and we can sell run all So what we do here is going to be training a network to Learn like to train a network that goes through this point, right over here and so If I Use this one. So in this case, what kind of activation function did I use? Can you tell me? Well, it's a real one you can tell because this final function is basically a piecewise combination of linear segments, okay Then I use a Let me show you the network But here so I have a network which has uses a real right positive part and then here I have a dropout now with a dropping rate of five Both at the beginning and in the hidden layer so This was just one prediction and here instead I compute the standard deviation basically Of when I send forward the network multiple times, right? So in this case, I send my data point hundred times through the network And then I compute here later on what is the standard deviation? Across these multiple predictions Uh, it is important to tell the network that we are in training mode Okay, so usually we don't want to have dropout on during inference and so you have to set The network in evaluation mode and that's going to be turning off the dropout In this case, we want to set the model the network to training mode such that the dropout is active Even during inference Such that now I can compute mean and standard deviation The mean is going to be a better approximation of the final function and then the standard deviation allows me to Estimate what is the level of uncertainty with which a prediction in this case a regression is made? In this case, you can tell no That the uncertainty nearby the location of the points is rather tiny, right But then as you move away, he actually increases It's interesting to notice that if i'm changing the activation function to hyperbolic tangent Things will look much different. So first of all This is going to be the approximation, right of this function. It's no longer piecewise linear And then finally the estimation of the uncertainty Goes like that, right So we have both different amount of uncertainty Before and after and also these tails now go horizontal because of the fact that the sigmoid is when Has those saturated External parts, right And then this is this kind of constant amount of uncertainty And that was pretty much everything I wanted to tell you About today Any question about these topics? Otherwise next week we're going to be putting together Today lesson both these two parts the uncertainty estimation the learning of the controller and the emulator Plus last week Maybe well last week is was foundational, right? Uh to to learn about these topics. So next week we pack all together And we sprinkle some variational predictive conditional network And we can be delivering the final practicum of this semester course of deep learning uh at nyu Thank you for being with me today. I enjoy your day and I'll see you next time. Okay Bye. Bye