 I realized that I skipped the lecture, so I have to go back now. Some of the stuff I was talking to you about last lecture you had no idea what I'm talking about. I apologize for that. I looked at the calendar and I didn't take into account that we didn't have a class because of the snow. So I want to start with a classic experiment that was done in the 1960s that is called Cayman's blocking. And Cayman's blocking is a good example of how learning theory is used to explain behavior in animals. Now the way we're going to do it is we're going to use two kinds of algorithms. We're going to use the common filter algorithm to describe what's going on. And then we're going to use the simple LMS algorithm to explain what's going on. And basically the difference in these two things is that in the common filter algorithm of state estimation, what we're going to end up with is a learning rate, the common gain that depends on the history of the past inputs, the covariance matrix. Whereas in the LMS, we're just going to have our learning rate, which depends on the prediction error and the states that were experienced on that particular trial. And that is not going to depend on the history of past errors. So it turns out for this particular experiment called Cayman's blocking, both of them work. And they explained the idea that really the way animal is learning is based on prediction error. The first thing we're going to see is that the way the animal is learning this task is based on making a prediction and that prediction is different from what he observes and therefore he learns from it. And if that isn't there, if the prediction error isn't there, he's not going to learn anything. Next we're going to talk about a different experiment that was done that demonstrates how LMS no longer works. And this is a version of the task where the order in which the experiment was done is just simply reversed. And by doing that, we'll see that LMS now fails, whereas the state estimation framework still continues to explain things well. So here's the basic experiment. In the 60s, Cayman took some rats and trained them to press a lever to get food. Basically, you know, press, press, press, press, at some point the animal gets food. So the animal is just constantly more or less pressing food. Now what happens is that they turn on a light for three minutes. So these animals, these rodents, don't like light. So they naturally get a little bit scared of light. And at the end of the three-minute period, so first they get a light, then what they're going to do is at the end of the three-minute period, they're going to pass a little bit of electricity on the grid where the animal is standing on. So they're going to shock the animal. So then what happens is that the animal gets scared, learns to associate the light with the shock. And so therefore he doesn't press the lever because he's scared. He freezes. So the mechanism by assaying learning in the animal is that how much does he reduce his pressing of the lever when he sees the light. Okay? All right. So simple experiment. Light comes on. A few minutes later, the animal gets shocked. As a consequence, next time the light comes on, the animal doesn't press the lever because he expects to get shocked. Now what Cayman blocking is about is the following. First they train the animal with the light shock pairing. Light comes, shock is going to come. Next what they do is take the same animals and during the time when the light is being displayed, they also play a tone. So they're too stimuli now. So let's write it down. Before I forget, what am I talking about? I'm talking about chapter 6.1 to 6.4 today. So the first experiment is Cayman blocking and it goes as follows. So the animal sees a light and then he's going to get shocked. He learns to pair, to associate these things. Light isn't good. Then what is going to happen, what he's going to see is that he's going to see light and he's going to hear a tone and now he's still going to get shocked. Okay? So the interesting observation is that after the animal is trained with these two sets of blocks of training, now if you just give the animal tone, oh god, let's see if we can do better. So if you just give tone, what happens? What do you think? Press the lever but then they realize that there's, oh so there's no shock. Yeah, so we want to know what's the animal going to do when he hears a tone. Is he going to press for food or is he going to be afraid or is he going to not care about the tone? Not care. Who says not care? One, two, three. Who says four? Who says be afraid? So a little bit more people think be afraid and you know people, I think psychologists when they did this experiment, they were kind of surprised that the animal was not afraid of the tone because you know he saw the light and he heard the tone and he was shocked. So why is he not afraid of the tone? It turns out that the animal was not afraid of the tone. Basically it didn't care. And the interesting result is that the reason why he didn't care is because light predicted the shock in both cases. So there was nothing new with the tone even though it occurred, the fact that the animal knew he was going to get shocked and he got shocked made it so that he didn't seem to learn this extra thing associated with the tone. So to explain this, the theorists in the 60s use the LMS algorithm. And let me begin with the LMS because it's a nice reminder for us to what was LMS. So I'm going to do the geometric interpretation of LMS. So let's see what this experiment does. Let's suppose that x is, David do you think you might be able to find me another pen? Thank you man. So say that x is light and tone. So meaning that if light is on, then x is equal to 1, 0. OK. Simple way of it. Now the animal is going to have to learn that there's going to be a shock or not. So y is shock and it's equal to 1 if shock was given. And what the animal is going to learn is a very simple mapping that y hat is equal to w transpose x. So he's going to learn to associate the stimuli light and tone with shock and he's going to give him a weight w. So how he might do this, in LMS what we do is that we imagine this two-dimensional, w is going to be a two-dimensional vector, x is a two-dimensional vector. So suppose that in this space, here's w. Here's this vector w on trial end. And x is another vector, it's a two-dimensional vector. Here's x here on trial end. So in this case, maybe the light and the tone are both on. So the x is a 1, 1 vector and w is another vector. What do we mean when we say project w onto x? So what does this mean here? W transpose x. Well, w transpose x, what this means is that take the magnitude of w, take the magnitude of x and multiply it by the cosine of alpha where alpha is the angle between x and w. So let's see what that means in this graphical representation. So here's w, here's x, here's the angle between them alpha. And if I want to project w onto x, what I would do is I would say, well, take the magnitude of w, so this length here is going to be the magnitude of w times cosine of alpha, right? Do you agree? Cosine of alpha is this length divided by this length. So what's cosine of alpha? Cosine of alpha is. So this quantity, this projection here, I'm going to call that p, this length from here to here. It's called that p. So then p is equal to w times cosine of alpha, which is that. So that's how this projection is related to w transpose x. W transpose x is going to be equal to this projection, except that it's divided by the magnitude of x. All right. So that's our guess. Suppose that now what we wanted isn't y hat. So how this is related to y hat? This is related to y hat as follows, right? So this projection is related to y hat, my guess on that trial. You see that? So it's just y hat divided by x. That's what that projection is. Well, what is it that I want? What is the true y that I want is y? This is my observation, right? So I want some other length that I should have had. And say that the length that I wanted is this length. Out here. Let me draw it here. Suppose this is the y that I wanted. This is the y that I predicted. This is the y that I wanted. So there's this difference here between what I wanted and what I predicted. It's called that delta. So what I want to do is change my w by another vector here. I'll call that delta. I'll call this, this is the y that I wanted. This is what I predicted. Projection that I predicted, this is what I wanted. So I want to change my estimate by this amount delta. Change my w so that when I project it onto x, I get my observation y. So this is what I predict. This is what I get. This is what I observe. I mean, so pre-observation is y hat. Yeah, yeah, yeah. So y hat is my prediction based on my model. y is what I observe. And now what I want to do is change my w, right? Yeah, yeah, it is. It is, yeah, yeah. We're going to make sense of that. So let me call this, I guess, p prime. Maybe p prime is better. p is what I predicted. p prime is what I wanted. And p prime is going to be y divided by x, right? Thank you, David. OK, we can throw these two away too. Excellent. So this vector delta, the thing that I want to add to my w, is going to be equal to y divided by x minus y hat divided by x, like this, this subtracted out. That's a scalar quantity multiplied by a vector that's in the same direction as x, which is x divided by its magnitude. So this is a vector. x is a vector. This is the magnitude of x. This is a normalized vector of unit 1. This is the amount that I should change my w by. So this is equal to 1 over magnitude of x squared times y minus y hat times the vector x. So my learning rule is w of n plus 1 is w of n plus some proportion of this amount that I should change my w, which is 1 over x of n, its magnitude squared times y of n minus y hat of n times x of n. So that's just our usual LMS with this normalization here that says, if you wanted to learn the whole error, here's how much you would learn it by delta. But generally, we want to learn a fraction of the error, which is what this a to this. Well, that's your simple LMS. OK? All right. So if you want to do the same problem in using state estimation, we would do something like this. We would say, there's some w that I'm trying to estimate. There's this x that I can see that gives me a y that I can see. And I want to estimate this w. And I have this generative model that says, well, my w is where this is a vector. And then I have an observation y of n is equal to w of n transpose x of n plus epsilon y, where this is a scalar. So here's my state equation. Here's my measurement equation. Now, what I want to estimate is w. So what I do is I say, all right, my posterior estimate is my prior estimate plus my common gain times the difference between what I observe and what I predicted, y of n minus y hat of n, which is going to be w of n given n minus 1 transpose x of n. This is my prediction error. And this gain here, what is this gain? It's going to be my prior uncertainty times this vector divided by the observation uncertainty, which is the scalar. Yeah, so that works out. I think that the units work out. This is a 2 by 2. It's a 2 by 1. So I get a 2 by 1 that I'm multiplying by a scalar to update a 2 by 1. And I think that the denominator works as well. And then my posterior uncertainty is i minus k of n times x of n times the prior uncertainty. And then w hat of n plus 1 given n is equal to w hat of n given n. And then p of n plus 1 given n is equal to p of n given n plus q. So what I want to do now is I want to go back to our problem of this animal learning this task. Light and then light plus tone. Let's compare what this does to what LMS does. Do you have any questions before I do that? OK, so the only difference between these two algorithms is in k, right? This is what matters. And so if you look at k here, you notice that it depends on p. And what does p depend on? p depends on x and the x's that you've been getting. So the history of x's. So what was the prior history? And what we're going to see is that when light and tone appear together as opposed to when light appears alone or when tone appears alone, then the uncertainty matrix is going to be different. And when the uncertainty matrix is different about x, when the animal has an error associated with a prediction, it will distribute that error to its estimates differently. So basically, the more uncertain you are about one variable, the more you're going to assign credit to it, right? So that's what the uncertainty matrix does. That's all our story so far. OK, so first what we're going to do is we're going to do this Cayman blocking. And then after we're done with that, we're going to do the reverse blocking. So this is the normal way of doing it. And we're going to see that both of these algorithms predict the same thing. So everything's happy. And next what we're going to do, we're going to look at it in a reverse way. What if the animal was first trained with light and the tone together like this? Then he was trained with light alone. What does he say about tone now? So those are the two conditions that we're going to compare these two algorithms. All right, so let's begin with the experiment where we have light comes first. And so we have, say, we're going to have some number of trials, maybe 40 trials, let's say. And our stimulus pattern is as follows. First we're going to have light on, but no tone. So light is x1. So light is going to be up here. So this is plus 1. Light's going to be on throughout the whole thing. So this is x1, which is light. And then tone is going to come on in the second half of the experiment. So then tone does this. This is x2, which is tone. And down here is trial. And then what is our observation? It's y. y is related to y star, which is the uncorrupted way of thinking about y. So y is what we observe, but it's always a noisy measure of the actual shock. So the shock is always the same, but we're going to assume that the animal has some noise associated with the way he evaluates the shock. So there's y, which is observation. And then there's y star, which is the actual shock. And we assume that y is equal to y star plus some noise, plus epsilon y. So there's some small amount of random noise that's added to what the animal observes. So y star is going to be 1. Shock is always on. Say that's y star, which is shock. OK, so our objective is to estimate w1 and w2. All right, so let's do the LMS. LMS element. What does LMS do? So we have what is going to be y hat? Well, we're going to assume that we don't have any reason to not like light and tone to begin with. So then with training, we're going to learn that we're going to expect the shock. This is going to be y hat, some amount of training. And how does that happen? Well, in this experiment, light is on. So w1 does this. Actually, I should put some noise on this. It actually looks like this. There's some wiggle on the y hat that we make. So w1 is going to look like this. w2 is going to look like this. So we're going to assign all the weight associated with the error that we see to light. Now, what happens is that both the light and the tone are on, but the shock is still there. And our prediction error is zero because we predicted as long as there's light on, there's going to be a shock. So now what happens is that this stays where it is, y hat stays where it is, w1 stays where it is, w2 stays where it is. So we don't learn to associate the tone with the shock. Why? Because we have no prediction error. You know, when the light is coming on, it doesn't matter that the tone came on. Well, I still got shock. And I predicted that I'm going to get shock. So there's nothing to learn. Okay, does that make sense? So prediction error is central to the learning. Without it, you know, there could be many stimuli that are on, but because I have no prediction error, I don't learn. All right, now let's consider what the common filter does. Very similar, very similar story. So except now we have a K and we have uncertainty P. So P is a two by two matrix. So I'm going to try to draw the components of it for you. So, you know, we again have these two variables. We're going to have y hat where, you know, y hat goes up and then stays there basically. So this is now the common filter, state of state. This is y hat. What is w1 and w2? It's going to look very similar. W1 is going to, you know, rise up, stay there. W2 is going to stay down here. Next I'm going to plot for you the uncertainty matrix. So this is one. Now the uncertainty matrix, in the case where we begin, we only have, in our example, we only have the light on. We only have x1. So what happens is that in the first 20 trials, I'm going to plot P11, which is P11. So remember P is going to be the uncertainty associated with variants of w1, variants of w2, the covariance of w1, w2, and the covariance of w2, w1. That's what P is. So my estimate of w1 and w2 is here. That's the mean of my posterior uncertainty. The variance of my posterior uncertainty, what it's going to look like is that, well, P11, I'm going to see a whole lot of times when light is on, but there is no tone. So my uncertainty for w1 is going to definitely decline. It's going to go down here. What about my uncertainty for w2? Well, I don't know anything about w2. It's going to stay up here. P22 is going to stay up there. P11 is going to fall. P22 is going to stay up there because I never see it, you know, tone. I don't know anything about tone. The covariance is going to be zero-two. P12. Of course, P21 is equal to P12. Then what happens is that both of them come on. So now, this covariance goes to negative numbers. So this is negative. This is positive. And both of these now begin to fall. Something like this. All right, okay, so as far as we can tell, the stated estimator and LMS behave very similarly. They both say that when I get to the stage where you ask me how much I know about tone, I'm going to say tone is not a dangerous thing. If I hear a tone, I don't care. The weight associated with it is zero. Even though I heard the tone and I got shocked because I heard it in the context of light, it made no difference. All right, so now let's compare this to the reverse scenario. So what happens in the reverse scenario? In our next experiment, what's going to happen is called backwards blocking. And now what happens is the light and tone are on at the same time. So again, let's do 40 trials. We begin with light and tone both up here. So this is light or X1. It is on. And for the first 20 trials, tone is on too. And then tone goes away. And I'm going to get shocked the whole time. That's why I start. My shock. Okay, so the interesting question is this. I saw light and tone together when I got shocked, okay? So that means that both bad. I'm going to assign some credit to light, some credit to tone. Maybe half of shock is coming because I got light, half of it is coming because I heard the tone. And now what happens is that I don't hear tone anymore. All I'm seeing is the light and I'm getting shocked just as much. So what should I believe about tone? How important is tone? Should I be afraid of tone or not? What do you think? How many think at the end of this, I should be afraid of tone? Nobody. How many think I should be afraid of, I should not be afraid of tone? Okay, so most people think that we should now realize tone wasn't important, right? Okay. And that's what the animal does. The animal says, oh, I used to be afraid of tone, but I didn't hear tone. I just took a shock and therefore assigns basically nothing to tone. Now what's interesting here is that the animal did not hear the tone, but changed its belief about tone. So how did he do that? How did he know that the lack of the stimulus itself was interesting? So if you look at LMS, look at LMS here. LMS says, what's an X? Did you hear the light? Did you hear the tone and did you see the light? So if the X associated with the value associated with tone is zero, I didn't hear the tone. I'm not gonna do anything about it because I'm multiplying something by X. So what LMS is gonna do, it says follows. So it's gonna learn to predict Y hat. So here's Y hat, it's gonna go to one. And now, how is it gonna divide up to W1 and W2? What was gonna say, well, basically I'm gonna go up to point five. I'm gonna, this is gonna be my W1 and this is gonna be my W2. Half the stimulus is coming because of the tone, half of it is coming because of the light. That's a rational way of dividing up the problem. Now, what happens is that I hear nothing, but I see the light. And when I see the light, I still get all of the shock. So by the end of here, I say, if I just see the light, if the light comes on, I should get half of the shock, but I don't get half the shock. I get all of the shock. So now what happens is that W1 goes up here. So I have a prediction error, right? Because all of a sudden here, I say I should get this, but I get all of it. But I never saw, I never heard the tone in the second half. So W2 stays where it is because X2 is zero. I can't change, I can't change the weights for stimulus that's not there. So LMS says that if you were afraid of the tone at the end of the experiment, at the end of the first half, you're still gonna be afraid of it at the end. Okay, now what does the Coleman filter say? Coleman filter says the case depend on the uncertainty matrix P. And if you heard the light, sorry, if you heard the tone and you saw the light together, what that means is that your uncertainty matrix has negative off diagonal elements. So if one went away, the other one must change. Basically, if this goes up, this has to go down because that's what the uncertainty matrix says what happens because the uncertainty matrix is keeping the record of this history. It is true that X2 is zero, but I'm gonna change the weight associated with it anyway because the P matrix has negative cross off diagonal elements. So if I do the same thing in a common way, when I get my Y hat's gonna look exactly the same, but my W, so in the state estimation, here's what happens. W1 and W2 change together and then W1 goes up, W2 comes down. And the reason is because the P matrix in the first half, so here's the P matrix, here's positive, here's negative, I'm gonna plot for you P12, P12. And P12 is gonna go negative because both stimuli are on together. And now because of this, the K matrix is gonna behave in such a way to make W2 move in the opposite direction because it remembers that these two things that I'm trying to estimate, W1 and W2 have a negative covariance. If one goes up, the other one has to go down. So W1 and W2 change. Now, they did an experiment to see how this actually works in real animals. And here's the experiment that they did. The following, I'll draw it down here. So they did experiment as follows. They had three tones, three different tones. Let's call it A, B, and X. So three different sounds. And they first played A and X together. And if A and X together were played, it predicted the tone B. Okay, so A and X come together, that's gonna give me tone B. And if A comes by itself, it's also gonna predict tone B. And now, if you hear B, you're gonna get shock. And so now they ask, what do you know about X and what do you know about A? Let's think about this for a second. So A and X together give you B, B gives you shock, A by itself also gives you B. But so do you see its analogy to the reverse blocking? So if the animal has learned that A and X together give you B, A by itself gives you B, but X must give you nothing. So I'm not afraid of X, but I'm very afraid of A. And this is as compared to the control group, which does the following. A and X give me B, and A gives me C, and B gives me shock. So now I ask, what does A give you? What does X give you? How afraid of X and A are you? And you're afraid of both A and X. Because I learned that B is bad, half of A predicts B, half of X predicts B, both of them predict B, so I'm afraid of both A and X. I think it's pretty cool. Any questions, guys? Yeah? That's negative. Right, because in our model, we have W1, X1 plus W2, X2 is equal to Y. Right? So if X1 and X2 are both one, in this model, W1, if it increases, W2 has to decrease. Otherwise, how can you be both at the same time equal to one? Yeah, it's implicit in our model, that covariance. This idea is for, why don't we have one and minus one and one and zero? Yeah, that's fine. So you mean if it exists, it's one. If it doesn't exist, it's minus one? Yeah, then we can modify it that way. Yeah, yeah. It doesn't exist. Yeah, yeah. So let's think about that. So would it change, of course it would change LMS. Yeah, if it's not there, if it's minus one, I'd have to think about it. In how would, In many situations, I've seen that, to avoid the problem of not modifying the words. Yeah. With the minus one. Yeah, yeah. Something happens to other people. Sure, sure. It should have happened to us. If somebody asked me the same question, I would not necessarily behave the same way they did. So why do you think that me as an animal behave in a different way than this animal? Because for me, I can say, okay, it might be, it does not matter. Even if it's like light or some, maybe both of them do the same thing, or maybe they must be at present together. And there are many different combinations that can give you the show. And maybe it's like, it's part of the make a conclusion. It's just a random guess if there will be a show cut at the end of the note. Yeah, yeah. Right, right. But the animals behave consistently. So that, okay. The experimental data is collected. And in its first, it is a little surprising that even though the animal heard the tone in the context of getting shocked, he's not afraid of it. So the first experiment illustrates that the concept of prediction error as being important. So the fact that the stimulus was there by itself wasn't important. What seemed to be important is the fact that the animal made a prediction, but that prediction indeed came to fruition. Shock was predicted and it occurred. The fact that he was in a room with the tone and all these other things didn't seem to register nearly as much as the fact that the light was associated with the tone. And I guess you asked a number of questions. So let me think about, I would be interested in asking if we were to change the mathematics so that X's are represented not in terms of zero and one, but in terms of minus one and plus one, would it all fundamentally alter the results? I don't know, we would have to simulate it. But if we represent it as a minus one, what we're saying is that we already know that that stimulus is not present, right? So there are many things that are not present when you're at a particular situation. If you assume that it's a, that you know that it's not present, you're giving it importance, right? So when you're in this room, there are many stimuli around you. If I say light comes on and you get shocked, well, there are 50 other stimuli as well that we're not on. The fact that if you assign minus one to those, you're saying that I know these are important and they were not on, as opposed to saying this was important. So I guess it's a philosophical, but I think it's reasonably important. You asked many questions, so I'm not sure if I addressed them. To expand what you say, maybe, if a stimuli that's off goes on, then you can keep each other a stimuli that's off. It attracts your attention and therefore... Like, if you have light and sound, then you get shocked, but if you have light and sound and a door opens, then you don't get shocked. So the door opening inhibits the other to... Yes, yes, yes, yes, that's right, that's right. You would have to, right, right, yeah. You would have to also be repeated. So I wanna next do an experiment because on last week, I mentioned to you this idea of doing a state estimation from the perspective of learning and these are goggles that have been slightly modified and we're gonna do an experiment, so I wanna volunteer. So put on the goggles and come on up here. Yeah, this should be able to go around your glasses. It's perfect. All right, yeah, yeah, excellent. This is very cool. Don't move around too much. Stand still. Stand still, okay. So I'm gonna take some data from you, so yeah. Yeah, just go instead. Touch the board so you're gonna get a sense of, you're left-handed or right-handed? Right. Okay, use your right hand, touch the board. Okay, go bring your arm down. Put it all with the goggles. This is all distorted. We shall tell you in a moment. So I'm gonna put up a dot here. Hold on, hold on to this. Yeah, so what you're gonna do is you're gonna try to hit the dot. Okay. Okay? Quickly? Yeah, just go ahead, something like this. Keep your finger down. I'm just gonna go ahead and touch it, you don't like this. Okay. Just go ahead and do it once. Okay. Okay. Put the dot there, okay. That's okay, that's all right. Go ahead and just make a rapid movement just like you did. Let's do the dot again. Yeah, just keep looking at the dot, do it again. Do it again. Yep. I did it past, oh there you go. Nope. Just focus on the dot. Yeah, focus on the dot. Focus on the dot, yeah. Keep focusing on the dot. There you go. No, no, no, this is data. Nothing stupid about data. Keep doing it, excellent. Yep. Keep focusing on the dot. Okay, that's great. That's great, all right. Stand where you are. All right, go ahead and take it off. Yeah, hold on to it. Great. Okay, let's focus on the dot. Hit it again. Do it. All right, so you cheated, you cheated, right? So where did your arm go when you first? I think it went over here. No, put your hand up. All right, let's do this again. Put that on. Didn't feel good, whatever happened. No, it was in my head. So you were just doing fine, all right. So go ahead and hit it. Yep, no, I want you to hit it. Don't stop your hand, yeah. I have to go back to my resting. Yeah, yeah, put you down. Give it a second before you come back up. All right, keep your finger down. I'm gonna go ahead and remove it. All right, look at the dot. Make a nice quick move when I hit it. All right, good. So most people, what they do is that they miss and then they learn. Now I don't know how it is in you. What you are doing is that you're sort of going over here but then you correct it for it. Most people have what we call after effects. In the 1960s, I'm gonna pass around some of these so you guys can try it on your own. In the 1960s, the scientists became interested, wanna come up here, wanna try one? In trying these things because they were sending, the United States was sending astronauts to space and space has, of course, no gravity or reduced gravity and they were interested in understanding what can astronauts adapt to lack of gravity and can they compensate for it? So they began doing experiments like this and the, yeah, you guys, if you wanna try it, basically you need to have a target. You gotta have to point to it and then see what that feels like. So one of the classic experiments in this field was as follows and it had to do with the concept of prediction error. So what Dick Held, who was nice enough to give me these original, these are prism glasses. Basically what you're seeing is that light is bent as it goes through it. So if you look at somebody wearing these glasses, their eyes are pointing to the side because for them, that's straight ahead. So what Dick Held did is that he considered this experiment. So he had people put on prism goggles and then he put the subject's arm in a board, basically, and then he moved the board in front of them like this so that they could see their arm moving, okay? And he compared that to when they themselves moved their arm. So if I myself move my arm, what happens is that I send the command to my arm and I see it move just like what we saw here. He sends a command to his arm to go here but he sees it out here, right? He misses it completely. So there's a prediction error. And so trial after trial, he's learning from it. Now imagine if I had taken his arm and moved it for him, right, and he himself moving it. And what Dick Held saw was that when people saw their arm move, even though they were in the prism glasses, they didn't learn from that. But they learned if they themselves moved their arm. So we think of it as you send the command to your body, you expect a certain sensory consequence. You expect to see your hand go someplace but you don't, you see it go someplace else. So you have a prediction error. And that that prediction error was the thing that you need to adapt, to learn. So it turns out that the ability to compensate for these kinds of perturbations depends on a part of our brain called the cerebellum. Patients with cerebellum damage, so here's a cerebellum, have significant difficulty learning to compensate for these kinds of perturbations. Okay, if there are no questions, any questions? Yeah. Yeah, it's involved probably in much more than motor learning, the kinds of learning that we've been talking about because we know that it connects to many areas in the brain other than the motor areas like the frontal lobe and the parietal cortex, a prefrontal cortex and the parietal cortex, but it has been dominated by studies of motor learning. Although recently people have begun looking at learning in other kinds of domains, cognitive types of domains to ask if there's this notion of an internal model associated with it, even though it's not internal model of my own actions, but my models are how I believe the environment should behave. Now, this is a very different kind of a memory system than what I told you about HM. So HM had an issue with his, here's a temporal lobe and in the middle part of his, medial part of his temporal lobe is an area called the hippocampus. And this hippocampus seems to be important for our ability to remember our autobiography, basically what's happened to us in terms of events, faces, places that we've been in our recent past. And one can have a no ability to remember that I've seen you before, I've had you in a class, but yet being able to do something that suggests that I remember something about the particular thing I learned here. So I learned in this room to perform an action, like to compensate for prisms, even though I don't remember ever putting on the prism on my head or you that give me the prisms. So that memory of being in a room and being given this thing that I put on my eye is a hippocampal memory. The action that I just did, if I were to repeat it and get good at it, would be something that the next time that I put the prisms on, I know from the very first try how to do it. Well, not the next time, next 20th time that I do it, I would know that this prism glasses altered the world around me and I know how to compensate for it on the first try. That would be a memory that would be not dependent on the hippocampus, but probably the cerebellum. Yeah, hippocampal lesion. Yeah, he had a disease that had caused in his brain to have seizures and because he couldn't live with his seizures, the surgeons have removed his hippocampus, his hippocampi on both sides and it made it so that the seizures went away, but they had this devastating side effect that he couldn't remember being in a context that he had been before. So psychologists use this word declarative memory to refer to the ability to remember that you've been in, that you know something, like when you get tested next, on Wednesday on what you know, that's a declarative memory. You can declare what you know. Why is it called declarative memory? Yeah, because it's a part of programming, right? When you think about programming, at the beginning of programming, you declare a variable as an integer. It's real. It's a declarative part of the programming. It comes from computer science and then you have the procedural part of it where you write down the algorithms of what to do with the parameters and that's called procedural memory. So psychologists in the 70s and 80s when they talked about memory, they divided it up into two parts, declarative, things that you can declare. I know X squared, what it means. I know the map of Johns Hopkins. And then procedural, I know how to ski. I know how to play tennis. I can't tell you how to ski. I just, I have to show you. That's called procedural memory. These two things come from computer science, the way you program. Why does it come from computer science? Because the person who came up with these two terms was a psychologist at, his name was Larry Squire, is Larry Squire, who basically at the time, at the AI lab, where people were talking about programming was introduced these concepts of declarative programming and procedural. And so he thought, oh, those are the perfect terms to use to describe the brain and forms of memory. Okay, thank you for your time. Quiz on or midterm on Wednesday.