 algorithm that so new hasn't even been published. In fact it's in review as we speak. It's David's work and it's pretty cool stuff but to get to it I need to describe for you where we are with regard to the problem of learning and the state estimation framework that I've been describing. So we describe the problem of learning from a common filter perspective. We said that the objective of the learner is to estimate the state of the environment. In his mind he has a generative model in which he describes how the data that he's collecting is generated. Based on that generative model he estimates he or she estimates the parameter that makes up the hidden state of that of that environment. And then we talked about how using the hidden state approach where we estimate the hidden state we can take observations and compare it to predictions and then compute a change in our belief about the hidden state of the system. Now let me begin by giving you a brief review of what we would predict regarding what the learner should do in a situation where noises are altered and state estimation says how he should learn more or learn less from a particular prediction error. So in a scenario where this is going to be chapter 7 by the way. The material today will be chapter 7. The material that I'm going to be describing with regard to this new work, David's work, is not in anything. So it's not in the book, it's not in any published work. You're just going to have to listen to me and pay attention to it and write it down. You're going to have a homework based on that that David just put up on the website tonight. So we'll give you an opportunity to think about it. Alright. So suppose that we have the state state estimation problem. We have x hat of trial n is equal to a x hat of sorry x of trial n minus 1 plus some state update noise epsilon x which we're going to have as variance sigma squared x with mean 0 and we have an observation y of n say it's equal to x of n plus epsilon y where epsilon y is normal 0 sigma squared y this variance. Now when we describe the problem of learning here we say that well we're going to have some belief about x hat a trial n given n so this is our posterior belief and that depends on our prior belief x hat of n n minus 1 plus this thing that we call the common gain the relationship between our uncertainties in our beliefs and our uncertainty in our observation times the difference between what we observed and what we predicted and we wrote that in the case of a scalar system like this the common gain is going to be equal to the ratio between my prior uncertainty and my observation uncertainty which in this case is going to be just p of n n minus 1 plus sigma squared y. So if you look at this equation k of n is really what we might say is my sensitivity to error. It tells me how much I'm going to learn from error. So if I make a prediction and I don't observe what I thought I should observe I'm going to learn something about my prediction and I'm going to improve my belief but how much I learn from that error is this gain which we've been calling the common gain the sensitivity to error and today's lecture is about this gain k and I'm going to show you that our theories associated with the state estimation only takes us so far when we begin to look at how biological systems learn humans learn we're going to see that it begins to fail and we're going to need something else and we're going to come up with a different way of thinking about things but the state of the art is more or less here that there is an objective way to modulate your sensitivity to error and that has to do with the common gain and what is this common gain is the relationship between your prior belief that has uncertainty in it P and your observation so in 2008 people began to test this theory to see if indeed in biological systems their sensitivity to error was modulated by the relationship between their uncertainty about what they believed and the uncertainty about what they observed so just to give you a sense of what that means just to stepping back basically if I'm uncertain about my prediction I'm going to say tomorrow it's going to rain but it turns out that I'm a pretty bad predictor I don't really have a whole lot of confidence in my prediction so therefore if tomorrow comes and it's sunny I'm going to change my predictions I'm going to learn something from my error and I'm going to learn a lot because I was very uncertain about my prediction on the other hand if I'm pretty darn certain about it from you know George Bush and I made prediction that there's going to be weapons of mass destruction in Iraq by God there's going to be those things and I'm going to be pretty darn hard pressed to believe that there isn't because I'm pretty certain about my belief so I don't mean to say anything wrong about George Bush he's a fine president only that he was pretty certain about his predictions and of course you have to be in these scenarios okay so in 2008 people began using these kinds of formulations to ask whether the way individuals learned from error could be thought about in the framework of uncertainty meaning that there's an uncertainty of the learner have something to do with the way they learn okay so the easy way to manipulate things is down here with regard to sigma squared y right so if I make it so that your observations are associated with a lot of noise then what you should do is to not learn right because the denominator here becomes large and so that means that K should become small so the easy thing to do is to make it so that the learner when they observe stimuli when they observe the consequences they just makes it so that it's very noisy so their sensors are bad so that makes it so that you know I'm not going to learn very much from my errors so just to be concrete about things suppose that we look at K of n as a function of n and you know we know that K of n is going to converge to some value as you iterate and this convergence is going to be dependent on these noises the noise is associated with my uncertainties and the noise is associated with my observation and so if I have a scenario where K squared y sigma squared y is 2 whereas sigma squared x is 1 suppose that my uncertainty associated with this noise here this this noise in the in the state equation is small my observation noise is high what's going to happen is that in this case K is going to be pretty close if I have the opposite scenario or if I have my observation noise is so here I have sigma squared y is equal to 1 whereas sigma squared x is equal to 2 there my K is going to be larger because I'm uncertain about my predictions here the state equation has more noise in it than the observation equation so therefore I'm going to be more receptive to my learning errors I'm going to learn more from my errors so if the state equation has noise that's large that's going to make me uncertain about my predictions if the observation equation has more noise it's going to make me uncertain about my observations either of these two can change the common game any questions okay that's the theory Rudolph Coleman who's still alive would say that's obvious so what the heck is new well what was new is that people in the world of learning thought that oh well that's really cool should we try it on people to see how they learn and see if they modulate their learning based on things like this so they began doing experiments where they try to manipulate things like sigma squared y and sigma squared x and so the first experiment that wasn't that was done similar to this was in 2008 and the way they manipulated sigma squared y the observation noise is by having people make movements but then giving them feedback that was like a blur so suppose you're asked to move and hit a target but I'm not going to show you your hand what I'm going to show you is this like a blur that's supposed to represent your hand and sometimes the blur is going to be tight sometimes it's going to be wide so when it's tight that gives you a small variance about what you're seeing when it's wide you're going to be very uncertain about what you're seeing and what they saw was that indeed in scenarios where the feedback was blur a wide blur individuals didn't really learn much from their error whereas when the feedback was a you know a bright representation of their consequence interaction they learn more from there so seems pretty obvious right that that should happen now is that because the common gain well we don't know maybe if you just give a you know sort of a cruddy feedback to people they say well what I don't know what's going on here why should I learn anything but nevertheless that was the first example of this scenario where by changing the feedback in terms of making it more noisy individuals became less less willing to learn from it so more interesting approach was does that make sense it just make to make the feedback more blurry so it's effectively making it so that your sensory system is not very good like putting on glasses that you know fuzzify the environment around you you just are not going to learn as much about your errors as as if you could see things clearly going back to the weapons of mass destruction thing maybe he thought that their sensors weren't very good they just couldn't find it that's why he wasn't learning from there maybe it wasn't because of his priors maybe it was because his observation was noisy or he just didn't believe the people that reported the results to that could also be one could make these judgments okay all right so let's suppose now we want to change the state state noise here let me tell you about the experiment that was done along those lines so again these are movements that individuals are making and what we have is that we have some state that depends on the previous one plus and then we're going to have a perturbation I'm going to call it r that's going to be equal to r0 plus the state x and r is the perturbation so there's a state x that depends that that that alters my perturbation and so in some cases you know the large amount of noise sigma x is going to be large in some other cases the sigma x is going to have small amount and on any given trial the perturbation is going to change from trial to trial and what this what the subject sees is where their hand went plus this perturbation plus some sensory noise signal so then you know the objective is that you know they're going to predict what the r is r hat they're going to have r hat which depends on what they believe to x to be and what they did in this experiment is that they manipulated the sigma x and they considered two scenarios they considered when sigma y was equal to four where sigma x was equal to one and another condition where sigma y was equal to four and sigma x was equal to I think two and a half or something like that so what they did is that they tried to make the perturbation follow a change in in its value from trial to trial that depended on some noise sigma x and that was highly noisy in one condition it changed a lot whereas it was fairly persistent in another condition and what they found was very small evidence for for large so if this is trial number and this is error on the first trial the two groups have the same error but what happens is that in this scenario when sigma x is one presumably you would have an uncertainty here the prior uncertainty that would tend to have a smaller number than in this condition with what it's what sigma x is two and a half and so in the condition where you're more uncertain you should learn this is this this is this oh sorry no I did it backwards because my uncertainty is higher my prior uncertainty is going to be higher when sigma x is two and a half so therefore I'm going to learn more from error my prior uncertainty is lower here so I'm going to learn less from the error my learning rate is going to be slower if my prior uncertainty P is smaller if mom if I'm more certain about the dynamics of the stimulus I'm going to learn less from my error than if I am less certain so just to be clear there was a small bit of evidence suggesting that if the dynamics of the perturbation are described by a state update equation that has large noise the large noise implies more uncertainty about what's going to happen in my belief P is going to be higher that translates into a larger K which translates into a larger sensitivity to error and this turned out in that paper is the 2008 paper there wasn't significant difference between these two lines but it was in the right direction questions yeah yeah it's more complicated right because so let's let's remember what is the posterior uncertainty so so P of n plus one given n is equal to P of n given n times a squared plus sigma squared x so you see a also is important here you know in our equation right but but so is the sigma squared x and then the sigma squared y comes into here so the evolution of the prior answer that the uncertainty depends on sigma x it's always being divided by this number that says add to it sigma y yeah yeah good question okay yeah yeah yeah yeah oh because the error is unrelated to my uncertainty it depends on my prediction right so it depends on the mean not the variance and so if on average these two groups are the same of the of have you know have the same mean in their prediction which is that if I move if I move like this the cursor is gonna go here but it actually goes here the error has the same mean so it starts out basically error being equal to the perturbation I generate this movement I get this error now the question is how much do I learn from that error and the amount that I learned depends on my how uncertain how much do I believe this error that I see as compared to my own belief about what should happen does it make sense so because error is controlled on the first trial by the environment not by my uncertainty about the environment so you know I throw the ball it doesn't matter how uncertain I was about I make a prediction about tomorrow's weather what actually happens in tomorrow's weather is independent of my right so that I will just have some error that depends on my prediction the mean of my prediction and the actual observation okay all right so the the next thing that was done along these lines was to try to see how people can manipulate these these errors and the uncertainty and and the and the this K which is basically what what what goes to our error sensitivity and I want to just mention to you what what worked and what really didn't work so well so what didn't work very well were these attempts to make people better learners so they could make people so they learn worse by adding noise to their observations but they didn't really make them any better in their learning so it was very hard to make the learner so that they learned more from the error than they would normally under reasonable circumstances a couple of places where they succeeded were scenarios where basically they made it so that they thought they were making the person uncertain and the way they did it where as follows so a couple of things that was successful in 2009 where as follows so one of them was to have people sit in darkness before testing them and the idea was as follows that's maybe my uncertainty matrix associated with the sensory consequences of my motor commands becomes large if I haven't been able to see the consequences of my motor commands so if you've made a prediction but you don't know what happened maybe you become more uncertain about your predictions and so what they did is that they had people sit in scenarios where so say they're measuring K this learning from error and how much do you change your belief about the world as a function of the error that you saw and what they saw was that if you move with feedback so suppose that there are three conditions move with normal feedback and you know you have some some valley this is how much I how sensitive I am to error now if I make it so that I move without feedback so maybe I make you know a hundred movements and you'd never show me anything now all of a sudden you show me the result of one of them what they found was that the person seems to learn a little bit more from that error and if they made it so that you just sat silently in a room for like you know a few minutes before you were tested to make no movements at all don't move for a few minutes and that made it increase as well so it wasn't clear why these things were working it wasn't clear you know why why why doesn't matter that the individual sits there for a while before they allowed to make a movement this process of waiting does us does us somehow make people uncertain but it was interpreted in this framework that I've been telling you that they're learning more from the error it must be because they were made more uncertain about the prediction and that resulted in the particular actions that you saw so the more systematic way to approach this problem was something that puts aside the common filter approach and asks in principle what should a good learner do in a certain circumstances and this has to do with the following if your environment is one where it is changing rapidly from trial to trial as compared to it is changing slowly from trial to trial maybe it matters that that rate so maybe I should be able to do you know to be able perhaps do better if my world is somehow consistent and stable if I make predictions on that world whereas in a world that seems to be flipping and flopping from trial to trial maybe in that kind of scenario it makes no sense for me to try to learn anything from air anyway so in 2004 there was a student in my lab his name is Murray Smith he did a kind of a nifty experiment he imagined a scenario like this that basically you have an environment and you know you make observations as before but what he did is that he manipulated this parameter a and the way he did it is based on the following so if I if I make an estimate now if I'm say x hat of n given n is equal to x hat of n given n minus 1 plus k of n y of n minus y hat of n then x my prior on the next trial x hat of n plus 1 given n is going to be a times x hat of n given n which is equal to a times x hat of n and minus 1 plus a times k of n y of n minus y hat of n so we see that a is a modulator of how much I learned from air so if you look at my guess on the next trial and you compare it to my guess in the previous trial you see that if I'm coming from a world where a is large close to 1 I should learn more than if a is 0 so if a is 0 then x hat of n plus 1 given n is going to be equal to has no relationship to anything in the past that's going to be 0 it's expected value right so I'm not going to learn anything from my error if on the other hand a is equal to 1 then my prior on trial n plus 1 is going to be x hat plus k of n times y minus y hat so now in this case the amount that I learned from error is going to be this whereas the amount that I learned from error if a is equal to 0 not going to be it's going to be nothing so the idea was if the perturbation is such that it's controlled by this parameter a so we're going to compare a perturbation that is generated by a random walk in which a is close to 1 versus when it's close to 0 versus where it's close to minus 1 and the idea is that in these three scenarios the learner should behave differently in case where a is close to 1 we should see a lot of learning from error in the case where a is equal to 0 we should see almost no learning from error and if case for a is equal to minus 1 if this theory is right what should happen is that when they see an error to the right instead of learning from that error by moving to the left they should actually move further to the right because they expect a to be negative so this was the basic idea let me show you what this means in terms of generating perturbations so suppose that you want to generate a perturbation in which a is equal to 0.9 what does that mean that means that the perturbation that you're giving is basically a signal that is is a random walk and it moves slowly because the state on the next trial is highly correlated to the state in the previous trial you can't jump very far on the other hand when a is equal to 0 what you have is basically you know a random walk that looks like this when a is equal to minus 0.9 almost minus 1 now you have even a higher frequency it's flipping back and forth and Maurice what he did is that he had people sit in environments in which the distribution of the perturbation was drawn from these kinds of random walks one where a was equal to almost 1 highly correlated the perturbation on one trial was highly correlated to the previous trial one where it was 0 there was no correlation between the current perturbation and the last perturbation and one where it was negative one if you got perturbed to the right you're very likely to get perturbed to the left on the next trial so what he saw was that in this condition the k how much people learn from error went like this so when a is equal to minus 0.9 a is equal to 0 a is equal to plus 0.9 what he saw was that first of all k never became negative people could not make their learning opposite to what one would want you know to the opposite of the error but what they did is that they learned less little bit more little bit more here so this was learning from error how much they changed their belief on one trial versus from the error that they had learned to the next trial this was the first indication that one could alter their learning rate based on the dynamics of the perturbation yeah so I'm going to give you examples from hidden Markov models where there's a state in the environment and the probability of staying in that environment versus changing to another state so I don't know if there's a it would be a flip basically right it would be a system that that has two states and if it's in the current state is very likely for it to change versus one that's likely to stay okay so what we're going with today's lecture is a way to understand how in principle a system could change its learning rate so that if indeed the world is like that it would become negative so how does this so this is David's work this is work that he's done in the last year and we're going to come up with an algorithm and what's cool about this algorithm is that it's going to be able to help us understand a number of problems that exist in learning theory from the perspective of how we remember to do things and and why is it that with the second time we do a task we're better at it and so forth and I'm going to I'm going to tell you about those things more when we talk about problems like this in the next lecture but today it's just going to be the basic algorithm where does it come from and recently we discovered that this basic algorithm that I'm going to tell you about was something that was described with regard to the systems that were used to learn from air in neural networks in the early 90s and we're going to see that the basic rule is very similar to that but it has this this interesting property that it has it is associated with errors themselves so the basic idea in David's algorithm is that if you look at that way we've been doing things this K here is independent of error so the gain how much I'm going to learn from error is unrelated to the error itself it's a modulator of error so what this means is that in the common filter approach if your gain has been increased it's increased for all errors right so we can make you a better learner in the world of the common gain independent of any error that you might see so we're going to see evidence that that is not true that when you are a better learner you are a better learner for a specific kinds of errors you just aren't going to become a better learner for all errors only certain kinds of errors and the question is why what's so special about those errors okay so the point let me go over this again in the mathematics that we've described so far K is a modulator it says that I'm going to learn more or learn less from air which depends on my uncertainty but it does not depend on the error itself it doesn't say for these errors I'm going to learn more for these errors I'm going to learn less it's going to do it for everything okay so yeah because your current state uncertainty does depend on your previous history of errors yes absolutely no it doesn't depend on previous history of errors it depends on history of trials it depends on the X's that you saw not not errors P doesn't depend on Y right P depends on the last P right okay yeah K depends on Y you know K depends on this noise yeah actually in these linear models it makes no difference what errors happened right so okay let me show you let me show you why this the thinking began to change so David began doing experiments whereas follows so he said suppose we are living in a world in which there can be two kinds of perturbation suppose there's a minus one perturbation and there's a plus one perturbation and there's a probability associated with staying in this state or changing so Z is my probability of staying in a plus one per so that means that if I'm in a perturbation state plus one there's some probability that's going to continue but there's also some probability that's going to change so for example suppose that my Z is point nine what does that mean that means that if I'm going to draw for you now the state of the perturbation let me call R R is equal to plus or minus one that's the perturbation you know if Z is equal to point nine if I'm going to say I'm going to start at minus one I'm likely to stay that I'm going to switch stay switch stay so forth on the other hand if I have a scenario where Z is equal to a small number like point one then what's going to happen is that so I'm going to put now Z is equal to point one that means that if I'm at plus one I'm very likely to change so now it's going to look like this I'm not going to stay very long so forth does that make sense okay now now now what should the learner do learner makes a prediction on a particular trial that prediction is associated with the actions that he does as well as this perturbation so if the world is stable meaning that if Z is close to point nine then what I learn is going to help me for the next trial it's good because the world is the same right but if the world is going to change what I learn is going to actually hurt me so in the case where Z is equal to point one you know I should stop learning in the case where Z is equal to point nine I should increase my learning and the Z is equal to point five it's my neutral state so what David did is that he you know put people in these kinds of experiments and he measured learning from error so of course what you have to do is to now regulate error because these are perturbations these are not errors perturbations are things that are added to actions to produce error errors difference between y and y hat that's what we call error so learning from error look like this if you take trials so say this is 500 trials and you look at how people learn from error what happens is that they had they start out by learning by some amount people that have Z is equal to point nine start learning more from error people that have Z is equal to point one they start learning less from error and people that have Z is equal to point five they live in an environment in which the probability of staying is about the same as a probability of changing they don't change how much they learn from error so learning from error could be modulated by this probability associated with the state of the environment if we predict something and we are wrong then we know that the world will change we predict again and it's correct so the maximum certainty should be when the world might or might not change the future probability right so what they say is that if we know that the world will remain the same yeah then we learn a lot yes because it's going to help me the next time if we know for sure that the world will change then I should but in the opposite direction yes exactly exactly exactly so that's exactly right so what you're saying is that in this world when as Z is equal to point one this really should continue on down here not go to zero but it should become negative yeah yeah yeah yeah right right we're gonna get there we'll get there yeah nobody's ever shown that but that would be that would be reasonable okay the second experiment that David did was that all right this is different people so there are some people that are in this group that can increase their learning from error there's some people in this group which they can decrease their learning from error so can the same person do both and what he did is that he did an experiment where we began with Z is equal to point nine then Z changed to point five and then Z changed to point one and then there was another group that started at Z is equal to point one went to point five and then went to point nine and then he's measured again this is trial number and this is learning from error and what he saw is that this group you know starts out high like this comes down like this this group starts down like this and goes up like that okay so people seem to be changing how much they learn from error depending on the state in which the world is changing if the world is changing based on this Markov model that says I'm likely to stay then it makes it so that they up regulate their air sensitivity and then when they down when when the world is changing such that one time you get one perturbation next time you get the opposite perturbation it makes it so that the learning from air seems to go down so what was interesting about this result is that when he looked at the amount of learning that was taking place as a function of error so learning from error as a function of error so errors zero here positive here negative here big positive errors big negative errors he could test between two important hypotheses is it the case that in this world I am a better learner for all errors or is it some errors that I'm particularly good at and is it in this world that I'm a poor learner for all errors or is it again some errors and what he saw was a function that looked like this so it seemed like over some range the errors had resulted in a modulation of the sensitivity so for these range of errors there was this up regulation or down regulation of this sensitivity to error whereas for these range for these larger errors they didn't seem to be a whole lot of change so it wasn't the case that people were changing their sensitivity for all errors they seem to be changing it for some specific errors and it certainly wasn't clear you know why was it those errors it turns out those errors were the most likely errors yeah well this that's interesting yeah that's interesting right right so what's what what is that about yeah it just turned out that those errors were the most likely errors so if you look at the probability of error it had a peak here this is probability of error so it just happens to be these were the errors that they actually experienced the most and they modulated the error sensitivity for those errors in particular okay so then he came up with an algorithm he said what we want to know is there a principled way by which we should change how much we're willing to learn from error and here's what he said so suppose that we have a very simple learning rule where we say we're going to have a state estimator x and our state estimate x hat of n plus 1 is going to be equal to x hat of n plus you know what we call our learning rate eta like you've seen before and we're going to imagine that this eta depends on error times y minus y hat of n where e of n is equal to y of n minus y hat of n and what we want to know is that is there a principle this is a function now this is a function not this is not multiplied by 8 by e so I want to know is there a way for me to change my sensitivity to error this eta function as a function of this this thing here and how should I do it and and that this thought was as follows so suppose that you are behaving as follows so suppose that on trial n minus 1 I generate a command you it produces some consequence here and here's my target here so this is my error now in trial n the next trial I make a better you like this I get a little bit closer so now what I have is that the sign of e of n minus 1 times e of n is a positive number if that's the case then what I should do is that I generate an action I generate another action and I got a little bit closer so if I were to look at what happened in my errors these two errors have the same sign this world is stable I should up regulate my sensitivity to error on the other hand so I should increase my my eta here if I had a scenario where I generated a command I got x here's what I should have done and now I generate another command and I get a different kind of an error so here the sign switch this wasn't good my world changed on this trial so I should down-regulate my error so the algorithm goes as follows suppose that you represent eta as a function of your error as follows there's some set of weights where it encodes error so suppose that there is a mechanism by which your nervous system encodes errors so this is a basis set that's going to encode error there's a basis set for small that prefers small errors and there's a basis set that prefers largest basically it's encoding of the error space am I getting a small error or am I getting a larger and what is this gi gi of e of n is equal to some exponential you're basically a Gaussian centered on some location e of i so this is preferred error size it just basically says I want to encode my error I want to I want to encoding of this error if I get an error here it's going to activate a basis set and if it's going to get an error that's larger it's going to activate a different basis function now the algorithm for changing eta goes as follows so these weights are going to change as follows w the set of weights that describe this error is going to be plus sine of e of n e of n minus one so this is my error now this is this is the rule that I'm going to use to change my sensitivity to error times g of e of n minus one divided by g of e of n minus one transpose so what this means let me give you an intuition about what this means so suppose that I'm going to plot for you now learning from error as a function of error and suppose that I begin with just some sensitivity that says I'm equally sensitive to large errors and smaller so what this says is that if I have an error that's negative if I have an error that's positive I'm going to learn from it move in the other direction if I have an error that's negative I'm going to learn from it move in this direction so I'm going to learn from error and in this case this learning from error is uniform I'm going to learn the same from every error now suppose I get an error here e of n minus one if the next trial has the same sign of an error this is positive what this means is going to increase the weight which means that I'm going to do this I'm going to learn a little bit more on the next trial you see it increased my learning at this location if I saw an error and I saw that error again if e of n minus one and e of e of n are of the same sign they're in the same direction what this algorithm does is that it increases the sensitivity at this particular location if on the other hand the two errors were in the opposite sign then this is what it's going to do it's going to reduce the sensitivity yeah how much do I change my estimate on the next trial given that I saw this error in the previous trial so this is a local rule all you need to know is what happened in the last error and the current error were they of the same sign if they are then you should increase your error sensitivity at the particular error that you saw if they're an opposite sign then should reduce your sensitivity and this rule reproduces these kinds of behaviors but it makes a nice prediction and the predictions as follows it says that if I change sensitivity to error I will have done so for a particular error size I will not be a better learner everywhere I will be just better for those errors that I saw and here's the test of that prediction so here's the idea so the experiment that David did was as follows so he said suppose that I give you perturbations that look like this so first I give you a perturbation that say it goes to plus 8 and then I measure your response to a minus 4 and so here's trial number so I you move I give you a plus 8 perturbation I notice how much you learn from it and then sometime later I give you a minus 4 perturbation and I notice how much you learn from it and that's your baseline then what I'm going to do is I'm going to make you reduce your sensitivity for plus 8 errors and increase simultaneously your sensitivity to a minus 4 error so the idea is can I make it so that for one kind of an error you're going to learn a lot from it but for a different kind of an error you're going to learn nothing from it can I make you so that for certain errors you become very sensitive you learn very much from it but for other errors you become so that you you will refuse to learn from it so because you know that's the prediction of this model this model says that I represent my error space using a basis set I'm sensitive to specific errors so the experiment goes as follows measure the sensitivity to a plus 8 perturbation and then also measure it to a minus 4 perturbation now I'm going to make you so that you reduce your sensitivity to plus 8 errors how am I going to make you so that you make you reduce your sensitivity to plus 8 errors how am I going to do that so now what I should do is give you that environment right plus 8 minus 8 so they're switching back and forth if that if I make your Z equal to 0.1 in that world you should say oh these errors when I see a plus 8 I'm very likely on the next try I'm going to see a minus 8 why the hell should I learn from that right so what he does is that he does this this makes sense so now a plus 8 minus 8 plus 8 minus 8 it gives me a scenario where error has a plus value in one side one trial it has a negative value in the opposite trial I should reduce my sensitivity because the sign is going to be negative but it's going to be negative for that particular error size plus and minus 8 next what he does is that he now gives you these perturbations look what happens here so now when you get a plus 4 perturbation you're going to have an error and it's going to be sustained so you're going to you know they'll learn these things this is giving you errors that are sustaining it's coming from an environment in which Z is almost one it is one it's staying but those errors are half the size of these errors so for plus and minus 8 I should reduce my sensitivity for plus here's a plus 4 error here's a minus 4 error I should increase my sensitivity simultaneously I should increase my sensitivity to plus or minus 4 errors and reduce my sensitivity to plus or minus 8 errors so how do we do that we test for that again here we again give you these two perturbations and what you should see is that here I'm going to learn a lot from this because I've seen these errors before and they were stable here I should ignore that it's going to go away that's what happens people simultaneously increase their sensitivity for plus or minus 4 errors decrease their sensitivity for plus or minus 8 errors which comes we think from something like this all it needs is the history of the errors and it says when the history of the errors suggests that the environment is stable I'm willing to up regulate my sensitivity for those errors and if the history of errors suggests that the environment is transient I'm going to down regulate my sensitivity for those errors okay what's their homework David okay one trial or many trials many trials okay okay so you're going to get a chance to try the algorithm and see if it works for you if it's if it's if it'll help you guys we can send you the paper so you can see it in in detail because you don't have any text to read this on I don't know it's up to you David if you want to send it to them yeah yeah don't submit it all right guys see you Wednesday