 Alright, so as a reminder, so far in the class we discussed the maximum informative nonlinearity for a single neuron, where the solution is to, if the neuron is not noisy then the nonlinearity is the cumulative distribution of the input signal and if it is noisy, then noisy binary neuron, then the optimal solution is to cut the input distribution in half. Then we discussed maximum informative solutions for two neurons and how they coordinate their thresholds. Then last lecture we discussed optimal filtering in a linear multidimensional case and today I will give some additional examples on the optimal filtering in the multivariate case because my sense is that it wasn't, I wasn't sufficiently clear and I will give you additional examples and today the plan is for today is to prepare for analyzing multidimensional case with multiple neurons. So for that we need to understand how to read out information from large neuronal arrays. So I will ask, I will present some background material from the information theory textbook and then you can let me know if it is too familiar but the necessary material includes knowing data processing inequality, mark of chain, chain rule for information, what is the sufficient statistics and then I will derive the sufficient statistics for large neural population and show that it actually exists. There is a simple read out and then we will talk about computing this formula for large neural arrays and then applied to retinal data. So that's a material depending on questions and so on maybe it will be some of the lecture today and on Monday. So to go back to the optimal filtering in the multivariate case I want to begin with a demonstration. So it's a little psychophysical experiment and you will be able to see these effects that I was describing or we will go over it one more time on yourself. So the experiment goes like this. You see these two photographs? Yes. They're identical and there is a blue dot and we are supposed to look at the blue dot but then I will, so right now these images are identical. Then I will show you the images like this so you can keep looking at the blue dot and the two images are either blurred or sharpened so the frequency contents is changed. So let's look for it for maybe 30 seconds and then I will switch to the original two images and you will tell me what you see. So I did this through Zoom. I'm hoping that you, what you were supposed to see that once you are, the two eyes or different parts of the visual space are adapted to different frequency content. When you switch to original images then there is a transient and the image where the statistics was sharper looks blurry and the one that is, was blurry looks sharper and then it normalizes to the baseline. Did everybody see that or do I need to make it bigger? I think so. Yeah. Okay. So I think this, I like this illustration because what I was talking about last lecture pertains to single neurons but because every single neuron does it, you see that it's true. So at the level of the brain. So here is a little bit more background on the experiment that I described last time. So in this case the neurons in the cat are presented either with white noise or with natural scenes. So white noise in the Fourier domain will have a flat spectrum and natural scenes has this, so this is actually the logarithm of the power spectrum, one of our F structures. So it's much more correlated. And okay. And then what, and then if we have natural scenes times the filter pass through nonlinearity then let's see. So this is the data I described last time. So you have stimuli now as a cross section, natural scenes have bigger power than white noise and the neural filters change within a certain range of frequencies here, low frequency, low frequencies. And then after that they go to same nonlinearity, same filtering part. And if you look in the temporal domain, then in the same range of spatial frequencies the changes in the opposite direction. Meaning because the natural scenes are smoother at higher temporal frequencies then there is an extra enhancement at low spatial frequencies but high temporal frequencies and then they match to the same filtering part. So that's the review from last lecture. Any questions on the filtering part? So is it clear what the relation between these plots and the experiments that we did with the two phases is? Yes. So there is also adaptation to other other high order parameters such as facial characteristics and so on. Any questions? Hello. Okay. So I have a question. Is this because it is completely clear or is this because you are completely lost? Which one of the two? Completely clear. Okay. Okay. So can you express your... Give me the starting point. Where can I find you? Yeah. I say I'm a little bit lost but I'm not sure what point. I see. Okay. Well let me try again maybe in a different way here. So this is a different slide. So a review is that we have say a linear system and in a linear system the outputs so if we do you see my slides right with the background. Yes. Okay. So some you know the one of the ideas of the linear filtering with multivariate case is that if you have stimuli that for example is a function of the spatial frequency if they are non equally used for example one frequency is over-represented compared to some other frequencies in the input. So in natural world the low frequency dominate so there is more input at low frequencies than it is at high frequency. So the at the output to maximize information I would like to use all channels at equal power. Yes because they have the same limitations in terms of energy. So the the solution is up to of course at some point the noise will dominate but in the range within which suppose the kind of neural noise is constant then in this range if the input if the stimuli are decreasing then as a function of frequency then because these stimuli are overused then I reduce the selectivity to the stimuli in such a way that after I take a product the output will be equally used. Okay. So this is one statement so in the linear system now if you now you can like with this image was of a girl that was a sharper image was sharpened or not sharpened you can say when it is blurred it looks more like this so you emphasize low spatial frequencies when it is sharpened you downgrade low frequencies and you enhance high frequencies. Is that okay? Any questions about sharpening? So the kind of one of the test so this is an example from my work but then there are many other papers where we compare how neurons respond when you change the input distribution and so to test these ideas so when these stimulus changes how should neurons change their selectivity and the theory will predict for example that they will change in such a way as to keep the product of stimulus power spectrum times the filter power spectrum constant but we can see how exactly that happens because the real neural system may not be able to fully adapt due to various constraints. It can be tuned to natural scenes statistics that are common and unable to change or there could be some possibility for adaptation to local stimulus statistics. Is that okay? I think I need some questions because the probability that everything is clear is approaching zero I think. So I think this was very explicative but probably the other questions Carlos you are okay right now and with the next image you're so you should switch on the microphone is it on yeah yeah I hear well yes yeah so the next image here says there is a series of papers of probing different stages of neural processing with stimuli that gradually approximate the natural world so you can have photographs here or movies and you can have white noise so that is the case where you match the mean for example luminance between white noise and natural scenes and you can match the overall variance but the structure of the power spectrum is different and then there are other experiments which match the power spectrum but the kind of the burstiness that is present in natural scenes is not matched and you ask whether the neurons can adapt to these high order changes like force order changes but in this case it shows that you can probe neurons with stimuli with different power spectrum which is second order statistics and ask how the nonlinearities are changing did I recapture part of the okay so we have a question there we need the microphone what is a in in that in that image so a you mean here yeah so a is the normalization scale so you think about the the pixel size so I need the frequent the spatial frequency times a length scale to give dimensionless oh okay okay thanks yeah so there are many questions one could ask for example I'm not you notice that in natural world here there are all one of our apps or they're decreasing with frequency but the horizontal and vertical components decay slower than signals at orientations at other domains so it shows that for example natural since power spectrum is not isotropic and any guesses for why that would be what do you think what's the physical force that creates dominance towards horizontal and vertical components in nature world yes yes thank you right yes thank you so you know either you know things are either standing up or they're falling down and lying down so even in the you would think that it's easier to see that in the office or in the man made environments you will have a lot of dominance of vertical and horizontal edges but this is from the forest and you still see the the dominance of horizontal and vertical components and in fact some people turn this around and say this is the natural world we know this is the statistics of signals in the natural world then if you look in the brain we are more there are more neurons than code this vertical and horizontal direction then oblique directions and then once this is the perception property of the human perception that we are better at coding horizontal and vertical direction that explains why we go to great lengths to create man made environments where these frequencies dominate okay so in this particular case these white noise in natural movies were presented for about i'm going to say 10 minutes yes 10 minutes so there is a limit to how much the neurons can adapt within these 10 minutes but these are the results that i showed that if you have a stimuli times filter and now with taking into account nonlinearity you have sorry this is going back so in the linear model this is the prediction that stimuli times filter should give you the constant output but now we have a nonlinear model meaning stimulus times filter and then there's some nonlinearity and in this case the thought was is that you have stimulus times filter it should be some function of the filtered stimulus that passes through nonlinearity and it's no longer a constant but it should be the same across different stimulus distribution so when the stimulus distribution changes the filters would change but the filtered stimulus whatever it is which we don't have a theory for what's the optimal filtered stimulus distribution should be constant if we are working in a more general case of a linear nonlinear system any questions about that so we have a question there she cannot hear without the microphone my question is why the filter changes in this case because if the stimulus if we just change the stimulus why the filter change as well so one possibility is that the filter doesn't change and so imagine that the we know that natural scenes have certain statistics let's build a filter that is optimal for that stimulus statistic and the animal then has this these filtering properties and then it doesn't and when the stimulus changes the filter is not capable of changing in that case the if we go back to the linear case if the stimulus changes but the filter doesn't change then the output will no longer be uniform and the information will not be optimal for that stimulus ensemble it will be optimal for the first case but not the second so the reason that one should expect some changes in the filter is that when the stimulus statistic changes if you want to maintain efficiency for different environments then the filters have to change so they may not be able to change there can be limits on how fast they can change and at what frequencies they can change but if they are completely rigid then they will not be optimal for different stimulus distributions okay another way of restating yes is that the the optimality is tied to the statistics of signals in the natural world another related point is that if you measure for example this property here power spectrum of natural scenes and then you go to the beach or to the city so this part here 1 over f squared is surprisingly robust to changes in the environment that are salient for us so it will be a small deviation in the exponent it can be 1 over f squared plus minus point 15 depending on the environment so some aspects you can imagine that the neural systems if they adapt if they're designed to process natural scenes with this 1 over f squared structure they will be reasonably optimal under all likely natural scenes distribution because the variation in the 1 over f squared is small so what we are more sensitive is changes in the higher order statistic but this was not tested in this particular experiment so this experiment tests the changes in the 1 over in the second order statistic and and we observed that there is some capacity for change at certain frequencies so in this case so I'm showing you here examples of how the filters change so this was under same neuron under noise and under natural scenes the filter has both different temporal frames showing here at every 30 millisecond resolution and you can see that the filter is similar so it has the same orientation the neuron doesn't change the orientation but there are changes in the spatial kind of a surround structure of the spatial scene so there is some blurring that is going on that is not reflective of the data but of the presentation software but what you're supposed to see is that there is a more broadening in the case of the noise filter and then in the case of the natural scenes filter and that's quantified in this graph that in the case of the natural scenes which is in blue there is less low spatial frequencies and in the case of white noise there is more sensitivity to low spatial frequencies or more broadly so that's any questions I'd like to recapture the the audience thank you yeah sorry just trying to recap a little bit my question my question right now is so there is an input then there is a filtering before the neuron receives the input that's correct or I'm wrong it's almost almost correct so the way we model the neuron and you can so the model of the neuron here is that there is a stimulus and this part is the neuron the neuron includes filtering and nonlinearity now it's not it's important to realize the limitations of this model because for if we're talking about the neuron in the visual cortex this filtering and nonlinearity is happening at many many stages so this is an effective filter and an effective nonlinearity after many stages in other words imagine the nonlinearity wasn't here then initial neurons in the retina would filter signals with small pixels then also in the retina they would then take averaging over bigger regions and then they will align averaging such a way as to obtain selectivity for an edge and this will be the final filter that summarizes all of the processing from the photoreceptor to that part of the brain like primary visual cortex but in reality there is this filtering and nonlinearity happening at many stages and we summarize it as one effective filter and one effective nonlinearity in principle the other generalization of this model include multiple filters and a nonlinearity that depends on all of the filters that you would be able to estimate now please continue with your question yeah thank you thank you and the spike will be represented in we seeing the picture or what will be the spike and the spike is represents the response of one neuron and so now the many neurons collectively will result in you seeing the picture okay thank you yes so we will talk a little bit you know more about the multi-neuronal case later but for now it's a single single neuron okay any any more questions here my my question is in computer science when they processing the the image they will use many different kinds of filter like Gaussian filter, medium filter so my question it is for our human being for the new neuron what kind of filter we have so in this case and the let me see how to answer this question I will I will I will find a slide that I think will answer the question well so the issue that we are we are estimating each filter for each neuron independently and so maybe I yeah I don't think I will be able to so I will answer without slides and the idea is that you um so for each neuron the filter is different and they're estimated by presenting thousands maybe maybe I do have a slide yeah like that so you present many many thousands of stimuli and you observe which stimuli elicited the spike and which did not and after building on this correlation between stimuli and the responses you find the effective filter for that neuron and different neurons will have a different filter to in the linear model we are just using a linear stimulus response function and we you can average stimuli that elicited a spike and that will produce an estimate for that filter okay so maybe that I think that's the best slide I have right now so Tanya are those what are called Gabor filters ambition yes so this is another slide that can help with understanding this is from a Hubel visual paper in they won a Nobel Prize for finding this orientation selectivity in the visual cortex so if you know a little bit of review of the say visual system and so so this is the macaque brain flattened and if you go to this visual cortex which is in the back of the brain so this is a schematic view of the primate brain so in the back of the head under the skull here you have primary visual cortex and here neurons are selected for edges so that's the discovery by Hubel in the visual and they show that if you take an edge and you it's almost horizontal there is no response if it is you increase the angle towards vertical for that neuron there is more spikes and so for that neuron the optimal filter will be similar to the one that I showed will be a Gabor that is oriented at 45 degrees so the filter is so in their case they discovered this as you may have heard accidentally accidentally because prior to this people were studying the eye that is missing here and they were showing dots of light and they were continuing with the dots of light in the visual cortex and they were not getting a strong response until the slide itself had an edge to it so it was sliding back and forth and one neuron responded very strongly so they found that these neurons in the visual cortex were driven by bars and edges which can be modeled as Gabor's and subsequent but the method that I've been describing to you about this correlation it is useful because it doesn't rely on the serendipitous discovery I put on natural sins in front of the animal I record responses if I record responses in the retina as a result of processing statistical averaging I'll get filters which look like little dots I applied the same algorithm to neural responses in the visual cortex and I will get these Gabor filters and then it gets more complicated but my group and others are working on estimating what are the relevant features across different stages of visual processing so after v1 there is a secondary visual area v2 then v4 is the main one and then goes to the this part of the brain under near ears this is inferior temporal so closer here and here you can have neurons that respond to hands so the neurons you see present an image of a hand or a method the response goes down there are face selective neurons but that particular neuron is not selective for a face on it for a hand and the miracle here is that you rotate the hand and the selectivity comes back so the big question both in machine learning research and in neuroscience how to create selectivity for these complex objects that are invariant towards various positions and orientations and to answer the question that was posed which filter we are using in principle one should be able to correlate natural scenes with responses in different brain areas and recover the relevant filters but in practice this is only possible for the first few stages of processing because of these nonlinearities because we are attempting to summarize many many nonlinear stages as one filtering and one nonlinearity I had a question yeah have this result ever been implemented as convoluted as a convoluted convolutional neural network yes so it's true so they so now one can fit convolutional neural networks to neural data and using this convolutional neural network one can recapitulate the selectivity that you see here but there are two in my opinion there are two open challenges open problems that remain one is that we have a brain we don't fully understand how the signals are wired and what are the computations that take you from edges to hands in the brain because and then you have a convolutional neural network that reproduces this result but we still do not understand what are the computations so we have recreated the machine but we have just as many difficulties looking inside the kind of simulated machine as inside the real machine so that's one question because if you know technically speaking if I going to encode a hand so I need at least you know say 10 edges for all five fingers but then it's across different positions across different orientations so there is a massive convergence of signals from v1 to even area v4 and what are the rules that guide that convergence that is not clear even though we can reproduce it in a convolutional neural network so that's one problem the second problem is that the result that we get with this convolutional neural network is less robust to perturbations compared to what we have in the brain so some of you may have heard of adversarial attacks and adversarial neural networks uh how many people have heard about those should be a few right two three people yeah so the adversarial attacks I think you know worry people a lot because you can take an image and perturb it a little bit and the machine will think that this is now something else so it has a lot of concerns for us relying on artificial vision you can think about in the case of self-driving cars or assisting a camera that assisted driver you have a stop sign you interfered with the image a little bit and now instead of stop sign it says you know drive 40 miles an hour or for security purposes I can impersonate somebody if I know how to manipulate the input so this happens much less in the brain through a variety of non-linear error corrective checks that we don't fully understand um okay I think there's maybe there's another question or I think not now okay so thank you for your question yeah thank you and um so um so um yeah so these are the summary of uh sorry where is my so this is the where are the gabor so the gabor selectivity would be an example of select the reason it's shown here because this is for the visual cortex and would approximate selectivity of neurons in the primary visual cortex so do we do do we think like we covered this part um and um what um here what is presented and we can move forward with data processing inequalities and looking at multiple neurons yes or what do you think I think so you agree okay so what we discussed so far was the optimal filtering for one neuron but um this kind of for one neuron as a member of the population we say well this is what how each neuron should change and now we would like to think about how to put these spikes um towards perception how to read out information from large neuronal array so in this case we have to do um um some some mathematical background so we I need to know about um data processing inequality and um other other features because we would like ideally we have stimuli that will be eliciting responses across a number of neurons it can be thousand millions of neurons and one of the open questions is how to read out information from this many neurons and I will show it to you hopefully um on the next lecture if not today is that there is a simple formula actually that um you can read out information from arbitrary large neural populations and the reason for because ultimately the algorithm can't be too complicated otherwise how it will be implemented with um neural software hardware but for now the data processing inequality how many people know about it two three four five six seven I mean I think uh most of them should know it but uh maybe it's useful to recall it yes so it's a very useful theorem a very useful check on um um on various results um you know to make sure that sometimes when you review papers or read papers or even do your own research you want to avoid making a statement that you did some processing on the data and as a result of the processing the information increased so the theorem is that if a set of um if a random variable x causes y and y causes z then there is the information between x and y has to be bigger that information between x and z meaning that because I did some additional processing to get from y to z then information can only decrease or it can stay constant so for the future applications that we will need x will be the stimulus y will be the neural response and z will be our measurement of the neural response so we would like ideally we'll have to have such a measurement as to um capture all the information that provided by the neural response because we will know that that's the best we can do we cannot create more information than is captured in the neuronal response so um to prove this theorem we um we need some other facts such as the chain rule for information so um the chain rule for information if I have mutual information between x and joint variables y and z so that is equal to information between x and z and information between x and y conditional on z any questions about that it's okay it's okay so this is information between x and y conditional on z but y and z are kind of equivalent things so we also can write it as information between x and y and information between x and z conditional on y but um so actually what I think I'm missing here is that x and y and z form the so-called Markov chain which means that y fully determines z and x fully determines y so in this case the x and z are conditional independent given y and so this last term is equal to zero and so we get that information between x and y is information between x and z plus this extra term because information is a non non-negative quantity we will get that information between x and y is bigger than information between x and z yeah so that's our data processing inequality any questions it's okay they show this in my course it was a while back but they were traumatized enough to remember I guess okay all right so if so if z is a function of y then I can only decrease or keep it equal so maybe we can skip the second law of thermodynamics um but the entropy of the isolated system does not decrease and we can model the system as a mark of chain and the transitions are being is that also was covered in in in the past course so I mean it's not absolutely necessary for us we can um but this also kind of the transition between states between x and y so knowledge of the present state should be sufficient to determine the future state independent of the past and you know that we can make various statements using this such as the relative entropy between states has to decrease with n and the relative entropy of a current state and the stationary state to which the system is converging also has to decrease with n and entropy increases if the stationary the overall entropy is increases if the stationary distribution is uniform okay so any questions about um about this okay no no no hands no it's okay I don't see questions all right so the next question is about sufficient statistic oh we have a question in the chat no um can you clarify your question um mu is the distribution of um um the the stationary distribution towards which the system is evolving and menu is the current um the current distribution of states so then very important quantity is a sufficient statistic and um um so we said that I for example think about the neural response I have set of stimuli I have neural responses and I would like to take some function of neural responses and I know that the information will that I can capture from neural responses using this function will be less than what is provided in the full neural response but there is a limiting case or ideal case of the so-called sufficient statistic where I will take not full neural response but some reduced function of it and still capture all the information that is contained in the neural responses so that's our goal is to find sufficient statistics for neural responses so a few examples about um about this so if we have a set of observations x1, x2, and xn um and these observations are drawn with the probability distribution that has a that is characterized by the parameter theta um and so theta can be a particular stimulus then the sufficient statistics is um for example it can be mean or any or some other function of this set of observations so it's in some cases this sufficient statistic is the one that forms a mark of chain so it's the statistics that is fully captures information that is available between the sequence and the parameters of the distribution um so was it also covered um can we skip over sufficient statistics or review it um more fully how many people know about statistic yes games you don't know so the information between the parameter of the distribution and x the set of observation has to be greater than information between theta and some statistic and it is sufficient when this information is um equal so that's our goal is to find such summary that will capture the information about the parameters of the distribution so um some examples so if we have independent and identically distributed sequence of quintosses with unknown bias then what will be a sufficient statistic that question yes for example hi um probability of zero no statistics is a function of the body the mean of zero yes yes so i mean that's what we would like to know is the probability of zero is theta is a parameter and we would like to kind of estimate it by um taking a function of the sequence observations yeah you can do that yes okay thank you so um so number of ones or a number of zeros is a sufficient statistics other examples sequences with the same mean um and n are equally likely so um that's i guess that's the proof but we all know we all know so we now have a statistics where the theta times the sum of these values which is number of ones and then the sequence itself forms a mark of change so that's one example so then a Gaussian distribution we know what that would be right with the mean theta and variance one so variance is given you need to find theta so that will be the sufficient statistics will be what so um here the variables are not between zero and one right all the other yeah so it's a Gaussian distribution okay what is sufficient statistics for a Gaussian for the mean of a Gaussian come on hello okay yes the mean okay very good yeah good all right so we don't need that um so i think if we figured it out so um the sufficient statistics for a uniform we have a question in why we consider thank you for your questions variance one because then i will need another i will not be able to get away with one variable if i don't tell you what the variance is then you have to also estimate it so your sufficient statistics will have two parts one will be the mean and the other one you have to estimate the variance um and uh what is important for um so let's see so now um like for a uniform distribution then it will be maximum and minimum but um um it's okay so we don't need maybe minimal sufficient statistics if it is a function of every other sufficient statistics but uh what is interesting is um what we need to pay attention to are the exponential um the the fact the the Gaussian is important because it's an example of an exponential family of the distributions and that's actually will be important for the neural case so before we get there we will we can discuss the the so-called factorization theorem so that may also have been covered in the class and this is out of covering thomas so if um um did people know about it or should i skip it um so basically if you have a random sample from as before of the probability distribution f of theta and theta we don't know and we need to form a function of x1 and xn to determine this um so we need a statistics that will be sufficient if the probability distribution factorizes so um f theta as a sequence of x will be some function of x plus another function that depends on theta and on the sufficient statistics so it's a kind of a coupling term between the parameter of the distribution and the sufficient statistic and everything else that depends on the particular instantiation is a separate function so if that happens then our this will be a derivation of these we will know that this function t is a sufficient statistic so we will use it to derive a sufficient statistic for um for neural for neural responses um maybe do we need to go through the proof um we can go or we can skip it it's uh the audience call i think uh we also did this in in class so what okay all right so we can skip it and go back to neurons so um the key assumption is that we are going to model responses of an individual neuron as the logistic function of the stimulus um by this i mean that the probability that the neuron produces a spike um which is rn equals to 1 is equal to 1 over 1 plus and then there is an exponential of some function of the stimulus and um so um the what's important is that once I have this sigmoidal tuning what we will derive now is that um it is possible to uh form it in the exponential form and derive a sufficient statistic for the neural responses for arbitrary large population okay so that's our goal so our stimuli are vectors they can be the dimensional vectors and the neurons um to begin with are conditionally independent meaning that the probability of um responses from two neurons uh multiply given the stimulus so there are no noise correlations and then we will add noise correlations but for now it's a product it is still a complicated problem because the neural responses are not independent because they are filtering um there are couples through filters any any questions about this equation so in the first equation uh I think this is the probability that rn is equal to 1 right yes so because the right hand side does not depend on other n no yes right that's correct I lost that somehow yeah so this is um should be probability of rn equals to 1 uh is a function of s and then um similar to what we discussed in the past rn equals to 0 will be 1 over 1 plus e to the plus 2 f of s uh the the object and um now our goal is to show that uh this p of r given s can be rewritten in this form uh exponential sum over n the various um spiking and not spiking from individual neuron times this function for now specific for neuron n minus some function of the stimulus once we rewrite this in this form our sufficient statistic will be um will be this will be this term okay and um so now um maybe uh it's useful to write some things on the board um and we will consider the case where this f of s is a linear function um or maybe we it should be tried to rewrite um I will have a derivation in the linear case and then we will um you can try it yourself for the uh general case the same um form so um let's see how to do it so that's um um maybe I will ask for some help on with the blackboard um let's see so um maybe um so how to write it so the way to write it is um let's write um p of rn is um um okay do you think I can let's see can we see the blackboard yeah r and s and this is one flash into the f and of s right yes let's see yes so that I should write um the p of r even s is given s is the product of this thing over n of e to the rn time f n of s e to the sum over n rn f n of s divided by this product over n of one plus e to the f n of s so and this part here is independent of r now this part is what is called uh this uh one can say partition function or um a n of s okay so this uh a n of s is equal to the logarithm of uh this is sum over n logarithm one plus e to the f of s okay okay yes thank you so the critical part is that our sufficient statistic is sum over n rn rn of n uh isn't there a factor two missing p of r n given s so it is um no I think because we wrote it as um um because there is a difference between if you multiply by the e to the f of s both my equation then um yeah I think there is a factor two I mean there is only a factor two difference yeah in in the exponent okay so otherwise it's okay um I was thinking that there isn't but um so like in the second equation here on the left board I think we have to write e instead of one we write e to the minus um no the denominator is e to the a n of s okay um yeah it is just this yeah I I see okay so maybe it's two two is okay um um as a cosine yeah so if um yeah so you see I have two cos two cos so it's two exponentials e to the plus e to the minus or it is uh e to the plus times one plus e to the minus two okay so ah maybe your r n is not zero one but it's plus one minus one and that that is possible so if r n is plus minus one then this thing is is e to the minus f n of s plus e to the f of s so which means that uh yeah so then your first question is okay and uh and then this is uh e to the uh minus f n of s and like this also and so this is two cosine of uh f n of s actually then you call a n of s this this thing right which is your equation uh yeah okay thank you okay so now um if you have a linear fun if f n of s is a linear function w dot s then our sufficient statistic is um um so then instead of f n of s we write s times w m uh and omega does not depend on n okay no it does w so w I guess I'm trying maybe it's omega or maybe it's um this this function here um okay so then this is uh let's call this uh uh so e to the sum over n r n times omega times s minus alpha n minus this a n of s and so and then you can write this as s times e to the s times t of r minus some a of s and then uh your t of r of uh r is equal to the well sum over n if this omega depends on n it's a sum over n of uh omega n times r n right yes very good and uh and then uh uh yes and this a of s is okay yes so now let's take this into a very nice box because I think it's a very important result we just derived an equation for how to read out neural responses without information loss by taking a linear combination of neural responses okay for arbitrary large populations and I'll show you the the miracles of this equations so um kind of very very important so um I'll switch to powerpoint and we will talk about applications of this equation and the history of it how um you know it was guessed and okay and then um so very important equation so some background um how it was discovered and not discovered so before this equation was discovered uh people talk about population factors it's um for those who work in neuroscience many of you might have heard about this um how many people have heard about um population factors population vectors was heard about I think no one has heard about okay no but okay good so then we are not redundant so there was an experiment by tergeopolis in college in 1986 and um they were a steady movement and it was in a in a monkey and monkey was making movements in different directions up down and so on and here is an example of the neural activity and um so 10 trials 10 movements in this direction are shown here with spikes from one neuron so in this case you can see if you look at this neuron um if the movement is down um to the bottom um then uh it produces more spikes so this also goes back to this earlier question what is the filter for the neuron so they would say oh for this neuron the preferred feature is the movement down to the right and so we will assign to that neuron that vector uh w n and then they said oh but actually um what we notice is that if we take these vectors w and in their case they were normalized to length one just the direction and multiply by whether the neuron produces spike or not then um so imagine that this neuron um has a preferred movement in this direction and if it produces a spike then it's likely the movement was in that direction if another neuron has a preferred direction um diagonally to the top and it did not produce a spike then its preferred direction will be downgraded so the construction is you have many neurons for each neuron you find it its preferred direction of motion the preferred stimulus and that's that and then you democratically weigh their responses zero or one or one and minus one with their preferred features and then because each neuron has a preferred vector the sum across the population is also a vector so they're called population vectors and um it is a vector so if you have 1000 neurons but if the motion of the hand is described by a three-dimensional vector the population vector is a three-dimensional vector and so this population vector uh captures the actual movement that the animal made uh quite accurately so um so in the georgiopolis example this is um um an example of a democratic redoubt from a population each neuron has its preferred direction and it votes up and down on a given stimulus and it says the stimulus matches my preferred direction or it doesn't and we average across neurons and we get the final movement so that's the widely used activity construction and you see it's kind of similar to what we have on the board but the difference with the georgiopolis equation is that they normalize the vector to have unit lengths and that caused some um necessitated a series of papers that we will be discussing um maybe today how much time no maybe next time um but basically it's a population redoubt and um um um we will discuss how this population redoubt differs from the information preserving sufficient statistics that we derived on the board next time but you see that they are structurally very similar okay any questions any questions yeah i have a question uh is uh are these neurons uh linked to the action of movement or to the perception of movement i mean if the monkey is moved or if the monkey is still and the cage is moved uh do they trigger too um so in this particular brain area this is uh monkey movement um and then there are other brain areas that are you know would be triggered by when the cage is moving or where the visual stimulus is moving but um the advantage of this information theory derivation is that we have a general description um you know for various kinds of responses and as you can see i'm writing here that it is um widely used to decode place cells in the hippocampus modern neurons as is here eye movements visual neurons sensory neurons so it's a general um general equation um and then in in a specific context you say well what is the space was in which my uh these receptive fields or preferred directions live to to to make it applicable um you know what will be the space in which this population vector moves the interpretation of rhythm thank you okay so i think maybe that's a good place to stop and then we will continue with um um with interpretation of this equation i think it's very important and how it can be used for reading out neural activity okay thank you so i have a question so so looks like uh the there is a key assumption here that uh uh um neurons are independent and uh so that the response is independent which i think uh it's another important principle no in neuroscience of efficient coding is this true yes so um so the equation that we derived assume that it is independent but actually you can generalize to foreshadow some of the discussion on monday is that you can add some correlations between neurons as long as these correlations are not uh stimulus dependent they will not affect the form of the sufficient statistic and therefore um the slide is here um let's see here the equation so if i can write the total probability distribution of responses given a stimulus in this form which we had before plus some correlations between neurons uh so the jj but if the jj are not dependent on the stimulus it will not affect the form of the optimal readout so we can use this equation in some cases with um neurons not being independent okay okay any question and uh if not uh should we stop here uh tanya yes and then uh we continue on monday and uh very good so thank you very much tanya and uh have a nice weekend nice weekend to you all and uh we reconvene uh on monday at nine a.m okay thank you recording stopped