 Buongiorno, ragazzi. Buongiorno, possiamo iniziare di nuovo? Grazie. Ok, molto bene. Quindi, io credo che siamo già... alcuni studenti sono ancora messi, ma credo che possiamo iniziare. Ok, quindi, Tania, magari potete share la scelta, e possiamo iniziare con la lectura. Ok. Potremmo iniziare. Tutto è ok, no? Sì, sì. Poi andiamo. Potete iniziare a ricordare? Ok. Io voglio semplicare, brevemente, che abbiamo fatto così fa, e dove andiamo. Quindi, abbiamo discutito la coda maximale informativa, e abbiamo discutito la trasmissione d'informazione da molte nuori nel preso di noia, e abbiamo observato che c'è una serie di trasmissioni base. Quindi, c'è diversi modelli di noia e diversi applicazioni. Ma, come aumenta la noia o aumenta il numero di elementi di underline e di elementi di trasmissione che si avvergono, l'ultimo il numero dei stati segue un processo di branching e aumenta esponenzialmente con questo numero di noi. Non ho imparato questo, quindi parliamo di questo, ma, la coda di trasmissione è informazione, la coda maximale informativa e la geometria di hyperbolico. Voglio parlare del link che avremo avuto e anche in la coda di Mathias, ma basicamente il numero di stati che, aumenta la noia segua il processo di branching e questo aumenta esponenzialmente con aumenta il numero di noi o aumenta il numero di stati. Quindi, come ho detto in breve questo processo di branching ha una connessione con la geometria di hyperbolico. Ma, so far, abbiamo discutito solo una variabile dimensionale. Quindi, il progetto per oggi e le altre lecce è parlare di una coda maximale informativa quando il signale è multidimensionale e poi il set concludere di lecce ci portano tutti. Questo processo di branching per quando eseguiti diversi tipi di cellulare per incontrare uno dimensionale di input e l'analisi che ci staranno oggi su come incontrare diversi variabili allo stesso tempo. Quindi, il progetto per oggi è la coda maximale e poi discutere due soluzioni famose che uno dovrebbe essere familiarizzati uno è conosciuto in ingegneria come una soluzione di acqua e io descrivo l'origeno di questo nome o un fenomeno di decorrelazione e fare comparazione con l'analisi neuronali nel corpo visuale. Quindi come ricordato, consideriamo la trasmissione informazione per il canale di Gaussien e ora dopo questo ricordato dobbiamo andare verso l'inferno multidimensionale in Gaussien. Quindi, quello che discutiamo è che, per un segno x, il dettaglio genera l'analisi neuronali con un segno con alcun senso corruptivo e poi discutiamo che, se abbiamo una distribuzione di Gaussien in genere ha questo formato è un exponentiale di funzione quadratica dove z- è l'amplificazione diventata dalla varia che è il prefactor e poi la distruttura di questa distribuzione è minus p log p ignoring some infinite additions to this equation or the average of log of PFC or this will be just the variance of the signal and the mutual information is this formula here depends on both P of x and P of y and for a linear Gaussien channel there is an important result that this is a one half of the log base 2 of the variance of the output divided by the variance of the noise or you can write it explicitly because the output has noise in it as one plus the variance of the input signal times the gain divided by the variance of the noise added so sometimes it's also written as and we will also use this as the variance in the signal divided by what is it called effective variance so the noise is added to the input after the input but in a linear system you can think of that added this noise to the input so the effective input noise so in other words the information for a one-dimensional variable is a log base 2 of 1 plus the variance in the input signal divided by the effective variance in the noise or in this equation variance in the output divided by the variance in the output so this is in the 1G case and now we will talk about multiple variables so imagine that we have multiple output variables and multiple input variables and we would like to know what is the best way of filtering them to to maximize information so in the linear case we have just like we had for one-dimensional case but now in a multi-dimensional case yi is a vector and its components are given as linear combinations of the input variables xj and this is the matrix jij plus z and we would like to know what should be this optimal matrix or how to filter signals optimal so the end along of remember the equation that we had maybe Matteo, would you mind writing the equation for this linear Gaussian channel one-dimensional case on the board and then we will and over is eta effective squared right so this is one way of writing it so the other way is of say the fluctuations in the output right divided by the noise squared right? Is it okay? Yeah I think so Allright so now you see that from the expression that we will have with multiple variables will be similar to this but instead of the well maybe people can guess where we are going but instead of variance so we have the variance for the signal for example so the multi-dimensional analog is variance along one component we will look at variance along multiple components so we have a covariance matrix of X i and X j so then same thing for the noise so in principle one could guess the form of that we will obtain but the relevant probability distribution because it's Gaussian instead of in a multi-dimensional case we have what we had before a quadratic function but now instead of dividing by the variance we multiply by the inverse of the covariance matrix and sum across the components and instead of multiplying dividing in the pre-factor by the variance we divide by the determinant of the covariance matrix so this is a repeat and now this is the expression for the information as we had before and we can look at what are the variables what are the functions that we will need so instead of X and Y we have a set of X and a set of Y we are computing in between them so we will need the joint probability distribution between them which may be difficult to write down so instead we will write it as a product we will need a conditional distribution P of Y i given X J and P of responses Is it easy to write it down? So this is the average over the log base 2 of P of X i given Y i divided by P of X i something like this Yes Yes, we can do that in salvage over all the X i of the joint distribution of the X i and the Y i Ok? Ok, so the you see that the material there in the denominator P of X i so we need this problem so that's how it looks like that's what we will have and actually it has a very simple form it's complicated in terms of the number of expressions but remember how in the one-dimensional case to write down conditional distribution of Y given X we just write it in equivalent form of the Gaussian probability distribution for the noise so this is the noise this is the noise covariance matrix and this is the noise component J So you will find this expression in the Bialyx textbook and then in other places but this part of the lecture follows his textbook and then here in the in the denominator we will have a determinant of the noise covariance matrix so any question so far so these are the two probability distributions that we need and and also we will need P of Y i but even though it is scary so Anthony is in his book on general theory in a nutshell so drowning in a sea of indices so there are a lot of indices here but let's as long as conceptually you think about what is written here so this is the noise this is another noise component and this is the covariance matrix then we don't we will have some guidance and not follow all these indices so then we need this P of Y but we know it's going to be Gaussian so this should be Y's here and this is determinant of covariance matrix of the noise so instead of deriving so one way of deriving this distribution is to take P of Y given X i multiplied by X i and integrate but because we know we are going to have this form we will just need to compute the covariance matrix of output variables so S of Y is G times the covariance matrix of the signal times G transpose plus the covariance matrix of the noise and using this then we have the expression for the P of Y maybe it is is it useful to derive this covariance matrix of the output signal is it clear or so so Y if we go back here so we know that Y i is equal to J i J X J plus i i so this is the covariance of the output right yes the covariance and then the covariance of the output and then this is essentially Y i is somewhere a L of G i L times X L plus this i i right and then this is sum over L prime of G J X G J L prime of X L prime plus i J ok then well these are all zero mean ok so then if you take these times these you see that you have sum over L and L prime of I L average of X L X L prime G J L prime ok if you take these times these these are independent and so the expected value is zero ok if you take these times these these are X i i and X L are X L prime are independent so the expected value is zero and if you take these times these you get average of X i i J so you see this is in matrix form this is just the matrix G which written there then you have the matrix of the input which is S and this is G transpose because you see the two indices are transposed G transpose and this is the covariance matrix of the noise which is this matrix here ok very simple linear algebra I cannot think can not hear are there any questions from the audience I can't quite see the chat I think it's ok no ok ok so um it's a little bit cut off but now we have this average but it's a logarithm of one probability distribution which we have here on the previous slide and P of Y i which is the same Gaussian but with the variance that is written on the board S y so what to do about this so the logarithm of the exponential part is this um term right here and P of Y um i the exponent of this becomes plus and this is Y S y to the minus 1 and I can write here so this will be ok and then there are also prefactors from here so in other words the average of this is when we take the average over probability distribution the average of these Y for example gives you a covariance matrix and this is a covariance matrix of the noise so these two terms cancel out and it's actually similar to what we had in in the one-dimensional case plus we have one half of the logarithm of the ratio of the prefactors determinant of S y over determinant of a variance matrix of the noise so in other words all of these complicated integrals they mostly disappear and the only thing that is left is the logarithm of the determinant of the covariance matrix of the output divided by the covariance matrix of the noise of these right ok so yes we are doing the other calculation y i ok so here you have the determinant of S y and then normalization of this distribution here as the determinant of the noise ok so the ratio of two determinants is the determinant of the ratio ok so this is one half log 2 of the determinant it's a little bit difficult to hear no can you hear better? no better so I'm just saying that the the ratio of the determinants is the determinant of the ratio and then the log of the determinant is the trace of the log right this is what you are using there so this is one half times the trace of the log log 2 of S y times this psi to the minus 1 ok so so that the log of the determinant is the trace of the log is clear to anyone so the way to think about this is that the determinant is the log of the determinant is the sum of the eigenvalues which is exactly equal to the trace of the log of the matrix which is the log of this matrix ok ok so now we can compare these expressions and relate them to for example one dimensional case so in one dimensional case we had information being one half log 2 1 plus g times power of x you know power of y divided by the noise or signal divided by the effective noise and what we have here is the sum of the independent components so in a Gaussian case yeah I will wait I am just rewriting your equation essentially you are comparing this equation here to this equation here right or even better to of 1 plus g squared x squared divided by g squared right is it ok itania yes I think so yeah so why is the Gaussian case interesting because the Gaussian case we can diagonalize it and transform it to a basis where the components the system is acting along independent components and so in this case we know that information from independent measurements ends and so we can say well it was along variable and now it's just a sum across multiple variables but the technical expression is that we have to work with the product of the two matrices but if we can diagonalize this matrix in the same basis then it will be a sum of the information along the different channels so one of the useful examples so this is just a rewrite and is in terms of filtering and Fourier transform so there are multiple ways of multiple applications of this equation one so the equation that we would like to study is the coding of multiple variables and it could mean that multiple variables are variables in time for example I still have a one dimensional signal but the values at different moments in time I consider as different components and so then the neural response is also as a function of time so it is same analog signal but across different times so in that case what we will do is create predictions for how to filter signals optimally in order to transmit most of the information but a parallel analysis where this has also been applied and we will also compare this with data is that x i represent for example different spatial signals you can think about individual scene different pixels will be different x i components and then how should I filter these x i components in order to maximize information and the y i could be even one dimensional it can be one neuron and we are considering what is the optimal filtering in space to maximize information per neuron or it could be multiple neurons and so then it will be information maximization across the array any question so far ok so in the case of the optimal filtering which we will do first instead of the summation across this indices i we can think of this as an integration in time so I have a signal x of t and I am solving this signal x of t by multiplying by a kernel g of tau at times that waste signals at previous times and as a result I am getting the response of the neuron at time t so I am asking what is the optimal filter g of tau to which I should apply which I should apply to this signal in order to get the optimal maximum informative signal so the question is the signal is corrupted by noise and at some frequencies the signal contains more noise than the signal so intuitively it might be useful for me to eliminate those frequencies where the signal is overwhelmed by the noise so I will be throwing away some signal that's the purpose of the filtering but I will be throwing out more noise than signal if I take into account the spectrum of the signal and the noise so that's the basis for that's the goal of this derivation so we are rewriting this again one can rewrite it in terms of the effective noise the covariance matrix and the interesting part and the advantages of the Fourier transform is that we are thinking about stationary signals and so the correlation function both for the noise and for the signal has to depend only on the time difference between the two variables if I have a time series I average x of t and x of t prime but it doesn't depend on all values t and t prime so it's not a full matrix that is it's a full matrix but it depends only in I want to say it's a circular matrix but what's the term what's the term it's the auto correlation no I mean what are these matrices called where diagonal matrix you think it's there and I thought it was kind of rotated and with periodic conditions anyhow so it depends on the t minus t prime and same thing for the noise and in that case these matrices are diagonalized in the Fourier domain so when we go to the Fourier domain instead of thinking about the auto correlation matrix as a function of time we think about it as the power spectrum as a function of frequency and different frequencies are independent if this is a linear system and the signal is Gaussian and so once we have this then we say information is obtained by adding contributions across frequencies and in that case both the covariance matrix for the noise and for the signal are diagonalized in the same basis and so in the same Fourier basis our equation for the time series for the information between a sequence of inputs and the sequence of outputs is one half sum over these frequencies of log 2 1 plus the signal to noise ratio so our goal is to find how to filter G of omega in a way to maximize this expression any questions Carlos signal to noise ratio the question is what is SNR it's essentially so this this is the signal and this is the noise and this is the signal to noise ratio the issue is that in this case if the signal and the noise are diagonal in the same basis then essentially you can take also G to be diagonal on the same basis and you can just have these to be just a sum over the eigenvalues which are the eigenvector which are the Fourier modes of of this thing our expression signal to noise ratio one could define it as the power spectrum of the signal over the power spectrum over active noise and this sum over frequency you can also write this as the integral over frequency times time and so then people define information rate which is just this integral across frequencies and our goal will be to maximize this integral so now we can talk about two specific solutions which are famous so the first one is the water filling solution first discovered obtained in engineering so far none of this in question between for the mutual information between signal and the output is of course it's a linear system so it is not only it's not neuroscience specific and was obtained in engineering but we will discuss application of this to neuroscience so we are interested in maximizing information rate but it's important to keep in mind that I'm maximizing information rate but at a given constraint for example the variance of the output signal because it's not fair to say if I could have a filter that has very large values so I will be able to multiply signal by arbitrary large values in fact I will scale down the noise so but we know that's not realistic that's not that does not reflect the case that the noise is cannot be removed without investments in metabolic constraints so we maximize information rate for a given variance of the output so this is our rate the integral over I would say d omega and a subject to constraint that the integral of the signal at the output is a constant so we add a Lagrange multiplier to this integral and so when we optimize so we divide and you get 1 over log 2 and then it is 1 logarithm is the sum of the signal plus effective noise and that's something that so only s of omega depend on the filter g and minus lambda so now this has to equal to 0 for an optimal solution or in other words you get that the sum of the signal after filtering plus noise should be some constant so that's the idea behind the water filling solution you think of noise as a shape of some kind of base that is holding water and then the signal is I'm putting into this base and the sum of the signal plus noise has to be equal to a constant so this is an example analysis for neural neural data so we had this is the estimated noise for photoreceptors and the estimated noise for large monopolar cells and this is how the amount of filters so this shaded area is how one should filter how photoreceptors should filter the incoming signal and so on so in other words with this water filling solution you measure the reliability of the channel at different frequencies and if this is the output of the signal that I can transport then above this level I should filter should be set to zero is that so any question so far for the water filling solution so the optimal filter will go to zero at these frequencies ok so in this region the s omega should be the difference between the water level and the depth I mean so this is nf in spite of this region it should be zero so a related a related formulation it has a different name the correlation but in effect it's pretty much the same but it talks about the gain g what is the gain g that will give you this water filling solution so one solution was to say it's s the output variance signal variance divided by the effective noise or you can say it's signal variance and now we explicitly write the gain divided by neuronal noise and we go through the same exercises before we have signal we have the gain and we have noise added or signal over effective noise and our water filtering solution is that signal times the gain has to be a constant minus the noise or the gain has to be constant minus the noise over the power spectrum of input signals so this is interesting it tells you that if some signals are common then the filter should have lower sensitivity to those signal components so in for example in natural scenes as we will discuss the signals are changing relatively slowly mostly the null hypothesis is that nothing is changing but you know from experience and perception is that we are very sensitive to sudden motion, sudden changes in light intensity and edges and so on so we are sensitive to these signals because they are rare so most of the signal is dominated by low frequencies but the gain is set such to suppress the dominant frequencies now we can look at it in a quantitative way comparison with the data any questions about this decorrelation solution so this one a bit of a side as I mentioned that most of the things in natural world are very change very slowly so if one pixel is white like on the screen then the nearby pixel will also be white same thing here but we are sensitive to edges because they are relatively rare even though you know to us this is the most telling thing and in order to make predictions we need to know the signal power spectrum therefore people have studied extensively the statistics of signal in the natural world and this involves statistics of video signals natural images so that was important for the development of television and statistics of natural sounds for telephone communication radio but also we will talk about later for a factory signal and it turns out that in all those cases the common observation is that the signal has this power law power spectrum so what is shown here on this graph is the power spectrum as a function of spatial frequencies this is from Bill Dalick's paper on the scaling in the woods in the pyrril letter but you will find this in the power law behavior also in the natural sound and in turbulence because of the turbulence also in the factory signal any questions so far so then I will have a little question about this graph you see there are two graphs and they are overlapping but you can tell that they are slightly different can anybody guess what is the origin of these two two lines and what sets the limits between for natural images that we can, over which we can measure the power spectrum come on it's Fourier analysis what does the largest frequency correspond to or the smallest frequency correspond to this for an image no Tania I think I can't quite hear the answer but could you repeat the answer no I mean my question was what does the largest frequency correspond to so I think basically but it was not a question to you of the camera with which these images were taken yeah yeah it was a question to the to the students so you know how big of an image I can take and how well how small I can discretize so that's the largest frequency the largest frequency and the smallest frequency in terms of how large is the image in terms of the smallest pixel that can be resolved so in this case it was with two different cameras and one could see that they overlap despite being taken with two different focal lengths but you can merge them together to have a join power law distribution so this is an example of so we know that natural signals have this one over f power signal and so Arti can readily use this and they said so if you go back to our equation which is constant minus noise divided by the signal so one component is one over square root of signal power and so this is one over f squared so the square root is one over f and so when we invert this you will get some kind of a filter that is a band pass filter so that was the first prediction that neurons in the retina should integrate within one region e then subtract from a broader region so that was one of the first explanations for the center surround structure of receptive fields in the retina and the intuition behind this observation is that it should it would be redundant for the neuron if each neuron relate the values of pixels individually instead it relates how that pixel is different from the surrounding pixels so this is the theoretical prediction in layout spatial layout but one can also do it in the Fourier domain as we were discussing so this is actually Attica sorry, this is wrong reference here but and so this is the noise curve B is the noise and the product of our equation which was constant minus the noise over the signal and results in this joint kernel A and that's the predicted filter and they actually went further than that and asked how does the solution change when we change noise so can anybody guess what happens when how can we change the amount of noise in the visual system so our equation is I will write it here maybe Matteo if you could also write it too so it is constant minus noise over signal and then square root of B yes, so oh my god the prediction should be if noise is smaller then I can decorelate more so if noise is 0 and the signal is if noise is 0 then the optimal filter is 1 over square root of S oh my god so if the signal is 1 over f squared then 1 over f squared will be the optimal filter will be f so this is in the limit of zero noise if noise starts to dominate then then it shuts off and then it has to go down to 0 with the constant so this is in the case of the retina one way to change the noise is to reduce the magnitude of the signal and so to make reduce the overall brightness so present a stimuli that have lower contrast and then what happens is that when the signal to noise is high there is a strong decorelation and neurons report average over noise decreases they remove this filtering and then they can start to integrate and they become more integrated so in that case if you have a lot of noise then we are trying to pick up the signal so you start to average in average over broader areas so that's the theoretical prediction we will I'll show you comparison with data so this is for human psychophysics and this is the predicted filtering at different signal to noise ratios and these are the measurements so this model I think has only one adjustable parameter and is able to account for changes in the filtering across a wide range of light intensity now I will skip a little bit over slides they are a little bit out of order so I want to show you a slide that this one so this is a famous curve and it relates to this so the graphs had this filtering shape and you can verify this using this simple construction and it is a test of your own visual system so does everybody see this curve right here that goes up and then down yes I can't quite hear back but yes people see it so what is shown here are the sinusoids and it's a modulated sinusoid and the frequency is increasing from low frequencies to high frequencies and the contrast is steadily decreases from this part of the curve to this part so there is nothing there are no changes in so the reason we see this curve so the change in the contrast is the same along the y axis but because we are more sensitive to this range of frequencies you continue to see the modulation up to a smaller change in contrast but at lower frequencies and at higher frequencies the signal should become greater at a smaller resolution in X is that okay so you see how this picture goes together with the measurements of the curves that I showed from this picture so then another part then there are two other things that we can discuss and another example is in the retina you can ask so this was for information maximization but another part is we can separate this for general ask how well we can reconstruct the signal not only for just in general to maximize information but more specifically to reconstruct the actual signal and here is another example of comparison of the filtering that is done between the photoreceptors so in this case y of t again represents the filtered version of the input signal plus noise and in the Fourier domain that's our equation where the output is g of omega times x of omega and what filter to apply to the outputs y in order to best estimate the input signal so in this case we estimate the signal x we applied to output y with a kernel f and we seek to minimize the average reconstruction error so in that case the reconstruction error is the actual signal minus the estimated signal and we average over different noise generalization over time and then in this case so this is in reconstruction error between x of t and its reconstruction and the integral over dt so actually there are two integrals here so then we use a property of the Fourier transform that the power in real space, in real time is the same as the integral over frequencies in the Fourier space and so if I have a convolution between two signals then when we take in the Fourier domain that's the product of two frequencies so in that case the reconstruction error is x of omega minus this convolution becomes output y times f of omega and so in this linear case so if the frequencies are independent then we can find the optimal f how to filter outputs to reconstruct the signal and so we minimize the reconstruction error with respect to the unknown function f and so what you get is how the average between x of omega minus f of omega y of omega times the derivative so in this case we divided by f complex conjugate of omega so we get y of omega and then the average still holds and the average should be zero so rewriting this you get that f of omega is equal to the power between x and y divided by the output of y so it's a similar equation to the correlation because I'm dividing x by the how the signals were filtering y so now in our case y of omega the signal y is related to x of omega with g and so one can rewrite this top part y complex conjugate of omega times x as so when we average the noise disappears and so it's omega times the power of the signal and then the output of y is what we already had in the past it's the gain squared times the input power plus noise and so the optimal filter is given by the ratio of this signal here which is g star times s over the output power which is g squared over the gain factor times s plus noise so that's the same similar correlation solution and so when noise is small then you get that the optimal way to filter the outputs in order to get the inputs to do the correlation so it should be one over the gain that was applied to the signal so one more time so if signal x has a gain g in order to reconstruct it we just divide it by g so the optimal filter is to invert this the filtering that was applied so and then taking into account noise we get that this is the optimal filter for signal reconstruction and then we can multiply both the numerator and denominator by g so you will get the power here in the g omega squared and this is one over g is this piece that without the noise plus the ratio of the output power spectrum to the output power spectrum plus the noise and so it is one over g which would have if the signal to noise ratio was infinite plus signal to noise ratio over so if we divide by the noise here then you get signal to noise ratio of omega divided by one plus signal to noise ratio over omega so in the high noise regime this is the optimal filter piece and it has some components that is depends on the signal and some components that is the only property of the filter in the retina and the noise so this is similar specific and this is the neural part how was the gain applied and its noise level so and this part is stimulus independent and now here is another comparison from bill bellick's paper so we can compare this part of the filter so the argument is that we have rods which is the first stage of the system they have some gain g of omega that they will apply in order to maximize information but they also have noise and so if the next stage wants to invert this optimal non linear optimal filter taken to cause noise then one can see that the filtering that is done by the bipolar cell actually corresponds to this prediction taken into account the filtering of the rod cells and the noise and its noise you get the response of the bipolar cell so to summarize this part of the derivation it says that in order to maximize information I have to filter signal in a certain way but then to actually use it I might need to invert this filtering and when I do this I have to take into account noise and at the first synapse in the retina one can see the evidence that the filtering is exactly what you would expect to maximize information but then reconstruct the original signals any questions so one can also apply this to the spike train domain so how you know the input that was put into the system and the spike trains that were obtained and so one can use this equation in order to estimate the underlying signal values and that would be predictions and then another part that I would like to bring back is that the actual signals are not always the are not only correlated but they are also non Gaussian so and that affects the next level of predictions so signals in natural world are not only correlated but the amplitude is non Gaussian so far we talked about predictions for Gaussian signals but we can also talk about how predictions are modified for non Gaussian signals so it turns out that these are harder to make but you can make some approximations for the input signals that will make it Gaussian and so one of this is that if you look at the probability distribution of light intensities as a function of the deviations from the mean so this is for the auditory signal for a Gaussian you would expect on the long domain you would expect to have parabola but what you would observe is that these more of the Laplacian tails and turns out that one can model this non Gaussian signal as a modulated Gaussian that the variance of the signal at each moment in time can be Gaussian but the variance changes as a function of time in other words that we have a modulated Gaussian and the sum of Gaussians with different variance is no longer a Gaussian and so it can give rise to this sparse distribution I feel like I should ask for some questions here so to provide more context is so for example when a person is speaking you have bounds of high volume and then silent so the variance is modulated in time and same thing with light intensity images where locally if you look at natural world the variance when you look at the sky is smaller than when you look at the ground so the overall light intensity distribution will have this non Gaussian shape can obtain this function as a combination of Gaussian with different variance so that's this procedure here where you take the underlying signal and and extract the variance here and multiply the variance of the signal times the Gaussian noise and as a result you would get a distribution of signals that is non Gaussian as a modulation of underlying white noise times the variance that is time dependent and so that's an example from the natural images here same type of the distribution and so we talked about this and then I wanted to un esempio from my own work on with predictions that are similar to optimal filtering but also signatures of these non Gaussian effects so so as we discussed today so in this case these are natural images that were presented to an animal and they are filtering in both spatial domain in the top part and in higher frequencies in the bottom part so if we focus on the top part here what is shown here is how neurons filter both natural sin and white noise the power spectrum of natural sin of white noise is shown here so for natural sin is 1 over f as we discussed and for white noise it's constant so in this analysis you would like to check whether neurons can adapt to changes in the input distribution with according to the prescriptions of the information theory under these conditions one is in the white noise condition the other one is in the natural sins and the product if you take so according to the the correlation prescription the product of the power spectrum of natural sins of neural filters times the signal should be a constant so within the range in which the filters are changing one can see that they overlap within certain range of frequencies and the overlap is the same to the same function and whether we are talking about natural sins which is in blue and white noise is in red but what is interesting and somewhat unexplained I would say this is a consequence of having a nonlinear model and also non Gaussian signals is that the decorrelation would predict that after filtering we will have a constant power spectrum but what we see is that there are remaining correlations and yes the signal for the white noise when the signal is white noise here in red that the neurons actually correlate the white noise signals in order to bring it to the same level as in the case of the natural sins so which part is not clear ok so any questions from the previous parts of the derivation not just about so Tania sorry can you hear me I can hear you yeah ok so maybe you can explain what is the relation between LK and PK and S omega and N omega in the previous slides so P of K is the signal so this will be S of omega ok L of K will be G of K ok ok I mean I have maybe a power point but I was reluctant I cannot quite comment in the power point so yeah so I apologize so essentially the prediction should be that the K times P of K should be a constant yeah so that's the prediction of the linear model and there are two deviations one is that first of all so the neural responses there are two kind of many things to talk about this data so what is shown here is a chain of neurons and it's similar so if you focus just on the blue curve it is similar to what we've seen in the retina and basically they do bank pass filtering now the question is to what extent they can be modified when you drastically change the statistics of input signals so in this case we change from correlated from natural scenes which are strongly correlated to white noise which do not have the same structure so it appears that the signal is optimized for natural scenes and when you present white noise it can change but only at low spatial frequencies at high spatial frequencies can not change in real time the explanation for this is that the system is set up to process natural scenes but there are some inputs that are malleable and they are mostly are coming from coordination between other neurons so it is on a broader scale maybe contextual modulation signals from other parts of the brain in this part can modify the low frequency the effective low frequency of the neuron but the rest of the system is fixed and is not adaptable on relatively short time scale that we are here about 10 minutes so the part that is adaptable when we change natural scenes and white noise then so the power spectrum of the input changed so it converts what it should have to its target so if you assume that this is the optimal solution under natural scenes then in the region where it is changing the change with white noise is the same as in the case of the target function with natural scenes another question is why is this curve not a straight line so the standard prediction that we discussed so far for the correlation that at least up until the signal drops off it should be a constant here in experiments is that it is a sum function that is not a constant the first explanation that can be given is that you have on the previous lectures we discussed that neurons are non linear so you have non linearity applied to the non linear output and so instead of saying that I need a constant I will say that there is some other function which we don't know but to the extent to which the input and if it is optimal then when the input signal is changing then it should maintain this function so the explanation is that in a non linear system you no longer expect a constant but there is some other unknown function that I don't think hasn't been derived so far and the observation is that in the region where the nervous system can adapt it maintains the same function under changes in the input distribution another point in the way of interpreting this is I was planning to talk about in the course about error correction I'm not sure whether that will we will have time for extensive error correction discussion but this is an illustration of an error correction meaning it decorrelates but it doesn't decorrelate fully so if you compare the power factor of natural scene and after filtering yes, it downgrades these frequencies but it leaves some residual correlations in the signal and so it's partial decorrelation you decorrelate but you also leave some correlations that can be used to correct for errors nevertheless so that's another explanation then the bottom part is the same analysis the same data actually but analyzing signals not at the zero frequency which is showing here zero temporal frequency but at higher temporal frequencies and in the case of natural scenes the power spectrum is 1 over f at higher temporal frequencies as it is at lower temporal frequencies but because it's 1 over f and 1 over omega then the power frequency of natural scenes at high temporal frequencies is actually below the power of the white noise at higher temporal frequencies so in this case the because the natural scenes have less power so in the top part the natural scenes have more power than the white noise so the sensitivity was less here the sensitivity is greater for natural scenes so in other words the signal integrates signals across space in order to get a higher signal at higher temporal frequencies and the product stays the same within the range of frequencies that the receptive fields are modifiable ok so I think I that's most of the words I was going to say with this slide any questions about the lecture today so we discussed the information and water filling solution and here are some examples of decorrelation this is in the primary visual cortex but you can see that the decorrelation happens at all stages of visual processing so any questions I will begin the next talk by the kind of a psychophysical adaptation and there are some examples of adaptation how to various parameters not just to low frequencies but also to color so it's known that people adapt to changes in color during summer versus winter and so on and also adapt as a function of age to the same color as the lens changes you compensate the changes in the lens quality and so on ok so I don't see questions so ok so should we resume on Friday Friday same time