 The fourth lecture of Tania Sharpe, please. OK. Thank you. So as a, hello? Is the connection. Recording in progress. OK. OK. So today, the first week, we talked about information. We talked about entropy and we talked about information in the linear system with Gaussian inputs and Gaussian outputs. And then we started to talk about nonlinearities in neural responses. And today's topic will be maximally informative nonlinearities for multiple neurons and how they are coordinated with each other. And so we will continue with this topic today and also talk about how that leads to a theory of diversification in biology. We will compare this with properties of neurons in the retina and also talk about how this is related to a theory of phase transitions. How one can think of this diversification as a second order phase transition and how this theory can be used to predict large-scale properties of retinal arrays. So to sum up what we started last week, if we have an analog signal and one neuron, which has one threshold, then I will not have access to all of the underlying analog signal. But I will only know whether the values were below or above the threshold. And so if it was above the threshold, we will denote it as one. And if it is below the threshold for that neuron, it will be zero. But actually, my information will be even somewhat less than that because the effective threshold fluctuates. So even if we were to repeat the signal, then on some trials, there will be a spike. And then on the next trial, the neuron will not produce a spike. So that's set up for one neuron. And ideally, in order to maximize information, and it's similar to those games that we discussed with submarines and so on, if I have a range of possibilities, I would like to place the threshold ideally in the middle of the distribution such that I will have one bit of information. But maybe that will require me to produce spikes more frequently than the system can sustain. And so in that case, I will be as close as possible to this central point given the metabolic constraints on the neural firing rate. So in the case of one neuron, one binary neuron, the problem of maximally informative solution is, I mean, it doesn't have anything interesting, is fixed by metabolic constraint. But then it becomes more interesting once we have two neurons. So this is just how we can quantify for one neuron. We will compute the response entropy, meaning the variability in this direction and across trials, the overall variability, and subtract the variability across trials. So for a given value of time or x. So now when we have two neurons, then what is interesting now, of course, I will have more gradation, more information about the underlying analog signal. So I have now two thresholds. And if we know that if the value was very high, then both neurons will spike. And if the value is intermediate, then a blue neuron will produce a spike and the red neuron will not produce a spike. So let's see if we have any cases like this. So for example, here. So in that case, I know that the signal was somewhere in the intermediate range. So with two neurons, we now have more access to the underlying analog signal. But the question remains, what is the optimal separation between the thresholds of these two neurons? And so the parameters of the problem, there are only four parameters. The two thresholds, mu one and mu two, and the uncertainty or reliability of the two neurons, mu one and mu two. So we will consider that the solution will be at different levels separately for different values of noise because noise is also to reduce amount of noise. I also need to invest into metabolic resources. So the one and then the thresholds for two neurons, one can approach it differently. If we fix the maximum, so in this problem, we decided to fix the average spike rate across the two neurons. And in that case, the average spike rate is related to the total number of spikes that the system can produce as a whole, thinking that maybe that's limited by the blood supply and so on to the group of neurons. But one neuron can spike, produce more spikes than the other. So the difference between the thresholds of these two neurons is less weakly coupled to the metabolic cost than the average threshold. So that will be our key parameter to set what is the difference in thresholds for a given average threshold. So of the four parameters that are available, we will talk, there will be one free parameter to optimize. And this parameter is interesting because it will determine whether the two neurons are sometimes play equivalent role, meaning they have the same threshold and I have reached the result or they play different roles. So then this is the figure that I ended last lecture and now we will go over this figure one more time because I think it's important and then go inside the calculation and disassemble its parts to find out what is the driving force between the bifurcation that we see here. So what is shown in this three-dimensional graph is the information that these two neurons convey about underlying Gaussian signal as a function of the threshold difference between neurons. The average threshold is fixed for all of these curves and then noise is the same for these two neurons and then it changes from large values in red color to small values in low color. So one can see that when the noise decreases, the overall information that these neurons convey increases. That makes sense because once we have less noisy neurons, then we get more information. But what is interesting is that there is a bifurcation between having, for a given value of noise, if noise is large, then the peak of the information is at zero, means I would like to have identical thresholds and average the result. Or we can have slightly different thresholds and so we can look inside this information function and see what causes this bifurcation. So at this point we observe that the optimal threshold difference becomes non-zero and the further you go from the critical point, the larger is the optimal difference between these thresholds. So the less noisy are the neurons, the further I can separate them. The further is the optimal separation. And if we change parameters, so as I said, for a set of these curves, the average threshold is fixed, but if we move around, change the average threshold, then this picture will remain, but slightly the position of the critical point will change. So I thought that it might be interesting now to take a look inside this computation and we will examine together what is the driving force for this bifurcation. So I thought that we will have a little file in Mathematica and we can... First I would like to show the code is minimal, so I just defined the logistic nonlinearity here. Maybe the type is a little bit too small. Can you see the type? Let's see. So there is a way... is this Mathematica? Yes, I can do bigger... I think there is a way to change the font, right? Yes, yes, I will move things around. So what I was hoping... Let's see if I do a view or something now. Maybe in format. Okay, size. Let's do larger. I think there is an advice for me in the chat. On the right bottom corner you can change the font size, right bottom corner. Let's see. It's usually a slider and a light, yes. Oops, that's not what I meant. Well, I guess that's one way of solving the problem. So let's see, but there should be... It says style, input, format, size, 16 points. Like that. Is that better? Yeah, it's a bit better. It's a bit better, yes. We can make it better. It should be okay. All right, so... I mean, the code is not that... So I will remove parts that we don't need. So I will give you... That actually we don't need noise entropy. So the phenomenon is entirely based on response entropy. So this is... And so we define logistic nonlinearity. You can define a different nonlinearity. It doesn't matter so much. And this is... So now we have two neurons. So we have four response patterns. 1, 1, 0, 0, 0, 1, and 1, 0. So... And the response entropy... So P11 means... So this is a function of these four parameters, a threshold for one neuron, noise for the first neuron, threshold for second neuron, and noise for the second neuron. So we have a Gaussian distribution of signals. And we have logistic nonlinearity for the first neuron and logistic nonlinearity for the second neuron, and we integrate. So this is P11. So we need the probability that both neurons will spy. P00 is the same thing, but instead of logistic nonlinearity with respect to X, you get a logistic nonlinearity minus that. So instead of... So 1 minus... I do 1 minus this. Turns out it's the same thing as logistic nonlinearity with minus X. And then P01 is... You take 0, the opposite sign for the first neuron and the positive sign for the second neuron, and then switch around. So then our response entropy is a sum over the possible P00, log P00, P11, log P11, P01, log P01 and so on. So we just have four terms. And mutual information we don't need for now, because the main effect is already present in response entropy. So now if... Now we can disassemble and look. So if noise is large, so NN is noise in my case, so we will get... It takes a little bit of time because these are infinite integrals, and it takes a little bit of time to compute, but you can see that with 1 it will look like this. And when noise is small, the derivatives will have the opposite sign. So the critical point will be somewhere between 1 and 0, 2. So we can... What noise values would you like to try? That's a question for students. What value should we try? Maybe 0.7. 0.7. 0.7? 0.7 is okay? Yeah, I mean everything. But between this regime, we know that our critical point is somewhere in between these two regimes, right? So 0.7 is too big. That's the 0.5. You know how to find... The algorithm for finding is zero of a function. So we're looking for the zero change in sign of this non-linearity. And notice that 0.5, but now the scale, I guess it's getting a little bit smaller, the range. So we can do maybe plot range all if I can do that. We did 0.5 and we know 0.2 is too big, so we will do something in between. You know this method of finding is zero. You take the interval and you divide it by half. So between 0.2 and 0.5, 0.7, around 0.3 will be our next value. So it might be also 0.3 is already on the other side. And so what is... Maybe what would be interesting is to see how these various terms behave. So we can discuss how... So initially, when, for example, it can be a homework exercise, if the threshold is the same, and as you notice, I'm plotting the threshold difference. So one can expand this expression. So for example, P01 and P10 will be the same term approximately. And then how these four probabilities, how they change this behavior. So the symmetry breaking, so the noise is only placed into account but only by affecting the probabilities of response. So the noise entropy doesn't affect... It's not the driving force. So I even think that if we expand, we find something that is a product P00, P11 over P01, P10, this ratio as a critical variable. So any questions so far? Say it again. Just a moment, we need the mic. Sorry, could you just tell one more time what were the four states? So let's... Four states. The four probabilities. Yes, so this one is... The first state is P00. We can plot them. Maybe that's... So this is the probability that no spike has occurred. So this is P00. And then when both neurons are spiking, then the next one is... And then when one of them is spiking and the other one is not. And the other one, we can maybe even plot them together to save screen. Maybe I can try to write it on the blackboard. So essentially you have the signal, then the threshold, and then... So this is the X that is going to go to this nonlinear function, which in this case is the logit, which is, say, 1 divided by 1 plus e to the minus X is a function of X. And so this is the probability of firing. Tanya, correct me if I'm wrong. Yes, that's right. So firing an X is this signal that in this case is a Gaussian with a certain mean, which is mu. So this is the threshold tau. And this is the mean mu. And it has a certain variance, which I think is B, right? So the signal... So let's set the mean of the signal to zero because we will count everything relative to the mean. And so the threshold tau is what I call mu. And then the variance of the signal will be 1 because everything will be counted in units of the variance. So then the noise nu is in units of signal variance. Okay, so this is, let's call this a phi of X. So essentially what... So the probability that the neuron fires is phi of X. This is X of T, right? Yes. X of T minus mu. And then it's divided by B, no? Which is the noise... Of the neuron. Yes. Okay, so this B is essentially related to the slope of this line, right? Yes. Okay, so this is the probability that neuron 1 fires. Yes. And this is the probability that... So this is B1. The neuron 2 fires. Okay, so this is B11. It depends on mu1 and B1, mu2 and B2. Okay? Is this clear? Then you have four possibilities. Zero, zero, zero, one, one, zero, zero, one, one. So you compute the entropy. Yes. I think it might be useful to write it down the probability of no spike as one minus this and then to show that it is the opposite sign. X is equal to, say, 1 minus 1 over 1 plus e to the minus x. So this is equal to... If you do the calculation, this is e to the minus x. Well, 1 plus e to the minus x. You multiply by e to the x. This is 1 divided by e to the x plus 1, which is phi of minus x. Okay? Thank you. So this is the probability of not firing, right? And it's e to the minus x. Okay? Yes, so now if we look at this, we can plot these probabilities as a function of the threshold difference. We set the average threshold to, say, 1 in units of the standard deviation of the noise. And the noise, I think, is set to 1 also. Let's see. Oh, I guess noise. So this is the value of the noise 0.35. And the probability of not firing when symmetric probabilities are smoothly behaving. So if we plot them separately, there will be little peaks like this. But when plotted jointly, they're essentially constant. And then the cross terms, when one neuron is firing and the other is not, are more affected by the threshold difference. Okay? So in that case, there is something interesting to look at. And now we can change from this probability to log p times log p. So in this case, let's see how this one will behave. We do log of this quantity and multiply by itself. And then we have p11 minus, I think it's okay. Okay. So they're already looking more interesting. And you see that the balance between them can, they already have different signs because they are on the opposite side of large values. So in general, I think it's also useful to know this property. How does the log of p log p depends as a function of p. So if we say x, okay, so this is a useful function to know. So this is the entropy of a binary process that goes between 0 and 1. And because it's a kind of natural log, it doesn't go to 1, but it goes between 0 and 1. So in our case, p00 was, and p11 were on the opposite side of this function. So they have the opposite second derivative. And then we can take a look at the cross terms. Okay. So these ones behave as they are. And so if we take noise to be very large, so now maybe the sum of these two terms. Okay. So the sum of the cross terms between the neurons that are firing, one of them firing and not the other will make it, will drive the system towards the same threshold. And then the, these terms here, the one that 00 and 10 can drive the system towards different threshold. So this analysis here says that the driving force behind this transition is this transition between kind of the interplay between not when neither neurons are firing and the neurons and both neurons are firing and whichever term wins is determines the transition. So in other words, if we plot these two terms separately here, so the term that is most important is this term p00. We need that term to be large in order to have a positive derivative at the kind of, and the way we, so this is the fact at the tails of the distribution was 00. So if we go above the critical point, so we know that already I think 0.7 was above the critical point. And so then these two terms are, there's, well maybe get closer to the, okay. So the conclusion is that the, the interplay between these terms 00 and 11 is the one that, so this term has to, the key term is the absence of spikes in both neurons. So information from silence is the one that will be driving the transition towards diversification. Any questions so far? Okay, so then maybe we can go, I'll go back to the main PowerPoint and I can upload this mathematical file to Slack and then you can examine in more detail what's going on in the, in the calculation. Okay, so this is the excitement that comes from diversification and a few more questions for the audience. So what do you think will happen when instead of a Gaussian distribution we change to a sparser distribution or change the mean, the average threshold between neurons? Any insight? So I have a general question. So essentially in this exercise what you are doing is you are taking the, the response function of the neurons fixed, right? Yes. Okay, so it seems to me that this specialization occurs because the noise, the variation of the noise is too, it becomes smaller than the dynamic range of the neurons and then it is convenient to have neurons with two different thresholds to essentially reproduce a larger dynamic range of the two neurons at all. I don't know if that makes sense. Yes, so the nonlinearities are fixed but the parameters vary along this axis and along that axis. So when the neurons have smaller noise then you can kind of tile the underlying analog signal and assign them different parts of the dynamic range to these two neurons. Yes, exactly, yes. Okay, and then remember last lecture we talked about how the neuronal nonlinearity has to be has to be the cumulative distribution of the input distribution. So at the end of today's lecture so what we will discuss if you have more neurons than two then you will continue this pathway, this splitting of the thresholds and ultimately, I mean maybe I will show this figure now, you will end up with a distribution of threshold that matches the probability, well I will take this slide maybe now and show. So what happens if you have more than two neurons? So this is this slide here. They have a different problem but the same kind of result holds, they have a different definition of noise but this is the threshold and as they kind of number of, in the case they are adding the number of neurons but you can think of this as transition between high noise and small noise for a large in a population of say 1,000 neurons. So what happens is they all start with the same threshold and then they split into two subgroups, maybe not equally but let's say equally but in reality could be in different proportions. So 500 neurons take this threshold and 500 neurons take this threshold and then you can take this part of the dynamic range and consider it separately. So because the input variance is smaller so the effective noise is larger so you will have to go a little bit further until you reach the critical point for that part of the dynamic range and then they will split. And so the splitting will continue and then ultimately if we are allowed to make the noise in the individual neurons very small then each of these thresholds will be a separate neuron. And the distribution of these thresholds will be the input probability distribution and when we sum the activity of this neuron, of this population, it will be the cumulative distribution of input signals and so we are arriving back at the Laughlin's result in the weak noise limit through this idea that what Laughlin's result represented was one neuron but it was a summed activity, you can think of it as a summed activity of many little neurons that are packed inside with individual channels. So it... I just wanted to understand whether this is clear, whether they are okay. Yeah, so there is a lot of information here because we are... and then we can go back and forth so this was kind of one of the concluding slides but I'm showing you where we are going and then maybe we will trace the path. So the idea is that you start with one neuron and what people say is that the neuron has a spike rate and it represents this summed activity of various compartments and these compartments can... they have different thresholds. So it's a way of maybe not fully developed, this theory not fully developed is to coarse grain a neuronal population and represent it as one effective neuron and it now has multiple response rates but in this way we can unite this picture with thousand neurons and each neuron is binary and then what should their thresholds be and arrive back at the results from last lecture on... if I can draw maybe... I cannot quite annotate here so if Mateo can you reproduce this result where neuronal non-linearity will be like a sigmoidal function or here and it is a cumulative distribution of the input thresholds. So if you have say this... this is the distribution of inputs, right? Then... and so this will be like the... this will be p of x this will be the integral minus infinity to x of p of x x prime okay and the optimal response function should exactly be equal to this, right? Yes, and now I can think of... if I put this threshold density similar according to the input distribution such that more thresholds in the middle and fewer thresholds on the edges then the joint activity of these neurons so will be the activity of that low noise neuron so now instead of... if all neurons have some fixed amount of noise but the thresholds are distributed according to... So essentially the low noise limit is where the response function is still like this okay but the noise now is much more picked okay and this is the threshold mu okay so that essentially you want to place different neurons with different thresholds at different say mu 1, mu 2, mu 3, etc. in such a way that the density of thresholds in any interval matches the density... matches the say the derivative of this response function okay is this clear? So we said so the optimal transfer function so the optimal g of x should be equal to the cumulative distribution, okay, times are constant so essentially now we are in a situation where the noise... so the transfer function is fixed okay and this is not satisfied because essentially the noise is very low variance it has a very low say low noise limit okay so the distribution of the signal is very picked but you have many neurons okay so what you want to do is to have the... you want still to have the population of neurons as a whole response in a way to mimic this relation here okay so what you want is that say the number of neurons which have a threshold in this interval should be proportional to the derivative so if you take the derivative of g g' of x this should be essentially a constant times the probability of having stimulus x okay so you want that the fraction of neurons which have a threshold in an interval should be proportional to the derivative of the response function is it okay, Tanya? yes, yes, thank you yeah so we... yeah so it's also... if we go back to the case of two neurons if the nonlinearity is very stretched I can't really cut this distribution in two so it's only when the distribution is more... the response function is sufficiently sharp then it makes sense to cut the input distribution in two so that's another kind of intuition behind the result okay so now I can maybe compare this with data so we will talk about neuroscience so there was a question in the chat about looks like supercritical bitch work by frication so this is more ideas from dynamics so what we will be discussing that it is more I mean it is similar but the connection that I'm making is more closely with second order phase transition not in terms of dynamics so that we will analyze parameters of this by frication in a few moments so now in the retinal case so if you background on the retina so this is a famous recording from the retinal cells done by Kufler in 1953 and he shines light at different positions A, B, C or D and records with a spike with an electrode the number of spikes so one can see that in the central position neuron produces lots of spikes and in the surrounding region fewer spikes so that led to the concept known as the receptive field for a neuron so in that case the receptive field will be somewhere bigger than region B but A and D and C will be already the edges of the response field for that neuron so this is one neuron and now there are many neurons in the retina because it's a detector array and there are different kinds of cells but within each cell type they tend to tile the space and potentially if there is interest I can separate lectures we can talk about variability in this array and how coordination in the shapes of these arrays they are not perfect circles and so on but for the purposes of today's presentation you can ignore the fact that they are not pure you can think of them as approximately circular regions and the retina has been a very good testing ground for theory because it's fairly close to the sensory input and we can test these ideas about information maximization and there are also different cell types so we can think about coordination across different channels and why these cell types the way they are so the canonical theory and a lot of the cell types were defined in this way whether they are responding to lights on so in this case this is on parasol cells in primate or off cells which disposed to light decrements and whether these are more of smaller cells they are called midges that have higher accuracy and often color red and green cells and this would be blue and yellow cells so that's one type of cell type specialization but in addition today we are talking about the cells that they were discovered in the retina that are more that whose experimental observation by Kasner and Bakus have led to this theory of cell type specialization so you have the stimulus and these cells filter it with a very similar temporal kernel but then once you filter it you get different kind of stimulus after that you plot that on the x-axis and in agreement with our theoretical framework they turn out to have different thresholds so here is an example of cells that the blue and red neurons that encode the same aspect of the light intensity in the visual world but do this with different thresholds and that was surprising at the time because usually the cell types in the retina were defined by differences in how they filter the incoming signal and in the retina you can tell that there are actually two different cell types because they form overlapping mosaic here so the idea is that the analysis that we did, the theoretical analysis should be applicable everywhere in the brain but in the brain when we are far from the retina when we do not have arrays of cells it's harder to detect that cells form separate classes than in the retina so another observation is that in addition to these two cell types of cells so they're both need to light decrements so on the previous side you can see they're called off cells because they primarily integrate whether the light intensity went down so when the light intensity goes down this is when we get the large projection onto the X axis so that's why they're called off cells and then there's an on cell would be the one that has a kernel that comes up and then comes down so in the case of the on cell here is an example of nonlinearity and you notice that it is less steep than both of the off cell types and so that's a hint that these on cells in our language of the theoretical model they will have larger noise so by fitting these nonlinearity we can estimate the average amount of noise for the two off cell types and for the on cell type so this one is around 0.3 and the other one is around for the on cells 0.45 and if you recall in our simulation our critical point was around 0.3 something so that's where the critical value would lie so then the critical value depends so each neuron when each pair of neurons when they're recorded they have slightly different for each pair they will have different average threshold so the average threshold is different the position of the critical point will be different so this range here is the range how much this critical point changes when we change the average spike the average threshold for the two neurons so in other words when in the experiment they measure the filters they measure the nonlinearity and then we fit them and we get our four parameters mu one mu two mu one and mu two and after that we can say whether the theory predicts the optimal mu one minus mu two consistent with the measurement of the three other parameters okay so any questions about this so this comparison has essentially no three parameters and it asks why or it you know attempts to explain why we have two off cell types in the off channel and only one cell type in the on channel and the explanation is that for on neurons the noise is larger and therefore these cells are in this regime and in the off channel for light decrements the noise is less and so they are in this regime where it's optimal to have slightly different thresholds is it clear any questions yes Carlos for curiosity the threshold level for neurons experimentally have you found if they are time dependent or signal dependent or a certain neuron has a certain threshold his all life no they are all dependent and they all adapt so in fact this data so remember we said in the kind of in the theory we said that I'm measuring all these units here in units of the standard deviation of the input signal so as the contrast changes the nonlinearities shift so there will be a moment or some period of time when if you change the input signal the nonlinearity will not be optimal and then it will converge to the optimal value as the neuron progress adapts so I actually have additional figures to this to this extent so here this slide here so in this case you have different contrast in the in the input signal and so the thresholds are slightly you know differ as a function of the contrast and noise doesn't scale perfectly with contrast but the thresholds remain separate across the range of contrast and yes so that's the answer so the nonlinearities change and to this model we assume that they have adapted fully to the input probability distribution but in reality they will be only partly adapted and also another thing is another I would say not fully understood phenomenon or the one that I would like to understand better is that these neurons in addition to having different thresholds they also have different dynamics upon changes in the contrast so the high threshold neuron so they are also named adapting and sensitizing based on difference dynamics of how they behave when the input probability distribution changes so one neuron I think approaches the its final threshold value from the top, from large values down and the other neuron jumps to the very low values and then increases so they approach the optimal values from the sides and that theory is partly developed I think in the Bacchus and Cassner's papers but I think more can be done from the theoretical side there is it okay Carlos? Thank you I was asking mostly to see if there was a separation of time between the shift on the non-linearities shift and the shift of the threshold maybe one of them is lower so in order to correct it the brain uses this other strategy to reverse a slate Yes, so I I will prepare some slides for next lecture so we had a recent paper where we followed up on this result and roughly speaking is that turns out that the difference between these two thresholds is what you can think of as a theoretical approximation that these two neurons are actually the same but one neuron gets an extra inhibitory input from another cell and in the case of the retina it's an emigrant cell so what happens is it's possible that two neurons have identical properties but one gets extra inhibitory input and as a result its threshold gets higher and then the whole circuit adapts to the extent to which that intermediate neuron adapts so this intermediate neuron then coordinates the difference between these thresholds and then there are other interesting ideas so it turns out that there is a connection with a form of stochastic resonance so that but in this system with two neurons and then a third one neuron is coming in so you might think that I'm deviating a little bit but basically it turns out that it's every time I make a connection in neural circuits I fact I introduce noise so the best way of having zero noise is not to have a connection so it turns out in this case I could have send one signal to one neuron and another signal to another neuron but then I would have noise in both neurons so it turns out that these neurons that have larger noise and they have larger threshold is because they get extra input and that input the difference in the noise between these cells is because they get this extra inhibitory input so the the extra input can explain both the increase in the noise and increase in the threshold well I hope yes so and now we are back to our we can talk about various phase transitions so actually that was my second slide but so across various contrast so these are the thresholds are maintained and the difference between the cell types what we think in the retina can be disassembled is that one set of cell gets an extra input from another inhibitory neuron so then here is an example of prediction so we say this is two neurons and we know their thresholds and we know their noise and then you vary the threshold difference between the and you know the average threshold but you vary the threshold difference and compare it with experimental points so very often the theoretical peak of the curve corresponds with the actual measurement of the two neurons so now how can we think of this analysis with two neurons how can we make predictions for larger arrays okay so now we can talk about bifurcations and approximate this mutual information and I didn't tell you that somehow in the slide but basically the figure that I showed here was for the case where neurons have the same noise but they also can have different noise levels so in this case so this is the slide somehow skipped over so as you as you noticed in the experimental case the neurons both of neurons had smaller noise than the on neurons but their noise in them was also different so if you make the plot of information versus average noise for two neurons and the threshold difference but now neurons have slightly different noise levels then you get the same picture as before now the surface is shifted so it becomes optimal to put a neuron that has smaller noise closer to the center of the probability distribution alright so we have a question in the chat what are the assumptions behind this nonlinear information model so the some assumptions as we will discuss in a moment matter in a quantitative way but not qualitative and some assumptions are maybe more critical so the assumption was that the input distribution is Gaussian so that's not a critical assumption you can change the probability distribution we the nonlinearity doesn't have to be a logistic function it can be some approximately that but the critical assumptions are that neuron is binary each neuron is binary such that we have four states and another assumption is that the noise their responses are independent given the input although that's also not will not affect the results so Colin did I is that okay I'm not sure whether the chat is from a student who is online or in the class online we do not see the question alright so Colin asked what are the assumptions behind the nonlinear information model yeah so the assumptions are that the neurons are binary and then we have four response patterns and so this is the main assumption and so then this is you know this is the case when this is the picture that you get if you ignore the difference in the noise for these two neurons neuron types and if we take this small difference in the noise into account then the picture shifts such that the two maxima are no longer equivalent and one of the predictions here is that positive threshold I think have to go with positive differences in the noise just like in this experimental situation so in other words if the neuron has smaller noise then it should have smaller threshold and so it will be encoding signals that are closer to the mean of the distribution so the symmetry is broken so the first picture is what we can say is a spontaneous symmetry breaking I have two identical neurons here in this picture I have two identical neurons and both spontaneously one of them will go and take over a smaller signals of a smaller range and the other one will take over signals with larger magnitude but when the noise between them is not equal then there is a bias and the signals that are smaller should be coded by a neuron with a smaller noise and that's this picture right here where in the experimental case the thresholds for the neuron that has higher threshold also has a higher noise value and we now think that the difference between these both phenomenon is that they get an extra inhibitory input and then we talked about this and now we can model this the neighborhood of the critical point in the spirit of mean field theory of kind of Landau mean field theory and so this is our information function it will have a quadratic term which is in terms of threshold differences and it will have a coefficient in front of the quadratic term that depends on the noise and it changes sign at the critical value then information cannot increase indefinitely so we have a force order term and it has to be negative because the information will go down for large threshold differences and then there is a linear term which is absent if there is no difference between neurons but if you have difference in the noise between neurons then it will have this linear term and break the symmetry so you might notice that this is identical description to magnetization we have I even chosen variables such that they will correspond so what is magnetization is a threshold difference magnetic field plays the role of the noise difference between neurons noise plays the role of temperature and then we are maximizing information instead of minimizing free energy so therefore there is a mapping from this expansion to the theory of phase transitions and using this theory and predictions for large arrays we can use it to make predictions for large arrays so do you have any questions about this expansion? so I have a question about the sign of the first term so when the noise is large you should have that the emission information is maximal at zero so the coefficient should be negative of m squared yes that's right so I can get out of it with a being negative okay so what's important is that there will be a critical value and it changes sign here and it should be then once we have this expansion I mean this is one of the simplest thing that can happen to a maximum so a maximum will change sign and becomes to maximum so once you have this expansion you have full predictions for how the threshold differences between neurons should behave as a function of the noise so we know that it just like magnetization as a function of temperature so when noise is less than the critical value then we will have a positive threshold difference and because it's a linear term multiplying kind of a quadratic thing we know that the optimal threshold difference will go as a square root so now you can analyze these numerically and it's almost you know it's now surprised that if numerics we take a mutual information and find the position of the optimal threshold difference as a function of noise it has to behave as the absolute value to the one half power then another one and this is a prediction in terms of in the simulation you can take it very close to a critical point to find the to find the exponent so then the next prediction is how does the threshold difference should depend on the difference in the slopes of the neurons and according to the Landau-Mienfeld theory it should be as to the power of one third and that's what you get here in addition for example this is a proper second order phase transition so we can take a look at the second order derivative of information with respect to this equivalent of magnetic field or the slope difference and at this critical point the derivative will diverge so that will be an analog of susceptibility and if you fit the simulation points then indeed you will get the scaling exponent of minus one that is predicted for mean field theory and you can even have another second order derivative that changes as a function of the second derivative of the noise and it jumps discontinuously between two constant levels with discrete jumps so just like in the mean field theory for magnetization so now we can compare this and maybe it also partly is related to the Collins question about which parts are which assumptions of the models are critical so now imagine that this picture which I that we talked about on the previous slide that was for a given average spike rate given average threshold for the two neurons now if that parameter changes then this threshold difference is a function of noise will always behave in this square root like way but the coefficient that multiplies how fast the square in front of the square root will depend on the average firing way so once you know this numerically this constant and how it changes as a spike rate you can rescale and therefore you can put the data from different neurons even though they have different spike rates into unified coordinates and so then this will be one example so this is a normalized threshold difference between neurons as a function of the noise and this is the optimal prediction of the theory of the mean field theory square root and the actual data lies below this square root or some other exponent and then we can compare the data so it turns out that the scaling exponent in the retina with respect to mu is slightly different from the optimal prediction so it's about 0.4 instead of 0.5 and the scaling exponent with respect to change in the noise difference it should be 1 over 3 but it is more like 1 over 7 so it's larger and then you can ask how does this compare to various physical systems so the mean field which is these exponents are known in physics so for the mean field the prediction is 0.5 instead of 0.4 that we observe here and this function is delta we should be 3 instead of 7 and we can compare to various other dimensional systems for example 2D ison model will have these exponents 3D ison models will have closer values to what we observed in experimentally and also that matches experiments in physical systems so we know that in order to go from mean field to the physical systems we have to take into account fluctuations in neural responses but it's interesting that our retinal array and its scaling exponents actually deviate from the mean field predictions but they deviate in the same directions in the experimental physical system so it gives us at least a hint that if we take into account fluctuations across the array then we will be able to explain these exponents better any questions so far? I think it's ok so I think we are close to being out of time but I will just end by saying that we observed that the scaling exponents match the 3D ison model and we can talk about the retina is a 2D array but this matches in terms of scaling exponents so it turns out there are papers and results which say that a 3D ison model with nearest neighbor interactions which is this one is equivalent to a 2D ison model with long range interactions so one can turn this result around and saying that because we observe a match in the exponents then it makes a prediction for how the scaling of the fluctuations across the array will change as a function of the distance along the retina to this power so I think I will stop here and I think we are out of time but I will leave you with this parallels between neuroscience and physical systems where we have a theory of how information maximization drives symmetry breaking and biodiversity and instead of free energy you can think about information maximization, temperature is equivalent to noise magnetization is the difference between cells and various exponents so that's my slide for today ok thank you very much so any final question or comment? so if not I would ask your patient once again because unfortunately we had some problem with the group photo so we have to I'm going to ask to all online participants to switch on their camera please we take another photo please switch on your camera so that we can do this quickly thank you hello can you switch on your camera hello can you hear me Colin are you there ok of course if you don't want to be in the picture it's perfectly fine but yes ok so when you want I think we can take the picture ok can you switch on the camera ok thank you great ok very good I will fast anyone else wants to switch on the camera I think we are ok so should we go ahead with the picture ok ok smile cheers ok just one moment are we done ok perfect thank you see you tomorrow have a nice evening