 it looks like. And then we here talked about representing the outputs of the retinas, kind of binary configurations, and we wanted to build maximum entropy more. So let me see that kind of multiple mouse point. No, I mean, I have it on full screen here. So I'm right. So what what we are going to focus on is describing the probability distribution over over these activity patterns in a neural recording. And for that we'll use maximum entropy models in order to address this curse of dimensionality. Oh, much better. This mouse. Sorry. Okay, good. Well, all right. Okay. Right. And so this is now a we have since I gave you the background yesterday. This is simply a rehash, right? So we have a bunch of samples here. I didn't know them by the Sigma I like spin variables. And I want to build a distribution from the finite number of samples with using the maximum entropy method and now just going very quickly through what we said yesterday. I'll build maximum entropy models by choosing some functions that I'll discuss of these binary configurations. These are functions f mu same notation as yesterday. I will compute their average values evaluated over my experimental data if you want over the training data. So these are these expectations here. And then I'll construct and we derive this form I'll construct a probability distribution here, the functions are here in the exponent as a kind of linear combination of constraints. And I'm, you know, to fit them all I need to find this g mu right the Lagrange multipliers such that the model matches the constraints. And, you know, and this has been done in the field of neural code analysis, using various choices of these constraints functions. So if my only constraints are individual neural activities, then, you know, I'm getting an independent model as we described yesterday. So it's a maximum entropy model that the only thing it constrains is the firing rate or the average activity of every neuron and no correlations. So it's a kind of a trivial factorial model. The next one is the one we also introduced yesterday is a kind of a pair wise ising like model where I constrain the mean activity of every neuron and the correlation coefficient between every pair of neurons. Okay, I and J. But I will introduce today more complicated models as well. And yesterday I tried to show you a hierarchy where you constrained first order margin of second order margin of third order margins and so on. But that's not the only path forward. In this data, we'll see that it doesn't make sense to constrain the third order and fourth order whatever margins which are also sampled more and more, you know, poorly even find a data. So these are the references that have looked at this mall sorry. You can you can do various alternatives. So for instance, you can construct, you can constrain pair wise correlations between pairs of neurons but also constrain the activity of each individual neuron conditional on some external variable. In this case, it could be stimulus because the stimulus is playing right. So this is not a number is sort of if you want to function right it's it's like a activity of neuron I given whatever is the stimulus. I won't discuss this one today. You can constrain something that's that we call a case spike statistics is basically a synchrony so so if I have a bunch of neurons imagine I have 100 neurons at every point in time I can some their activity so at one point in time 57 out of 100 are active the next point in time 29 out of 100 are active and so on. And then what I can constrain is the expectation value of this quantity, which is nothing else but the probability that zero of them are active the probability that one of them is active whichever one the probability that two of them are active so it's constraining the synchrony right the joint activation. And this is from like from physics side is a bit of a weird variable kind of the global variable that I'm constraining right. That's perfectly fine within the context of max and models. And you know the thing that I'll show you and it will become more understandable this I'll show you and motivate today is kind of a specialization of this pairwise model so constrained the mean activity of every neuron. You constrain all the pairs and you constrained this global synchrony how many of them how many of neurons independent of their identity are jointly coactive in in each time. So again this is right max and is a framework you can build it using various types of constraints that are motivated by the problem that you're looking at. Okay so let me give let me start by this this reference which kind of sparked a renewed interest in max and for neural code analysis so what was the big deal that the schneidmann schneidmann and all that so they look at very small subgroups of neurons and 10 neurons okay and it has been known in neural coding for quite a while that if you record simultaneously from pairs of neurons beating the retina or beating the cortex this is qualitatively it will be the same so if you record for many pairs and measure the correlation coefficient between any pair of neurons and you choose many pairs and you look at what these correlations coefficients are this is a distribution of correlation coefficient your your normal correlation coefficient okay Pearson correlation between minus one and one you will typically see something that looks like this so it's a histogram where the values are really very small okay between pairs typically right I mean there is like some 0.0 like you know maybe the median of these distributions 0.0 I don't know three or something like that you know there's some pairs that go to correlation coefficient 0.1 you know maybe there is something 0.2 but typically the values are small for any pair that you choose to record and the interpretation of that finding in many works prior to this one that I'll allude to is that you know pairwise effects correlation effects are small maybe we can model these neurons as essentially being independent okay you can neglect these things now if you test for statistical significance these correlations what were significant we have a lot of data we can estimate them very well there's a lot of samples in my neural recording so the question is not you know whether they're significant or not most of them are but they're all small okay so you know the the idea for for quite a long while is that perhaps they can be neglected but then what what this you know what Elad Schneidman will be Alec and other colleagues have done in 2006 is they said but let's take them seriously and let's build this maximum entropy models for a small group of neurons that's a 10 neurons together that constrains the mean firing rate of every neuron and the pairwise correlations you know which come from this distribution so small values but essentially all pairs are significantly correlated that's what you also find out if you look at the data with these small correlation coefficients okay so they build this type of model which now you're familiar with kind of a pairwise icing like model and this was sort of the striking result of that paper so let me explain it for you this is very important okay so each dot in in this plane corresponds to a particular joint configuration of 10 neurons okay so for instance this dot here corresponds to the first neuron firing and then you know nothing you know all of them silent and the ninth neuron fires and the tenth is silent so this is a particular configuration okay now this configuration and you know each dot is one such combination and many of these configurations if you only look at 10 neurons happen sufficiently many times in the data that you can just estimate their probability by counting so they repeat because the number of neurons is small there is you know 2000 sorry 1024 possible patterns and you have hundreds of thousands of samples in your data so you can just count okay so on this axis on the x-axis there is the frequency empirical frequency of each such combinatorial pattern of activity okay now on the y-axis there is two things one is in blue one is in red in blue there is a model of neurons that are independent so you neglect that and say they're not correlated so this is a so for each binary word you can estimate the probability of it simply it's a product probability right what's the probability of the first guy firing given its firing rate of the second guy and so on so that's a model neglect correlations and the red is this model the ising pairwise model that actually takes seriously this even though small correlations right and what you see is that of course this line is the equality line where the observed pattern frequency is actually equal to the model pattern frequency now you see for these very low frequencies there is a spread that's simply because these these are the patterns that even in your data which is large are poorly empirically simple because they only happen a few times right and these two towards the right are patterns that happen many times but what you know is this drastic failure of the independent model I mean this is a log scale okay so this means there are patterns like this one that is highlighted where independent model mispredicts its frequency by orders of magnitude okay and so the you know the statement of that paper I mean it makes other statements but but the statement of the paper is independent model is a failure even though every individual pairwise correlation coefficients is small but together many weak pairwise correlation coefficients leads to large collective effects meaning large shifts in the joint pattern of activity in this case of 10 neurons right so pairwise correlations are weak but their collective effects are strong already for these small groups of neurons and for these small groups of neurons you have two advantages okay as I said one advantage is you can actually just empirically sample the probability by counting this will of course not be true any longer when you go from 10 neurons to 100 neurons which we'll do next okay but the other advantage is also that for groups of 10 neurons the solution of the pairwise max and problem is exact so you can still solve it exactly because with 10 neurons the partition function the z is a you know is a function over 1024 states you can just enumerate and solve everything explicitly no Monte Carlo no approximation you can make sure that you've done your job you know without like technical difficulties okay so that was kind of the motivation and then a lot of analysis neural code you know took off following this particular paper and so what you know I was kind of part of this of this group and of course the first idea I mean the first thing that we wanted to do afterwards is to scale this up to larger groups of neurons to say because the recordings were just coming online where you could record simultaneously from 100 plus neurons simultaneously right and so you know what I'm going to be showing you is is an analysis of this fish movie clip like the fish swimming around that you saw which can be repeated exactly 300 times to the retina it's kind of a 20 second movie clip and then the recording is from 160 neurons and you know we we are essentially when we finalize as you see we get about 300 000 samples in the data you know of like configurations and so now if you want to build models not for 10 neurons but for you know up to so in this case we build models up to 120 coupled neurons you know there is no more possibility that you empirically sample anything right I mean the state space is 2 to 120 so you know that's off limits and also you can no longer exactly solve the maximum entropy problem because you would have to sum this partition sum so we're this state space so you need to solve it solve the problem with Monte Carlo methods and and so on and so we build two types of models we build the same one that I showed you for the small group pairwise ising okay and I'll show you how it works and then we generalize it to this k pairwise so meaning max and model that constraints the means the correlations and this joins synchrony statistic and you will see why we did this last bit okay so that's you know pairwise model constraints and mean firing rates and and choose two covariance elements and so here is how the thing looks like on the left in the left column you have what's measured so this is now a 100 by 100 covariant where correlation coefficient matrix in this case why is it threat up is because the neurons have been sorted by their firing rate so and then there is some correlation between the firing rate and the correlations so that's why there's threat here so you measure the correlation coefficients the next little plot is you measure the average activity of individual neurons so this is now in spin variables meaning minus one would be if the neurons are totally silent all the time plus one if they're all the all the time firing and you know the fact that they're negative is because neurons are like they like to be silent most of the time so most of the time they don't spike in many of the time bins okay so they're closer to silence and again if you look at the distribution of this correlation coefficient same thing I've showed you before from the 2006 paper so high peak close to zero very very small values significant but small okay so these are inputs and now of course I'm skipping all the technical details but these inputs are sufficient statistic for the pairwise maximum entropy ising like model and from them you reconstruct the matrix of couplings jij so these are now ising like couplings between pairs of neurons and individual biases or magnetic fields if you want to think about them that act on these neurons and these two things exactly reproduce these two measured things okay and now you know for those of you who might have statistical physics background and or maybe spring glass background you would be immediately tempted to look at the distribution of this jij in the matrix right this is that kind of disorder if you want and you would observe that this jij and here there are distributions for you know groups of 10 neurons 20 60 hundred and so on this distribution kind of look like gaussian centered on zero positive and negative values it's frustration and you know we also were extremely enthusiastic and say oh it's like spring glass model etc turns out if the story is much more complicated and it's not like ssk spring glass and and so on but but you see it's not an easy type of ising okay everything's coupled with everything and you know the couplings are diverse positive negative and so on and there is frustration in this if you look at triangles and so so potentially something interesting um yes and one can check one can do test train one can check one doesn't overfit and so on i'm not showing this but it all works out it's fine um now you have this pairwise model okay and you can you can ask this model now right it's a pairwise model meaning it's as random as possible it only reproduces the covariance and the means nothing else right and so now you can ask the model to predict some higher order statistic that you did not put in because you only put in covariance and means right and so one statistic is this distribution of synchrony that i want to show you as a function of the size of the network right because we have we recorded 160 neurons and so we can build models for subgroups of 10 or for subgroups of 20 for 50 whatever right so here is how this synchrony looks like and let me parse this out for you so this is for a group of 10 neurons so you know you can have either in a group of 10 neurons all of them can be silent so that's k equals zero and the probability of all being silent is here and then you know one of them can be active which is here five of them can be active synchronizing and this is the probability over the whole dataset that k of them are simultaneously active so the data is the red line it's very you know sharply decreasing the independent model which is a failure so we know the independent model doesn't work right because it doesn't get the collective stuff right it's this and the pairwise model is the black line and you know here for the groups of 10 neurons as in the nature paper this you can quantify the agreement and it's quite okay but as you go to higher and higher number of neurons i'd say this is a group of hundred neurons right you start seeing a substantial deviation between the data which is the red thing and the pairwise model which is the you know first of all there is deviation in the tail but more importantly there is actually deviation here but it's a log scale right so it's hard to see right there there is a particular pattern which is the k equals zero so when i have a hundred neurons there is some probability it's actually quite high that all hundred of them are exactly silent like so nobody says a blip okay that's a probability of being silent p of p k equals zero and you know there is like a pretty decent mismatch up here given that that is actually the best sample empirical pattern because it's the most frequent pattern in the in the data so it's a pretty strong deviation and so what you can do is you can now supplement your ising model with a potential right this is a max and model now which will exactly also match this p of k this global synchrony it will exactly match this curve this means my max and model in addition to fields and couplings also gets kind of a global potential term v of k which again is just fitted same way as h and j and you know this is how it looks like as a function of k for this particular case of the network and in particular it has high values for zero which means it increases the likelihood of silence because that's what you need to get the data right so and this has been observed by the way in many other recordings cortical recordings and so on the probability so even if you account for all the pairwise correlations the neurons like to be even more sparse or even more quiet if you want they want to jointly be silent right so individual blips individual neurons making a blip that's less likely than the pairwise model can account for and silence is more likely in the data than the pairwise model can account for and again this has been reproduced in other systems later on all right so you know I try to make the part about the technical aspects of how to get the max and model short because we talked about this yesterday so what I want to focus here in the next slides is mainly okay you fit the model you know you can check that it works in this case is this k pairwise model what do you do with it you know what kind of stuff can we actually learn about the neural code and what I'm showing is sort of a selection of results that I grouped into four categories here you know somewhat arbitrary so the first category is that once you construct this maximum entropy model you can actually make statements that are you know kind of provably correct in some sense for instance we can after we construct a maximum model we can put a bound on the information transmission on how much information this sub population itself 120 neurons for which I have a model how many you know bits per bin if you want at most can it communicate okay now we cannot estimate the the correct value the exact value but we can upper bound that's because what we are computing is a maximum maximum entropy model of of that model we can compute the entropy as you will see right so it's the entropy of these code words and the entropy of the code words is not profound on the information transmission so you know this is interesting because you could ask how many bits per second of information go through your optic nerve let's say or through the salamander optic nerve because this is a salamander recording in this case so it's one set of results I'll try to demonstrate to you there is certain types of predictions which maxent models make you've already seen one when we fitted the pairwise model it made a direct prediction for a distribution of synchrony it was a wrong prediction right it mismatched the data so we had so we expanded our model class to include that as a constraint and now this expanded model is of course again making predictions about other statistics of the data right which we can directly test so that's a nice bit about maxent models is because they have sufficient statistics and we know exactly what they're fitting everything else they're not fitting is already a prediction that you can try to validate in the data so you can because these models are not infinitely expressive as in you know neural networks it's very clear already on the testing set what they are predicting and what they're not predicting of course you then train them to see that they're not overfit but you know they by construction cannot express certain things right and for them you know k you know as you will see for our model let's say looking at three point correlations now is a prediction right because we only constrain it by the global synchrony and by two point correlations so we can ask it to predict three point correlations let's say the third class of results are really hypothesis about the structure of the neural code that our our model will will motivate so you know they're hypothesis they need to be independently later verify checked and and and so on right and and the fourth class is that these models actually disprove some claims that were there in the literature and they were quite how to say entrenched so i'll i'll i'll try to take you through this so let me start with this bounce on information transmission yes go ahead i can repeat it if you want so sorry what so what the what's the inside about what's the integration like okay so the question is what's the what's the intuition or what's the motivation for for large neurons to need this global potential i am not exactly sure i mean if if what if you are very pedantic you can see very small deviations already for the group of 10 neurons uh you know the question is is it already statistically significant but as you look at many subgroups there is a consistent pattern as this deviation in probability of silence let's say we'll get larger and larger with the size so it's not that there is like a magic transition you know up to then pairwise models are fine and then they're not fine uh we do have a mechanism like mechanistically you can try to guess except you know this is a feature that is actually also in the cortex and so on right it's not limited to the retina uh so somehow one possibility why we observe this is because many of these neurons you know you can imagine if you have like a joint inhibitory apparatus that that is kind of keeping you know homostatically the activity of the neurons at some particular rate right there will be you know there could be global circuitry that actually jointly inhibits or keeps like a lid on if you want on the activity of these neurons that are actually like primary neurons that code for the information so so one thing is that we would be seeing a statistical signature of such a mechanism but whether that's true or not that that's purely a hypothesis right i mean this is a statistical model that we're fitting um i'm not sure i can you know i can say more about it but you know that's one one thing this always it could be but as said i mean this you know this at least qualitatively this result about the silence has been has been reproduced in many different preparations in many different tissues so so i think it's more general than you know than kind of an art like experimental artifact um i think it's really something about the excess sparsity of the neural code and in even from the coding perspective you i mean now i'm hand waving okay but you know even from the coding perspective it actually makes sense so you you look patterns you know where everyone is silent but one neuron makes a blip our patterns which are very robust the noise i'm not robust the noise i'm sorry right so there is kind of a likelihood that you know one neuron will blip spontaneously and there seems to be a mechanism to suppress those to increase just either complete silence or when the neurons do respond they respond more synchronously than you would have imagined right so there is like groups of them that co-activate um and you know given that individual neurons are not very reliable that might be a good idea for coding but you know again that's speculation so uh so okay so one thing is so you know one type of result is that we can actually estimate or upper bound the entropy of these code words um there is multiple ways of doing that and just you know just for you to to to get you know that's a non-trivial computation we're computing an entropy over a distribution that has you know two to the 120 states okay so by direct summation of p log p you will never get it okay so you have to use some tricks we have this probabilistic model that we have fitted which looks like statistical physics model you can actually use the tricks from statistical physics to get this entropy so there is a well-known method right which is called the thermodynamic integration by which you know if you have the model you pretend you're changing the temperature from zero to some final temperature in your model in our model the temperature is the thing that multiplies the energy function that's in the exponent you can introduce that fictitious temperature and do this integration and you can estimate the entropy it's a mathematical trick which is correct even when you don't have systems at the equilibrium it just doesn't have the correct interpretation but the entropy that you come out you know you can get it you can use very sophisticated methods of Monte Carlo sampling that were that were designed to compute the entropy so the usual metropolis is not designed to compute the entropy but various if you are familiar with one Klondau type sampling will get you directly the partition and entropy out or and that's very interesting in our model as I told you it has this fantastic property that is non-generic which is that there are microscopic patterns that are sampled with very high precision empirically even when you go to 120 neurons the probability of silence of everyone being quiet happens many many times in the data so you can just get its value so as soon as you have a probability of one microstate empirically then by by the parameterization of our model the probability of silence is exactly one over the partition function remember in the exponent there is all sorts of stuff if you insert for the activity of all neurons x equals zero then all the terms are zero so the only thing that you're left is one over z okay and so you can actually get the z directly because there is this microstate that's sampled very precisely even though the dimension of the problem is large so you can do it either and once you have the z you can get the entropy you know classical thermodynamic relationships so you can do it in all three ways and you get very nice consistent estimates and what you find is the following so this is entropy per neuron this is what the independent model would be and it has to be constant right for sub for subgroups of neurons because you know per neuron the things without any interactions are just extensive but for our maximum entropy models that you build for 10 20 blah blah 50 groups of neuron 100 neurons per neuron entropy is still slowly decreasing as we make the network bigger and bigger so we are not in the extensive regime yet okay even if you go out to 120 neurons what does that mean well that means that as we increase the the patch that we look at this is a set patch is a sub sample from a very highly tightly coupled system okay we're recording more and more neurons and and basically there is a lot of correlation between them so as I include more and more neurons they're still being constrained by other neurons that I'm not looking at right so I have not yet scaled into regime where it's extensive now we do believe that there is other estimates done before that in the salamander retina the relevant patch of highly coupled neurons that all encode the same stimulus about two to three hundred of them okay and then of course the retina you can think of as a carpet composed of many you know of many such patches that that that are correlated right but about two to three hundred neurons is a highly correlated unit that is encoding the stimulus in a certain little patch of the visual world okay so we are seeing how this entropy per neuron is going down and a set you know this is putting an upper bound from the information transfer so for hundred neurons you know with about 0.17 bits per neuron you can multiply that by a by a 100 that's the number of bits that this population of a hundred is able to at most transmit per 20 millisecond because these are 20 millisecond samples this can only be decreased because the samples are correlated through time and so on it but it cannot be higher right because this is a maximum entropy model so it's the highest possible entropy consistent with constraints it can be smaller but it cannot be higher that's why it's upper bounding the the information transmission so I think this is that's the thing I was saying and you can make more progress along these lines I just want to illustrate sort of one type of result you can get which is about the transmission second second observation also very interesting if you're from kind of statistical physics background so there is just to see how weird these patterns are you can ask yourself what's the probability you know if from my data set in my data set that I randomly draw a pattern and I randomly draw another pattern what's the probability they're they're the same one right so this coincidence probability and how does that coincidence probability behave as you make the network the the number of neurons larger and larger now if if this was a if these neurons were independent it's very easy to convince yourself that this will exponentially drop right as as you take more and more neurons it's exponent so larger and larger binary words you're exponentially less likely that what you sample once and what you sample the second time is actually the same in all spins okay so that's this fast drop but for both the experimental data and the k pairwise model which reproduces this coincidence probability nearly perfectly this decreases way way way more slowly than you would expect and that is not actually something that you typically expect we can also show that this is not simply due to the fact that the probability of silence is is high so you know you could think okay that's just the consequence of the all silent pattern is very frequent so you know I sample once and I get all silent I said in the second time I get all silent so that's that's that's the the only thing no there is also other patterns that exactly repeat many times in the data set and so you're much more likely than you would have thought to draw exactly the same configuration so here you know there is interesting consequences about again it's a test of the model but it's an interesting way to think about an ensemble of high dimensional cold words where this coincidence probability is decreasing so slowly doesn't even look exponentially it kind of goes even more slowly okay so here is another type of prediction that the that the model makes so we can we still have enough or barely enough data to estimate empirically the three-point connected correlation function so correlation between neurons ij and k right expectation values of ij and k three neurons firing together and so this is you know from the experiment and and this would be a prediction if you construct only pairwise models so ising like without this global term you see you know this now scatter plot of one versus the other and sort of these are bin predictions because there is many many right there's one hundred two three of these numbers so if you just plot dots you don't see anything so these are binned but you see that the pairwise model has this systematic deviation right so it kind of over predicts three-point correlations for positive correlations and it under predicts the negative ones right because this is the equality line and the k pairwise model the one that has the global constraint is actually doing a much better job it's not perfect there is some deviation for the negative three-point correlations but it's it's definitely much better so the bias is very small and maybe i skip that the point of that is that your error in three-point correlations is actually not increasing with the system sizes you take more and more neurons these errors you're making an error but you know the error is is flat and so the hypothesis is maybe you know as you make more and more neurons there is of course more and more things that you could mispredict because there is more and more three-point correlations but at the same time your model is capturing more and more of you know the neurons that otherwise would would constitute hidden units because you don't they influence what you observe but you don't model them but as you start increasing the group of neurons you of course start modeling them so there is less and less kind of latent factors that you are that could influence the observed neurons so that's one example where the max and model is simply predicting some higher order statistics that you can directly compare to the data okay so now you know we are coming to to these more hypothetical notes so you know these are so up to now there are things that you can either say with certainty or you can compare to data okay so now you can start looking more in in more detail about what this model is saying and making hypotheses of how the neural coding might work so one one hypothesis that we entertain at the beginning uh was that because the energy landscape of the model is there is lots of frustration right in this energy function that we inferred because there is positive and negative j i j's that maybe we have a an energy landscape or a probability landscape which is very rugged okay so it has basins of attraction maybe as you might know from hopfield type models or so and that you know what this is now schematic diagram what this might mean is of course that there is there are particular configurations of of firing of activity which are local energy minima so if you flip any spin you would go towards up in energy meaning down in probability right and the that information about the stimulus is not encoded in the exact microscopic state of firing or not it's encoded by the identity of the basin of attraction where i'm at and this would provide us people have discussed before it would provide a code that's robust to noise and so on and a code that's maybe learnable downstream right so that so that was our initial idea and we actually could validate that our models do have multiple basins of attraction and so on and so forth um and they are reproduced like and if you because we uh repeat exactly the same stimulus many many times you can validate that on one particular repetition we are in this basin and the retina emit some sort of microstate it's a pattern of spikes and silences on the next repetition the exact microstate of spikes and silences slightly different but it's still in the same basin okay and then the third repetition again right there is microscopic changes in which nor is active or not but you're always at the same point in the same basin when the same stimulus is being displayed on screen okay and so there is um we did some subsequent work that really pursued just that idea of basing code which is which is what you're seeing here so what you're seeing here is the retina it's a new experiment because it's a later experiment it's a retina responding again to a natural type movie this is time each of these lines is a repetition so there's many repetition of exactly the same movie and the color code here is encoding is denoting a collective state it's not exactly the basin of attraction because this was a slightly different model but it is it's it's a it's something very similar it's the collective state of the retina that roughly corresponds to basins and what you will see that across the repeats the retina quite precisely takes on the same collective state even though if you zoomed in you know there is lots of differences on the level of individual neurons when they spike or not right but the collective state is corresponding kind of to those basins is the same there is a bit of a jitter as you see in timing when you enter it and when you don't enter it and maybe that's also dependent on how we actually quantify that but but in general you see higher reproducibility of collective activity patterns now you know further on we'll take two wanted to take this idea apart what it looks so this is of course how a physicist would like to understand it through basins and I said you know then we followed up on it here and here it's not precisely that it turns out the landscape is you cannot think of the landscape of really basins it's more the visualization that we have here so I'll try to parse it for you there is so this is now probability landscape plotted of course I mean this is multi-dimensional and discrete so this is for schematic right it's ball plotted the smooth in kind of small number of dimensions that we can visualize it but the landscape looks more like that so there is one global probability maximum the second global attractor which correspond to the retina just being silent that's a silent state it likes to be that's the highest probability state is no more on does anything okay and then from this silent state you can imagine that there is along many of these discrete dimensions there are ridges and troughs so there is always this global attractor that you can go to but if you try to move orthogonally to it right you go through stuff that looks like that right so there is like so there is patterns here that encode for a particular stimulus that's very well separated from the other states here but you can always go to the silent state and that's because that's what retina prefers if you don't drive it by any stimulus actually goes there then you display the stimulus it goes to this collective state which is very well separated from the other ones but then of course you know it kind of turns back to silence if you don't drive it okay so the kind of picture is a little bit more like that than like that we think but of course it's hard to visualize this right this is high dimension discrete space um yes but but not oftentimes you would you know if how to say if you stimulated the retina with a blink and then relax you you'd kind of you know it'd go up and then back right to silence right yeah I mean this sort of that you have to drive the retina strongly to do that yes yeah that is what we hypothesize it's very difficult to check right because I mean you so why is it very difficult to check it's because um you drive it I mean the retina really most nicely responds to natural stimulation so in and in natural stimulation it's very hard to say what is a stimulus right it's it's a full movie where many things happen at the same time so it's now you can try to do it and I think Michael Berry was following up but I don't know the latest the experimentalist right you can try to train the retina or train you can try to display a discrete set of stimuli okay so that you you know you only have I don't know 10 different pictures that you show it and then try to ask whether you know the resulting landscape would form 10 ridges right each corresponding to a stimulus um and I know that they made some progress on that but I haven't you know it's not I haven't seen the the results from that okay so that's of course that that rich like thing it's it's a hypothesis okay it's kind of what the model suggests that that might be the organization principle of the neural code and I think there is a lot of work if you wanted to validate that or not and kind of lots of experiments so the so the so the next day you know the next kind of a idea of such character which is very hypothetical right connects to uh some of you might be interested in that connects to the idea of criticality neural systems um and in particular there is a construction that you can do from the model and from the data which is kind of different from the criticality that some of you might know uh from uh you know avalanches and so on so what what I can construct is the following both in the data or in the model I can ask how many patterns can just count them how many distinct patterns have approximately the same probability so in in my model the probability is like the energy right because probability is e to the energy and the number of patterns is just the entropy right that's a micro canonical just I just count and take the log that's the that's the entropy right and so this is what it is right so I essentially am for physicists we are computing the micro canonical entropy as a function of the energy so log of the number of states that have a certain probability and you know if you wanted to pursue the idea of criticality as in statistical physics you would look for the condition that you see right there that the micro canonical entropy second derivative is vanishing um at the critical point so because patterns actually repeat in the data I have some handle on this dependence entropy here you see there's a entropy so this is log number of patterns if you want entropy per neuron this is log probability which is energy per neuron right so I can take small groups of neurons I think this this dark blue here is subgroups of 10 neurons and then it goes 20 30 40 in terms of color okay and for small groups of neurons I can just you know I can just look at all the patterns I can empirically estimate their probability log of that is the is the energy and I can count how many possible patterns have this particular probability and I plot it on the y-axis you know how many patterns have this particular log probability and I've plotted on the y-axis and across many subgroups of 10 neurons I get a particular curve looks nothing special okay for 10 neurons it's here for 20 neurons for 30 neurons but as you see as I take more and more neurons of course then I you know my sampling gets restricted and restricted so I can I can only see a smaller and smaller part of this curve because I'm limited in you know for many neurons how many samples I can actually count but what is happening is that you know these curves as I take large and larger network of neurons actually are approaching a straight line and you can do formal extrapolations of these curves right by taking n to infinity but you can extrapolate where this would be for a large network of neurons and the extrapolation points are these black points that you see here right there like extrapolation goes from 10, 20, 30, 40 up to infinite neurons right so it goes like this to get this point it goes like this to get this point so this is how extrapolation looks like and you you find out that all of these points would end up on literally an equality line where the numerosity is balanced by the probability okay now this is a very weird construction because typically you know you would have a if you have a critical point you have a second derivative vanishing for a particular value of energy but here if this is a line the line has a zero second derivative everywhere so it's kind of a very weird system right that is that looks kind of you know it's not like a critical point that corresponds to particular e right it's the whole thing is critical and of course in the model that you fit to the same data you can take this further because in the model once we construct it we can sample as much as we want from it and just compute directly this quantity so here in the data we are limited right by sampling to see this this little bit and in the model we can just compute the whole curve right and you know and you see the same thing right I mean this is capturing that and so it's maybe not surprising that in this lower range you see these extrapolations but also there is a behavior for high for less problem patterns which you can now which the mall extrapolates to which you don't have empirically in the data right so entropy per neuron equals energy per neuron to a good approximation this is both in the mall and in the data and it's a very peculiar behavior which has another signature one can show analytically if this is true that what you're seeing here then this is equivalent to plotting a zip flow for the configurations what does that mean right I look at the probability of every binary configuration that I have and either in mall or in the data and then I rank order all the microscopic patterns in the data according to their probability and I plot log probability versus log rank and what you will observe is that this plot log probability versus log rank is a line says the power law with slope minus one particular minus one slope okay this is a direct consequence of this scaling it's another way to describe this and this has been nicely shown in the work of Thierry Mora you know that these two statements are equivalent I'll just point that there is a very nice this work motivated a very nice PRL paper by David Schwab, Ilya Nemenman and Pankaj Mehta where they explain how this minus one zip in such systems can emerge without fine tuning so for those of you who are interested in criticality of this particular sort it's a very nice it's a very nice reference now you know my take on it is you know it is the fact that these distributions are you know weird in the sense that you would not expect them from like a generic ising model I mean you would not even expect them from an ising model to the critical point in this particular form whether it's a consequence of some special adaptation in the retina or some special fine tuning I would not speculate I don't know that but if you're a downstream area let's say a visual cortex who is receiving these spikes right that's a very peculiar distribution to learn you know maybe it helps with learning there's all sorts of hypotheses why this might be good certainly you have this property that there will be microstates that exactly repeat so you know if the downstream area has to learn some stuff maybe this is something it can rely on so they're very speculative but I you know bring it up because this is a very salient signature that the models helped us get right you can that's the last crazy bit so let me just outline that so you can you can then play a game and ask is there anything special about the setting of correlations or mean firing rates that makes that critical signature so what we have done here is we played with the model in a very in a very perverted way okay so so you can fit when we fit the max and model you can imagine that no the alpha is not here and this would be our model that we have fitted right so there is fields there is terry wise couplings and there is there is this global term okay and now what I can start to do is I can I can now this is I can only do that in the model right I can now put a factor in here that is scaling this term up or down so that this term is responsible for all the correlations between the neurons and if I scale the correlations up I can retweak the individual fields such that the result still has the firing rate the mean firing rates of neurons will be always fixed to data so every individual neuron always fires as in the data but I can make them more or less correlated by changing these alpha alpha equals one is exactly what the data is at if I put alpha equals zero I get rid of all correlations so I get an independent set of neurons and if I put alpha more than one I make them more correlated but still firing at exactly the same mean each one of them okay and I can look at what kind of code I get so this would be when I set alpha to zero and they're not correlated and of course then you know I can what what you see here is just some example firing pattern sorted so these are neurons and you know these are firing patterns the most frequent one is everyone quiet so this is there is a pattern where everyone is zero and then you know more and more neurons fire if you look at the correlations as consistent with the with my uh you know alpha equals zero you know there is no correlation among them right so this is alpha equal one this is the actual fit to data so you see you know if you look by eye they don't look very different these patterns okay from from from the non correlated that's because all correlations are weak okay but but if you histogram the correlation coefficients as you see you have to get something that you know it's like you know they have this tail but small values as we are used to if I make them more correlated then you start seeing this okay so this just by eye you see what starts emerging are these are kind of basins of attraction right I made things very frustrated very correlated and now even if I sample the patterns you see that there is this similarities right this bunch is presumably coming from one attractor because there is this you know these guys are always on and then there's a few blips somewhere right and then there is a next attractor and so on and the correlations still a lot of them zero but this tail has started to expand out right and so you have some pairs are really strongly correlated these are these pairs right this guy and this guy right they're strongly correlated they're always on together okay so I can make this code more and more frustrated more and more glassy if you want as I go that way you have to solve another optimization nonlinear problem so it's a pure numerically so but you can check of course once you retune you can check that you got what you have to get and so what what you know is interesting is that now you can ask as a function of this alpha so as I change these correlations I can do my you know usual thing in statistical physics I plot let's say what would be in a in an in an ising model heat capacity with the emerging peak as a as a signature of critical behavior and what's interesting is that as I go from independent code alpha zero to some very correlated code alpha equal two alpha equal one is actually the data so you will see as I take large and larger group of neurons and 120 is this last curve up here the peak really moves close to where the data is that which is the red line so actually if I make code more correlated in this signature I go away from criticality right so somehow this particular strength of correlations that's observed in the data is the one that's kind of singled out as being close to the peak of course you can play the extrapolation game and ask you know maybe we are actually at the peak when you take more 120 is just what we can record you know so the the correlated patches as I told you be presumably between 200 and 300 neurons okay so again this is suggesting that the code is close to critical what does that mean for neural coding I don't know I mean there is a lot of hypothesis floating around they're interesting but I think the final word has not been said it does show though that these ensembles are in some sense special right that's that's that okay and then the last bit that I would like to show show to you is more concrete again so it's less crazy this is the bit where I said that this this type of analysis actually disproves certain notions that were quite persistent before in the field you know there the coding neural coding retinal coding visual cortex coding was really dominated by the idea of decorrelation if you're familiar with this idea so very correlated stimuli come into our eyes and then the circuit somehow removes a lot of this correlation because that's efficient for coding and produces responses that are quite decorrelated okay now I've already hinted at the fact that the code cannot be independent and it cannot be completely decorrelated because otherwise we could throw correlations away and we would get a bunch of independent neurons and you saw that that doesn't describe the data very well you have to put in correlations right somehow but what I haven't shown you yet is how strong that constraint is in terms of effects on the neural code of individual neurons so here is here is what I would like to show to you now right we display a movie to the neurons and because of that as time goes by so here what you're seeing in the red is one example firing rate so the average activity of one neuron okay so in the movie so it's quiet and little blip then it's it likes something in the movie very strongly so it makes this response and and so on these are these are these peaks okay now my my maxent model completely ignores time and stimulus right it just takes all the patterns without any order right and builds a model for all the patterns there is nothing about time or stimulus in the model but what I can now do still with the model is I can say could I predict the behavior of this one neuron let's say this is the one chose neuron random neuron can I predict what it does by knowing what its partners in the network are doing right so I have a joint distribution right I have this object at p of x1 xn and I ask can I predict the first guy by knowing what the others are doing well that basically just means can I say what is the p of x1 given that x2 up to xn I know their state okay in the data can I predict the can I predict this guy so if I have a joint distribution clearly I can construct this right and I don't know anything else about the stimulus or time so this is the stimulus or time is somehow in the activity of the others okay so what I can do is I can take my data set and my model and I can take my model for say groups of 10 neurons and you know from the nine others predict this guy and you know you get something pretty bad and I can group take groups of 20 40 80 and then when I take my full group of 120 I predict this guy from the 119 of his neighbors this is what I get with you know with like variability across repeats is the same lines right so I have a very good prediction of what one guy is doing from what the rest is doing um and this is independent of which neuron I choose this is the one example neuron there but if you you know if if you ask for each one of them what's the correlation coefficient when I make the prediction for one of them from the rest you should in this plot you should be looking at the at the blue dots I basically get 80 correlation but predicting one from the rest so this redone like this this type of coupling that's in the network and these weak correlations still amount on the level of individual neuron to a very very high predictive power and therefore very high redundancy in coding right I mean the coding is super redundant if you can predict what one guy is doing from the from what the others are doing without directly knowing the stimulus and so this is to my mind is still one of the best demonstrations that although the neurons on the pairwise level are not much correlated there is still a lot of redundancy which could be in principle used for error correction right because if individual neurons are noisy you know the collective state corrects you know mistakes of a of an individual neuron and you can construct decoders that make use of that and this same analysis was was much later done by by the group of David Tang and and by by Michoulum same thing on hippocampus neurons same you will even see basically the same plots in the paper so it's not something that's restricted to the retina okay so you can you can take other brain areas and and see signatures of of of this redundancy yes yes no yes that it means that no I why would be I mean I think so there is no there is no notion I think in the at least in retinal coding that some neurons would be the how to say master neurons or leading neurons if that's the question right that some would be special in that others are just like coupled into them and kind of mimic what what the few leaders do the notion in at least in retinal coding is that it is a distributed code so it is redundant but distributed without any like specials you know kind of center or main neurons or something like that so so well in this case right the the whole retina is seeing this natural movie it the you know the the only thing is that you're recording and analyzing just from the piece of the retina which of course looks at the piece of the movie but otherwise it's the whole retina it's simulated right yeah yeah I see what you're saying so so the mall of course the mall only describes what you put in so you put in the recording of these 160 neurons but of course the fact of the matter is that these 160 neurons that you are focusing on and recording because they interact with the other neurons in the retina you are not recording but they're still there and they respond to the movie the model needs to somehow effectively take those other ones into account okay but how does it do that it's it's very hard to say right so you can only predict or check that it works well by comparing it to the data for things you do see a little bit we were looking at at that right at that at the question of so there was this interesting idea which I still don't know whether it's true or not because I don't have data to check whether all these like higher order effects that we had to put in like the v of k the potential right whether or or the or the j i j matrix does as a if I were able to record more and more neurons and include more and more in the mall would those terms get simpler because right now there may be complicated in part because they effectively account for what all the other neurons are doing right and maybe they would get simpler if I can record more but we don't know that this one no that one is the same one it's the same it's the same yeah yeah that's a that is a very good question I'm not exactly in this manner but there's a similar work was done later on by Ulysses Ferrari in Paris with Olivier Marre it's not exactly the same but there what was done was that on on one recording so same neuron you cannot switch neurons to another retina but same same neurons in one retina exposed to stimulus one to fit the model and to it to get the j i j that correlated activity j i j out and then predict on another piece of the different movie but same neurons to show how well how well which component of the model carries over it's actually a very nice piece of work in in p re and you can show that there is a part of correlations that you can learn which are in the j i j that generalizes very nicely from from movie to movie and you know these are experimentalists in the retina who later went on to try to even part right this nice piece that generalizes from stimulus to stimulus is is is actually the piece that has that has a connectivity that drops with distance and it's presumably because some of these neurons are coupled via gap junctions and so there is like an actual mechanism that couples the activity of nearby guys together i i can you know i can point to the reference for that yep yes yes that's what you know this might be yeah yeah absolutely i absolutely that would be great i mean i think here in this you know like the the kind of history of this particular set of experiments was all like you know first experimental setup able to record 10 10 to 40 neurons and the x experimental setup you know was up to 160 160 was just at the boundary of you know of like feeling the full putative size which is you know a set of a correlated group in this retina about two to three hundred where do people come yeah where does this estimate two to three hundred come from it came from looking at how redundant are you know in another experiment you can stick electrodes in basically or you can use arrays that are more spaced we don't see all the neurons but you can see further away and you can ask how correlated are the neurons of this distance of that distance of this distance of this distance as they look at natural movies and you know the for the salamander retina the experimentally show that basically once you go to a patch that has more than roughly two to three hundred neurons then the neurons really get de-correlated even in the sense that even the small small correlation coefficients are gone away okay so they look at different type of the stimulus they're not co-activated together anymore and so that's the effective range and so then this experimental setup said can we record a macroscopic fraction of these neurons in the patch right so 120 160 out of 300 so it's about a half but it was not pushed you know experimentally at that point they did not push it to 300 now I think one can do it probably one one could do it and actually get the full patch at least in the salamander um but you know everyone also kind of migrates from salamander to mouse and so on so it's complicated for other reasons but but yes I think that I think that would be like the cool thing would be right to scale it up do exactly this but just scale it up and see many signatures of you know correlations go away j i j decay at this length the entropy gets extensive because you're coming to the point where this patch is saying something else than this patch right so you could see many of these things simultaneously what we are saying as any extrapolation merit okay we just that nobody has done so far as I know okay so I'll let me conclude so I you know so here was sort of a kind of a selection if you want of highlights um about what you can discover about the neural code what hypothesis you can make if you use this maximum entropy principle um to build models um I haven't shown you anything about this null model hypothesis testing but if I summarize what I showed so you see this interesting emergence of very strong collective behaviors synchronized activity um even though individual pairwise correlations are small you see non extensive entropy scaling and high coincidence probability because microstates repeat um the retinal input is far from decorrelated uh you know which flies against typical efficiency arguments for the retina it is in some sense efficient many like natural stimuli are more correlated in some sense than retinal output but that does not mean that retinal output is decorrelated it's not decorrelated there's still a lot of redundancy in them which could be used for error correction or learning perhaps uh the code seems to be organized in in this collective modes of activity that resemble basins of attraction with some caveats um that seem to really encode information about the stimulus and we have these interesting signatures of statistical criticality that you can make a bit more precise what that means is still an open question whether that's due to adaptation due to downstream little learnability of the code or us nemen man and Schwab and so on suggested it can be a generic result in a very large class of models without any fine tuning and you expect that so that's all on the table all right so maybe I uh end here and take any questions if there is any more we have some which was very nice during the talk but any any other questions so there is something in the chat maybe I check no that's just working fine okay yes now maybe oh yeah now it works great okay the fact that retinal output are from for being decorrelated could be used in a way to model attention in a fit forward manner meaning if we assume that are bunches of neural correlated to one single neuron that responsible for the the output that maybe are I don't know like inhibitory type of neuron do you think that this kind of model can I can explain this behavior I don't know if my question is clear yeah so I mean I would not say that there is anything well so this is retina right so it's still thought of a circuit that I mean has some lateral connectivity but mainly it's fit forward in the sense that you know information goes in and then it's sent to the brain without much stuff going back from the brain into the retina so kind of the attention mechanisms are usually thought of top down recurrent connections that don't go to the retina but you could try to make similar arguments or similar model constructions let's say for neurons in the in the primary visual cortex right where there could be top down attention modulation and you know there has been not of this flavor but there has obviously been work statistical work done on recordings from primary primary visual cortex and trying to account so what you can do there is try to account for how to say shared modulation of activity which will show itself up in like in in correlated activity of a group of neurons because perhaps that group you know is receiving some sort of top down connection that kind of gains it up or down right and so there is some I mean many people have done there is some very nice work from the group of Aerosim and Shelley trying to build statistical models that you know account for let's say slow modulated latent variables that you don't observe but statistically explain right how whole groups of neurons are slowly gained up and down together and so on and you know one one question is like would these type of models would they reveal if you apply it to that data you know would would they reveal signatures of that maybe you know for some sort of global potentials like V of K or so it's perfectly possible to think about it in that in that system the retina is more difficult right because just because of how it's wired up on same line basically the correlation between the neurons has is part of two different contributes one that is related to the neural activity and so how the networks works and the other is due to the correlation in division of the of the retina so in in the stimulus using this approach there is some way in order to discriminate between the structural correlation that are those important in order if we want to have an insight in how the network works and those pertaining to the stimulus so it's a perfect question so if I can have the last 15 minutes on the blackboard this is what I would like to actually take up so so the question to simplify if I if I can or to stylize it a bit right is that here this model is just modeling the total correlation between say any pair of neurons okay or right the question is like where can we use the correlation to reveal something about the circuitry this model doesn't do it because you know all the correlation is the same for this max n model whether it comes from the fact that stimulus is co-activating to neurons in and therefore they are correlated or is because neuron a talks to neuron b right there is wires in between and that's why they are correlated right this model doesn't distinguish sources of correlation in any way and the question is can maybe this type of framework or something else be used to discriminate the sources because you know there are certain source of correlation due to wiring that maybe we could use to infer which neurons let's say are jointly connected regulated and so on but for that we'd have to discount these other source of correlation let's say the same stimulus keeping the neuron right so that is what I wanted to sketch on the blackboard in the last you know I don't think I need more than 15 minutes so I can try please do okay actually an interesting set of lectures on causality last week so these guys yes we'll appreciate it so how do we so I just stop share right now the question is how do we make how do we make it I think it's visible it just it needs light that's what I think so what I'll try to do very briefly so I only have this one page I wanted to show is this second use of maximum entropy models as presenting to us a set of null distributions against which we can do statistical tests and that's relevant for what you ask although maybe it does not seem that way yet but let me let me just start by this question right so the problem statement is really this by observing correlation between two new neurons can I say something about whether they actually interact or not you know that's that's what I want to ask and in what you have seen right now you know what can we do right so if I take so I'll here these are two neurons and they have some correlation you know coefficient let's say CAB that that I observed and you know the first order question is simply to ask is that correlation significance statistically significant okay you know that is very simple right of course you can take the you know roster of neuron a and here in the roster of norm b by roster I just mean this binarized you know spike train this is time right and I can take the the responses of the other neuron if you want like that and you know here is a sort of a zero-author order thing everyone can do in the computer right to assess significance you can just create a random permutation of these of the time bins with respect to you know each other to destroy all the correlations of course because the data is finite the correlation will never be exactly zero but you can do it many many times and construct if you want CAB a null distribution P CAB after doing many many shuffles so you expect that null distribution to be you know peaked around zero correlation but due to some finite sampling there will be some spread and compared to this null distribution you can test the actual CAB let's say it stands here right this is the true value and this is sort of the kind of null distribution constructed which you know which is not zero strictly just because of finite data and then you can you know put a p-value on the significance of this being non-zero right I mean this is a pedestrian way in which you can do it but you can do it very quickly okay through a shuffle okay so the idea was shuffle the spike trends create that null distribution now of course oftentimes if even if you conclude that the pair is significantly correlated it doesn't tell you why it's significantly correlated and in particular for sensory systems right this the first hypothesis that you think about is that there is a stimulus that drives these guys and therefore you know if these two neurons look at the same part of the sky they are seeing the same stimulus and so even if they don't talk to each other they will be correlated to some extent right automatically because of because of that right and so what we what you want to do then is you want to do something slightly different well let me arise that right you're really interested in whether the responses of these guys are conditionally independent given the stimulus right whether the following x a x b so this is the activity of neurons and b given s which is stimulus whether that is true right p x a given stimulus p x b given stimulus right in which case right if the total correlation that you're seeing there emerges from the joint activity of a and b which is a marginal over the stimulus of this conditional x b given stimulus so in this distribution they can be correlated and you test it maybe they're significantly correlated but this correlation could come solely because you know the stimulus drives them in this conditional way and even if they're conditionally independent you know they can get correlated because of the stimulus so the question is really whether this guy factorizes in a way if it factorizes like this right then there is sort of no like all correlation is is is explained by inheriting the stimulus but if it doesn't correlate this in these ways so for instance if to describe this conditional joint distribution you'd have to write you'd have to write down a model that looks like this one over in the language of our max and models that we talked about h a which can depend on stimulus plus h b these are independent fields which can now depend on stimulus because there is conditioning on the then z dependent vegetables x b plus j a b x a x b so that means this doesn't factorize these neurons are interacting even though you know i condition on the stimulus and so then that means that there is a part of correlation which is beyond what the stimulus can explain okay so you can construct this type of model that's called the stimulus dependent max and stimulus dependent max and and you can search for it if you want in the in the literature um somewhat complicated and so people who are in the field in neuroscience they don't usually do that because for systems where you have perfect repeats there is a much more simple mechanism to test for for whether the neurons are correlated beyond what you'd expect by stimulus so let me let me say what that is okay so you could do that and you know it's kind of cumbersome because the thing depends on the stimulus but let me assume like indirectly i have my neural rosters for neuron a and i have rosters for neuron b okay but not only i have one response because i can display the same stimulus over and over okay so i have multiple repetitions so i have for neuron a i have repetition one response which is some zero one zero one one i'm making it up and for repetition two there is of course something slightly similar but not exactly the same because the neuron has a bit of a noise and for repetition three and so on and similar for norm b right i have repetition one response somehow differently and so on and there is right there is other repetitions so if i have stimulus repeats then of course there is another nice shuffle i can do right to do a new hypothesis test for this excess correlation while i discount for this right so what i can do is that every time been in in in the right so if this is a particular time in my stimulus and in my experiment right time goes this way so it's a particular time and for every time i'll do that i can randomly reshuffle the responses of neuron a across the repetitions so i reshuffle these responses okay and for neuron b i reshuffle its responses so meaning here i instead of putting a repetition the response at repetition one i put repetition 17 and here i put repetition 36 and whatever right so what i have a if i if i do this type of shuffling at every time been what have i done i'm keeping the average response of neuron a locked to the stimulus right so the average being an average across repetitions because i only reshuffle so the average stays the same right the firing rate of neuron a stays the same the firing rate of neuron b also stays the same the firing rates are the same as in the original raster and they're locked to the stimulus but what i have destroyed is correlations between a and b conditional on the stimulus so both are locked to the stimulus right but here if i now compute the correlation coefficients you know so the part of correlation that's because that exists because they're both modulated but the stimulus is the same but any extra part is destroyed right because i now compute the correlation between the neuron a on repetition 17 because i did the shuffle with the you know neuron b at repetition 36 right and so and so on in many shuffles so i can now from these shuffled rasters again from from shuffled rasters compute i'll call it c a b tilde so it's still there because the it's another it's not the the original correlation right it's the correlation where the the the repetitions have been shuffled okay and so this is this neuron after the shuffling the only correlation between a and b that remains is the correlation because of the common stimulus because that's that's still in the data right so this c a b is correlation due to common stimulus but nothing more right and so now what what is done traditionally in the field is to compare c a b tilde with the full correlation without the shuffling c a b and typically people define something that they call the noise correlation which is a difference you know c a b noise is the total correlation minus this one that's just due to the stimulus and the question is is the noise correlation you know significantly and this could be positive or negative is it significantly above or below zero and this is taken as an indication that you know it's not just due to the inherited stimulus but it's you know there is some other interaction between the neuron that the stimulus doesn't you know account for so that's a common like that statistically that's a bit question like that's actually a bit questionable so why would you take the correlation difference and does the correlation difference matter and so on i mean this would be an actual probabilistic model that you know you can do all the tools of basic statistics on that's a particular empirical measure but you know it's a common one so that's why that's why i write down here but the last bit that that i want to say before i as we wrap up is what if i you know in many cases in also in neuroscience and so on two things happen so first of all i don't have the perfect stimulus repeats i mean for retina it's great you know i can show it 600 times the same movie and it happily looks at it 600 times and you look you show you know the cortex the same things well are they really the same i mean the thing is learning and this and that and it's modulating and so on right so i might not have perfect repeats so what do i do then and the second thing is what if i want what if i have in my mind a more complicated model where it's not only these two factors right so here is the stimulus which i control from the outside and i can undo it via the repeats and then there is some other you know putatively interesting correlation but there is cases where the thing is more complicated so what how like how do i then decide whether you know the two neurons are more coupled than i would expect based on common stimulus and so on okay so i know the idea here is that you can use the maximum entropy models to do that and let me just give you an example and i can give you the reference if you're interested so this is something that we have done recently on and the setup is hippocampus and the setup is also freely behaving animal so the animal runs around while it's being recorded from the brain and of course then you can have no stimulus repeats right i mean the animal is doing whatever it wants to do so there is no particular way to repeat anything um and so so so what what does one want to do is the following so here is a setup so this is an arena it's a circular arena one meter there is a mouse and i guess that's the i don't know that's my mouse right it's running around some trajectory uh you know r of t and as it runs around in in the hippocampus there is a set of cells so ca1 neurons that really like to fire locked to space so if you record from it you will find a cell that really likes to be active and make spikes when you know when the rat is here and when the rat is elsewhere it doesn't do anything or it just fires at a very low rate and there is another hippocampal neuron that for instance likes to fire when the rat is here okay so and so on so many neurons that they like different firing and again you know if we record from neuron a and neuron b well if this is a and this is b then of course they will be somewhat correlated because when the rat is here kind of both like to fire right so somehow that will lead to some sort of correlation and again we would like to know if a and b interact more than what you would expect because of this overlapping thing but that's not all so part of the confusion in interaction of a and b is because they're what what is called place fields like receptive fields because these things overlap but it's even more complicated so in the hippocampus these are two neurons this is this common position of the animal that can be a confound it's not the only confound so it's very well known in hippocampus that you have global modulation like you have rhythmic activity right you have various neural oscillations as it is called and neurons like to fire locked to those global oscillations and they like to synchronously fire as a global population right so you can imagine there's sort of a global some global signal it's not the position itself it's some other you know pacemaker or rhythmic thing or so on that also co-modulates a and b and can make them fire more than we would expect right so in a sense what what you'd like is a way to statistically test for excess correlation between these two guys while discounting for various other factors like this one and this one and so on okay so in in the way in which we approach this problem in hippocampus is that as a proxy for this synchronous for this signal you can actually if you record for many neurons you can actually use the and there's good reasons to do that that I don't have time to go into you can actually use that synchrony right so you look at your record for hundreds of neurons and if you just take the global variable which is this k as we had it there that's a good proxy for both oscillatory activity and for modulation because of velocity and so on that I'm not talking about here but it's a kind of global activity pattern right of all the neurons together and what now I want to do is I want to create a maximum entropy model for my population of neurons so I want p x 1 no x n and two of these ones are my special neurons say and be somewhere inside I want to ask whether they're correlated somewhere among them are these two so I want these to be max and let's say which exactly reproduces the following variables x i as a function of r and the symphony okay so I want a model that's as random as possible but correctly reproduces the activity of like the mean activity of every neuron given its position that's this guy right this place field as it's called right for each for each of the neurons you exactly reproduce where each individual likes to fire and you exactly reproduce how it's modulated by this global co-activation of activity which is this k variable right and so after we have this joint distribution that accounts for the marginal activities and for the global synchrony right you can now this is a full joint distribution right so I can now ask this joint distribution for my two neurons of interest a a and b to predict a very non-trivial null distribution what would I expect to observe for this particular statistic the correlation of a and b right you know this is a distribution because my model is a joint distribution so it will give me a expected correlation between a and b due to the fact that the marginal responses of the cells are like this and this is how they're modulated by joint activity and then I can go into the data empirically and measure the real correlation between a and b and you know it will fall somewhere within this distribution it turns out that in this type of modeling in hippocampus 95 percent of pairs of cells fall within the body of the distribution but there are neurons that you can find neural pair that you find that are correlated way more or way less than what's predicted by this new model right and for us those pairs are then the pairs that we can focus on so they somehow there needs to be to explain their joint behavior there needs to be something right which is not just shared place and shared synchrony and you know it's a non-trivial null model because it's it's a joint model right this k variable collective variable couples all of them together okay so it's really a large population effect it's not just a pairwise thing you compare you predict pairwise statistic and compare it against the data but the whole model is a joint model for the full simultaneously recorded population right and so you see this is slightly different you know here i could make an even crazier model a maxent model which constrains all the marginals and in addition constrains all the pairs x i x j right that would be a full model that's what we did in the in the retina for the neurons and then i could look at in i could introspect into the mall there would be a j i j term right the coupling between the two neurons and i could say is it large is it small and use that as a as an indication of whether things are strongly interacting or not but technically constructing this model is very hard whereas constructing this model is not hard but the world turns out oh it's not easy but it's not that hard and then using it as a null model to test against the data is actually also something that you can do so this is what i wanted to you know you have these two alternative models either you build a full-scale model that precisely reproduces all the pairs and so on and try to learn from it or you use a simpler maxent model as a null model and then test the data against it right it's a different same thing but it's a different state on the problem all right so since i went a little bit over time let me let me end here and tomorrow we do something very different the last lecture will be about more statistical more theoretical it will be about still with slides though it'll be about connecting maximum entropy distributions as prior invasion inference so it will be really kind of inference oriented and this is more data analysis oriented now all right so thanks and sorry for going a bit over time yes there is some oh no it's just comments so it's fine okay so do we have to stop the recording something oh wait not this right yes