 OK. So thank you for inviting me to teach here. So yeah, I'll talk about, so in the first hour, about bridging activity, neural activity and perception. And in the second hour, I'll talk about what's called the Bayesian hypothesis, whether the brain can be thought as a Bayesian machine or not. So the problem here is when we consider neural circuits, so maybe your model producing spikes with a lot of connections, et cetera, or when we are looking at a given cortical area in the brain, the question we can ask is, how efficient is that neural activity for coding? And in particular, in terms of predictions for behavioral performances. So we want to understand what are the impacts of specific features of this neural activity and also of the variability in this neural circuit for predictions for performances. And I'm going to argue that a very useful framework to answer these kind of questions is this encoding decoding framework. So I'm going to argue that we can view perception and to some extent also behavior in terms of encoding decoding cascade. And in that model, so we are going to use statistical models, and we are going to look at the transformation between a sensory stimulus and a population response in some area in the brain through an encoding kind of mechanisms, through the properties of a set of neurons. So transformation between a sensory stimulus and population response, that's the encoding stage. And then the transformation between this population response and perception. So what is the response of the subject maybe in terms of what he or she is perceiving or how he or she is behaving? So the response as you can measure on a psychophysical task. So this would be the encoding stage. And this is the decoding stage. And we can frame those stages in a statistical way. So the encoding stage is really about describing the statistics of the responses at this stage given a stimulus. And the decoding stage is about understanding the statistics of the stimulus in the external world given the response. So trying to figure out what is the stimulus in the external world given the response in the brain. So in the following, I'm going to describe this encoding stage and this decoding stage and how we can use this kind of model to go from properties of neurons to predictions for psychophysics. So the encoding stage, we want to describe the statistics of the responses in the population of neurons in the brain given a stimulus. So of course, most sensory neurosciences about this, about understanding what neurons respond to. But here we want to describe this in statistical terms. So first of all, of course, we have to choose how we are going to describe this activity in the population of neurons. And of course, we have at least two choices. We can describe only the number of spikes that these neurons are emitting or the exact sequence of spikes. So of course, we have to make assumptions about what the signal is. And in most models that people work with in this field, we make this assumption that only the number of spikes or the rates matters. Of course, it's a huge simplification, as I'm sure you know. But since we don't understand much of the coding at this level of the spike sequences, it's a good assumption to start with. So the reason why we use statistical measures is because of variability. As you know, there's a lot of variability in the brain. If you show the same stimulus over and over again, you don't get the same sequences of spikes. You don't get the same number of spikes. And that's why we want to describe things in statistical terms. So we are going to describe a spike count in terms of a random variable. And we want to describe the activity then using probability distribution for which we want to find a model that fits the data. Is that clear? So to do this, what we do is we choose a model, so a probability distribution that fits the statistics of the data. And usually that means fitting the mean of the data, so over trials, and second order statistics like the variance of the spike count and collisions in the spike count. So we want to find a model that describes p of r given s, the probability of the response of our population of neurons given a stimulus in terms of some of the mean and the variability, that is, the variance and the collisions. That would be a good description of the data we interested in. So how do we go about describing the mean? So the mean can be thought to correspond with the tuning curve. When people recall tuning curves, what they do is they show the same stimulus over and over again. And they count the number of spikes. And they report the average spike count for different stimuli. And this is the tuning curve. So the tuning curve, as people measure it, corresponds to the average spike count over trials for different stimuli. So we know, for example, that in primary visual context, most neurons are selective to orientations. And we have those bell-shaped tuning curves for which we have a very simple model, a Gaussian model, for other dimensions. For example, disparity, a suitable model would be, so we have response curves, where a suitable model would be more something like a sigmoid. This would be true also for contrast. You would have a model looking like a sigmoid where the neurons would respond more for higher contrast and less for lower contrast. So we have all those dimensions. For example, in primary visual context, we have orientation, direction, contrast, disparity. And for each of those, we have tuning curves and response curves that we know the shape of, like Gaussian tuning curves, sigmoidal tuning curves, or maybe more funky kind of tuning curve that we can use as a model of the average spike count over trials. Is that clear? Yeah. So tuning curve is the first model we can use to describe the statistics of the response. That's the mean response. Then what we want to do is also describe the noise, of course, the variability. So the simplest way to describe the variability, the first stage, is to describe the variance. And as I'm sure you know, so people have also measured variances of spike counts over trials. And people have found that the higher the response, usually, the higher the variability. And there's a relationship between the mean spike count that you record and the variance of the spike count. And this relationship is usually linear. So the variance is a function of the mean. Sorry, I don't have the full extent of my slide, so I don't know why. So the variance is usually a very simple function of the mean. It's proportional to the mean. And this coefficient of proportionality here is called the Fanno factor. And often, people find a Fanno factor which is close to one. It can be a bit more complicated. Sometimes you would have a bit of a more complicated model where you would have the mean to some exponent here. But often, this is a reasonable model. So we have this first model, the tuning curve. And then we know that the variability is such that the variance scales with the mean. So that's already a good start for our model. So we know that some probability distributions have also this property that the variance, so as I've said, the Fanno factor is often close to one. We have this property, so the Poisson distribution, as the variance strictly equal to the mean. So often, it's used as a good model of variability, since it, in particular, it has this property that the variance is equal to the mean. And I'll show you the Poisson distribution. So that's a first model that we can use to describe variability in the brain. Another model is to use a Gaussian distribution with the variance that would be proportional to the mean with a relationship like this, where here we can manipulate the relationship between the variance and the mean. You've seen that before, I'm sure. So the Poisson distribution, I'm sure you've seen that also before, so this is just to remind you of the shape, so the equation of the Poisson distribution. So here we want to describe the spike count for a number of trials, the probability of this spike count for a number of trials, given a stimulus. And we imagine we know the mean spike count, because maybe it's given by the tuning curve, and we know the tuning curve of this neuron, so that would be F. This is the equation for the Poisson distribution. So this Poisson distribution gives us the probability to have each outcome, so each number of spikes when we know the tuning curve. So for example, if the mean is supposed to be 10, like here, the Poisson distribution would look like the black curve, so we know what is the likelihood, the probability to obtain each kind of spike count, each number of spike count over trials. So the Poisson distribution looks very much like a Gaussian distribution for means that are quite high. The Poisson distribution is always positive for means that are smaller. It looks like a bit of a truncated Gaussian here, or distorted bail shape curve here. And so if we want to model the number of spikes for one trial, then we can sample from this distribution, and that would be a good description of the spike count that we can obtain with this model on a given trial. So the Poisson distribution is a first model to describe the statistics of responses in terms of spike count for given stimulus when we know the tuning curve. It has this property that the variance is strictly equal to the mean, which is similar to the data. Another model that is very frequently used is the Gaussian distribution, as I've said. So in this model, we are going to model the spike count on a given trial as the mean spike count, which is going to be given by the tuning curve, plus some noise. And here this noise is Gaussian, centered on 0, with some amplitude. And we can fix that amplitude so that the variance is strictly equal to the mean, and then we would have something very similar to Poisson process, or a function of the mean. So the Gaussian distribution is also a model that you can use to describe the variability. Is that clear? We want to go further from this. We also want to describe the noise between neurons. We want to understand whether the noise between neurons is independent or correlated. So it might be that neurons are correlated. So for example, if neurons are positively correlated, when one neuron is spiking more spikes than its mean, so here for neuron one, this line would indicate the mean spike count on a given trial. So each point is a given trial. A neuron one would spike more than its mean. Neuron two is also spiking more spikes than its average. If this is often the case, then the neurons are correlated. If there's no relationship between the noise between one neuron and another neuron, so when neuron one spikes more than its average, neuron two can either spike more than its average or less than its average, the neurons are independent. So often people plot spike counts like this. So each dot, again, is a given trial. And if the neurons are independent, you have this kind of circular shape of this cloud. If neurons are correlated, you have an elliptic kind of cloud. So we want to describe those correlations. In the data, it's found that neurons are usually positively correlated with weak correlations. The amplitude of these correlations in the cortex is very debated. And people find very different kinds of amplitude for this correlation. It's a very active topic to try to understand what are the properties of these correlations. I have to describe them in the data and also what is the influence of these correlations for coding. And I'll come back to this. So we want to describe the mean, the variance, and the correlations between neurons, the noise correlations. So to do this, we can use our Gaussian model just like before. So here we are going to describe the activity of each neuron as a number of spikes like before given by its tuning curve plus some Gaussian noise like before. But now we can choose a model where this noise is correlated. So the noise is described by a multivariate Gaussian distribution. And you can parametrize this matrix Q, which represents the covariance between neurons to give it the shape you want. So here our model becomes that. So the probability of the response given the stimulus is given by this multivariate Gaussian distribution where so the neurons are described with their mean and a covariance matrix, which can have a shape that you give it. In reality, it's not very clear what should be the shape of this covariance matrix, whether it's more or less uniform, or whether neurons that are closer to each other are more correlated than others, whether it depends on the stimulus preferences, et cetera. So often when people play with models like this, they assume a given shape for the covariance matrix and look at the functional implications of this correlation structure. It's not clear, again, what the data would say about this. Yeah. This is an additive noise, obviously. It's not multiplying. The noise isn't multiplying yet. It's actually counting to it. So if you had a neuron which has got quite a high underlying covariance, the noise would be relatively smaller parts of it. It's relatively smaller than that. No, because they have the variance scales with the mean. So there's a bit of a confusion. People talk about additive noise and multiplicative noise. And there's a bit of a confusion. This would be an additive model, but it can still be multiplicative noise in that this can be a function of f. So it's actually multiplicative in that it grows with f. So it's a bit confusing. Is that OK? So that's for the encoding part. So we have models to describe. So we have abstracted the level of circuits, and we have chosen to describe only the statistics of the responses of our neurons in terms of the mean, the covariance, and the covariance. So now the second stage of our exercise is to go towards this decoding stage. So the decoding stage is to do the reverse exercise. And now we are looking at this population response. And we want to figure out what is the stimulus in the external world. So that's the decoding problem. So there are different ways to do this. And it's not clear which is the best. What is the most plausible for describing this transformation in the brain itself? The first assumption we can make is that this transformation is somewhat optimal. That is, when the brain is going to commit to a percept or behavior, it is using all the information that there is in the population response. So now in the decoding stage, we want to figure out what is the stimulus in the external world based on this population response and extracting all the information that we can in this population response. So we can choose to work with an optimal decoder, which is going to extract all the information that there is to guess what is the stimulus in the external world. Whether the brain can do this, and I'll come back to that. It's not clear, but we can do that. So here the question is, what would perception or behavior be like if all the information here can be used and decoded to form the percept or the decision? So one tool we can use then to decode is to use maximum likelihood. So here, so how many of you know maximum likelihood? But it's simpler than the name suggests. The idea here is that we have a model where we know exactly the encoding stage. So we know exactly p of r given s. And we just want to invert the model to guess what the stimulus is given the response. So maximum likelihood, the idea is to choose as an estimate the stimulus which has the maximum probability of generating the response that you observe. So you know all about the encoding model, and you observe this kind of, so this is a population response. So we have a bunch of neurons in V1, for example. And they are ranked by their preferred orientation. And each dot, so it's the spike count for each neuron in this population. So this is my population of neurons. They are responding like this. So for example, this guy, which is selective to zero degrees, is responding with 32 spikes. This guy is responding the most. I'm looking at this, and I have to figure out what is the stimulus in the world. Maximum likelihood, the idea is that we know exactly the encoding stage. So we know exactly what should the response look like for each kind of stimulus in the external world. So the idea really is just to fit this model to the response that we observe, and choose as an estimate the stimulus which would have given the response that best fits the response that we observe. Is that clear? So we observe some responses, and we choose as an estimate the stimulus which has the maximum probability of generating this response that we observe. So we are just inverting the encoding model to decode. Say that again? For me, my understanding is you know your input, you know your output, then you just fit the parameters to get that output. That's what you're trying to do. But I just don't know how you get this output. No, no, no, here we know for all stimuli in the world, we know what should be the statistics of the responses in the brain. So that's the encoding model. Now we are just using this encoding model, and we are kind of fitting it to the responses that we observe to figure out what is the stimulus in the external world. This is our prediction. We are reading out the spike count. The idea is that we are measuring, maybe I should say that before, this population response in the brain, we can measure it. The idea is that we're measuring it. So this is like a model where we are describing. So either you are thinking of an experimental situation where we can recall from a very large number of neurons. Yeah, practically one, right? So put it up if you're going different for a head. And you approach where we have some problem to get it. No, no. That would hold for any part of the brain where you can recall a large number of neurons at the same time. We assume you can recall a large number of the same time and you have recorded them long enough so you know this model well enough, so pure var given s. So you know about the mean response of those neurons and the variability of those neurons. And this cannot be done without the data. So it's always possible in models, right? And this would hold for every places where people recall with arrays of electrodes, for example. So this decoding exercise, also people do it when they recall single electrodes, with single electrodes one neuron at a time and then they pull the data and assume the neurons are independent. So it's applicable to all kind of data where you have recordings from a large number of neurons. Is it in a way, though, is it more a question of, is a sort of hypothesis about what neurons, say the next stage of neurons, could read out from an earlier layer of neurons? So is that really what we're, I mean, because for those, the next layer of neurons, they will, as a group, be able to get information from all the neurons in the first layer. Yeah, absolutely, yes. This is a sort of hypothesis about what would happen if you have much information that they have. Yeah, I'm really interested in this transformation between the representation at some level and the response of the subject. You can think of it as this being another layer in the brain. So in any case, it's not a homunculus or a little man decoding the activity from a layer of neuron and it's really the transformation I'm interested in. It doesn't have to be explicit, that's what I mean. There's no need to have an explicit decoding stage, but this transformation always exists between some layer, some population of neurons in the brain to the response R. Okay, so the first way to decode and would be to use an optimal decoder, like maximum likelihood, but this is only feasible if we know exactly the encoding model. So whether the brain can do this, can use something similar to maximum likelihood to do this transformation, it's not clear at all. I think people agree that it would have to be an approximation anyway of maximum likelihood, especially for complex distributions. So there is some evidence from psychophysics and I talked about this in the second hour, some evidence that the brain does something similar to that and there is some literature proposing some neural circuits that could do something similar to maximum likelihood, but it's still debated or the plausibility of those circuits and whether they really work for the kind of computations we are interested in, it's not clear, but some people are interested in this in trying to figure out whether maximum likelihood can really be implemented in the brain to do this transformation between some population response and the response of the subject. So if this is too complicated for the brain to do or if we are this experimentally trying to decode from the brain and we don't know all about the encoding model, P of R given S, there are some simpler decoding strategies that we can use and I'm sure you've heard about them. One of them is the winner take-all mechanism. So here again we are observing this population of neurons and we have to figure out what is the stimulus in the external world and the winner take-all strategy is very simple. We are just going to pick the neuron that responds most and we are going to decide that the stimulus that is in the external world is the preferred stimulus for this guy. So here we don't have to know all about the encoding model P of R given S, we just have to know about the preferred stimulus for each neuron and then when we observe the response we take the neuron that responds most and we decide that because it responds most it must be that it's preferred stimulus is what is in the external world. Is that clear? So the winner take-all is a very simple strategy, it's very simple to implement a winner take-all with neural circuits. The problem is that as a strategy it's not very efficient at all. If you look at in particular the variance of this decoding strategy it's very high because it's based on responses of single neurons instead of taking into account all the populations and so the efficiency is pretty bad. But it's still used in models and it's very easy to implement. So maybe a more interesting strategy which is still very simple is that of the population vector. So I'm sure you've heard about this as well. The population vector is a strategy where we are assuming that each neuron in our population is going to vote for its preferred stimulus. So each neuron is voting for its preferred stimulus with a vote which has an amplitude that depends on the amplitude of the response. So for example, this guy which responds a lot is going to vote for its preferred orientation. For example, 10 degrees with an amplitude that is over the screen. It should not be, but it's proportional to the magnitude of the response. And each neuron vote in the same way. So this guy is also voting for its preferred orientation with an amplitude that is a bit less. Each neuron vote in the same way and then the idea of the population vector is to compute the sum of all these vectors corresponding to all the neurons in the population. And the resulting vector which is the sum of all vectors is going to be much longer than the screen here and it has an orientation which is the orientation we are going to take as a guess for here the orientation of the stimulus in the external world. Is that clear? So the population vector is also a strategy where we don't need to know all the encoding model of our given S, we just need to know the preferred stimulus for each neuron. So it's a very simple strategy. It has much better properties than the winner take all strategy. In particular, if the tuning curves are large enough if the noise is Poisson and if the neurons are uniformly distributed in terms of the stimulus preferences, it can approximate very well maximum likelihood. So it's a very simple strategy but it can be very efficient in some situation. So there's a lot of people who are working on decoding from populations of neurons using those kind of strategies. There's also a literature about using so other kinds of decoders that would be so simpler or simpler than maximum likelihood or non-optimal in some ways. And the question is really about what will be the cost of using those non-optimal decoders because the idea is that usually knowing all about the encoding model of our given S is too complex. In particular, if your neurons are correlated, knowing your given S means knowing all about the covariance metrics and that's very hard to get you. You would need a lot of measurements for all your neurons. So that would be almost, that would be infeasible. So you would have to have some approximations of this anyway. So people have looked at the cost of using things like the Winotech or the population vector or other kinds of decoders that are constrained in some way. For example, they would be constrained to be linear. So this is called the optimal linear estimator. So you can try to decode. So figure out what is the stimulus in the external world based on the responses of your neurons in the population. When you train this decoder, so you choose the optimal weight. So you train this decoder to give you the response on the set of training data. So you find the optimal weight and then you give new data and you try to see what the performance of this decoder. And so that's called the OLE. It also has good properties in some cases. Then there's also some literature about understanding whether you need to know the correlations to decode and whether if you use a decoding stage that ignores the correlation, that assumes that the neurons are independent, whether your estimates are going to be very wrong or not. And in some cases, again, so it depends on the structure of the correlation. So I've worked on this as well. On some cases, you can use a decoder that ignores the correlation, that assumes that the neurons are independent and still do a very good job at decoding. The question, again, is whether the brain, when it does this transformation, it's also reading out, if you will, the correlations in the population activity to transform to the response or not. Is that okay? So this one, yeah. Would you agree that decoding work, for example, that you just showed, if you were in a position, I can't just stick to seeing how that would work, the way it works. It works locally, so you can train around a given orientation. Yeah. But it would not work for all orientation. So you would train it for a specific region and then it would do a very good job in this region, but. Yeah, otherwise, yeah, you would need a multi-layer kind of network. Yeah. Is that clear? So now we have, so I have described, so in encoding models and decoding models, and I'm going to go towards trying to predict psychophysics from neural data. But before that, I need to do a little reminder about assessing the performances of estimators. So maybe you've seen that before as well, so we have a model like this where we have, also I'm using decoder and estimator in the same way, right? So we have a model like this where we have a stimulus in the external world and we present it many times and we have some variability. So each time our model is going to give us some guess about what is the stimulus in the external world. And from trial to trial, even if the stimulus is the same, maybe our guess is going to differ because of the variability at this stage. And we can use different kind of estimators, so decoders, we now take all population vector, maximum likelihood, some kind of crazy stuff. The question is, how can we, how do people evaluate the performance of an estimator of all this transformation? How do you evaluate how good your guesses are? And there are two quantities that you usually measure to characterize the performances of the estimator. The first one is the bias. So the bias is the difference between the average guess at this stage and the stimulus in the external world. So it's the difference between the average guess and the stimulus in the external world. And if the bias is equal to zero, the estimator is said to be unbiased. So that's the first quantity that you can measure. So if on average, yeah, your guess is equal to the stimulus in the external world, your estimator isn't biased. And the second quantity you can measure is the variability here at this stage. And so you can quantify the variance of your guesses. And there's a very important result that we know from estimation theory, is that this variance here is going to be bounded. It cannot be as small, infinitely small. It has to be bounded by a quantity, which depends on fissure information. So the variance here is bounded by a quantity, which depends on the inverse of fissure information and the derivative of the bias. So the idea is that at this level, there is a finite amount of information. And there is some variability, which is such that even if you have a perfect decoder at this stage, you can't be perfect. You are limited by the variability at this stage. And this quantity tells you about this limit, the limit here, which depends on the variability and the properties of the responses at this stage, which is going to limit your precision at that stage. Is that clear? Yeah? How many of you have heard about fissure information? Okay. So the following is going to be about fissure information. Fissure information is really a tool that people use to do this bridging between neural activity and behavior. And so, this is what I'm going to describe now. So in this framework, it comes very naturally. So the variance at this stage here is bounded, so it's known as the Cramera bound by the quantity, which depends on the inverse fissure information and the derivative of the bias. And fissure information is expressed, oh, that's a shame, I don't have the full equation. Fissure information is dependent on pure var given s, the encoding model. And it's described as a second derivative of log of pure var given s. The idea is that fissure information here depends on the properties of the neurons, so the mean and the variability. Pure var given s is given, as we've seen before, by your tuning curves and either Poisson model for the variability or Gaussian model for the variability. So at this stage, you can compute fission formation and this is going to give you an idea about the limit for the precision of your estimator at that stage. Is that clear? All right, so it's finished, it's pure var given s, that's all, but I have it on another slide. So now, how can we work with our model to try to relate activity and psychophysical performance? So now it's very easy, we have all the tools that we want. So what kind of things do we want to predict in psychophysics? What is being measured in psychophysics? So we are interested in experiments, for example, where people do estimation tasks. So you ask people, for example, what kind of orientation they see and they have to tell you, maybe turning an arrow or comparing with another stimulus, what is the orientation of the stimulus they see? And in some cases, they are going to have illusions or they are going to be biased and you are measuring, for example, this bias. So we know that in adaptation or in certain visual illusions, people are biased and so on average, their guesses are going to be different from the stimulus in the external world and this is what we measure. So this is the kind of thing we want to predict, this kind of bias. Another thing we want to predict is the precision of their perception in terms of discrimination. For example, we have those experiments where we present one stimulus and then another stimulus and we ask the subjects whether this is the same stimulus or a different orientation here. And we measure the just noticeable difference, so the difference that they can reliably detect on a given number of trials. So can we predict this kind of precision as well? So here, so in a model like this, I hope you'll be able to see that the discrimination threshold is going to depend on the overlap between the internal representation for the two stimuli. So they are going to represent the two stimuli with a priority distribution, so because there is some intrinsic variability. And so the overlap between those two distribution is going to depend on the bias and the variance of the estimates. So for example, if they have an illusion such that the difference that there is in the external world, they see it as being amplified. There will be, so these two distribution will be further apart in the internal representation than in the external world. So that's how it depends on the bias and the width of this distribution is going to depend on the viability of the estimates. So the discrimination threshold is going to depend on the bias of these estimates and the variance of these estimates. Is that clear? Yeah? Would it be possible for the bias to improve discrimination in the right direction? Does the bias always? Yeah. So bias could actually make things more... If you have a repulsive bias, yeah, it helps you for discrimination. Although in terms of variance, yeah, not in terms of discrimination threshold, finally. Yeah. Yeah. So no, actually. In terms of discrimination threshold, yeah, I'm not sure. That's a good question. I'll come back to this. I used to know that. Yeah, so now, so if we want to compare this model with the quantities that we measure in psychophysics, we can compare directly the statistical bias of our model and the perceptual bias. So if we have this encoding model and decoding model and we have a constant bias in the output, this corresponds to a perceptual bias. We can also compute the variance of these estimates or the standard deviation and then the derivative of the bias and that gives us the discrimination threshold. So I've not shown the math of this, but you can find it in my paper. So it's very easy to go from a model like this to the predictions in some of bias and discrimination threshold, it's the idea. And from the crowd bound, we know as well that even if the decoder is optimal, the discrimination threshold is going to be bounded by a quantity which depends on fission formation. So fission formation gives you the minimal discrimination threshold that you can obtain if your decoding stage is optimal. Is that clear? So this is a slide on fission formation then. So it's a very important quantity for doing this kind of exercise. So fission formation again gives you the minimal, if you want, discrimination threshold that can be obtained if you have an optimal decoder. Fission formation is expressed in terms of the encoding model, pure var given s. It's related to the pickiness of the likelihood. So it can be expressed, so pure var given s, we've seen that we can express it in terms of the tuning curve and some noise model, for example, a Poisson model. So for example, if we have a Poisson model, and I'm sorry, the rest of the expression is missing, if we have a Poisson model for pure var given s, fission formation would have a form like this. It's going to be expressed in terms of the derivative of the tuning curve over a quantity here which really, so it's a Poisson model, so this is the mean, but it's really representing the variance which is equal to the mean. Yeah, I should have said also fission formation, so you've probably heard about mutual information which is expressed in bits. Mutual information is a quantity of information theory. Fission formation is not fissure information, as we've seen, is a quantity of estimation theory. It's not expressed in bits at all, it's expressed in terms of inverse variance. So it's very different quantity if you want, although it's related to mutual information, but it's not at all in the same kind of unit and doesn't come out from the same framework. Fission formation also is always expressed for a given stimulus. It's how much information in terms of discriminability you have in your neural population for a given stimulus. Whereas mutual information usually is about a set of responses and a set of stimuli. So it's how well your neural activity is coding for a set of stimuli. Fission formation is really how well you can discriminate around a given stimulus. And as I will tell you, so it's dependent on the tuning curve of the neurons for this stimulus and in particular the slopes of the tuning curve around that stimulus. Okay, so now we have all our tools and we can explore our very interesting questions. We can play with our encoding models, our tuning curve, our noise and we can ask what is that going to do for performances. If I'm changing the number of neurons here or if I'm changing the shape of the tuning curves or if I'm changing the viability and in particular the structure of the correlations, what is this going to predict in terms of the precision of the code and in particular the discriminability or discrimination performances at this stage. And we know, so the discrimination threshold is if the decoder is optimal, discrimination threshold is given by one over the square root of fission formation. So we can compute fission formation at this stage given the tuning curves and the noise and we have a very good idea about discriminability at that stage that we can measure in psychophysics. Is that clear? So that was the goal and that's what we can do now. So we can look at the factors that control performance. And so intuitively we can think so the factors that control performance are the viability. So for example, the variance from trial to trial. So we can think that of course the less noise, the better the performance. But in reality we see that this relationship between the mean and the variance doesn't seem to vary much. So for example in a situation like learning when we get better at a given task, it doesn't look like the viability in the brain is decreasing. What seems to be happening is more like that the shape of the tuning curves are changing. So a lot of people have initially thought that neurons could sharpen the tuning curves and that for example perceptual learning was related to changes in width of the tuning curve. So that's a way of course to improve discriminability around the stimulus. So if you change the slope of the tuning curve around the given stimulus, you improve the discriminability around the stimulus. There's not that much evidence for tuning curves happening in the brain in particular with learning or attention or adaptation. What seems to happening more frequently are mechanisms such as gain modulation. So a lot of people have described perceptual learning or attention or adaptation in terms of gain modulation. Gain modulation is of course a way as well that you can use to change the slope of the tuning curve around a given stimulus. So there are a lot of models out there describing adaptation, attention, perceptual learning in terms of gain modulation with this idea that if you do gain modulation you change the slope of the tuning curve and that improves the accuracy of the code. Now efficient formation is a way to formalize this intuition and exactly quantify what would be the effect of doing gain modulation on the precision of the code. So if we have Poisson noise and so we can use the definition of efficient formation, the second derivative of the log of p of r given s or p of, so that this r and this n is the same. This n here is what I call r when I say p of r given s. So if we plug this model into the definition of efficient formation, we obtain this form for efficient formation. So efficient formation for a given stimulus as I've told you and in a given neuron is related to the slope of the tuning curve of that neuron for that stimulus. So how steep the tuning curve is around that stimulus over the tuning curve for this stimulus. So the mean response, which is also here equal to the variance. So efficient formation is really related to the slope of the tuning curve over the variance for a given stimulus, for given neurons. Now the nice thing is that efficient formation for a population of neurons is just going to be the sum of the efficient formation for each neuron. So efficient formation for a population of neurons is going to be given by the slope of the tuning curve for each neuron around that stimulus in the population over the mean response for all these neurons for this stimulus in the population. So now we have a very nice way to quantify changes in the shape of a tuning curve, how this is going to impact on efficient formation and thus on discriminability. So that's for Poisson noise, but it's very general that efficient formation is dependent on the slope over the noise. For Gaussian noise, efficient formation has a similar expression, although it looks more complicated. So if we have Gaussian correlated noise, the model I presented before, so that would be your model for P of R given S. So it's a multivariate Gaussian distribution. You can plug this in again in the definition of efficient formation and now it's three pages of mass, but you would end up with an expression like this. So if you have multivariate Gaussian noise, efficient formation looks like that. Similar to before, it's dependent on the slope of the tuning curves for this stimulus over the noise, the covariance matrix. And now there's also a second term which depends only on the covariance matrix. Yeah, so it depends on the inverse of the covariance matrix and the slope and the derivative of this covariance matrix. So this term is modulated by the tuning curve. This term is only dependent on the covariance matrix. And so people have been interested in that term in trying to understand what's the impact of correlations on the precision of the code. Is that clear? Kind of? Let me see, okay. So to finish, I'm just going to give you a few examples of how these kind of tools are used in research to try to bridge some of your activity and perception. So how people use them. So for example, you can ask yourself, once you have those tools, you can ask yourself whether the shapes of the tuning curves that you measure are optimal or not for kind of the task you are interested in. And so a lot of people have been interested in understanding whether adaptation or maybe attention or maybe learning can be thought as steps towards changing the tuning curve so that they become more optimal in the sense of efficient formation. And there is some evidence that this would be the case. So for example, this is the work of David McAlpine at UCL is recording from the auditory midbrain of the guinea pig and is presenting sound levels to those animals and is looking at the neural activity and is basically adapting the animals to given ranges of sound levels. And what he's seeing is that so he's adapting the animals to this range of sound levels. So around 40 or around 60 and is measuring the activity in terms of firing rate for the sound levels. And he's seeing that the tuning curves or the response curve here, they change. They change from green if the animal is adapted to 40 to red if the animal is adapted towards 60. And what he's doing is then measuring efficient formation on the right axis, efficient formation corresponding to those response curve. And then what he's finding is that adapting those response curve like this corresponds to improving efficient formation around those frequencies that you are sound levels that you are adapting to. So he's making the point then that adaptation in this auditory midbrain can be sort of corresponding to changing the tuning curves so that the slope is such that efficient formation becomes improved at the adapted sound level. Is that clear? So this is the kind of thing you can do. If you see tuning curves that are changing then you can ask yourself whether this corresponds to improving discrimination or not. And this gives you a tool to quantify that. So really the fire which we have got the use of fire which has, which is the total cost of firing rates No, this is the activity and this is fissure. I think, yeah. The dotted is the fissure. So he's making the point, yeah, that the maximum fissure is at the adapted, yeah. That corresponds to the sort of slope sort of thing. Yeah, exactly. So it's maximal when the slope is maximal, fissure information. And a lot of people have done the same kind of exercise for attention and learning. So this is a tool that is widely spread. Another very important research question is about understanding the impact of variability on the precision of the code. And in particular, those correlations I've told you about. So of course initially people thought that neurons were correlated, but so the variability of neurons maybe could be averaged out by pulling more and more neurons. But if neurons are correlated, then there might be a limit on this precision that you can achieve by pulling more and more neurons. What this limit is, though, depends crucially on the structure and the properties of those correlations. And it's very unclear still what's the impact of those correlations in the brain on the precision of the code. And it's also an hypothesis that is around now that maybe attention, adaptation, and perceptual learning might act by changing the structure of the correlation so as to improve the code in the sense of fission formation, actually, which is the tool they use. So these are recent papers published in high level journals. They all propose that the structure of the correlations or the covariance matrix, this Q matrix in the multivariate Gaussian distribution I showed you before could be changing with adaptation, attention or perceptual learning, and that might be the way for the brain to improve its representation. I've worked on something very related to this. I'm going just to describe it very briefly, but I think it's an example of the kind of things you can do. I looked at the models of orientation selectivity, so I'm sure you've heard about models of orientation selectivity in V1. So there are models that assume that V1 is producing orientation selectivity, so the thalamus would give only a broad tuning, and then V1 would sharpen this tuning in orientation and would produce orientation selectivity. That's a sharpening model, or... And there's an attractor... People have described this with attractor models as well. There are other people who propose that, in fact, the afferents from thalamus to V1 are precise enough that, in fact, V1 is not creating orientation selectivity per se, but it's... It's basically making this orientation selectivity independent of contrast. So there are different kinds of models which were proposed before, which corresponded to different kinds of circuits in the visual cortex. So one model, the sharpening model, was based on mostly recurrent connectivity, whereas the other model assumes more like a feed-forward scheme where orientation selectivity would be given by the thalamus and just modified slightly in the visual cortex. So what I've done in this paper is I have re-implemented those two models in the same framework. So here also with spikes and a large number of connections. So I have a retinal level and a V1 level with excitatory neurons and inhibitory neurons, conductance-based integrative and trine neurons. I have two types of connectivity for this recurrent model or the feed-forward model. And the question I've asked here is to try to understand whether those two models were actually similar or not in terms of efficiency of the code, in terms of the predictions they would made for discriminability at the behavioral level. So I'm not sure exactly what I have here. It's stuck. So the question I've asked is whether these models were equivalent in terms of information transmission. So what I've done is I've done this exercise of decoding from my models to try to estimate efficient formation around a given orientation for the two models. So here I've used a linear decoder, so a very simple decoder which I have trained for given input output that I knew about and then giving a new input to the model and trying to see what the model would guess in terms of the orientation of the stimulus. So a very simple decoding scheme which we know gives us a good estimate of efficient formation locally for given orientation. So I've played this game of decoding from my model for a very large number of trials to estimate efficient formation so as to compare my two models. The idea was that the models could spit out the same kind of tuning curves and variability in terms of the variance, but because they had different connectivity, they would actually produce different covariance matrices. So what I found is that the two models were not equivalent in terms of information transmission or in terms of discriminability. I found that the no sharpening or the feed forward model, if you want, was much better in terms of efficient formation than the other. Even if the two models could produce very similar tuning curves, I could show that this was not due to the models receiving different information in the input. So they have different information in the input because the connectivity from the Salamuse to V1 is different in two cases, but we made sure in these models that in terms of efficient formation, the amount of information they would receive in the input was the same. But we could explain the difference in efficiency by the fact that the variability in the two models in terms of the covariance matrix, in terms of the correlation, was very different. So the two models had different connectivity, recurrent connectivity or more feed forward types of connectivity, which would generate different kind of variability, which I could quantify. So these are the correlations. So the diagonal is the variance. So the correlations between all my neurons versus all my neurons attract models, tend to have very specific shapes for their correlation with strong, positive and strong negative correlations. And we could show that this structure of correlations was actually bad for efficient formation. So that model, the feed forward model, was much better for efficient formation. So yeah, so that's the finding was that those two models, which could be made very similar in terms of their output tuning camps and variability, actually made very different predictions in terms of how well they could be used for discriminability discrimination. So they are different correlations and then different information. So that's related to the more recent work that people have done where they proposed that the covariance structure might change with learning or adaptation and that might explain how the precision of the code might change. And so finally, that's the final example I'm going to tell you about. I've been more recently interested in this decoding stage. So we know very little about how the brain reads out its own activity. We don't know whether this is optimal, whether they are constrained, what exactly is being read out, we don't know. And I've asked the question about whether the study of illusions, visual illusions, can inform us on how the brain is reading out its own activity. Their models of this, it's supposed to be related to adaptation to contrast, yeah, which is read out by M.T. or interpreted in terms of changes in motion. But yeah, I'm not sure it's completely explained yet, but it's very strong. But in any case, yeah, it's related to, it seems to be related to sensory adaptation. So I've taken this example of sensory adaptation to try to answer this question about decoding. So maybe I can show you this, so by sensory adaptation I mean this kind of things where you look at this grating and make sure you see it as being vertical. Now you fixate on a stimulus like this for something like 30 seconds, which is very long, so we'll do much less. But if you look at the first grating again, I'm sure you know, now you have a bias, you see this as being tilted away from the adapter. So I'm interested in this kind of phenomena. And the question I'm asking is, so in a situation like this, sensory adaptation, we know that what seems to be happening is that the encoding stage seems to be changing. If you look at the responses, say in V1 here, you are going to see that the neurons response less and less. So if you think about their tuning curve, it seems that there's a gain modulation such that they decrease in amplitude. So you have a change in the encoding stage with sensory adaptation. The question I've asked in this paper is, so what about the decoding stage? If the encoding stage is changing and now the brain is trying to read out from its own activity, is this decoding stage changing at the same time as the encoding stage, as it should, to stay optimal or is it not? So I've compared the situation where the decoding stage is changing at the same time as the encoding stage and a situation where the decoding stage is always fixed at this time scale of adaptation. And I will show you, I've played with very simple encoding models and decoding schemes to try to see in which kind of situation I could predict the results as measured in psychophysics. So I'll show you this very briefly. You can have a look at the paper it's published. In psychophysics, people have looked at sensory adaptation for a long time and they have measured those biases, so the tilt after effects, how it depends on, so the difference between the image you are looking at compared to the adapter. And so how strong is the bias depending on this difference between the two orientation? And they have also looked at discrimination, so if you look at a given image for a long time and now you're trying to discriminate around that image, how good or bad you are. And you see very specific patterns of biases and discrimination threshold where you are biased away from the adapter and in terms of discrimination, you are mostly going to be worse so around the adapter, not at the adapter but around the adapter. So there are different studies but you can see there are very specific patterns of results that then you can try to predict with a model in terms of again bias and discrimination threshold. I played with a very simple model for the encoding stage where I'm just assuming that adaptation is about changing the tuning curve so that they decrease in amplitude. So with adaptation I'm assuming that some neurons which respond to the adapter decrease in their amplitude. Of course I could have done something more fancy like for which there's also evidence in the literature but it's not as obvious as this where some neurons could change their width or they could shift or they could have their variability changing et cetera but I kept as a start this very simple model just with the gain modulation. So extremely simple model of the encoding stage so that I have this and a Gaussian model for the variability. And then I played with the different, so I don't have all of it here I play with different decoders which were either fixed so not changing with adaptation or changing also with adaptation to take into account the changes in the encoding model and I played with either optimal decoders or simpler decoder like the population vector or the winner decor. And then I could compute the bias for those models and the discrimination threshold and compare with the psychophysical results. It's quite qualitative but since the patterns of the results observed in psychophysics are very well defined the predictions are still quite well characterized. And what I found in nature is that those decoders which were fixed which would not change with adaptation which as if we are not aware that adaptation had taken place could give predictions in terms of bias so the tilt after effect and in terms of discrimination threshold which were very similar to what was observed in psychophysics. So just an example of the kind of exercise that can be done. Well here we relate explicitly changes in the tuning curves and predictions in terms of bias and discrimination threshold in psychophysics that can be compared as well with the psychophysical data. So that's it for this first part. So that was about encoding and decoding and a bit of estimation theory where this idea that we can characterize this cascade of encoding and decoding in terms of bias and variance and this can be used to try to relate the shape of the tuning curve and the variability to psychophysical performance in terms of bias and discrimination threshold. I introduced fission formation which people use a lot to try to bridge this gap between neural activity and performance or try to measure the precision of the code in a population of neurons. So fission formation is always expressed in terms of the tuning curve and in particular the slope of the tuning curve and the noise. It can be used to get a threshold on the bound on the discrimination threshold. It gives you the discrimination threshold that would be obtained if you had an optimal decoder. And so you can use it to explore the factors that impact on the precision of the code, in particular the shape of tuning curves, happening of tuning curve, gain modulation. Also, so a topic that is quite exciting now is whether or not the correlations are also changing and what it would impact on the precision of the code. So that's it for this first part. Do you have questions maybe? It's a bit dry maybe, but I hope it gives you an idea about the kind of tools that you can use to try to bridge this gap. And also, if you have any models that produces neural activity, you can play this game of decoding from your model and trying to see what kind of performance it would predict for discriminability in particular. Well. What's the purpose of the training model? Is that kind of heavy or some sort of problem? So I should have said maybe this is a high level of abstraction where I got rid of circuits and everything and I am assuming that starting from a probabilistic description of neural activity is useful for understanding this bridge. So it can be any kind of neural activity. I'm working with very simple models, so these are models where we describe in a very abstract way activity. Say that again. A group of neural, it's a population. So how many groups is called a one population? Is there any definition here? For example, 100 neural is a one population. So that's all. Why can you run as a one population? So it's a very tricky question and also one that can be addressed with this kind of framework as well is if you think about psychophysical performance, how many neurons participate in that task? The answer is that we don't know. In a model like this, you want to have enough neurons so you cover the kind of features you are interested in. So if you have orientation, you want to have neurons that cover the range of orientations so that you can do the task. In reality, it's not clear at all. We don't know how many neurons we need, but people do this decoding exercise using measurements from arrays of neurons where they have from 50 to 150 neurons at the same time and they can predict. So for example, maybe you've seen all this literature on decoding from motor cortex and trying to predict. So in monkeys, the intention of movement and reproducing it in a prosthetic arm. So from decoding from the motor cortex, trying to predict what's the intention of movement of the monkey and reproducing it in a prosthetic arm. Yeah, yes, you can. But it can be 10,000 as well. That's also an exercise that we can do with these kind of tools and the answer is a bit tricky. It's trying to understand how many neurons are involved in a given psychophysical task. And there's a bit of a mystery here because often people have compared the precision of single neurons and the wall animal. Here in models like this, you would find that with models that have like 50 to 100 neurons, you predict the same kind of accuracy as the wall animal, which is not right because we have many more neurons than this, so there's a bit of a problem here. So maybe the readout is not optimal or maybe there are other factors that come in. But the number of neurons that should be taken into account to predict a psychophysical task is still unclear. These are very simple constructs, if you want. So we're trying to build as simple a model of the statistics of the response of a population of neurons to try to relate the two levels.