 OK, tako, da smo vzout. Zelo smo všim tega mesej, a potem... ...ko smo pridzili. OK, prič sem da smo počučati. V zelo, da smo počučati... ...Tanya, tako... ...daj... ...elo. Vse. Vse, da smo počučati? Zdaj smo počutili? OK. OK. So, go ahead. OK. Thank you. Thank you, Mata. So, today is our second lecture, and it will be primarily about information maximization in neural circuits. So, we will talk about, so as a recap, last week we discussed the basic concepts of information theory, entropy. Derived how Shannon entropy follows from three postulates, additivity, independence, and branching process. And then today we will discuss maximum informative nonlinearities. And this morning in my email, when I opened my email, I had a question from a colleague who doesn't work in neuroscience, but he was asking a question, can a neuron transmit more than one bit of information? So, we will discuss that. And we will talk about entropy in information for a Gaussian variable. And in an approximation when the noise is small and optimal nonlinearities in that case, and example applications from both neural circuits and the small application from the fruit fly development. My first question, is there any overlap between this plan and what you heard earlier today and maybe yesterday? I cannot hear, but I will take it as a no. Okay, so, actually, I think it would be good to have some connection back from the audience, or at least test the connection. Sorry? That's good. All right. Good. Now I hear something. Okay. So, I guess that's the plan. If there are any corrections, then we can move forward. And I will start, and maybe it will be for the next lecture, depending on how many questions there will be, information transmission by multiple neurons. So, probably, there is a question. Thank you, Colin. So, probably, you, we discussed it qualitatively, but now, because this is a basic formula for many of the analysis. So, we will talk about the equation for information transmission and various views, in various forms. It's the same equation, but in various forms. So, the first statement is that, as we discussed last time, the information here is the difference between the entropy of the neural response minus this entropy of the neural response given a state of the system. So, in other words, like from the examples that I described last time, this is the entropy of, if you're asking me questions, this is the entropy of my answers, whether I say yes or no sometimes. If I only say yes, this entropy is zero. Then, if I say yes or no with some probability, then we have another entropy of a nonzero entropy, and so I have a capacity for convenient information, but this can be offset by the entropy of my answers when you repeat the same question twice. So, that's the basic premise of information. And now, mathematically, so the entropy of the variable y is the integral of the probability distribution p of y log of 2 p of y. And then, so this minus becomes a plus, and we are integrating over the probability distributions of signals x. And then for a given x, we have the same equation as we had here, but now conditional on x. So, it's the integral dy p of y given x and log 2 p of y given x and the bracket is cut off here. So, you can notice that, even though there is no x in the first integral, you can add it in. So, I just added an integral dx and dy, and p of y became p of x and y. This should be p of y. And then here with this integral is dx and dy comes here. And p of x and p of y given x combine into the probability distribution of joint p of x and y. So, then we have p of x and y, p of x and y log 2. And then, I mean, I guess, it's symmetric p of y in x divided by p of x. So, that's the, and it's called mutual information because you can write it as p of y, maybe I'll write on the board. So, we have, so that is equal to dx dy p of x and y, logaritm p of y and x over p of y and p of x. So, this should be p of y here, and this is p of y. Maybe, Matej, you can write on the blackboard if it's not clear. There is some writing on the blackboard. Thank you. Thank you so much. OK. All right. So, then we have here C. OK. So, now this is our equation that you see on the blackboard, and it's called the mutual information because it is an integral between x and y, and it is symmetric. So, instead of writing on the first line entropy of response minus entropy of response given y, you can also write it as entropy of x, how much the entropy is in the source, minus the entropy of x and y, and you can also write this as entropy of x plus entropy of y minus the joint entropy of x and y. So, maybe write that, Matej, would you mind? So, this will be for our reference. OK. Thank you so much. So, OK. So, then we have a question. I have a quiz. So, I will have an example. This is in many neuroscience papers, but this is the expression that I will discuss is from Nama Brenner working with Bill Balik, but this particular recording is actually from Lauren Zinčić with whom I collaborated as a post doc. So, in this case, you have a visual stimulus sequence shown on top, and this is a primate retinoneuron, and there are many, many trials, 50 trials, or maybe even more. So, you see that, you know, it is like very reliable. It is a machine. You put the signal in, and you can see a clear pattern of the neural responses. So, now we would like to evaluate information that this neuron transmits. So, instead of integral over dx, which is written on the board, you can have integral over time because there is a one-to-one relationship between stimulus and time. And then we are comparing the entropy of responses across time, p of r, and we are comparing the entropy of responses, so entropy this way. We are comparing the entropy of responses for a given time. And so, the, you know, people compute the spike probability, which is shown in R, in red. This is just by average in all these little dots. So, one more time, this is called the raster plot, and each dot is a spike, and this is one trial, this is another trial, so the trials go this way, and in red is the average probability of a spike in the given bin. So, it never quite reaches one, so if it reached one, it would be 100% reliable, but you can see the range of variability. And here in the bottom I have an equation, which is actually a rewrite of the equation that we have on the board. It's information, but divided by the rate of the neurons. So, you can see that the neuron is in two states, p of zero and p of one binary, and integral over time dt, this is kind of probability, instead of integral over x, we have this integral, then r of t means probability of response y given time t, r of t, and then this is probability of y given x, meaning y given t, divided by probability of y overall, and this is normalized by average rate of spikes, so this is information in spikes. So, now I have a few questions for you about this equation. So, first we always say that information reflects and characterizes uncertainty in neural responses, and sometimes reviewers ask the following question. Here is my expression for information, and it's only based on the average neuronal firing rate as a function of time. And there is no kind of expression for how variable is delta r. So, how come information characterizes variability of neural responses if the equation that we got doesn't use it? So, you mean it doesn't depend on the stimulus? No, it does depend on the stimulus. It does depend on the stimulus implicitly, because if I take, and we can also talk about it, if I take a smaller segment of time, then in practical terms this expression will change, but even for a given set of time, there is no, r of t is the mean firing rate of a neuron at the time t, so there is no contribution where is the delta r as a function of t. Shouldn't we have a delta r? So, for example, here in this red graph, this is the average firing rate, but I haven't told you what is the variability in neural. Of course, the equation is correct, and information is based on variability, so it is somewhat of an unfair question, but because it does come up in reviewer comments from time to time, and it's kind of an interesting, the way I answered it, I think it's a useful exercise to think whether we are understanding everything that's written here. So, I can start giving out hints. So the question is why does this, not depend on the fluctuations of r of t? Yes. So in other words, any guess? Well, my guess would be that it does, actually. So in the sense that if you write r of t as r bar plus delta r, then this would have a contribution to delta r squared. OK. So, my theory gets a plus. But the idea is that we also have a model of the neural response here. I told you, we assume that this is a binary neural response, so we treat it as a binary zero of one, and this is the probability that it will produce a response r of t. So for a binary neuron, if I know the mean, I already know the variance. So if I tell you that the mean is one, you know that the variance is zero. If I told you that the mean is zero, then you also know that the variance is zero. If I told you that the mean is 0.5, you know that the variance is maximal. And given our model of that this is a binary neuron, if I know r, I also know p of r, and it doesn't depend on any, for a given time this is p of r, it's a binary neuron. So I also know the variability, I know the entropy. So information, yes, it reflects the uncertainty, but in our case, I know the uncertainty once I know the mean. It's like for a Poisson process, some of you know about Poisson process, once I specify the mean, I also specify the variance. Is that okay? Is that okay? Okay. So, you know, this is, comes up in the, you know, as I said, I received it a few times, when submitting a paper, so I hope this will be useful. This is another question that comes up. So another question that I, and then, you know, some application of this is in more detail, practically, oh, I don't know what happened. This is an example of application, so this is an example of recording from Thalamus. So in the brain, the visual signals start in the retina, they go through a relay nucleus in the middle of the head, and then they go back to visual cortex. So this is a recording from this relay in neurons deep inside the brain. And you can put an electrode next to a neuron, and this is called an interesting situation. There are called S potentials. So you're recording responses from one neuron, but you can also see signatures of what was the input to that neuron. So here is the reconstructed spike train from the retina, and then this is the reconstructed spike train from the Thalamus, lateral genicule, so that's the abbreviation LGN. So even though you're doing one recording, you can actually reconstruct two signals. One is the input signal, and the other one is the output signal. And so we can measure information transmission across this synapse. And by doing this subtraction procedure that I will skip over, and now we can analyze this. So they're both driven by the same signal. One is in the retina, and they record here in the Thalamus in the LGN, and we see signatures of both neurons, the incoming neuron and the outcome neuron. So while it turns out that as you saw from that recording, not every input spike generates the output spike. So in general, the retina, the input signal, is more spikes than the output signal. But then it turns out that the LGN, the Thalamus, produces more information than the retinal signal. So when you multiply these two together, actually you can have some cases where the information is not lost for some neurons. And so it really operates without any information loss. Any questions about this part? So this is an example analysis that was obtained using this example formula between that we discussed, which is r of t divided by average r, and Mati, would you mind writing that equation down for reference? So this is information in the spike train carried by this average neuron with an average firing rate r of t by neuron. I have a question. So essentially this is there is a minus sign here or not? No, I don't think so. So we have information integral. I can't quite see the blackboard, but I think it should be r of t over average r log r of t over average r without a minus sign. So this looks like an entropy because 1 over t is a probability distribution, but this is not a probability. It's a normalized. So it's information per spike. So maybe add on the left-hand side that this is information per spike so if we didn't normalize it by the r bar, it would be a probability. So you say, for example, yes, you have n repetition of the experiments, right? So essentially r of t is essentially 1 over n times number of neurons spiking, I mean, number of times neurons spikes at t. So essentially if you divide then r bar should be this averaged over t, right? So essentially r of t divided by r bar should be essentially probability of having a spike at t. So if you look at a random spike what is the, so it's number of spikes at time t divided by total number of spikes. Is this clear? So then if this is a probability then I think this should be negative, right? Or not? No, somehow it's not negative. So if you look on the left side in terms of information, we know that we have p of y given x divided by p of y. On the left-hand side, the one that you wrote here, yes. So if you integral over dx and dy p of x and y and then you can have p of y and given x divided by p of x. Ok, so p of y is r and x is t. Aha, ok, so here ok, ok. And then these two are one essentially in the sense that you are saying that x and y are uniform. X is uniform and y, which is r, is not uniform but binary. Ok, so. Ah, yes, that's a good point. So y takes two values. On the left-hand side, on the left panel, on the blackboard, y takes two values, zero and one. Yes. Yes, this y takes zero and one. So this is the spike, right? This is the spike. And then if you could write, yes, go ahead. Y is the spike. X is a random time between zero and t. So and then this should be something like a mutual information between spike and t. Is this the idea? Yes, so in information which is kind of a general expression, we should really write sum over y equals zero and one. So we will have p of y equal to zero given x over p of x and p of y equal to one given x over p of y given one. And then what will happen, yes, that p of y, when it doesn't spike, because if the spikes are rare, then that term kind of goes almost to zero. And so you're left with a contribution that y is equal to one. Yes, only spikes. Ok, so this is essentially would be rp of x and y. So this is a joint distribution of a spike and having time t. Right? Or maybe you should divide this by t, actually. Yes, you should divide this by t, I guess. And then, yes, I think the idea is that say this divided by t is the joint probability to choose a time t uniformly between zero and t and that the neuron spikes. Ok? And then the p of t, which would be p of x would be just one over t which is essentially this one over t here. And p of y is just r bar. So if you put all these things together you get this formula. Is it clear? More or less this is an exercise. For you to check that this is a mutual information. Yes, so there's a few more comments about this expression. So if r of t is not very modulated so imagine that it is being so modulated it is very, very small. A kind of constant plus small modulation. So in this case r of t by average r bar will be approximately one plus corrections. So in that case the information will be very small. So when you look at this red trace if it is not very modulated then the information is small. And another comment which is interesting is how many of you know about reny entropies? One, very good. So what we have here? Question, we have a question. Just wait, I'll bring you the mic. So that thanks. In the joint probability distribution we know that they are independent because we can see so the mutual information should be zero. Yes, so if the two variables are independent the information will be zero. Because sometimes when there are a lot of spikes we need to have a spike. Yes, so that's a very good question that if you have, if you look on the expression on the left side of the board make general expression between x and y then if the two variables are independent then p of x and y is equal to the product of p of x times p of y and so the expression under the logarithm will be one and information will be zero. So it is true that if you have two independent variables then the information will be zero. And in our expression if r does not depend on time meaning it does not depend on the stimulus then r of t divided by average r will be one and information will be zero. So imagine that the neural responses in the brain are influenced by the stimuli but also by internal thoughts. So I suppose I am listening to the concert but I am really working on my paper. So in some cases the neurons will be very activated but what I am typing but if I try to compute information with what I am listening it will be zero. So there are other assumptions that go into this equation and we don't know what we will have in exam but maybe I can assign some homework problems that you can then deliver during the exam with example data sets. So for example what happens if you have not 50 trials and what happens is it okay to record for one hour where I can record for 10 seconds. So those are practical questions and in general and then we will discuss them in a moment. So information is a positive quantity so even if the two variables are independent and the information should be zero then when I compute a given sequence you will get a non-zero answer you will have a positive answer and so you might say that there was information. So the technique against this there are several examples so I will digress a little bit and give you an example from everyday life about superstitions so an example from my life is one of my students when on a trip she brings me back a necklace and when I call my program officer they tell me your grant has been funded I was so happy. So then I say what is the correlation I wear the necklace, I get my grant so this is an example if I do not have enough information where really the information should be zero so in example of this recording you can say suppose I have 50 trials which is what was recorded I use this expression, I compute the number then instead of taking 50 recordings you can take 45 recordings randomly a subset, you compute information that you will get is positive will be larger and actually the first person who wrote about this was from Trieste was Alessandro Travis so we can go over his paper the Upward bias in information theory so I would like to ask you would you mind crossing the logarithm? First of all before that we have a question in the chat it's about aged versus young neurons so I think the question is that whether there is a way is there any experimental evidence from which aged neurons would have large information loss so whether the information content depends on the age of the neuron I think this is a very interesting question I don't know I don't know an experimental test to it you have to record from the same neuron multiple times as the animal ages substantially so I think it might be possible in an insert but I don't think it has been done so we could do one could do a study saying this brain area or neurons of this type convey so much information and say well if we take older animals they will have less information so that would be possible and certainly during development one can do this but I don't know the study here I don't know but what we do know something from what we are looking at is in general variability and in general variability increases with age and one of the book is by Schrodinger what is life and then he says that it is when the entropy increases beyond so as we live the organism can only entropy is increasing and then when it's more than can be controlled then that's that and there is experimental evidence for that so there is a paper I would say within the past two years on variability in blood samples and also how quickly a person can recover from perturbation and that variability does increase with age so when we are using this variability and projecting where one over variability goes to zero they predicted the maximum lifespan for a human of about 120 to 150 years of age so that's a study you can look up the it's called maximum human lifespan it's interesting that in that study they measure both variability even in the movement so it turns out that my initial guess would be that as we get older we move less but it turns out that not only we move less but also variability boughs of movement also increases so I'm guessing that a young child bounces constantly and if I go for a walk I go for a walk but then maybe I will take a rest so the variability in the amount of movement also increases with age which is interesting so there are these general ideas about entropy and kind of life which are not necessarily been studied not for neurons at the level of spiking but for neurons at the level of gene expression and also behavioral variability and also blood samples so the idea is that you have perturbations and so how quickly the person can recover from perturbations and that variability also correlates with the time kind of response time auto correlation time in the sequence so yeah go ahead so this is maybe a longer question but I think it's more useful to respond to questions ok, so now we can go back Tania, we have a question from Miyako, just a moment I haven't understood completely why are we computing this mutual information with that time, we are not taking into account the stimuli, I haven't understood the passage from the stimuli to just the time and not the stimuli at time t so yes so let me write down or maybe Mateo, can you write down that 1 over t integral over dt is actually sum over stimuli p of s so can you repeat that so 1 over t let's put it in a box 1 over t integral from 0 to t dt equals integral over ds p of s well, these both are 1, no? no, but I mean that there will be something so we can make instead of the equal sign we can make an error so may I suggest one thing that essentially this should be an upper bound to the mutual information between the spike and the stimuli because of the data processing inequality so this is clearly an upper bound to the mutual information between the spikes and the stimuli, right? so there are two comments so I will just tell you from papers, so we can discuss whether that's correct or not so what people say this is Bill Bialik's paper, so they say in context of these experimental studies there is a one-to-one relationship between stimulus and time ah, okay okay, so then stimulus is a function of time one-to-one so if stimulus is a function of time one-to-one then this is the same thing exactly but let's say for example in this picture you have that in time you have the same value of the stimulus right? so there are two questions, two comments to this so on one hand you can say that well actually if the stimulus is defined as really the light intensity at one moment in time then there are many different if I cross draw a line here in different times I have the same stimulus so that's not true but then they say well actually the stimulus is not just the light intensity at this time but this history of light intensity and it's true and the neuron responds not just the specific light intensity at one moment in time but to its history so if you look at the history then all of these segments are in between time and history yes and this question is important because it also is related to the question of how long should my recording be because if my recording is very short then I have not probed the probability distribution of stimuli sufficiently so in the sum over stimuli p of s that will not really be well modeled by the integral over time because this segment was too short and so we will have not a complete answer so this equation assumes quote-unquote ergodic assumption that our history of stimuli here have probed the full range of stimuli sufficiently well ok, so should we go ahead so now it also the related point is that you know p of s but I can have different p of s so sometimes in this case this is the intensity of stimuli that is taken from natural world as in pixels of an image but instead of and so it's a little bit more peaky as you can see but I can use a Gaussian distribution of light intensities and it will be a different distribution so the amount of information that the neuron conveys will depend on the probability distribution of stimuli if p of s it depends on the entropy of the stimulus it's a mutual information so sometimes what has been published is that the entropy of the neural response neurons convey more mutual information when our stimuli are taken from the natural environment and they will convey less information when our p of s is taken from kind of a Gaussian ensemble ok so that's another another comment so people have studied that many different kinds of neurons respond more vigorously to stimuli taken from natural scenes compared to a Gaussian distribution the disadvantage of that statement is that how do you define natural stimuli and their probability distribution so for a long time I would say that there is no definition for this because we know that the natural stimuli are not Gaussian there is a lot of fluctuations and one can approximate maybe as a modulated Gaussian process but bringing back hyperbolic geometry a little bit although I wasn't planning for this lecture to do this we have the natural stimuli being a combination of signals from multiple sources if we model them as arising from hierarchical network or maybe as a set of stimuli in a uniformly sampled hyperbolic space then we will have a probability distribution of natural stimuli ok, so we have another question yeah, so there is another question specific technical so would the Nyquist criterion be sufficient to know the needed length of the recording based on the stimuli now we should say what is Nyquist criterion so I think I know one answer with respect to special information of hand but let's try to think about this in the temporal domain so let's look at the question again so the Nyquist criterion be sufficient to know the needed lengths of the recording based on the stimuli so I'm guessing that I guess the answer is yes because the Nyquist criterion is exactly how much you should sample a signal of reliable estimate I'm not sure I was thinking of the Nyquist frequency so it will determine what is the frequency of stimuli that I should use so that was my plant answer but maybe is Nyquist criterion different Nyquist frequency? No, I think there is a Nyquist Shannon sampling theorem which I think is what the questions are referred to and I would think yes, this is what and I think this is what this criterion would be sufficient I imagine so let me look this up so far the answer is yes but I have to think about it maybe you can have a look at it and then you can post the answer on Slack so should we go back to rejni entropijs? Yes, so so another reviewer comment that you sometimes can get is says but you know because of all these samples sampling difficulties and the upward bias in information theory I need a lot of recordings so if I don't want to do it I will just compute variance so what does it mean variance so now if you notice here and we have in this equation we have log base 2 or you can just say in the natural log up to a scaling change so now let's remove the natural log at all and instead r of t to average r we are going to write in some power so right now what we get is the 1 over t integral dt r of t over average r to the power of 2 so now if if you can rewrite r of t if we write minus 1 maybe minus 1 and bring it to so somehow it should equal to the variance in normal responses ok so you want me to write r of t as r plus delta r of t ok so then this is 1 over t integral dt 1 plus delta r over r bar t squared so that's 1 then 1 plus linear term disappears because of delta r the variance of r of t divided by r bar squared so what we have is that instead of the logarithm you can write r of t over r or you can write r of t to the r bar to the any power ok so this to the n power and then what that is the reny entropy reny information so the reny entropy is like Kullback Leibler distance is our information p of x log p of x but the reny entropy is p of x to some power and so when we talk about information this is a specific version of the reny divergence between r of t and r bar but you can have any power and if you remove the log then you are getting the variance and so all of the statements about biases that exist for information they also apply to variance therefore when people say oh you know information is so complicated I don't want to do it, it has all these biases I will compute variance but actually bias in information is less than the bias in the variance because variance is also biased but because we are so used to it we don't think about the fact that variance is also biased and in other words this also highlights the intuition between information and variance so if I say this variable accounts for so much variance in neural response that's a statement but you can also say this variable accounts for so much information in your response and those are related things and you can actually show that the Kullback-Libler distance has the smallest kind of error bar and it saturates the so-called Kramarau bound and is more efficient than higher order reny entropies so we have a lot of questions in the chat so the difference using reny entropy versus Shannon entropy for firing neurons so the difference is that I will try to write here on the board so you can write 1 over T dT R of T over R bar to some power K and then K is equal to 1 will be approximately this one and with the logarithm being kind of partial power but K is equal to 2 will be variance and K is equal to 3 are higher order things I think we covered the information in neural circuits and then some practical applications so you can measure quantified information and one thing to notice is that this is firing rate and this is information in bits per spine so you can see that so there will be two comments about this so one of them practical is that this relay neuron conveys more information but also it says that per spike you're not limited to one bit which is a good to know statement and let's go over it, why is that the case so if I have a binary neuron and the probability of firing goes between 0 and 1 then information I will just draw on the board so the information will be so another expression for the information is dx, which is x is the stimulus no, just the entropy of the neural response so we have p of r then this is p of r log p of r with minus so the entropy of the binary neuron between 0 and 1 so that's equal maybe Matej, you can do a better job of writing on the board here so I need p log p minus p log p minus 1 minus p log 1 minus p I hear some the information transmission is not reliable but I think it should be minus p log p, yeah like that minus p log 1 minus p so this is the entropy of the neural response and so for a binary neuron that's the maximal information it can convey because that's the maximum entropy and this quantity here so that's the entropy of the neural response if you plot it looks like this between 0 and 1 so maybe let's make a drawing should be as a function of p the entropy that you have written minus p log p minus 1 minus p log 1 minus p so we have, yeah like that so we have, what do we have here so if p is equal to 1 then the first term is 0 because of the log and when p is equal to 1 then the second term is 0 because 1 minus p is 0 and 0 times log of 0 in the limit that's a 0, x log x when p is equal to 0 then the first term goes to 0 because p log p goes to 0 when p is equal to 0 here, yes and then the other term is when p is equal to 0 we have log and then when p is equal to 1 half then if we measure in units of log 2 then it will be 1 on top yeah, right here so this statement says that a binary neuron at most can convey one bit of information so now if we discretize the spike train in response window such that you can have at most one spike for bin it seems that the maximum rate that the neuron can transmit is one bit per spike but when we do the measurement here we are getting values that are bigger than 1 so the trick is well first of all it's somewhat not fair because I divided by the rate so in our equation of the integral dt r of t divided by average r so that's one part of it and the second is this is information a joint information between responses and lack of responses so if the spikes are rare then when it does happen there is a lot of information so because it is information per spike it can be more than one so information per bin has to be less than one but information per spike has to be or can be bigger than one so that I think that's a useful thing to know when you interpret the data any questions here so in fact when I woke up this morning and checked my e-mail as I mentioned there was a question can a neuron produce more than one bit per spike but I think he was also asking per bin so per bin for a binary neuron you cannot but now you have many you say that my neural response is not limited to one bin it can be multiple bins and then 0, 1 is different from 1, 0 and then you have a time sequence so then one can get much more information any questions about this? so there is a question in the chat by Colin in which context are the two different entropies used for neuron transmitters I think refers to Shannon and rainy entropies, right Colin? I don't see this question it is in the zoom chat so in which context do you use rainy entropies? I don't see it but I trust you are just going from auditory things in which context do you use rainy entropies? so sometimes the only I would say so I guess two answers that I can offer one is that from our discussion whenever you are computing variance you are actually using the rainy divergence of order 2 second you can talk about the rainy entropy and instead of writing p log p you write p squared and you integrate over dx so this is a famous problem with coincidence counting as a way to estimate probability distribution this problem is known as the birthday problem you might know so we are in the room and we ask how many people have birthday on the same day and then the logic goes that roughly I think for our year or in the same month so my memories that around 30 people you will start getting coincidences in terms of the number of days and that's a more reliable measure to estimate the probability distribution assuming they are all uniform the range of possible values so that is often is used so for example with in the case of the neuron we may not so generalizing these two words we have some patterns the number of patterns is astronomical and I would like to know they're not all equally encountered and going back to the first lecture we would like to estimate the size of the typical set so you can say when do I start seeing repeated patterns and from that number you can estimate the size of the entropy of the distribution so the reference for this where I first read it would be in so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so so