 Well, good morning, we're going to change scales now considerably. Ignacio was talking about the smallest details. I'm going to start actually the first half of my talk, talking about almost the whole brain and behavior. In the second half, I'm going to talk about neurons. Now, I was given the title of data analysis and neural coding, and I don't think in 45 minutes I can cover data analysis, and in another 45 minutes I can cover all of neural coding. So I've chosen to basically do what I'm going to call two case studies. The neural coding stuff is much more mature, but I'm going to show you some data from our lab on behavior because I can use it to talk about principles of data analysis that I think are important for young people to get early on. There are places where I know that difficulties arise, and hopefully you'll find the science that goes with the data analysis as interesting as my colleagues and I do. So the question we are answering, we would like to know, is how does the brain process information? Information about the world enters through the senses, is compared with stored information, whether genetically determined or learned, and organized behavior, we hope, emerges. And the goal of the whole laboratory where I work is to learn something about how macroscopic behavior emerges. And I've worked on topics that range from visual perception to reward seeking, and I'm going to talk in the first half, actually in both halves, it's mostly going to be about visual perception today. Some of it will be very high level, some of it will be not so high. To do this, we need an approach to study information processing in the brain, and the principle basically is you design an experiment, I should put that first, we collect data and we do some type of data analysis. Now, the problem is these days data analysis is approximately thought to be equal to statistical analysis, and I'm going to deal mostly with the issues of statistical analysis. And before we go any further, we need to make these two things clear. Richard Hamming was an engineer who did quite a bit of work on what we would call today exploratory data analysis. But he had two very nice quotes, the first of which is, the purpose of computing is insight, not numbers. And I think that it is all too easy to get caught up in the numbers and forget that the purpose is to find out how the brain works, not how you can compute a number from a bunch of data. The other is, most of the time each person is immersed in the details of one special part of the whole and does not think of how what they are doing relates to the larger picture. It's just another way of saying, you know, you're not doing the data analysis because you want the numbers, you're doing the data analysis because you want to understand something about how the brain is actually doing this fabulous job of looking around and in a semi-untrained way looking around the room. And if I asked you all, asked each of you, what is significant in this room with no other instruction than that, you would start to tell me about things in the room that seem significant. Now, we may not agree about their total significance, but this really badly underspecified problem is something we would solve effortlessly. I mean, this is something, when I make that statement to you, you look around the room, you have no trouble doing it. And yet, we actually can't do it with a computer. And in the second half of my work where I'm going to talk about neural coding, I once got locked into a conference room at the CIA in the United States. It's the only time I've been locked in to give a talk. Usually, they want to lock you out after a short period of time, but they wanted to know whether we had discovered some way to automatically analyze photos because they had great computer algorithms to look at these thousands and thousands of satellite photos. And this was in the 80s, so you can imagine how many more photos they have now. But what they wanted to know was whether you could make an algorithm that knew whether something changed from day to day that was significant. So actually, at my era, we all think back to the Cuban missile crisis, which grew out of satellite photo interpretation and some photo interpreter seeing a photo of an area in Cuba and seeing there was new construction going on that hadn't been there a few days before. And this led to the discovery that the Cubans were building missile bases with the help of the old Soviet Union. And to do that kind of task is something the brain does with no trouble at all. We have no idea how. So one of the points I want to make, and I'm going to show you, is how important visualization is. And to my British friends, I'm sorry about the Z, but I do come from the other side of the ocean. The value of adequate visualization cannot be overemphasized. The best pattern recognizer is the one between your ears. And if you can present your data in a way that simplifies it and you can make something of it visually, you just must try. Because if you do an analysis on your data, it reveals something that you didn't think you noticed and you go back and look at your data and you cannot see where the result came from. Don't trust it and certainly don't publish it. Okay, now I'm going to talk about a couple of roles for statistics. There are two parts really in modern statistics. One is to test a specific hypothesis. For example, the data from treatments, that is two sets of experimental parameters. Are they equal? This is something we all do. You do two conditions and you ask, one is a control condition, one is an experimental condition. You just ask whether they are the same or different. To do this, and then the second form, and I'm going to talk about both of them briefly, but mostly the second one, is to form a statistical model for assessing the efficacy of parameter fits. And I hope the next slide, okay, no it's not, but we'll go on from here. So hypothesis testing. To test a specific hypothesis, for example the data from two treatments, are they equal or are they drawn from a known distribution? So many times you don't actually have two treatments, you just have some assumption about what the unaffected treatment is. You do something and you expect your data to look different than say a Gaussian with a mean of zero, you might expect it to have a Gaussian distribution with a mean of ten. We actually believe that the two sets are different. What we can ask is whether they are the same, the so-called null hypothesis. The key is always to construct the null, the correct null. I'm going to show an example in part two of the neural coding where from my point of view a poor null was constructed and the whole field of neural coding chased a problem for a good 10 or 15 years because a poor null hypothesis had been constructed, which is troublesome, it happens all the time. Don't think you're immune, it's easy to do. Hypothesis testing is parametric. We assume a distribution from which the data are drawn, most often Gaussian. These are things like a t-test, linear regression, ANOVA, and we just look at the p-values and ask whether we got a significant result. That is not all you can do and we'll get to that in a minute. You can also do non-parametric, which is a misnomer. It's just the parameter you're measuring is different. You're always testing against some distribution and in a lot of non-parametric statistics instead of testing the data against a normal distribution, you're testing, for example, the ranks against a normal distribution because there are a couple of theorems that show for most real-world data, excuse me, for most real-world data, the ranks are Gaussian distributed and so you can do some statistics on the ranks. These are generally less powerful but much less likely to violate the assumptions, so if you get a significant result with it, you usually have something that is reasonable to believe. But again, be sure you know where it came from before you go talking about it too much. We're going to consider an experiment so I can show you what we've done in the way of data analysis and where you can get. We're interested in how primates, including humans, perceive visual objects. In these experiments we ask whether the weight of the mixture in morphed images is perceived, how well and what happens after damage to relevant brain areas. So I'm going to talk about one of the oldest approaches to doing behavior, which is looking at normal animals and then damaging a brain area that you have reason to believe is involved in a function and asking whether you've damaged the function or not and further in what way if you have. Our subjects are going to be rhesus monkeys. The brain looks roughly like the brain of a human. The reason we use rhesus monkeys is it's a convenient species. In some countries they're pests. The human brain is a recognizably old world architecture. I do not make a strong claim about how much we learn about the human brain from monkey brains but at least I have the belief it's closer to a human brain than a rodent brain is and when I'm looking at functions like perception I would like to believe that the monkey visual system works roughly using the same constructs that the human brain does. But monkeys are stupid. Let me just make that clear. They may look like primates but they are not smart animals. The brain is about 250 grams. Your brain is about 1500 grams. That difference is huge. What we're going to talk about is the visual system. In the monkey, I think I have another diagram of this. There's the eye. The optic nerve goes into the middle of the brain to the lateral geniculate nucleus. The nerve cells there project to the primary visual cortex which on the monkey is on the surface at the back of the brain and then through a bunch of experiments over the past 50 years we've learned that information passes at least in a feed-forward way down what is called the temporal lobe through a series of areas and today we're going to be concerned with this area at the front of the temporal lobe, area TE shown both here and in cross-section and a region here in light gray that I'm going to call rhino cortex today. Now, anyway, so let's skip this. What we know about this, Hube and Wiesel showed in the early 60s that neurons in the primary visual cortex respond very nicely to oriented bars of light and they showed that these cells are organized in columns where every orientation for a particular location in space is located and then the next little location in space is mapped into the next what they called column and they won a Nobel Prize for this discovery. This information is fed down the temporal lobe in the early 70s a researcher named Charlie Gross found that there were cells in this area TE that I just mentioned that are sensitive to faces and hands and so Hube and Wiesel had already speculated that vision was hierarchical that you started with very simple constructs in the visual system and ended up with very complicated constructs and this TE area was sort of the final nail in that hypothesis that you went from simple objects to complex objects as you go through this hierarchy and there's this other area that I'm going to call rhino cortex for now. Okay, for the experiment I'm going to show you today I'm going to show it to you in two ways. The first is and this is the way we actually did it we were interested in how easily monkeys could learn to tell cats from dogs there's a long history of trying to train monkeys to do this and for reasons that still aren't clear to me always took a long time to train the animals. Now I'm not going to go into it today but we found some tricks for getting the animals to do this where they will learn it as quickly as we think a human learns it and I haven't got time to deal with that aspect if we were talking about animal behavior I'd be glad to if someone wants to ask me later we had three monkeys in each group we asked the monkeys to respond in one way for one class of stimuli say dogs and respond in another way behavioral response when I say response here to cats we have the monkeys learn the principle of the task with just a single pair of stimuli this avoids having to account for the monkey learning how to respond that is it establishes what is called a learning set because for this experiment we weren't interested in learning how the animal learns to respond A, respond B we wanted to know could the animal tell A from B so we wanted to get beyond that so we'll bypass the learning of the response and consider the categorization only and when we did this with three monkeys this is visualization we asked what's the percentage of items out of all these sets and it was 420 cats and 420 dogs that they saw once they had learned the principle from seeing 20 dogs and cats and practicing with them and then they were tested on 420 dogs and 420 cats they had never seen before and they did well they got it mostly right and if you all think about it there are a few dogs that you're not sure probably from a photo from the side whether it's a dog or a cat these photos were in all views we didn't just have it face on or head only some of it was head, some of it was side views some of it was actually from the back but the monkeys did this really well we also damaged this brain area called TE and on our initial testing it looked like the monkeys actually were having no difficulty remember this is that area that's supposed to have these face selective neurons we thought these animals would be devastated in this task and they're damaged there's no question they are not or this is the question we need to ask because we're worrying about data analysis this is this different than that and is that different than that and the answer is this is not different than that and this is different than that now how do I know that? no on what basis do I claim that? can you just compare the average data? what if each animal has an idiosyncratic bias in response a so-called random effect it's something the animal does and it happens to be part of our experiment because we picked that particular animal but it is not part of what the statisticians would call the treatment the treatment in our case is dogs and cats it is not whether one animal is different than another and the other part of the treatment is this damage to the area TE so we want to know is there a systematic difference across all of the animals the latter is a common problem in statistics whether biology or not we take several repeated measurements from each animal and we're interested in what is called the treatment or in modern terms a fixed effect so if you're reading in a modern statistics book you're going to get this concept these two terms of fixed effects and random effects and it's not always exactly clear about which are random effects but we won't get into that today it's just something to be aware of excuse me so we need to account for the variability arising from the difference in the animals and in modern terms this is called or semi-modern it's called a repeated measures problem because you're taking several measurements from each animal and what you would like to do is remove the bias that each animal brings in and one thing you could think of doing and it is reasonably valid is to go ahead and take the average performance of each individual animal and subtract that value from every measurement for that value and now you're measuring the animal's performance against his overall mean performance that isn't quite right but it's close because the variability can change things a little bit and as a first pass people do a repeated measures correction and this is an example of a printout that I don't expect you to understand I want to just make clear what you will see is something called error that's attributed to the monkey and just know that the statistic this is an ANOVA, an analysis of variance which is just a version of linear regression and hopefully most of you or all of you have heard of this but what happens is that computation is done to segment the variance that's due to the individual animals from the variance that's due to the treatments that you've imposed and so the error for the monkey is the part you're not interested in in this experiment or we weren't and the error called within is exactly what we're interested in and as you see there's a... sorry, it's a... in this case it was just two classes so this should read category is a significant effect and the group is a significant effect so we know what we could see the animals really do know the difference between dogs and cats and if the statistic had come out otherwise I would not have believed it but it certainly confirms what the graph showed but the other part which is is that no a little really higher than the no for the other animal and the answer is yes there's a group effect in other words the TE animals versus the normals the normal situation and there's a morph level or a category by group interaction and the interaction had I drawn it would be if I connected these by a straight line and asked whether those two lines were parallel if they are parallel it is said there is no interaction if they are not parallel and that deviates significantly there is an interaction and we can see the interaction because in the dog category the animals perform about equally well but in the cat category they don't now we understand this and it has to do with the way we did the behavior and not with their ability to tell cats apart because we can reverse this and reverse the result but that's not completely the point today so that's one level we know we have a significant effect we'd like to do what is called statistical modeling not just simple hypothesis testing so statistics have moved forward to modeling data and ANOVA is actually a model but I only emphasize the significance testing what is the difference between theoretical modeling and statistical modeling it's forward versus backward a theoretician and I think Uppibalo later today is going to actually talk about some real theory is someone who sits and thinks about all of the phenomena and makes up a model and says this is my model now we're going to go and see not only whether we can sort of interpolate from old data but can we predict conditions from the model that we didn't really put in in making the model statistical modeling is saying I'm going backwards I have a set of data I want to fit it with some kind of a model and I want to know whether the parameters I have chosen to encapsulate the variability in the model are appropriate for the data I have collected and in general a statistical model is not expected to extrapolate beyond the limits of the data that were or the conditions that were considered in the experiment a theoretical model to be a good theoretical model ought to take you into a new domain is that clear to everyone? because it's a really important point now to match data against a theoretical model you actually use some of the tools in statistical modeling because you use the theoretical model as your null hypothesis you ask whether you can reject it you hope not it's a good enough model the second role for statistics is that theoretical modeling is trying to estimate parametric descriptions of the data with judgments about which parameters are better descriptions of the data contrast with theoretical modeling is a forward process in which one hopes to extrapolate beyond the bounds of the current observations or explain the method of implementation so you could actually say in a theoretical model this is about what the mechanisms are that made something happen okay, so we're going to go back to our category experiment and we're going to model a little data the experiment sorry, I had a cold last week and I'm still coughing a little bit category ambiguous stimuli we originally did the experiment just like I described with cats and dogs but in this case what we did was take cats and morph them with dogs now this is an ill-defined concept we never did find a rigorous algorithm to do it but you can do something that you believe looks like a relatively continuous change from this cat to this dog and the question we wanted to answer with this is how well would our monkeys do with these morphed stimuli both the normals and the animals with the TE damage and this is a really nice experiment now we didn't use just one pair of morphs we actually used 20 pairs of morphs I must say some of them I find not so easy the reason I mentioned that is I'm going to show you how our monkeys did and I probably would not have done as well even though they did it with no practice and what we did was we just took these monkeys they were familiar with the cat and dog categories but they had never seen this stuff in the middle and they had never seen the boundaries either they had never seen the cats and dogs so what we did was just one day start presenting all of these images but what we wanted to know was whether they at the 50% boundary condition had a relatively random response and when it was a little more cat-like did they call it cat when it was a little more dog-like did they call it dog and you can all sit here and think about whether you would do this for them, yes well, there's software you pick some features and connect them and then the software doesn't what he's mentioning is whether the morphing software that exists the semi-automatic stuff which is what we used all of the real stuff that we found was terrible actually compared to doing the semi-automatic when you pick a few features like the tip of the nose and the tips of the ears and things like that you get a much more ambiguous kind of morph in the middle than you do when you play with the morphing software it gets really confused unless the faces look pretty similar to start with in which case it wasn't much good to us it's really good for transforming one human face into another it turned out not to work well for transforming a dog into a cat even though I can't tell you how you know a dog is a dog and a cat is a cat we would love to know that and if anyone has a good idea and wants to come to my lab and try to prove it so what happens with our monkeys? this is the performance of the normal monkeys what we're seeing is the proportion of dog against the probability of answering that it's a dog and this is what's usually called a psychophysical discrimination curve and we're going to model it in a minute as a logistic these monkeys did really well three different animals it's the average of the first three days that they ever saw the morphs I think I didn't put in but if anyone wants to see it later I'll show you how they did on the first day yes they're adults young adults mostly probably in the range of four to seven years I can tell you better what their weights are which is between four and seven kilos if that means anything okay okay anyway but there are a couple of things to note they really do at the edges very well and in the middle they have more trouble they keep working non-trivial these were all mixed up this is again the first three days they ever saw these they did almost this well on the first day and the statistical model I'm going to show you I've actually modeled the day but I'm not going to show you that model and it made no difference actually it was surprising to me because it looked like it should have but it didn't the other thing is there's just a little bias this curve for one animal the curve really does go through the 50-50 point or close enough but the other two have this little bit of bias and I want to point it out just to show you that when you visualize your data you have to get things out of it we understand why this happens it's because there was an asymmetrical reward in this case the animal was more likely to be rewarded or only was rewarded for dog-ness and if the animal wasn't sure they were a little more likely to answer dog so they wouldn't miss a reward only in one way I'm not going to show the data the question is can we be sure it's a bias on the animal side yes because if we reverse it you get the same bias the other way we know isn't quite as good but if you move the Morph boundary over in 10 minutes the animal has moved his discrimination criteria and over also so I think they're not having any trouble with the images I think they're hoping not to miss a reward okay alright these are the TE animals you already get the sense they're not the same and we also did Rhino Cortex I haven't got too much time to go into the theory of that but the reason the Rhino Cortex data are interesting is that there's an argument in the field about whether Rhino Cortex plays a role in perception or not that's all I'm going to say about it and I'll show you in just one second that our data argue against it and this is just putting all the data on top of each other and you can immediately get the intuition that the TE animals are having some trouble that the normal animals are not and that the Rhino animals are not I mean the reason I chose to show you these data is they're actually very easy you don't we're going to want the statistics but in a sense we don't need the statistics because we already know the answer we think now one of the things you noticed is that we're dealing with something that is not a nice Gaussian distribution even if I differentiate this it will not be a Gaussian and the reason is it's a binary decision and this is something that is fit in modern statistics not using ANOVA you can do a transformation of the data if you want to use ANOVA but there are more modern tools now and what I want to do is just introduce the concept of these modern tools they are everywhere all of the functional imaging uses these tools in neurophysiology they're becoming important and this is the crudest level its behavior oh I didn't want that I wanted this and these models start with something called generalized linear models and then move on to linear mixed effects models generalized linear models all of these use an exponential family of data the data can be Gaussian they can be Poisson they can be binomial they can be pseudo Poisson they can be inverse Gaussian any family of functions that you think might describe your data that can be put into an exponential format can be the generalized linear model solution can at least be approximated for these families every statistical package these days will have the tools to do the modeling for you so you don't have to implement it it's a nightmare to implement the computation and instead of maximizing the mean squared error which is what we're all used to when we do ANOVA's linear regression we can maximize something called the likelihood function which for the Gaussian case is the same as the mean squared error so you're just you're doing the same thing you actually get a slightly different answer because you can solve if you minimize the mean squared error directly you can solve that even graphically but you can solve it algebraically exactly that's a hill climbing technique and you always end up with a slight difference the goal here is to get the most likely parameterization of the tested models so we're gonna one model against another with a likelihood ratio test which turns out if the model's constructed nicely to be an ANOVA between the two models so if we do that with these data now we have something that is a random effects and fixed effects now remember I told you the random effects here are the monkey because we want to we don't want to deal with that but we have to account for it and the fixed effects are the morph level and the group and what we see here is both by group interaction meaning that those lines wouldn't be parallel so what's happened here is that we see that the means are different that's what this intercept means it's a little I haven't got time to go into how well no I will tell you what happens what's happening is this is being modeled I took the formula out is being modeled using a binomial distribution and a binomial is not a Gaussian and we're doing the likelihood or the sorry the odds ratio so what we're modeling here is not the animals trial by trial performance we're modeling whether the likelihood that the animal is going to say yes versus no and it's like going to the racetrack so you're getting three to one odds or four to one odds and you're asking what the odds ratio looks like and that's how you interpret that logistic function sorry I should have covered that so what this is plotting is just the log the probability of yes over the probability of no so it's are the odds three to one at this level the animals going to answer that way or are they one to three and that's what you're getting out of this and the reason to do it is you get this very nice logistic function and this since you're taking the log means that you're in the exponential family and you can model it with the GLM machinery and you get these very nice answers and what you get is a correct assessment of the variance because remember this is in a binomial situation and the variance of a binomial is not the same as the variance in a Gaussian it's p times one minus p rather than Gaussian distribution so you need to account for that and that's what's being done here so the model against which this is being run is the binomial model and you get these very nice answers out of it I did that and now I can take my model which is here the other thing you need to think of the odds ratio is now being mapped into the odds into the logistic space so every data point is mapped through that logistic function and the advantage of that is now you end up with a straight line and the machinery you're using to solve this looks very much like linear regression and so when you get the answer back you actually get something back that's the intercept and it is in the space where the data are being modeled but it's not in the data space the one problem with these models is you can look at them in the transform space until you're blue in the face and you get no intuition at all about the underlying process what is critical is to map the data the model back into the data space and I've done that for you so this is the exponential this is the intercept plus this number times the morph level asking what the model predicts for the normal animals and these are the data and this is the red line is the model that came out of these data for the normal animals now in one fellow swoop I got the answer not only for the normal animals but I happened to have modeled the TE animals in the same model so I've done it all in one big operation and when we do that here is the TE data I'm sorry I accidentally made them both the same color but you ought to be able to see a continuous line that has no dots on it that's the model these are the data and we know from the likelihood test that these are significant coefficients so we can live with them okay so what we do is and I didn't put in what did I do here huh I took out a couple slides I meant to leave in okay I had the Rhino animals also and those two the Rhino model and the normal animal model are virtually identical so the conclusion that damaging TE interferes with the animals ability to distinguish these morphs in some mild but significant level and that the Rhino animals are undamaged which puts us on one side of this debate about whether Rhino cortex plays a role in visual perception of complex so this is an exercise in modeling data but coming to what I would like to the kind of conclusion I want which is which of these pieces of tissue actually has a functional role in this kind of perception and it led to this other very nice experiment because they had trouble with the morphs we asked what would happen if you just blocked off parts of the images and did it again and here is no mask and again you see the little bit of trouble that the TE animals the green are having and here are the Rhino's and the controls when you do those masked images the animals the TE animals just fall apart completely so they can do the categorization pretty well they can do the morphing pretty well but when parts of the image are occluded they have lots of trouble and we've actually done the occlusions in a couple of different ways and I'm not going to bother you with that today so our conclusion from this we've done all this fancy statistics but the conclusion is biological TE damage is followed by a little damage in classifying dogs and cats when noise is added to the images the deficit becomes quite severe so how does this map on to human to human patients the question is how does it map on to human patients with strokes we don't know yet I can tell you where this actually all started is that our hypothesis when we did this experiment was actually that when we damaged TE these animals were not going to be able to do the categories at all because the evidence while it was inferential from both human MR data what we do know is the corresponding areas in human MR light up when humans are doing face discriminations we've never seen it for categorization and I'm trying to get some of my colleagues to do that experiment maybe they will when these data finally are fully published these are only semi published data right now maybe I'm not supposed to be showing them but they're so nice okay this is what I can say is that at least the physiological correspondence between the monkeys and the humans is pretty good and that's what led to this and it leads me to this uneasy feeling that I have that I don't understand the connection very well between single neuron selectivity and responsiveness and actual brain function that's a big step there's a lot of assumptions people record single units and they get all kinds of selectivities and then they make very strong statements about whether that brain region or they do an MR study and they show you these bright spots and dark spots and then they make a they title the paper frequently this and this is is out by some piece of brain tissue and in fact they didn't do that experiment so they're not justified in that statement but they make it anyway because it sells really well okay so neural coding now the question I have I have in principle 35 minutes do we want a couple minute break or shall I just go on is everyone happy okay I get tired of hearing myself talk but I know I can shut that off okay so we're going to talk about neural coding this is going to change scales but again it's going to be mixing actually a little more something closer to modeling than we were talking about before but a lot of it was done using statistical approaches so what is the code that is used to carry information by single neurons we'll use the visual system again as a prototype we're going to look at one simple process identifying which stimulus appeared ultimately we want to know how the stimulus is interpreted that is what does it mean how are you going to choose which behavior to do and I even skip that in the previous piece and here we're going to make even less attention to it again we're looking at the visual system and mostly I'm going to be talking about neurons and primary visual cortex and I'm assuming everyone knows how a neuron generates action potentials is there anyone in here okay so this is we redid the Hubel and Weasel experiment so a monkey is looking at a spot except it's in an awake spot so it's redoing an experiment Bob Wertz did in 1969 where he showed that the Hubel and Weasel's data from anesthetized animals was about the same in awake animals and the trick is you have the animal look at a fixation point and the animal's job is just to look at that fixation point if he looks away he doesn't get rewarded if he keeps looking he gets rewarded if the animal learns this well enough you can wave things around and in primary visual cortex you can find the receptive field location and the orientation selectivity of neurons and it's a lot of fun I've never seen anyone go in the lab when we're recording in v1 who didn't enjoy it because you put that bar or slit on the receptive field and it goes on and the cell goes it's really nice and this is sort of what it looks like imagine these little pulses of sound and this is the optimal what's called the optimal orientation the orientation that leads to the maximal response you can get from an oriented bar and this is the orthogonal representation and you can see roughly what that looks like this is about 15 presentations of each stimulus it was mixed in with a lot of other stimuli if we consider the brain as a signal processing system can we learn enough about the statistics of spike trains to decode the messages this represent and then the more difficult question does this help us in understanding brain function a question that I'm less sure than the first one so this would be a complete experiment or a complete version of this experiment and here we've taken the data and I'm showing you all the impulses aligned on the stimulus that elicited them and this is a so called orientation tuning curve and one can see that this neuron was selective at something between 30 and 45 degrees and the reason we've reflected this we're using stationary bars a lot of people use moving stimuli I don't believe primates actually look at moving stimuli very much the reason you track things is so you can look at them carefully and if you can't look at them care while they're not moving on your retina your acuity is terrible okay so the first neural coding question if I just counted the spikes how good a job could I do and it becomes obvious that it's really difficult what we've done is count the spikes in a 300 millisecond period here which is the length of time that the stimulus was on and put a dot sorry put a dot on this graph for every time that spike count came up for this orientation and what you can see is that you could get 10 spikes for basically any one of those stimuli so you've got a problem the question is what are you going to do about it and it gets worse because if you look at the distribution of spike counts you now see that at high firing rates this distribution is much broader than at low firing rates and this is no surprise because this is a counting process again you could model this with a GLM model I'm not going to show you a GLM model of it I'm going to show you something else but people do do this now use GLM models because Poisson counting processes are exponential so you can make a very nice model of a counting process or you can consider a binary process where every bin has a spike or not but what this suggests is that this is what is called in the trade a point process and point processes I didn't put the slide in but what we should expect to see if the spike count goes up the variance goes up and if the spike count was equal to the variance it could be it came from a Poisson process because the one of the significant mathematical features of a Poisson process is the mean is equal to the variance and theoreticians love it the other feature of it it's too bad they don't represent neurons but I'm just going to assert that these distributions are not well fit by Poisson distributions so it's a point process but it's not Poisson ideally we'd like to decode as the response is unfold in time this is intrinsically a probabilistic or statistical problem how likely is each experimental condition that I've visited the spike train at any point in time and that's what I'm aiming for and obviously we think we can do it or I wouldn't be bothering setting this up we try to derive a model for the statistics of the spike trains so it's another way of using statistical representations the problem might be relatively simple if all we had to do was count the spikes to estimate the rate however there's this other little complication which is these two stimuli were chosen because they have there are two things that are of interest to me one is that the shape of this average of all the spikes if you just collect them down the page shows that the shape of this is different than the shape of that and if you look at the responses you can actually see that for this stimulus the neuron elicits a couple or three spikes and then has a period where it goes silent and then starts firing again same neuron different stimulus different at the beginning I would argue that if that were accounted for it would help in decoding the response if it's not carrying the same information as the spike count and one way I can show by example it's not carrying the same information as the spike count is it turns out that these two sets of responses have exactly the same mean firing rate and exactly the same variance that's how I happen to pick them but it turns out in our data it's not hard to go through the data and find these kinds of examples which means now that we have to worry about the temporal aspect of the response there's another kind of temporal aspect you can worry about these are responses where the contrast is varied and the orientation is varied here this is a primary visual cortex neuron the contrast is going down as you go down the column and what you can see by eye because this neuron is so silent is the latency the time of the onset of the response gets longer as the contrast goes down but the firing rate does not go down to the same degree it does go down a little but not nearly to the same degree and what's quite dramatic is to look at this response to this very faint this would be barely visible to you at 6% contrast this has a long latency and a higher firing rate at this orientation you see a shorter latency but a much lower firing rate and these cells are basically every cell in primary visual cortex that we've recorded in the upper layers does this I'm going to skip this how to decode them we want to know whether we're going to use the course firing rate or the individual spike arrival times I'm not going to spend a lot of time on that what code do you use spike counter, spike timing I'm already roughly rejecting the spike count code spike timing there are two ways to think of spike timing one is that the rate is varying but the spikes that are being elicited are sort of at random given the current rate the other is that there are spikes that are just unexpected from any kind of a random model if that makes sense so the possible relevant parameters are spike count spike count distribution as I showed you shape of the spike density which is that rate function from averaging patterns of spikes and correlations across neurons I probably won't finish all of this but we'll try so the first thing I just want to establish this is the graph that I thought was earlier I'm plotting the mean spike count against the variance for all of our data in a reasonably long counting window is that the variance to mean ratio is off this identity line so these neurons have a different variability structure this actually can be understood this is what you would expect to see with bursting where you have some periods where you get a bunch of spikes close together and other periods where they're further apart and in fact that's what most of our responses look like so this shouldn't have been a surprise I already showed you that I already talked about this issue sorry these are repetitive okay so you could decompose these responses we did that a long time ago using principle component analysis it's a way of taking data and compressing it and this is the average response across a bunch of conditions these are the principle components which add more and more complicated temporal dynamics that's the way to think about it for now that is not what the principle components how they're derived I just don't think I have time to talk about it here if you want to know I wouldn't tell you because I'd rather get on to something else this is just showing you the principle components or a good representation of the data this is reconstructing these two responses these are the same two responses where I showed you the rasters and the spike densities by the way this is just magnified a little bit in time these two have the same spike count this suggests that the first principle sorry these are the firing rates and they look identical this is the first principle component sorry this is the first principle component it tends to map the average firing rate and those are identical I asserted to you these have the same firing rate and this is essentially a quantitative test of the demonstration that that is true the second principle component had this funny peak in it with a little bit of depression and what we see if we try to reconstruct these two responses using the second principle component is all of a sudden we discover these two responses really are different but it's not in the spike count there is some slowly varying aspect of the shape of that of this thing it still doesn't answer the question about whether the spikes are stochastic or not and we're going to get to that and I'm going to skip the rest of this so the other observation you have to make about this is that it is a point process and you could think of any period you want broken into let's say one millisecond bends and the reason we're going to choose one millisecond is lots of neurons have a refractory period in the neighborhood of one or two milliseconds so the likelihood that you're going to miss spikes because they're closer together than one millisecond is low there are a few places where you would have to be careful about that but most of the places in the nervous system you don't and it's broad enough so you get every spike and you get to look at the statistics of the spikes and if you were just throwing the spikes down at random into the numbers of bends it would depend on the numbers of bends and the numbers of spikes this is what in statistics is n choose k it's the number of combination number of ways you can put k spikes into n bends and if it were a uniform rate you could just calculate this and know everything it wasn't a uniform rate and I'm going to skip this oh no I'm not going to skip it because this was an observation again this has to do with visualize your data always visualize your data we didn't do this for a number of years and we missed this point and we probably would have gotten to an understanding earlier had we made this simple graph remember I told you the variance of the spike count goes up with the mean linearly which means the variance of the first principal component goes up with the first principal component because the first principal component is something very close to representing the mean what happens with the second principal component which has this funny shape what is it related to in terms of variance and we fully expected that this graph the mean of the second principal component against the variance of the second principal component would look like the variance of the first principal component against the mean of the first principal component that's just not what happens it turns out that the variance of the second principal of the sorry the mean of the second principal component is still related to the variance of the first principal component and you can understand this I see puzzle looks why is this the case and it's because of N choose K the counts the number of combinations of firing patterns that you can have are constrained by the number of spikes you have so it means two things the principal components weren't actually a good way to look at this and we had a mileage out of it and we had not fully understood that you cannot talk about the patterns of spikes in a spike train till you know how many spikes you have available to make patterns N choose K you just cannot know until you know something about the spike counts and we did some stuff looking for repeating patterns of spikes I'm going to skip the information theory well we did information theory and I don't want to spend a lot of time explaining this but it is a way of measuring the relation between two distributions that are stochastic and basically it is a way of estimating if you have some random event and I send you some signal about that event how much better can I guess what the event is given that I know the signal okay so if I say I've got I'm thinking of one of my four fingers and I just tell you that I have I'm thinking of one of my four fingers you have a one chance and four of guessing which finger I'm thinking of if I tell you I'm thinking I've doubled your chance of guessing correctly and what information theory does is formalize that idea it makes it so that you take the entropy of the problem the uncertainty of the problem transform it by a logarithm so that you're in not multiplying probabilities but adding or subtracting you now ask again the probability of guessing once I've got the signal and the difference between those is the amount of information that's been transmitted either that means something to you or it doesn't if it doesn't don't worry about it too much at this point what it did allow us to do was look at these repeating triplets as funny codes and we could do the spike count and discover that 0.4 bits we could do the repeating triplets and discover that the repeating triplets carried information about which stimulus we had shown but it was less than the spike count and then the question comes if I knew both the repeating triplets and the spike count can I do better and the answer that we got was surprisingly no you can't do any better at all which should lead you to the intuition that somehow the repeating triplets again are related to the spike count and you need to go back and think more carefully about N choose K it suggests that it's a random process sampling something and the question is can we guess what it is now in the last 10 minutes I'm going to show you what we think it is okay not everyone agrees with us but I have yet to see a data set where we've been allowed to do this analysis where it didn't work pretty well actually spectacularly well what if we assume that spikes are stochastic samples not necessarily Poisson of the rate function so we're going to just ask what happens if you throw spikes down at random so that if you do it a lot of times you'll recover this thing back okay and the question is how do you do that and it turns out you could do it the hard way which is to simulate all the bins and throw down spikes turns out there's a much, much easier way to do this and you all have probably or many of you who've ever analyzed data will know what this trick is implicitly though you may not know explicitly which is you take this thing and integrate it so it becomes a cumulative distribution okay you've just integrated this and now you're going from probability zero to probability one and you now say I have a uniform random number generator and I need a spike train with five spikes in it but I need in the long run to recover this not to have it be uniform there's a theorem proved by Kolmogorov of the Kolmogorov-Smirnov test a well-known mathematician statistician and the theorem says that if you map a uniform probability distribution through the cumulative distribution of any probability distribution you'll get a point process back that matches the distribution you're after and the reason you all know about this is most statistics packages these days have a way to construct an arbitrary probability density function for you and they make use of this theorem to do it so what we did is said what if this is a model of a spike train so we know the counts are not Poisson but we don't know about the spikes themselves so what we did was just to map take the experimental numbers the numbers of spikes we got in the experimental data and just map them through this function and create artificial spike trains and we can create as many as we want and we can now match the distribution so we call this the spike count matched model oops if you do that and plot the number of these funny triplets in the data against the predicted number you get these these regressions that are better than I mean, you just can't expect to get this in biology it's a slope of one and on r squared of one it basically explains all the data and this works both in primary visual cortex and in the lateral geniculate we know it works in motor cortex also and I'm going to skip this so the spikes appear to be stochastically distributed according to the spike density the number of spikes must be accounted for to represent spike timing accurately how does this help us we can decode so you need to know one more thing which is how we built the dictionary so in spike count matched model we used the spike count distribution and the probability distribution of spikes over time and the integration of this so the spike count and spike density function carry the information about the stimulus that's our conclusion individual spike times are random with probabilities determined by the spike density function of the spike count well, it turns out there's a well established branch of statistics that knows all about this developed in the 30s but not paid too much attention to until about the 1980s called order statistics order statistics describes the result of independently drawing an ordered fixed number of samples from a continuous distribution doesn't that sound like what we did for the spike counts the spike trains yeah order statistics can be adapted to describe the spike trains and it allows us to calculate with the spike count matched model simulated so you can just calculate it all you need is this equation this is the order statistic of the k-th spike out of n spikes at time t actually it should be a point process I should say the k-th point I'll slip and call it spikes the this is n choose k multiplied by k this is a normalization factor to turn what I'm about to do into a proper probability density summing to one this is the integrated the cumulative distribution or the rate function and what this is is the rate function raised to the k-1 power which is what I can get rid of probabilities due to spikes that have already gone by so if I'm interested in the third spike I've got to deal with the probabilities that I need to account for the first and second spike that's what this piece is doing this is the um the probability density itself so you want the probability there and then 1-f of t to the n-minus k is to deal with the probability that's remaining for the rest of the spikes that have yet to come okay? and this gives you back the probability density and again we're using this this won't quite work the way I showed it you can only deal with it for the first spike because once a spike comes by what I gave you was the unconditional density we want the spike decoding to be conditioned on all the spikes that went before but we're going to do just a small amount of algebra that you either follow quickly or you don't did you send them the reference for the reprint? okay there's a reference that I put in that has this if you really care and so all we're really interested in is the time of the first spike then which is n times f of t to the 0 so that's just 1 f of t, 1-f of t to the n-minus 1 and then we're going to condition on previous spikes so we're going to consider every spike as the next first spike so every time a spike comes we're going to reconstruct this whole thing and do it again and we're going to this is some sample data this is what happens to the probability of spike 1 as the number of spikes goes up okay if you have only one spike in the whole period on average you recover the rate function you had in the first place if you have the first spike out of 5 you've got to get the first spike in and leave enough time for 4 more these are actually responses with more spikes in them and as you have more and more spikes in the train you see that it becomes peaked more and more that actually explains why latency looks shorter for higher firing rates because you're sampling of the driving function the rate function is actually denser deal with the dependence on n you can do that by just multiplying these first order statistics by the probability of having n spikes and you get something that looks like this the count-related first order statistic and then you start decoding using Bayes theorem the other thing that's cute about this is I'm going to show you in one second you get a probability of a stimulus when there is no spike also because 1 minus h of t the probability of a spike at a given time is the probability that you won't have a spike at a given time so you can keep decoding even when there is currently no spike this assumes you know the prior distribution but all decoding assumes that you have knowledge of a prior distribution no matter how you did it and I'll leave this and this is two spike trains being decoded a 0 degree bar a 67.5 degree bar these are the spikes the vertical lines this is a very low firing example the first couple of spikes don't do much for it but all of a sudden in this period where there are no spikes this guy starts to decode this one has basically finished decoding guessing that this anything that has these first three spikes looking about like this will be stimulus for whatever that was and it just keeps going and finally this guy finally finally figures out that it was probably stimulus one and you can do all kinds of calculations with it you can find out how well you can do the decoding at any point in time I used to have an example I guess I left it out where it started decoding before any spikes had arrived because the fact that no spike had arrived meant that it had to be one of the high contrast stimuli and the spike train came from a low contrast stimulus so the fact that there was no spike early meant that there had to be it had to be one of these low contrast stimuli and the decoding goes up really rapidly and you haven't seen a spike yet so it's I guess I'll skip this because we're done if anyone wants to talk about how to use this for pairs or more of neurons it works and the only thing I have to get to because the second half of this talk is all there just to set up this quotation there is in the United States once a year contest for someone to write the first sentence of the worst novel you can imagine ok this was the winning sentence in 1999 as published in the New York Times it wasn't the best of times it wasn't the worst of times it was the times you'd get if you arranged all possible times including fictional times in which the nights were usually dark and stormy in order from worst to best on the real number line from zero inclusive to one exclusive and then used a really good random number generator of value in that range thus choosing the corresponding times that's the times it was and that's the times they are when you're looking at spike trains so thank you 1230 on the nose if anyone has questions I'm here the next couple of days and longer and I'm happy to discuss any of it I hope I'm not as young as you guys are but I hope you see that even at my age you can still be enthusiastic about doing all this work it's been a real great ride for the past 35 years doing this kind of neuroscience so I hope all of you have as much fun as I've had any immediate follow up questions