 For the past few days, we've been discussing the advantages of open science and how it can drive innovation. So this project wouldn't be possible without open data sets. So my group at the Salk Institute, we are primarily a computational lab and the data that I will be describing was collected in the laboratory at Berkeley. And the general problem that we would like to understand is the object recognition. As many of you might know, the mouse is not working, but so I will do an interpretive dance to point to things that are on the slide. So basically, if you look at on the right side, if you show a hand, this IT neuron responds a lot. If you simplify it to a magnet, the response is just reduced. For that neuron, if you show a face, the response is abolished, but then if you show a hand at 90 degrees, then the response comes back. So if we assume that to the first approximation, signals neurons in V1, code edges, then presumably to encode a hand, I need a combination of at least 10 edges for the fingers of the hand. But then it has to be across all positions, across all rotations, and not to get confused with the meton in between. So I think it's an interesting, well, obviously it's an interesting theoretical problem of how to wire these signals by achieving invariance and selectivity. And what is astonishing for me was when I learned that we were talking about convergence and divergence. So it turns out that even from V1 to V4, a single V4 neuron can get as much, collect signals from as much as the 1-6 of the V1 surface. So it's an incredible degree of convergence, and if it is without any rules, then my guess is that all hope of the selectivity would be lost. And so the goal is to somehow to figure out how our signals are processed within the visual stream and with the hope that this will be a model for how we can understand other sensory modalities. So the general approach that we would like to have is a little bit similar to the deep networks that are popular nowadays, but we were doing this even before the deep networks. So we present the animal with many, well, ideally as many as possible natural stimuli, but in practice it's 20,000 to 50,000 different images. For, we collect responses sometimes to the same set of images across visual areas. This is one of the advantages of using natural stimuli because they will stimulate the whole visual pathway, and if I were to use wide-noise images, then the primary visual cortex would respond, but the V2 and V4 will progressively respond less, and the IT neurons will produce almost no response. And then that's where most of our efforts are, is finding statistical methods for statistical analysis between stimulus and response data that can elucidate how the signals are processed. So as I mentioned today, I will describe results from the secondary visual area V2. So this is just after the primary visual cortex, and although the primary visual cortex is considered one of the best understood areas in the brain, already at the second visual area, it's not clear what is the essential computation that emerges, and some recent work has pointed to the representation of images using textures, and that's the theme that we will discuss. I will present some evidence for it also. So yes, actually I wonder about it. So they are real spikes, well I mean they are real spikes, but the reason they're in quotes is because I work at the South Institute for Biological Studies where most people are not neuroscientists, so for them a spike is kind of a metaphor, you know, so that's why they're in quotes, sorry I should have removed it. At least I removed the definition of the spike, so that's much I think. I did as much today, but that slipped. So the data is the data that was graciously uploaded by Jay Gallant's laboratory to the CRCNS website, and we actually spent a lot of effort assimilating the data, but it was still better than doing the experiments. For us it would be an infinite cost, because I don't have a lab, but still it's good for many perspectives. Just to give you the summary of the data, we are working with 80 V2 neurons, and on average for each data set there are 6,000 spikes and 23,000 stimuli per neuron, and these are not static images, they are little movies, about 3 second lawns, and the animal is fixating, and the movie is presented roughly within the area of the receptive field of the neuron, and the size of the movie is about at least two to four times than the estimated receptive field of the V2 neuron. And we will be analyzing, and now I will describe the model that we use to analyze the data. It has some elements of the deep networks, but it also has a structure that we put in as a way to model neurobiology, and I think it retrospectively turned out to be essential to create a model that has robust features and good predictive power, so both interpretable and good predictive power. So the first element of the model, so imagine that the V2 neuron has access to this large visual scene, but actually because we want to model position invariance it processes it in patches, so if we look at a small patch then there will be an unknown number of filters to be determined during the analysis, and so the neuron will be sensitive to multiple combinations of features. Some of them will be positive, meaning if the edge of, and those are shown here schematically as blue ellipses, so if there is an edge of that matches the orientation of that blue ellipse, then for that model neuron, for that model of the real neuron we predict that the firing rate will be increased, and then some edges are suppressive, and suppression is an important part of models, and it's often I think is not incorporated explicitly into deep network models, and actually that was one of the findings that I was described today, and the reason cross-orientation suppression as it's called in neuroscience is important is because it can help enhance the selectivity of neural responses to edges, so what does it mean to detect an edge? It means to detect an edge of the correct orientation and not detect an edge of the orthogonal orientation, so using these combinations of rules of both selectivity for on edge and negative selectivity, suppressive selectivity for the other edge would enhance the sharpness of tuning, so in this, but this doesn't actually been assumed by the model, it just says there will be some arbitrary number of filters to be determined during the fitting, some of them will be positive, some of them will be negative, the output of that filter will be passed through non-linearity which is schematically symbolized here by the quadratic function, one can also view this as an expansion into a quadratic space, and it has both linear term and a quadratic term, so for those who work with primary visual cortex, it's a way of modeling both simple and complex self, then the output of these non-linearity is summed and passed through a sigmoid, and that's one element of computation at one position, and then the convolutional aspects of the model come about because we repeat the same block, the same we call it a local quadratic model block, quadratic because it's passed through a quadratic non-linearity across different positions, and then the signals are weighted in a non-uniform manner across space and across time, and then the result is passed through a spiky non-linearity, so and then this model has three non-linearity, the quadratic one, the saturating one, and the spiky non-linearity, so it has some elements of the deep network, and it has some elements of the convolutional aspect to it, but the connections are structured or grouped locally, so if we remove any one of these non-linearity, it actually maps to one of the existing models that people have used in the past, but we, and I will show you, if there is interest of how the model performance drops, if we remove any one of these non-linearity and end up with a model where there is an arbitrary number of connections, but there is no convolution or there is convolution, but there is only one filter per position, and so on, so and then so we fit this model for each neuron, finding the number of filters at each position, what was, the filters are not actually assumed to be gabors, but then we will later fit them to be gabors, so we want to find what is the number of filters, are they positive, negative, however there are some local, it was in the subunit, how they are summed across positions and how they are summed in time, and all of this model is fit to V2 neurons and now I will just summarize what we found by fitting this model to the properties of the V2 neurons, are there any questions about the model or maybe I can just summarize some findings, so first I will be telling you three organizing principles for feature selectivity in V2, first we found that at each position, so the first two properties will be about this type of non-linearity and then the second one will be pulling and the third one will be across pulling, so locally we find that there are many features that we can detect as significant and on average there are about eight excitatory features and six oppressive features that affect responses of V2 neurons, but it turns out that this complexity can be simplified, first and foremost by noticing that the nearby features form the so-called quadrature pairs, meaning that they represent, so economical quadrature pair would be a set that is an edge that is a feature that is mostly a bar and a feature that is mostly an edge, together in cross-section one is a cosine times a Gaussian, the other one is a sine times a Gaussian, so the relative spatial phase of this oscillation once it's multiplied by a Gaussian is 90 degrees, so across the population when we plot the spatial phase difference between centers of features that are nearby, it strongly peaks at 90 degrees and the reason we use them, so that's one simplifying finding that even though there are 14 features per position and actually there are seven pairs and that is consistent with the fact that these quadrature pairs are a canonical model of the V1 complex cells which actually drive, provide the main output from primary visual cortex to V2, so one can then identify, even though we are recording from V2, what are the candidate properties of the V1 neurons, which will be this stage that project to, that this V2 is pulling signals from. That's one simplification and the second one was the cross-orientation suppression, so the previous graph was a schematic, but this is now real data where I'm showing for a V2 neuron, local features and I'm showing what is the most dominant positive feature and most dominant negative feature and one can see they're approximately oriented at 90 degrees and here are other features for that neuron and one can see that positive features approximately have the same orientation and then the suppressive features are all approximately orthogonal to a neighboring excitatory features and across the population one can see that the orientation difference between excitatory feature and suppressive feature is close to 90 degrees, so we started with say 14 features per position, we can simplify it to 7 pairs and now each ellipse is a pair of features and then we conceptual the simplification is that positive and negative edges are orthogonal, so then we are talking about this becomes like a motif and then there may be three different motifs for a V2 neuron that it uses to characterize it feature selectivity. So at this point I would like to acknowledge that this is only for one class of V2 neurons and we call it orientation tuning class and actually mentioned that the area V2 is a very complicated area, although it is up for discussion but anatomically area V2 one can think of it as maybe three different areas that are intertwined and there are thick stripes, pale stripes and thin stripes and they have different preference for projecting to either V4 or this would be object recognition, more of the object recognition area or area V5 or MT which would be more of the motion process in area and most studies have of V2 neurons typically find two sub-populations of neurons and that was true of the Gallant's paper, this is the data set that we are using but also to other publications where they characterize V2 neurons according to how diverse their features were spatially and we also found two sub-populations of neurons, our measure of how complicated their tuning was locally to orientation was this measure of angular deviation which is, I am going to explain it using a few examples, so this is an example with angular deviation close to zero where all the excitatory blue features have approximately the same orientation and so in fact actually the color, the saturation of the color is the strength with which this sub-unit is affecting responses of V2 neurons but then there are other neurons for where the excitatory signals are all over the place and form a more complicated pattern that we don't yet know how to characterize, put it in words but for now one can say that there is a clear by model distribution between neurons that are tuned locally to the same orientation and neurons that are tuned to a combination of local orientations so on one hand this reproduces previous results from a number of groups but at the same time there was also another study of V2 neurons by Jonathan Victor's group and they characterize neurons, they found two also to sub-populations of V2 neurons but they characterize them in terms of their dynamics whether the neurons were integrating signals in time or differentiating and so far we haven't talked about this and neither were those publications but we actually know something about the dynamics because these are movies and we can have analyzed their tuning profiles in time as much as the space so what is shown here is the average temporal kernel for neurons that are tuned to uniform orientations and one can see that it is biphasic in time it has both positive some time moments are weighted positive then some latencies are weighted negatively and then for neurons that are more diverse in orientations then there is a uniform tuning profile so using this data set using natural scenes and open data set one can resolve a question that was in the field whether the two sub-populations of V2 neurons are actually the same and I will just briefly mention that this cross-rotation suppression is also holds for the second class of neurons that are more heterogeneously tuned locally for orientation but one can see it's a very more complicated pattern but usually one can find a suppressive edge orthogonal to a nearby positive edge and that's the data across the orientation so what I have described so far was the feature selectivity of V2 neurons at the first level of the model and then I will discuss the feature selectivity of V2 neurons at the second stage of the model across positions and so it turns out that most of the time the tuning across position was quite boring it was uniform in space but in 25% it was biphasic meaning that some parts of the space were weighted negatively and some parts of the space were weighted positively so here is another one from negative to positive to negative and what is interesting is that these types of masks of course are very reminiscent of the masks of the filters that Hubel Invisal plotted in V1 for V1 neurons with respect to luminance these masks here one can interpret them as how V2 neurons pull the combinations of putative V1 outputs so one can see the evidence of the repeated cortical computation across different stages so one can ask what are these biphasic neurons are good for and one thing that they are good for is for detection of the so-called second order edges so what are the first order edges and second order edges the first order edge is a change of luminance across the conditions and the second order of edge will be an example here where there is a very little change in the luminance but there is a change in the texture on both sides of the edge so between an animal and a tree so one can then interpret these findings by saying that locally the local selectivity determines what kind of texture a neuron will be more sensitive to but then if it is a uniform second pulling mask then it will be selective to a patch of texture but if it is a biphasic pulling mask such as the one shown here then such neuron will be selective for the second order edges that represent changes in texture in position so in summary one can build on the knowledge so we know that signals go from V1 to V2 and to V4 and MT and there are certain properties of signals in V1 such that the neurons are often sensitive to combinations of features and form quadrature pairs so that's the canonical model of complex cells. In V1 we know that their orientation selectivity is enhanced if one by combining excitatory and suppressive features in approximately orthogonal matter and we see evidence of both of these computations for neurons in V2. Now they're not selective for one orientation but a combination of orientations but each orientation is a pair of features a quadrature pair and the pairs also pair up being excitatory and suppressive so an evidence of this type of computation and some neurons are tuned more to the same orientation and those are the ones that are more sensitive to changes in time and I think these are the neurons that project to MT and other neurons are more selective for more diverse shapes and they integrate things in time and those are the ones that I think project to V4. So then one can discuss that some edges across both of these neuronal classes we see that approximately one quarter of each of these type of neurons is selective for these biphasic masks that would indicate the selectivity for edges in the world that are not defined by changes in luminance but more complex features. So finally one can talk about how these signals speak to feed into higher areas maybe area V4 and so in this case we analyzed this V4 data it was an earlier study and we had a simplified model where the responses of V4 neuron were modeled as we only estimated two dominant features per neuron and then fitted them instead of the regular go bores with these curved go bores and what one can see is that this is an example of two features for V4 neuron and one can see that it is also something like a quadrature pair but along a curved contour so if you look in the cross section it goes positive negative positive or positive negative so that would be like a cross sign in the cross section and this would be like a sign in the cross section and of course in the V4 neurons there is also it is even less well understood area than V2 and there is a number of neurons so that was one V4 neuron and another V4 neuron is sensitive to almost straight edges and there was actually a trade-off I am just summarizing the findings and the trade-off such that neurons that are selective for tighter curvatures had less position invariance that neurons that were selective for straight curvatures and I can understand this computationally for example you have an animal the contour that you would like to encode if it is an elbow then I would like to know a very curved feature I would like to know its position more precisely than a position of a more straight edge where I can allow for more variance across the edge so and one can actually check this finding using parametric stimuli where neurons are where probed with little segments that were either and this is a two example of two V4 neurons that were probed with parametric stimuli that had the combination of three edges and these edges were either straight or had different degrees of curvature so there were some neurons with large position invariance and in this case we plot all the features that evoke more than 95% of the peak firing rate and all the features have approximately the same orientation but for a neuron that is not position invariant then the preferred feature changes strongly with position so with this I would like to thank you for your attention and thank my group this work was done by Ryan Romincum based on the analysis as I said of public data set thank you for your attention.