 So yesterday we did PCA, who did not finish the PCA exercise? Okay, so there are still quite a lot who did not. So what I'm going to do now is I'm going to continue my lecture and then afterwards we'll have plenty of time to finish those PCA exercises or then do a demix PCA, which I'm going to talk about now. Now, once you've finished the PCA exercise you should basically have results that look a little bit like this, okay? So here you have time. This is the projection of the first principle component and the different colors correspond to the different task conditions. So color in this case is actually frequency F1 and then you also see dashed versus solid and that is the decision of the monkey. Whether F1 was larger than F2 or not, okay? And that is the type of plot that you should get out of the PCA exercise, okay? So you see, and so the question is what did you learn from this, basically? Okay, what did we learn? So there's some things that we can see that, you know, at least to us were a bit surprising the very first time we did this. One of the things you see is that if you look at the first principle component you see that all of the different task conditions just overlap perfectly, okay? So it has very little information about either the first time it's frequency F1 about the decision of the monkey. It is completely independent of what was happening in that particular task. It's just locked to the rhythm of the task. And interestingly, you know, the second principle component is similarly very much influenced just by the rhythm of the task. In fact, what you'll also see is that most of the variance in the data is actually explained by these two principle components. So that there's a lot of activity that is just related, you know, to the timing of the task. And so I think it's an open question, you know, why that is the case, okay? One possibility is always that, you know, from the monkey's point of view there are many other things going on that the experimenter maybe did not monitor, okay? And then these are reflections of that. So imagine, for instance, just to make something up that the monkey synchronizes its breathing to the task, okay? And there's a reflection of that, say, in the prefrontal cortex. Then, for instance, you would expect to see some kind of rhythmic activity, okay? There could be other things, like maybe it's tapping his foot or having other somatosensory experiences doing the task, okay? What you see here are the weights of the principle components onto the neurons. So it's basically the, if you look at the eigenvector, so the axis in space, and you just plot the distribution of coefficients in that eigenvector, okay? And that's basically what you see here. So it tells you, you know, how these principle components are distributed in the population. And one of the interesting things you see is that essentially every neuron is more or less a random combination of these principle components, okay? Like even if you look with clustering methods, et cetera, you don't really find any kind of different classes of neurons in this particular task in the prefrontal cortex. So there's no class of neurons that cares about the stimulus frequency of one or a class of neurons that only cares about the decision. Everything is just, you know, very distributed. And that is something that you don't just find in this data set. I think you find it in many data sets, okay? And there's also sort of characteristic shape to these distributions. They look, if you look at them more closely, you see that they look like exponential on both sides. So that will be called a Laplacian distribution, for instance. Now, why that is the case, I would say no one actually knows, okay? Again, it's kind of curious that it happens not just in this data set, but it happens in many data sets that you always find this type of distribution of the eigenvector coefficients. And so to ask what did we learn, I think it's useful to compare this again, the classical analysis that I explained yesterday that was just making computation, population averages. So remember that yesterday what we did is we kind of, you know, checked each neuron for frequency tuning, or we checked each neuron for decision tuning, and then we got this class of neurons that were like significantly frequency tuned, or the class of neuron that was significantly decision tuned, and we just took an average. And so this is what you get. And so I think one of the things you learn from principal component analysis is that there's a lot more going on in the population. Not least, that you have these strong components that are just related to the rhythm of the task, okay? In many respects, this simplified picture gives you like the wrong picture of what is going on in the data, okay? Now, because it does, it misses a lot of what's going on. So here's who asked the question, by the way. Okay, right. So I have a specific view of why it is the wrong picture, because when I was a postdoc with Carlos Brody, where for the very time, encounter this data, before I actually analyzed the data, Randall for Robo and Carlos, et cetera, had analyzed the data a lot. And sort of from this perspective, you know, that there are classes of neurons. And so we developed a little model, just a two neuron model, basically to explain this task, just a little dynamical model. And it could sort of reproduce these types of components. And then we kind of knew that there was other stuff going on in the data, okay, that didn't entirely agree with the model. So I got very curious and decided to run PCA on the data. And when I saw this, I was extremely frustrated. Because, you know, all this, like, our model would basically reproduce something like PC5, okay? And all the other stuff, you know, didn't occur in our model at all. So in that sense, I had the feeling, you know, with this type of analysis, with this classical analysis, we kind of cheated ourselves into a very simplified view of what was going on in the PFC. And then if you come from a theory perspective and you try to model something, you know, if you don't know what the data is, you're not going to model the right thing, okay? So that's why, you know, I like to emphasize, first you really have to understand what's in the data. You have to have sort of a representative picture of the data before you can make any kind of reasonable model choices. Then you can still decide that you ignore this stuff. But at least you know that you ignored it, okay? So it's from that perspective that I say, you know, it didn't give us the right picture of what's going on in the data. Despite the fact, we're just correct that there's also some dynamics there. That's true. Still, I have to say though that, you know, after seeing these components, that didn't really make me very happy either because it's not like you look at these six components and then you say, okay, now that makes total sense, okay? For me, it was totally nonsensical, basically. It was nonsensical. And so one of the things I tried to do back then is to kind of say, well, you know, these are six components. There's six coordinates in the space. It's really a six-dimensional subspace. And it's really a question of visualizing what is going on in the six-dimensional subspace. This is one particular projection of the data, of the six-dimensional data, onto six axes. But you could choose any other type of six axes, okay? They will no longer be ordered by the amount of variance, but they may sort of give you a different picture of the data. And each of these pictures may be, you know, interesting in its own sense, okay? And so if you do that, if you kind of like, you know, mess around with those six dimensions and try to find different coordinate systems in that six-dimensional subspace, then what you can see at some point is that you can actually separate out coordinates that are only responsible for the decision and those that are only responsible for the stimulus. Okay? That was sort of an interesting thing to see that actually decision and stimulus, even though they're pretty mixed here, right? So you have, you know, there's stimulus information here and then there's decision information here in the third principle component. In the sixth principle component, you have a bit of stimulus information, a bit of decision information, same in the fifth principle component. So it's similar to what we saw in the signal neuron level where the principal components basically show this kind of mixed selectivity. They care about the stimulus frequency, they care about the decision, okay? But there's no, you know, I care only about the stimulus, I care only about the decision. But if you play around with this coordinate system, you actually see you can find a coordinate system like that. And that triggered the idea of like developing a method that would do it automatically. I don't really care about doing that, but I noticed very quickly that if you're in a talk and you tell people I messed around with a coordinate system and you made it around, it doesn't make them very happy, okay? So that then led to the development of what we call demix principle component analysis and that's what I'm going to explain now. So let's look at this whole idea of, you know, dimensionality reduction again. So we've basically said that you have the state space of neural firing rates, it would be the firing rate of neuron one, neuron two, neuron three. And we kind of imagine that, you know, as the activity in the prefrontal cortex evolves, there is some type of trajectory in that state space. And yesterday evening we saw a lot of nice examples of that. Now if you have many of those trajectories, you can say that basically these trajectories span some type of subspace. Or if you want to be more fancy, you would say a manifold, because subspaces usually assume to be sort of linear and flat, and manifold can also be curved, okay? You could imagine that, you know, you have many trajectories and they generally lie on some type of lower dimensional manifold or subspace, yeah? And then the question that you may want to ask is, well, you know, that's kind of cool that they are in this lower dimensional subspace and that's what we saw with principle component analysis, but is there any meaning to the subspace? And what do I mean with meaning? Imagine, for instance, specific movements on this manifold would correspond to specific parameters of the task. So imagine that if you were to move in this direction on the manifold, then that would correspond to the stimulus that the monkey is experiencing, changing, and only to the stimulus. But if, for instance, you move in the orthogonal direction, okay, in this direction, then the thing that changes is the decision of the monkey. So now it's not just any coordinate system on that manifold, but it's a very specific coordinate system where different coordinates correspond to different things that the monkey experiences or does, okay? So in this case, it's got to be stimulus and decision, but then it turns out there are many things that you can basically separate in these sort of different directions and one way of understanding what such a direction is is to see it as a linear readout from the population because each direction is the linear weighted sum of the firing rate and what you want to do with this linear weighted sum is extract information about, say, one of the variables in your task, such as the stimulus of the decision, okay? So let's say that's actually what was going on in areas such as the prefrontal cortex and maybe other areas. So then what you would really want to know is you would want to find this coordinate system, not the principal component coordinate system, okay? Because principal component just orders your axes by the amount of variance in the data, but that would not be super informative. You would want to know something about, you know, which directions correspond to the stimulus, to the decision, to the reward, whatever else is going on in the task. First, I want to point out that not just PCA, but actually many of these dimensionality reduction approaches are sort of based on this idea that, you know, you take the neural activities of your neurons and then you project it down, but a projection, you know, intuitively can also be understood as just a linear readout, a linear weighted combination of your firing rates, which is a way of reading out information. And so what you end up with is, from these neural activities, a bunch of traces that are these linear readouts are the components, and what we ideally want, just to show that graphically in a different way, is to go from these neural activities that are very mixed with respect to the task parameters to activities to these readouts that are no longer mixed. So that you have a readout that, you know, only carries information about the stimulus, only carries some information about the decision, etc. That's basically what we want. But that's only goal number one, because that in some sense is only decoding. Goal number two is, we want to be representative of what's going on in the data. And that again is like, I guess, my perspective as a theorist, I'm not happy if someone tells me, well, I could decode the information, because then I don't know what's the other stuff in that data. Is that everything that was in the data? And so to understand that, we're going to impose, just as a principle component, that from these readouts, you can actually reconstruct the total neural activity. So that is what we do that is PCA-like in some sense. We ask to generate a bottleneck that allows us to compress the data in a way that we can totally reconstruct the data, but we want the compression to be in such a way that it relates to the task parameters that we care about when we design the task. And another way of saying why that is maybe a good thing to want to reconstruct everything is to say, well, you know, if the brain fired a spike, it was probably for a reason. So every spike matters, and we don't want to throw away 90% of the spikes in a way of just interpreting what we're interested in, and account for everything. And so I think as I had the paper posted somewhere, I sent it to Martin, so I think people may have had access to it, but there's this paper, Demex principle component analysis from Elive 2016 that basically explains the method in detail. But I'll give you some of the core intuitions now. So here's the core intuition of how you can find these readouts that reconstruct everything and only care about certain parameters. So we'll do it in a very, very simple toy example. So imagine that you had a task where there are three stimuli, and those three stimuli evolve over time, and you record the firing rate of, say, two neurons. So here you have stimulus one, and you see that both neurons' firing rate changes. Here you have stimulus two, both neurons' firing rate changes, and this is stimulus three. And there are basically five time points here. And what we're trying to demix here is just the stimulus against time. I would have liked to show you a stimulus decision in time, but we get into dimensionality problems. So that's why we have to work with a simple two-dimensional example. So let's first look at some classical methods on how they work. So one method that we haven't had, whether it's very classical, is called linear discriminant analysis. What linear discriminant analysis tries to do is find a projection of the data that separates the stimuli as much as possible. And visually you can see that this is a really good axis. So this would be the axis for linear discriminant analysis or LDA. If you project the firing rates onto this axis, then you see stimulus one falls here, stimulus two falls here in the middle, stimulus three falls here, and then nicely separate it along this axis. So that's an axis that nicely separates the three stimuli. However, it's not an axis that allows you to reconstruct the data very well. Part of the reason is that if you think about the data, you may have noticed that stimulus one and stimulus two are very close together, and stimulus three is out here. So stimulus three is very separate. However, on this LDA axis, they're all sort of equally spaced. So you obviously lost information by using this particular axis. And there's a general statement about decoding. Whenever you decode information, there's always the danger that you lose a lot of the information. If you decode one bit of information, you may lose the rest of the information. And that's basically illustrated here. So you lost the fact that stimulus three was out there. And in turn, that means you cannot really reconstruct the data very well. So LDA basically allows you to separate the three stimuli, but it's not representative of what is going on in the data. PCA on the other hand, which we had, is different. PCA really allows you to reconstruct the data. So here you have the PCA axis for this example. And if you now project all these points on the PCA axis, you can already see that on this axis, the points are very close to the original data points. So you're very representative of the original data points. But there's a downside, and the downside is you failed to actually discriminate those three stimuli. Because while stimulus three is clearly up here, so it's very separate, but now stimulus one and stimulus two kind of overlap. So if you were now to look at this projected component, then you would basically see there's a lot of mixture here between stimulus one and stimulus two in some sense. So the question is can we sort of do the best of both worlds, both try to decode something, and try to be representative of the data? And so the key idea is that you change things a little bit in such a way that you find some kind of compromise between these two conflicting goals. So here for instance, I'm showing you now the way we've decoded with a DPCA decoder axis. So this axis is a compromise. On the one hand, you know, it still allows you to separate those three stimuli. On the other hand, you can see that the red stimulus is now out here and the blue and green stimulus are up there. Now you may say, oh, but you know, these guys are very far away from the red points. So how did we reconstruct this properly? But it turns out that PCA makes a very specific assumption between the step of projecting down and the step of moving back up into the original space. And because both of these steps are totally related. But you can loosen that. You can say I take one linear mapping to project down and then I take another linear mapping to project back up. And that is visually illustrated by just saying, well, this was the DPCA decoder axis. And then what we're going to do is we're now going to find another axis, so another linear mapping that allows us to sort of reconstruct the original data. Except we're not trying to just reconstruct everything because at this point we only care about the stimulus. So all we try to reconstruct are the means, the mean of the three stimuli. So here you have these three stimuli, one, two, three, and the firing rate. So we're going to take the mean firing rate for each stimulus and that's going to be the target of what we're going to try to reconstruct. And then you have a different axis which we call the encoder axis, which maps you from that 1D subspace back to the neurons and then this is the axis along which you encode the information. And now the distance measure that you have is the difference between this mean and these different points. So that's explanation one to give some intuition about what DMAX principle component does and I'll give another explanation in a second. First I want to point out a little bit how this works mathematically. So mathematically what we do is we change the PCA loss function in that we don't try to reconstruct the full data, we will eventually, but in this single step we don't quite yet. In this single step we only try to reconstruct these stimulus means here, the mean firing rates for the three stimuli and we call these marginalized averages. So these are these three different points and they sit in this matrix. Then we take the original activities, put them through the decoder and the encoder. So these are the two linear mappings similar as in PCA except they can be two different linear mappings. They don't have to be the same linear mapping. And then we take this difference and minimize it. And that can be done mathematically through method called reduced rank regression where X are the regressors and XS is what you're trying to predict. And reduced rank regression is a nice method because it has a simple solution through a singular value decomposition. So there's no tricky business. You know that you always find the best solution. Yes, exactly. And here's an R. And then I promise that in the end we're going to reconstruct the full data. So far we only reconstructed the stimulus averages and we do that by decomposing the data into marginalized averages. That's a decomposition that actually existed in the statistical literature before. So it's part of the maneuver literature. They also decompose the data like this. So basically what we do is we take the data X. So these would be these 15 points here for the three different stimuli over time. And then we first average over the three stimuli. If we average over the three stimuli we just get the average profile over time. So that would be this particular trajectory. Then we add the stimulus means. So the firing rate means for the three stimuli. This is a blue, green and red point here. And then there will be some kind of remainder term which in this case we just call noise that basically tells you a little bit what you couldn't get to this particular decomposition. And the interesting thing about this decomposition is that it's actually unique. It's unique. You can go from here to here and vice versa from these three things you go back to here. So the way in DPCA the reconstruction works is that we do basically separate demix principle component analysis if you want on these three marginalized averages and we try to reconstruct those three marginalized averages through three different sets of axes. And then afterwards you can add those and you get back the original data as much as possible. So the actual loss function then turns out to be this one. So you have the loss. You try to go from the data X over the decoders for time and the encoder for time towards the marginalized average for time. You go from the original data over the decoder for stimulus, encoder for stimulus to the marginalized average for stimulus and then the same for the remainder term, the noise. Okay? And so that's what we call a demixed DCA and here's another way of looking at the algorithm that goes more step by step in what you're actually going to do. So imagine that you have a single neuron and this is its PSDH. The way we computed it yesterday afternoon. So here you have time. This is the firing rate of the Rommel's data. So the colors I have won, solid and dashed lines are the decisions. What we're going to do is we're going to take this PSDH and we're going to decompose it into these marginalized averages and that works in the following way. So first we just take the overall average. That's what we did yesterday when we centered the data. You computed the overall average. The overall average is just a number. So I just illustrate it here as a flat line then you can subtract that number and you can average the firing rate at each time point. So at each time point here, say at time two, you actually have 12 lines for the six stimuli and the two decisions. So you just take the average of those 12 points. And then if you do that for the different time points then what you get is what we call the condition independent part because we averaged out all task conditions. But as you see in these individual neurons, there's some kind of rhythmic activity left in some sense. Here you see that the neuron fired more during the first and during the second stimulus for instance. Then you can subtract that. And then what you get is that you can now average out the decision. So we go again through this PSDH after we subtracted these two terms and then we just now average out the decision. So there are only two decisions in this case. So we just average them out. So for each frequency there will be two lines and we average over those two points. And that gives us what we call the stimulus dependent part. But we can also do the other way around. We can say, okay, let's subtract these two parts, but now we average out the stimulus. So again we go here at every point and we say well, there are six stimulus frequency for each decision over those. And that gives us then the decision dependent part. And then there will be something left which will be depending on the stimulus and the decision whatever you couldn't basically capture with this averaging procedure and that's the interaction dependent part. And I kind of explained it in this sequential way of subtracting something because that's in the end what you do. But it's important to notice that the decomposition is unique. So it doesn't depend on the order in which you do things. There's a unique set of formulas that describes it. So you do this for neuron number one and then you do it for neuron number two and then you do it for neuron number three and four and five and for all the neurons in your data set. You do exactly this decomposition. And then here is what basically DPCA does compared to PCA. So basically in the end you're going to look at this condition independent part at the stimulus part, the decision part and the interaction part separately. Because looking at them combined or separately is the same thing, so separately is easier so that's what we're going to do. So let's just focus on the stimulus dependent part. This one here. So the goal of DPCA will be to go from the original data find a decoder that projects everything down and then map everything back up so that you can reconstruct only the stimulus dependent part. So that's what's different to PCA because in PCA you're just trying to reconstruct the whole data. Here what we're going to do for this particular, for the stimulus we're only going to try to reconstruct the part that depends on the stimulus. And so if you do that you build this readout or decoder to map things down that then gives you what we call the DMIC stimulus component and then you have an encoder that maps things back up and then here in the middle you now find the components using reduced rank regression that basically allow you to do so. And in this case these are the two main components that allow you to basically go from here to there. And so the goal of DPCA is to basically minimize this distance. So this is the reconstruction and that is the stimulus dependent part that we extracted from the data. So this term, this reconstruction term should be as close as possible to the stimulus dependent term. And just as in PCA the more components you have the better it gets. But you can sort them by how much variance they explain in the end and that's usually what we do. And so now we'll go to the exercises. So for those of you who are still doing PCA I would strongly recommend to finish the PCA exercise first because nothing is going to get easier with DPCA. Okay? And I think it's good to have gone through it once and Son and I will walk around and help you. And then for those of you who are already finished with PCA we're just going to like let you jump into the cold water. So you can again start with this data in this mat file romo-allpsdh.mat Then I would recommend to read the help for this function dpca.m It's also on the JIT so the JIT has the MATLAB version of the dpca package that's also a Python version by the way but it may not be on the JIT it would be on GitHub though so you have to maybe download it from GitHub. And Sunder is the expert on the Python version. And the MATLAB version is more plug and play and the Python version is more hardcore. So the Python people will be more challenged basically. And then what you basically have what we'll ask you to do is to first just separate time from condition. So ignore the fact that there are frequencies and decisions just say there are 12 conditions and there's time and that's what you're going to try to separate just time from condition. And then you can basically try to get this and plot some plots and for those of you who managed to do this before 12 o'clock there'll also be the challenge of reorganizing the data matrix such that the conditions get separated into stimuli and decisions. Okay? And then the explanation of what all of this means I'll provide at a quarter to 12. Well, no, no, wait a second. I'm not going to provide the explanation for what all of this means. I'm going to provide what we think all of this means at a quarter to 12. Sorry if I promise too much. So many of you have sort of managed to go through the TPCA and generate some plots both in MATLAB and Python. So the MATLAB for the MATLABs I have a solution for the Python people you have to coordinate because some of you have plotted it so the solutions are there. So if you want the solution you have to kind of like ask your neighbors some of you had solutions. For the MATLAB plot there's a file called ROMO DPCA1.m Okay? That does this task that I gave you in terms of separating out the condition or the category in terms of stimulus and decision from basically the time or condition independent component. And this is kind of like a stripped down version of the code, basically. So you combine these parameters many of you did that in terms of stimulus stimulus plus time and then this is time category independent and then there is this problem that many of you bumped into that you know DPCA can overfit the data because underlying DPCA is a regression problem reduced rank regression and regression tends to overfit in these very high dimensional spaces and the two ways of regularizing this so one is by just turning on a regularization parameter lambda and I think I recommend it for most of you to do that another way of regularizing it is to say you know something about the noise in the data because you know something about the PSDH they were computed as means but you don't just get a mean if you compute a PSDH you also get a variance so you know how noisy your PSDHs actually are and you can use that noise which is called C noise here, it's a matrix as a way of regularizing DPCA as well and then if you run that you get this type of plot that I think many of you had I guess it's not super fast and then there's also code called RUMO DPCA2 which separates out the stimuli and the decisions so then you have the thing that is more or less what we did in the paper except it's only for one of the monkeys so in the paper we actually have the two monkeys so that's why it's going to look a bit different from the paper because you only see one of the two monkeys now if you use the two monkeys and you separate out stimulus and decision then this is basically what you get so that's now our representation of the data so here you have the task first stimulus frequency delay for 3 seconds then the second stimulus frequency and we separate out these condition independent components and these are actually 12 lines that just lie on top of each other and that means there's a lot of activity that is just related to the rhythm of the task and that actually dominates the overall activity in the prefrontal cortex and that here these are the different components this is the variance that each component explains and you see the first four components they're all condition independent so gray here is condition independent so they don't tell you anything about either the stimuli or the decisions then component number 5 is the first one that has some information about the task in this case about the stimulus and component 6 is the first one that tells you something about the decision and the interesting thing here is that you can actually read out information separately from the decision information so that even though everything was mixed on the level of single neurons once you look at the population you can separate these two different bits of information again then we applied this method to many datasets but actually first I want to say these condition independent components what do they correspond to and one possible explanation it's usually not super popular with my experimental colleagues but I guess as a theorist I kind of like it is to imagine that you have this manifold where the firing rates change and there's some directions along this manifold that you control experimentally because you present the stimulus you measure the decision of the monkey etc but there may be all kinds of other directions that you actually do not control as the experimenter say like the breathing whatever else the monkey experiences and anything in that sense that is task locked that will in some sense even if it's weak vary with the timing of the task will show up in these condition independent components then another thing that you can do is really just a visualization so you can think of these are the three stimulus components that we extract that's one way of looking at them but you can also think of them as trajectories so this is the same information now plotted as trajectories over time so there was the F1 axis now there's the delay period and then when the second stimulus frequency comes on it moves along this axis so the different ways of visualizing the data I think the main reason I want to show something like this is because yes you get these components but it's really just one way of visualizing the data and I would always guard against just fixing on one way and that's it it's important to look at the data in many different ways then we analyze some other data I'm going to point out this one because I think it has some interesting results about the interaction terms so here we analyze an olfactory discrimination task in a rat from the lab of Zekmanin in this case and what the animal basically has to do is it sniffs on an order part and it sniffs one of two orders A or B if it's order A it has to move to the left if it's order B it has to move to the right and then it gets a reward so that's the task which provides extra problems because every trial has a different timing so there are different ways in which you could handle DBCA in this case you could just say I cut the data around the different events of the task we did something more radical we kind of re-stretched each trial to align it properly turns out that it doesn't really matter which way you do it you more or less see the same thing so we re-stretched the task then there is the movement period and then the animal goes into the reward part there's a bit of an anticipation period and then you basically get the reward or you don't get the reward depending on whether you got it right or not and again just as in the data from an olfaromo you see that the main components here are actually not related to the parameters of the olfactory the smell or the smell they're just related to rhythm of the task then here you have basically information about the stimulus so the olfactory stimulus and here you have information about the decision the decision of course is the animal moving to the left or to the right so those are could just also just be movement in fact could just be the animal moving left or animal moving right that you basically see here and you have these higher order components now the interesting thing is that you also find this interaction component where basically in this interaction component you know you have certain stimuli and decision going up and other stimuli and decision going down but now this is a deterministic rewarded task so if you know the stimulus and you know the animal's decision you actually know whether the animal is going to get the reward or not and it so happens that actually what this interaction component then just extracts are the rewarded condition and interestingly you see that its timing happens exactly at the point when the animal is getting the reward so actually then in this case the interaction component is not truly an interaction between stimulus and decision it's just the reward that the animal is getting so there's a separate axis in the population activity space that actually now lets you read out whether you got a reward or not then one of the things that I find is sort of good about these dimensionality reduction methods in general that they also allow you to compare things across tasks, across animals etc so one thing for instance we did is we took another data set from the mainland lab where they changed this task in the sense that the animals did not have to discriminate two orders but they had to categorize a mixture of orders so there would not just be order A and B but there would also be mixtures of A and B and the animal just had to decide which of the two was stronger and this is what you find so here are the different order categories in this case order strength, so order right or the left has different mixture ratios and then left and right choice depending on left or right choice we have solid lines or dashed lines and these are the components you'll find and here we just show that we reproduced something that Cambridge and I had found about confidence in this task but what I just want to show you is that even this is a different task in different animals it is sort of surprising that the components don't really change that much so the only thing that changed is that the type of stimulus changed but I would say that there is a surprising overlap in the overall amount of components you have and so that is one of the things one could try to quantify it then but that is one of the advantages of these dimensionality reduction methods you get a quick view of like the whole data set and you can then easily compare across animals, across if you change the tasks etc now one of the questions I got earlier was ok so but what does this all mean that we have these linear readouts which apparently I'm not quite answering quite yet but one more point I wanted to point out because I think I said that already earlier you can look at the distribution of encoding weights so if you look at each neuron how much does it participate in each of these components and that is basically shown here for the four tasks and it's sort of curious that you get these very similar distributions in all cases and they're all sort of you know exponential tails on both sides so for the tasks where we've looked at we've never found any kind of categories of neurons we have like decision neurons or stimulus neurons anything like this we always find these distributions and the distributions have always the sort of Laplacian shape which is sort of a curiosity that you know at some point I guess we'll have to explain why does it look like this given that it reoccurs in many different tasks maybe the explanation is trivial right but still we need one ok so what does this all mean so here's a sort of simple schematic of what this could mean so imagine you have something like a feed-forward network you know there's some signal that goes in could be something like you know a visual stimulus or maybe in this case a somatosensory stimulus a second layer of neurons third layer of neurons et cetera could be a lot more layers these days people like lots of layers so this would be a standard feed-forward network now a feed-forward network can also be understood as a visualization of what's happening in a recurrent network the only thing you have to think about is that this is not layers but this is time so this would be a time step one and then all the activity of your neurons basically influences the activity of the whole network one time step later which then influences the activity of the whole network one time step later this is to illustrate that sometimes people say well a recurrent network is like a feed-forward network unrolled in time this is to show that with this little graph I want to both illustrate what happens in feed-forward networks but I also want to illustrate what can happen in a recurrent network now if you remember the picture we had for DPCA it's like you project the data down and move it back up that's what we did so let's take that literal for a moment and think that actually the brain is working this way then this would be the way it basically works what you'd have to think about is that these connections here are sort of low-rank connections that project things down and then move them back up the way you do F times D in DPCA and what that means is that you actually now go from if you think about it as a feed-forward network you project things down into this into these subspaces that don't have any kind of physical realisation realisation, that just subspace that we extracted from the data and then from these subspaces you project things back up and then you project things back down and you project things back up so this would be the way in which you would now have to understand what is going on in the network if we take these dimensionality reduction methods such as DPCA literal and these different components here they could be like the stimulus component the decision component etc and what it then fundamentally says is that there is another way of looking at what the network computes which is not to go from one layer of activity to another layer of activity but rather to jump from node to node from these compressed subspaces to the next compressed subspace because that is then the actual computation that the system is doing it's not the actual neurons that show you what the computation is it's what's going on in these subspaces that show you what the computation is one thing that we did so one thing that you could also think well if this is not a recurrent network but it's a feedforward network then shouldn't you be able to see that if you record from one area and another area let's say there's one area here and there's another area here that actually just happened to come out today so it's a bit of self-advertisement that's exactly what we did we call it the communication subspace that would be the subspace along which two brain areas communicate so say in this case it was from V1 so the Utah Array recording from V1, something like 100 neurons and then Tatord recordings in V2 and the monkey and we basically used reduced rank regression to predict the V2 activity from the V1 activity and that there's a bottleneck in the middle the bottleneck well if physiologically it doesn't have a direct physiologically representation because it's a linear readout of the neurons but you could imagine that like the next layer I mean that's what it does it has linear readouts because in the first approximation to what the neuron does you could say it passively integrates the spike trains that are arriving on a synthetic tree and then what this basically says in a specific way in which these readouts are organized such that they actually effectively is this low rank bottleneck okay how can we compare them that I didn't tell you, I didn't tell how to compare them the only thing I said is that if you have a trajectory in this subspace and then there's another one in this subspace and in that subspace, that's the computation that the system is doing that was my claim our interpretation of the data but I don't tell you how to track the computation it's up for future research I didn't tell you that I just claim that's the computation that's one way of interpreting why you find these low dimensional subspaces here's another way very quickly to sort of show why that could be an interesting type of scheme imagine that you had like some source area such as V1 and then it projects to two target areas target area A and target area B so one problem that you want to face is essentially you know separate information that flows to A from information that flows to B because you don't want to send everything to both areas so if you imagine for instance that neurons in area A and area B so the B neurons are shown in red here and the A neurons are shown in green that they basically read out different directions of the space of the source activity so this is the space of the source activity and each neuron reads out a different direction or cluster this whole space then what that would mean is if you vary the source activity so here's the source activity you vary it that both the downstream areas are going to see what's going on in the source area in this case because they read out in all types of directions the downstream areas both the green and the red downstream areas will see activity that was going on in the source area if you assume that you have these low dimensional subspaces you could imagine a scheme that works like this where for instance downstream area B only reads out from this subspace and downstream area A only reads out from this green subspace and if you have that so you have different subspaces now between these two areas then you can vary source activity in a way that only one of the downstream areas actually sees what's going on so for instance in this case there's activity orthogonal to this red plane and then only the green plane the green downstream area actually is going to see what's going on in the source area area B doesn't see anything vice versa if you only vary activity orthogonal to the green subspace then only the red subspace is going to see anything so it's a way of basically say routing information between different areas just as another example for why these low dimensional subspaces could be useful and then I promise I was going to finish 5 minutes early what time is it now 5 to 12 so this slide will be very fast so we did PCA there are obviously other methods there's factor analysis which Alexa talked about yesterday Gaussian process factor analysis there's the idea that you don't just have these latent variables and projections but you try to fit linear dynamic systems to them they're sometimes called latent dynamic systems they're also nonlinear methods that people already asked me about it nonlinear always sounds fancier always sounds like if you use a nonlinear method somehow you're doing something better but nonlinear methods are actually very dangerous for electrophysiological data because they do not deal very well with noisy data they're good if your data is sort of deterministic but the stuff we record in the brain is very rarely deterministic so if you apply a nonlinear method you really want to know what you're doing you really need to understand that method very well that's my 3 cents of advice and so that's it so as I said our best guess for population representations are linear readouts from the population that give us these subspaces then DPCA seeks a set of decoders that provide readouts for individual task parameters but without losing essential features of the data representative of what's going on in the data set it rests on a decomposition of the data into marginalised averages each of which is fitted to the actual data using reduced rank regression and then application of the method can highlight similarity across different tasks and across different cortical areas thanks