 because, again, this is for you. So I'm going to start by kind of just, again, like I did yesterday, is rewinding a bit to say, I gave you, this was kind of what I was going to try to give you in the three lectures. I started on Monday by reviewing the kind of areas in the brain that are involved in supporting visual object recognition and why we think that and how we think about how the neural activity relates to the behavior. That's what I spent most of day one talking about this place called IT Cortex and Linear Decodes. And I spent most of yesterday telling you how you can take certain deep neural networks and use those as approximate models of what's going on in the ventral stream. And I showed you some of the evidence of why we think that's true. And next down, I'm going to tell you about, well, OK, these are some approximate models of both the encoding and the decoding. And I want to talk today about, well, how might these models be incorrect as they currently stand and how might we think about improving them and what can we maybe do with them even as they stand right now? And this will hopefully relate to the question that Professor Zocalon asked at the end last time, which is, what do any of these models have to do with us understanding how the brain is processing visual information? So we may have time to talk about that. So like I did yesterday, I'm going to review in one slide here what I told you. Yesterday I reviewed what I told you on Monday. Today I'm going to review in one slide what I tried to tell you mostly yesterday. So the big picture take-home that I wanted to give you yesterday was illustrated with this neural network model that was called HMO, but again, you don't need to know about the name of the network. So this is a deep neural network. And it was nice that you had the lecture on perceptrons. Unfortunately, you had it after my lecture because I realized you guys hadn't even actually didn't really barely know what a neural network is. So I hope now that you've had a lecture on perceptrons, which was very nice, that you now have a better sense of what these things are. So inside these planes, so each of these feature maps, as you heard yesterday, this is a set of simulated neurons where each of them is a perceptron where you have a weighted sum of inputs that then gives rise to an output that is then propagated forward after, say, a non-linearity. And there's a whole plane of them of a certain type. And then there's four planes here. So there's four types here. And the plane illustrates that they're locally performing the same operation across visual space. So that's what is in here. And then there's all feedforward operations to here, feedforward to here. And so this is a feedforward network with these kind of stacked perceptrons. And it's stacked like this, so it's called deep because it's more than one layer. And what I tried to tell you yesterday is the structure of this, the fact that you have these many layers and roughly their sizing and their internal components are approximations of this architectural family of the ventral stream. So data from neuroscience drives those loose architectural constraints. And then I said, hey, we think this part of the brain does something about recognition. So let's optimize the parameters in here to do recognition. And in machine learning, again, as you heard in that lecture yesterday, we'll call this the so-called cost function. So the cost function here is separate a bunch of categories. And you can vary the number of categories. I think in this study, we did something like 30 or so categories, vary the number of categories and say, I need you to be able to say, these are boots and not people, and these are people and not boots. And you do this with a large number of images, not just the four shown here. And then you optimize the parameters in here. And as you also heard in the lecture yesterday about optimizing, there's many ways that you can optimize. And the most common in these kind of networks now is called stochastic gradient descent. But you're basically computing the error on this cost function is how can you get better on this with respect to the parameters in the network? And you update the parameters in the network, which are mostly the weights in the network to try to get better on that. And that's what I meant by here by this optimizer, which is highly debated as to how biological things like gradient descent are. And in my framing, I presented it to you as like, I'm not going to assume this is biological at all. It's just a way to optimize the parameters of a network that's a stack set of perceptrons to do a particular task. So we have neuroscience, kind of science, and some engineering just to get it to work. And then the main message I tried to give you yesterday is that after you do this, you get what I'll call a model. Each time you make an optimization run, you get one model. It's a hypothesis to me of how this part of this model might be approximating this part of the brain that you're intending it to model. And I call this evolution of the computer. You could call it development and computer. You could call it both if you want. It's probably neither as an exact model of either evolution or development, but that's why I put this in quotes evolution. But I want you to understand that each time you optimize, you get a model. So when you say there's a deep network for vision, it's a specific deep network. If you say there is some deep network that does vision, of course, that's true because this is a stacked set of neurons with feed forward and feedback connections. And so that's not a specific hypothesis. It's just a claim that the brain is a deep network, which we already clearly know. The question is, what is the deep network? Finding the correct artificial neural network. This is a specific artificial network. Once it's trained, it happens to be a feed forward one trained in a particular way with a particular architecture. And again, you can change these variables, especially architecture. You can change the cost function. You can even change the optimizer. And that will yield a new hypothesis, a new model of what might be going on. So I showed you the checking one model against the brain by doing these comparisons right here. So before I remind you of that, I want to make sure people are on the same page, especially after the lecture yesterday of what this thing is and how to think about these other components. Do you guys have questions about that? OK, everybody now sort of knows what a deep neural network is and how, at least, has a sense of how you might optimize it. And then I also told you how you take these simulated neurons and then map them to the brain. And the reason I'm mapping to the brain is not as an engineering exercise, but to just ask, is this actually a good hypothesis of this? So we need ways of checking the hypotheses against the data. And the way we did that, and I went through this quickly, so I'm going to say it again because it's also important to what I'll show you later today, is that you take a weighted combination of features in here, neurons in here, to try to predict a single neuron in here. And then you do that for each neuron that you sample. So if I record 100 neurons here, I'm going to build 100 weight vectors that are trying to be models of the mapping between these IT and each individual recorded IT. So each time I record a new neuron, I need to optimize a set of weights that suggest that give the best linear mapping from these features, which are nonlinear functions of the image onto the actual neuron. And then I test the performance of that as predictions on held out images. And what I showed you is when you do that, that you actually do pretty well at explaining and predicting IT responses and also from the middle layers pretty well of explaining and predicting V4 responses. And I showed you this with one network that we had built called HMO, but I showed you how more modern neural networks like AlexNet, VGG, some of these more modern networks from computer vision, their internal hidden features can do even slightly better than the model we had built by doing exact same processes there. And again, the main difference is a little bit of architectural change, essentially a little change in the exact cost function of the images that were used, and even actually some changes in the optimizer. So some changes in all of those, but they still yield a model which is a pretty good approximation of what's going on. Yes? Yes? So you say this IT, the IT neurons and the V4 neurons pretty well, the V4 neurons. What about if you switched them? Yes. And if I go back, I think you maybe weren't at the lecture. I wasn't at the lecture. Yeah, so I will show you, but since the interest of it's a great question, and I did show that, but since I'll back up here, this is the slide you want, which is that not everything predicts everything else. Here I'll show you. This is the summary slide right there. So this is the prediction of the top level of the model against the IT data, and the top level predicts better than the likes level and predicts better. And then here's the V4, and the top level is worse than the middle two layers. But I also like to remind people that these are both only half of the explainable variant. So there are big improvements over a bunch of models we had at the time, but still only halfway to perfection. This is noise corrected explained variant, so this is probably still a real lack of fit. But I think does this answer your question? And I wish your lecture had been in front of mine, because I think that's where I should have given that lecture. So I'm trying to connect those two. So let me just jump ahead just so I showed you other ways to connect. And now I'm going to go on to the more what's wrong with these things yet. I mean, at some level, I've already shown you some things wrong. And I kind of gave this class half full, half empty point of view, which is that, hey, we've got 50% explained variants, but you could also say we got 50% unexplained variants. And I was taking you through all the things. Well, I got to really just keep going here. All the behavioral data that we can't yet predict, but we do pretty well, but not perfect. I can quantify all this. Sorry, I went way back here. That's the half full, half empty. We got to find the better A&Ns. I showed you a brain score to keep track of where we are. And I'm going to return to that later. Okay, so things I wouldn't have time to talk about. Okay, so then this was, this is kind of a review of yesterday in 10 minutes there. So now today. So this is now up to like the last couple of years, at least some work in our lab. And I'll try to show you what we think, one of the things that's missing. So the first thing we did, one of the things we've done recently is say, and this was with actually a Murray team, Fei-Fei Li, Jatondra Malek, Jack Gallant, and others working on where are deep networks wrong relative to brains. And our group was doing something, and this is a co-heated car, a postdoc in the lab, and who carried out most of the work I'm going to show you. We said, let's start testing the current deep networks against the brain and figure out where differences are and see if that tells us something. So in that general spirit, what we were doing is saying, okay, we're going to have animals now doing 10-way categorization tasks, not only with these kind of synthetic images that I showed you yesterday, but also with kind of images that we take from photographic challenges for computer vision. This happens to be a database called MSCoco that also has these categories of objects like bears and elephants and faces. And they were sort of picked to be within these 10 categories that the animals had already learned in the lab. And so we can do exactly the same thing I showed you yesterday where either animals or humans can be tested on images, both types, here's an example of a naturalistic, okay? And so what do you guys think? I didn't cue you, I didn't tell you what you're going to see, bird or elephant, bird. Okay, but remember, there's at least 10 possible objects that any pair of things can come next. So I'm trying to neutralizing attentional conditions. We actually also inserted in the test a bunch of things that you thought might into it would give things, would be where humans somehow show advantages. We just stuck this in for coverage. Things like I showed you, synthetic is how we built things naturally. We blurred things, we made things small. We did some occlusion. We cut out parts of the object, which is incomplete. There was some deformations. And then Co had a couple of other things and I already mentioned we showed some natural versions. I'm not going to talk about these in detail just to say we tried to augment the set in ways that we hope would expose more differences between networks and brains. And then we did a lot of behavioral data collection and I showed you a little bit of that yesterday and I'm going to show you it in a different format here today. So this is an important plot for you to understand. What's on the x-axis is the performance of a computer vision system. This was AlexNet, when we started I'm going to show you a bunch of other deep networks. And so that's its computer vision performance in D prime units, so zero is 50% accuracy and infinity is 100% in these units. But something like 4, greater than 4 is like 98%. So really good if you're above 4. And then here's monkey performance. And the star means it would be very similar for humans, which we've also measured, but this is actually monkey performance data. And now each dot on this plot is an image. Okay, so, and the same units, D prime units. So everybody's treated in the same unit. So what do you guys see? What's, sorry, what? What, somebody, what do you see? Monkeys are primates in general, are on average better on these images. Okay, we see that because the points are above the diagonal. Anything else? Yeah, these points up here at 4, it's like that means it's sort of unmeasurable to us because the animals didn't make any errors at all, right, on these images. So we can't quantify the D prime. So it's really high, but we don't know how high. So let's ignore that for now, but yeah, that's sort of up here. So I wanted you to notice also though, here I'm gonna try to highlight it for you. There are, you can take out images where if I just cut them out, that images where both systems are poor, right? So it's not, again, there's natural variability across the primates. It's not like you're at 100% on every image. And then there's some images where you're also, both systems are good. So we consider these to be kind of, we call the kind of think of these as control images for what I'm gonna show you. And these were the interesting images to us. Like what is going on with these images where these computer vision systems were worse than the animals? And again, these are not constructed to be adversarial. You might have heard about adversarial images where you manipulate pixels and you take knowledge of the network to force the network to be bad. These were just taken out of random sets of images that we had screened both systems on. So the sort of naturally adversarial in the sense that humans find them relatively easy, even with short presentations of 100 milliseconds, yet computer vision systems at the time, these systems, at least this system was struggling with these kind of images. And so now you can see there's these two bags of images that we'll call control or call control or what I'll call CV solve, computer vision solve, and computer vision not solved or so-called challenge images, as I'll also call them. Yes. The average performance, the average over, I think it's at least two animals here in these data over many trials of the images. But we also can measure this in humans where each human only gets to see each image once. We have monkeys versus humans. It's not perfectly correlated, but it's almost within, we're not clear if there might still be some differences, but it doesn't look anything like this. Most of the plots look very strongly along the diagonal. It looks like the control diagonal. Yeah, I don't know if we made that plot. It gets into sort of noise issues of how much data from each animal. I think there is a good correlation, but I don't know what that number is. I'm sure there is a correlation and it's better than this, but I don't remember what the number is. And of course, those correlations depend strongly on how much data you get per image, because if you, but where you're asking is for the noise corrected correlation. As far as we can tell, the noise corrected correlation is near one, but there's uncertainty around that, right? So for both the monkeys versus the humans and the monkeys versus the monkeys. And there's a paper that I can point you to that is published like last year on this exact issue. And I think it has all of that in it. Okay, one thing is like, hey, I told you that was AlexNet. Here's a bunch of computer vision systems at the time. There's AlexNet at the top. And this is now plotting in Delta units, the Delta versus the primates. So numbers that are up here means that the system is doing worse than the primate on this plot. So the way this Delta is computed. And anything about this dashed line we were kind of taking, I think it's, I don't remember where I cope with this dashed line here, forget that. What I want you to see is like, the red are images that we took as challenge images out of AlexNet. And I want you to see how similar this is across AlexNet, this NYU system and VGG at the time. So these were sort of three, sort of medium deep neural networks, networks at the time for doing computer vision. And I just want you to see, it's not like AlexNet had a particular set of errors and then VGG would have a totally different set of errors. The errors are actually pretty strongly correlated across the networks. But mostly I'm going to show you AlexNet question. These are networks that were pre-trained just like the humans and the monkeys. So whatever the designer of the network decided to do, they did, we took the network and we tested it. I can show you in the paper I just referred to, we actually did fine tuning with our particular synthetic images. It doesn't change these results very much. But I think this is an important tension in the field. People who are saying deep networks can solve the problem and like as an experimentalist, I don't get to retrain the human. So if you're going to hand me a model of the system, you can't tell me I need to train it up in some imagined way that I haven't yet done. Otherwise you're not handing me a hypothesis, you're handing me a family of possible hypotheses. And I don't mean you in particular, but I sort of get that kind of comment from deep neural networks people a lot. Some network can solve it, because then I totally agree with them, but I'm like, please help me find that one. All I can tell you is these ones are broken as handed to me by these measures. Here's examples of the images that we saw, that we kind of found in those two groups. And I'm showing you these not so something's going to pop off the page at you. In fact, it's for the opposite. I mean, you can stare at them. These are some examples. You don't suddenly see, oh, everything's small or everything's cluttered or everything. And in fact, I'll show you regressions on some of that later. When you regress out these variables, like position, the scale, or occlusion, you don't explain a lot by single variable regression on that. So it's hard to look at these as a human and infer what's making them hard for the computer vision system just by staring at images. These are some examples. And so again, this is what I'm saying here. The understanding embedded in these models of what's making them hard is gonna allow us to do a better design experiment. So rather than going into the experiment saying, I think clutter is important. Let's show clutter to monkeys and spend several monkeys' lives and who knows how many dollars to measure clutter. We're gonna let the models tell us what are they struggling with as an output of the models rather than a single variable that human intuition might say, why don't you do clutter? Which is the typical way experiments would be done. So the models are gonna tell us, test these images because we find them hard. We don't know why. Compare them to the brain and see if you can help us. And so here's how we're gonna do that. So that's that plot I just showed you. These are those images that I showed earlier, but I'm now gonna isolate the top part here. Why am I doing that? Cause these are really kind of special in a way. Cause these are ones where primates are very good. So all primates and humans find these easy. And now you can see though, there's a nice range of performance here. We can actually do this at different levels of human performance and we have, but I'm just gonna for simplicity show you this high level of performance. Humans find these easy and monkeys. They both find them easy, but you can see computer vision systems. Some of these are easy. People call those controls. So both systems find them easy. And here's the special images, the challenge images, computer vision systems. Like I don't know why, but I'm struggling on this image here. And you primate systems somehow deal with it just fine. So those are the two groups of images. Of course we can test things in between, but we'll focus on these two extreme groups that are gonna call control and challenge or CV solved and unsolved. So what do we do as experimentalists? Well I showed you in the first day, we said we think we know where the brain solves these things in the ventral visual stream. I told you that solutions to objects identity and category tended to be easily linearly decodable out of IT and that it could predict the behavior of the animal quite well. That was all of day one. So our first question is, well what happens to these unsolved images inside just measuring IT? Because there's a straight up prediction there that IT should somehow solve these if you believe everything I told you on day one. So Co went and did our standard thing of put triple arrays in. He actually got more arrays in than we typically do to get more recordings within each animal. Record a bunch of spikes in response to each image just like I showed you on day one and lots of reps of each image and the animal is either doing a task or not doing the task and turns out that doesn't matter much and I'll show you that in a minute. But you get these population vectors out. Remember you can do things like count spikes over various time windows. Here we're now gonna take time windows that are gonna be big or small and I'm mostly gonna show you small time windows and what's coming next. So we're gonna slide the window across time and you then train these classifiers to try to do how well can you tell which object it is. So you train 10-way classifiers on this neural data on some of the images and you test performance on held out images and I'm gonna show you the test performance out of IT for both the challenge and the control images. Okay, do we have predictions? What do you guys think? Any guesses? I'm trying to sort of find out what, you know, look I just get the setup, right? So this is like we're using the models as a hypothesis and we're trying to ask the brain, what are you maybe doing different than these models at the level of the readout of the brain, neuraly where we can look with high-respasion temporal resolution. Anybody have any thoughts? If you, is IT gonna spit out the answer to these images? What do you think? Some people say yes. If some people say yes. So based on what I told you in the first day, that would be a prediction. I'd hope it'd be yes, but it's a question, right? We don't know, right? Because these are especially challenging images, right? We do know the primate solves it. So somewhere in the brain there's an answer. We just don't really know if it's an IT. Maybe the images we showed before were somehow the easy images and that was easy for us to find an IT and maybe the hard images aren't gonna be there. So that's one question. There's another question I'll let you guys will come out of the data. So the question that was kind of motivating us all along, which I haven't said, because I didn't wanna kind of give it away completely, but here's one of these kind of relatively easy for both systems images. It's a face again on a uncorrelated naturalistic background. It's one of our synthetic images. There's the sliding decode out of IT. Here's the monkey level accuracy. Now, of course the total accuracy depends on the number of sites that you're recording. I forget what the number here is on the order of hundreds of sites, but whatever image I show, we have the exact same number of sites. So I wouldn't worry so much about the absolute level, but we're gonna wanna compare across the images. But you see that this actually gets up to monkey level performance, which is again around in this band of three to four, D prime units at about 110 milliseconds or so. And that's a time we call it there. And you remember I told you at the beginning, IT neurons have latencies of around 100 milliseconds. And this is the neural data that you're seeing here. Now I'm gonna color the bar scale. So you see the responses of a whole bunch of neurons coming up across time. Okay, so that's one image. Yeah. No, no, so this is a sliding time window. And it's trained on a different image set than this one to say what the category is. And this is its decoder accuracy. On that time window for this image. Well, it may be hard to see on this call, but there's a lot of activity right here. And it's still high out here, you're saying? It's not showing up so well, but it actually still is pretty high. So a lot of this activity is maybe not helpful to the decode is probably what you're getting with your eye. But you're right, it's like visually, this looks like there's more activity here than out here, but the decode is still pretty good out here. This window, I'm sorry, I think this is a 10-mil second window. It might be 20, but I'm pretty sure on this plot is 10. Sorry, I'll check the, this is still a work in progress here. Yes, so if you go longer, everything gets better. But the point of this study was to actually try to look tighter in time. So I'm showing you the time windows. In our earlier studies, we would do a 100-mil second time window between like 70 and 170 and just come up with one number. I don't think I have the plot for that, all of the results I'm gonna show you. I mean, the result I'm gonna show you, okay, let me get through the result and then we'll come back to that, see if there's a question of that type. But to your answer, Peter, normally we would take a big chunky time window, 70, 170, get one number, do a decode on that. And we said that on average that predict animals' behavioral average performance per category pretty well. But so now we're looking at image-grain performance at high temporal resolution. So we're making two changes at once. So here's another image of zebra. It's sort of a relatively easy one for both systems. There you go again. Okay, seem to look similar. It's a different code up here, of course. Different sort of pattern coming out, but the decode is fine. Okay, here comes one of these challenging images. This one happens to be a little eccentric. They're not always eccentric. So again, I'm showing you examples so I don't want you to read too much into it, but here's what happens. Okay, okay, here's another image. It's kind of a dog, but it's a foreshortened view of a dog. It's gonna be even harder for you to see, but you actually get this right. Okay, so I showed you two more examples. So what do you guys see now? Those are some data traces. I don't know, what are we? Okay, so it's later, right? So you notice that in the two examples I showed you. But also, what about my first question? What about the question of the prediction, right? It's not down here at like zero or one, it actually goes up, right? So the solution occurs in IT at a later time point. So we didn't know that the IT would actually come out with these solutions, but it does, and this is just two examples, and it's generally true. We can't say it's true for every single image, but it's 90-some percent images we think the codes exist within IT at human level performance like the ones I'm showing you here. And as you pointed out, they tend to be later, and these two examples are clean examples. I'm gonna show you the whole distribution in a minute. So this is what I just said. It is linear decodable, even though these are hard for computer vision systems, but it tends a little bit later. Okay, here's actually, I think this is 15 or I don't know how many images, and they're times of decodes. So you might have noticed I was putting that T theta on there. That was our estimate of a time to reach a particular performance level, which was three D prime units on the last slide. So this is doing that for all these images and plotting the time of the T theta time for all these images. And the color indicates whether they're from the challenge category or the control category. And we actually measure this for thousands of images. And these are actually the data I'm most proud of. It's not what I'm gonna tell you about, but these are actually where I think the real power of these experiments lies using these as constraints on models. But I'm gonna give you the high-level view of the data here, which is easier for humans to digest, which is already kind of apparent on the slide, which you've already said, which is you notice the red dots are generally later than the blue dots, right? But not perfectly, there's overlap in these, and I'll show you the full distribution. But you see that. There's the mean delay, 30 milliseconds. Okay, that's the mean delay among the image types to get this one. And like the examples I showed you, right? So again, but you see some are very long, some are shorter. So it's a mean, 30 milliseconds. So remember, this is all happening in a, we're only showing the image for 100 milliseconds, generally human dwell times. As I said, we're like 200 milliseconds at each location of the image. Usually when people think about recurrent processing, they're thinking at much longer time scales involving over-eye movements or attention. This is kind of probably very subconscious to each of us what's going on in here, but your system is actually requiring a little more time to execute on these images. Yes, Peter. How does this mean delay depend on the number? Good question. I think to our first approximation, it doesn't. When the number gets really low, you have to lower the performance threshold even get, say, you've decoded anything, right? So what you're asking is like, if I was decoding out of 10 neurons, I'd be well below the behavioral level. And you're asking, would I still see a 30 millisecond difference? And I think on average we would, but I don't think we've done that simulation yet of many samples of smaller sub-samples of our data. Shuffle, correct? Sorry. Different images from the animal. And these images are actually collected over many days. So this is chronic recording that we pool over many days. Oh, you mean the correlations across trials of the same because we're recording simultaneously. That's a good question. When previously, when we looked at that, it hasn't affected our decoder level performances very much, but we haven't tried it in these finer data, at least not to my satisfaction yet, but that's something we should try. So it's a good suggestion. It'd be good, maybe we can talk if we have time at the end about like what predictions we might have around that. But I would tend to, everything I've looked at in these data, it's like more samples, less samples. You're still gonna see this mean delay between these two types of images. And I think that's the thing I want you guys to remember that we see so far. Yes, okay, lots of questions. Go. Oh, one of you two, yeah, front, yeah. Specific features. You're asking, what's different in the challenge images than the control images? And I kind of show you a slide at the beginning, I said there are nothing that will jump off the page at you as a human when you stare at them. We've stared at them for a while. And if you do regressions where you try to regress things like the position of an object or the pose angle or things like that, that come out of those regressions. And so the answer is nothing obvious. And the best predictor of what's challenging control is the thing I used to define them, which is the models find them hard and the humans find them easy. You're asking for me to kind of give you an image feature out of that and it's very hard to do that. And you may have an impression from a few images. I don't know if I've tried animate regression. So maybe we should make up, maybe this would be a fun exercise for the group. Everybody could send me the thing they want me to regress on the images and we'll see if any of them yield anything because it actually would be useful to the general research too to say, here are all the things that humans and to it might be going on in these images. Let's try them as regressors. Now, not all of them are labeled on our data but we could get them all labeled to see if any of those, how much power each of them shows. I'm gonna show you regression side in a little bit so we can talk about that. We didn't try, I don't think we tried animate, non-animate. So then maybe that's one that will make a difference but I don't think we tried that one yet. One question back there. So you want me to kind of take the differences and then try to, but what basis will I encode the data beyond the pixels, which is how they're already encoded? That's the same question again. How do I, how do you want me to do that? But I, okay, maybe you guys could come with some. Again, we had a bunch of characteristics. We regressed them. They don't cluster on single things. They're an interaction of many things as far as we can tell. But those are the suggestions. If you guys have some ideas of what they might cluster on, we can try. But there's an infinite number of things to try. And if you give them to me, we can try them all and see if you ever get happy. But yeah, but I kind of already know how, I guess I already know how to do that, right? Yeah, so then maybe then we'd have a network that would generate more such images. But I don't know if it would satisfy the animate question, but I like it as a efficiency argument that we could use it to make more of these things. And then it's sort of like the main thing it does is takes the human out of the loop because right now we can push them through mechanical Turk and say, models, human's different, go. Models, human's the same, not go. And your version would be, oh, now we replaced that delta with another machine learning system and then it embeds the knowledge of what's different, but how to extract that knowledge still is unclear to me, but maybe there's ideas that we could use. Yeah, maybe. I don't know. Yeah, maybe these are good suggestions that we could try. I'll show you what we see out of the brain now and then maybe we can come back to this. So you saw what we saw. Okay, so I told you there was this mean delay of the images and I'm just keep saying mean delay. I also told you this, this was the monkeys actually doing the task when we collected the data. The numbers are all now lower here because I'm gonna lower the number of sites, as I mentioned, but you can kind of still see this sort of average delay here. And the trained monkeys, this is the same monkeys, but we tell them just fixate while we show you these images. You're gonna get rewarded for fixating. And you still see this. And one of the reviewer critiques of this, yeah, but the monkey might still be doing the task. It could have depended on the training. So then we did a subset of our images and went back to some of our older animals that we had actually collected some of these images on older animals that we're never trained to do anything. We're just passively viewing. And we still see these lags over here. So from these kind of data, we take this to mean that most of what we're seeing here in this lag is something that's automatically evoked this extra processing or lagged processing that's automatically evoked even in untrained animals. It's not something we had to train in or that the animal even needs to be engaged in the task to engage, just like your retina works, even if you're not trying to make it work as long as your eyes are open. So the circuits are kind of automatically doing things that you don't have to consciously put effort into to make it happen. So what are the, oh, yes. Choice probabilities, yes. These are great questions about how the neural activity relates to the behavior. So when the neural activity is high, as animals, choice is low, these are all ways to make inferences on the type of what we'd call the decoder that intervenes between IT and the behavior. That's something we're working on now. We have done some of that. There is choice probability in these data. It's not what I'm focusing on here. I'm not really ready to talk about it today, but it's sort of a different question. Here, we're just like, hey, these encoding models, something's broken with them. Let's try to infer what's broken with them. And I'm trying to go down that road, right? Sorry, could you say that one more time? If you separate the correct trials, IT responds to correct trials, incorrect trials, and then similarity of IT cortex to IT networks. Okay, so you're wondering if the incorrect trials look more like deep networks than the correct trials, something like that. That's a cool idea. I don't think we've done that yet. I'd have to ask Ko if he's tried that. Oh, you're just, you're asking about, like, are the neural latencies different? Yeah, and we have done a lot of that, and the answer is no. If anything, the neural latencies are a little faster in IT for the challenge images. And actually, let me back up. I should have piloted that on the slide here. I meant to do that, and I didn't. This is to your question. So see this late, these late decodes? Look at the response activity at the top. So you see it's still out here, but the decode's sort of somewhere in here. And if you actually then quantify all that, the latencies are about the same. Actually, and if you look in area V4, the input to IT, they're very similarly matched for across these two images. So, and then we've done a bunch of analysis of let's find the neurons, individual neurons that like each type of object and see if those are lagged in their latencies. And then if you just look at response latencies and by all these different measures, if anything, there's a slight faster thing for the challenge images than the control. So this isn't simply explained by some long lag that didn't get through the retina. Kind of, that was sort of my original words. Like, hey, everything's slow coming through the retina. Of course, it's gonna look lag, but things are very well matched even up to V4 in terms of the responses. There's something about the activity. The initial activity is not quite good enough and then some churning happens that allows it to become better. But it seems good enough for some of the images but not good enough for all of the images. That's exactly how we think about it. But there's alternative views which would be slow feed forward versions which is where I thought your question was going. So imagine, again, it didn't make through the retina as fast. That's kind of a boring answer. But the fact that IT is responding is one piece of evidence against that. Does that make sense? I didn't want to say this because it is the thing that we think's going on is there's probably some need for recurrence here that's automatically evoked. And that was what we were trying to investigate. But we're trying to rule out the other possibilities which would be bad choices of images slow through the visual system. You mean post your IT and enter your IT? Yes, I don't have those plots for you. I think they generally show the same. Again, I would have to ask Ko who's writing this up right now. I can't remember if there's any difference. I think they both show lags. I don't know if the lag is slightly higher in the anterior which is maybe what you're asking. So I don't want to say because I don't remember. But for now you should just take this as all of IT because that's all I remember that all of IT looks like this but there might be subtle differences between PIT. But I know it's not just PIT looks like matched and then AIT looks lag. So you're trying to start to get at the questions that we are on to too which is like, where is this recurrence coming from? Is it within IT? Is it along the ventral stream? Is it through some other areas in the brain? And I have some data on that I don't have ready to show you but we think there's probably a prefrontal circuit involved here. This is from Newsome All-Silencing work that Ko has done recently. But my wager would be it's kind of not just one recurrence circuit. It's probably circuits within IT, maybe even some within the ventral stream and some elsewhere. That's my neutral agnostic guess rather than trying to say it's one simple thing. It's probably a mixture of things. But that's where you're going with those questions I think. Yeah. I'm not sure if this question will make sense but if you look at the images, is there a way to distinguish between, let's say, making a wrong guess and not knowing the answer? Is there a way to, let's say, if for example my accuracy of my guess is too low, I would just say the image is too complex, I don't know. Or for example, in this image, I would say no, there's not enough. Right. Yeah, it sort of gets down to I think that's a great suggestion of how to do the behavior in a different way. And Shadlin's lab has done this in a different way. You can have a confidence judgment. You can have a bailout choice instead of to express your confidence. Rather than two options, there's two options where I don't know. And that's an interesting set of behavioral data that we don't have that we should think about getting here because it's how we're starting to also think about better decoders on these things which are now sliding evidence accumulation decoders, much like what's been done on the Dorsal Stream just at much faster time scales. And that's actually what Coe's fitting now and it relates to the choice probability question. But then the issue of confidence comes up when you start building those kind of decoders, which that's where your question is a way to sort of estimate confidence psychophysics. We don't have those kind of data. That's a good suggestion of a behavioral experiment that we couldn't do. We haven't done it yet. Yes, we can get the network's various estimates of confidence less confident for the challenge images. I mean, they're worse at them. So, well, you're asking about, like, let's say the distance off the margin on the classifier, right, and we'll take that as the confidence. I'm already showing you they're worse. That's how the images have been described. So if you want to also call that poor confidence, these are not various, what, correlated. Oh, I see. So you're asking if they make the wrong choices in a really confident way. I don't know. I always assume they're collapsing towards the margin and just, you know, but maybe you're asking, maybe they should do something like opposite of that, I think, right? So, yeah, I mean, those are good suggestions, too. We haven't done, I don't remember what we've done there. So again, I'm sorry, guys. This is like at the edge of the lab, as I said. So you guys are helping me more than I'm helping you in this lecture, I think. So I'm sorry? Yes, we do have reaction time data. And you do see the trends in reaction time that are consistent. I didn't think you would see it because the animal has choices and then it gets to move its eyes around before it decides. But over many trials, you can pull out an average reaction time lag increase on these challenge images for the animal, which is kind of what you'd expect from these data. And the reaction time lag is about the order of magnitude that you see here. So it's sort of all consistent, right? Yes, let me try to get on because I don't want to spend all our time on just this one. I have this other topic I want to talk to you guys about, which is, but let me try to finish this one. So here's this image properties question. So if you're asking here, I'll do, which image properties require the need for recurrence? So if you take this late IT single as evidence of recurrence, which is what I was trying to argue with those lagged slides that came up in the questions, which image properties predicts that IT response will be lagged. This is a version of the question that was being asked earlier. And I've sort of already given you the answer. Here's what happens when we regress a bunch of things. Occlusion has a little bit of weight. If you regress a bunch of all these things together, then you get a little bit of effect. The best predictor of whether IT will show recurrence is the deviation of the current feedforward networks against the primate behavior, which is this thing over here, right? So to me, this is an example of how, instead of using the words that humans would like to use about the images, the models already embed the knowledge of what's going on here, and we use them as the predictor of what will engage the recurrence circuits in IT. May not be as satisfying as humans, but it's a better predictor. If you wanna talk about occlusion, there's a little bit of juice in there for you. But if you only focus on occlusion, you're missing a bigger picture of what's going on here. Okay, a little bit more evidence about what is missing from current A and Ns. They said recurrence is important, but let me give you another sort of insight into that that comes from comparing different models. This is the prediction of the model of a deep neural network predicting IT response, just like I showed you in the intro slide today that I talked about yesterday, how well the model can predict the IT response, but now we're breaking it down as a function of time, which is this solution time here, but this is time from image onset. And so what I want you to see is like the current feedforward networks are pretty good at predicting the front edge of the response, and then they fall off rapidly. This plot has been normalized by any drop in responsivity that Davide mentioned. This is sort of noise corrected explained variance. So there is variance to explain here, and it's already been corrected for that. So this drop off is not just because the responses are fading off. This is actually a drop off and predictive power of the models. And what we're doing here is asking the model to build a weight function, as I said earlier, at the beginning of the talk, break function here, different one here, it's doing its best job at each time bin to try to take the features and fit this response, and it just does better at the front than it does at the back. And if you kind of take that, so that's what I'm saying here, that is consistent with the idea that what these networks are doing is approximating the feedforward edge and doing a poor job at approximating the later part of the IT response, which you're viewing as probably the product of some recurrence. Yes, viewer. You're on to my next slide. Okay, because this is an eight layer network. That's exactly this. Again, I mentioned we have thousands of constraints to guide these, but let's go to the deeper network. So here's this plot again, slightly different format. If you look at this window, here's the explained variance. See, it's pretty low. Here's these eight layer-ish networks. They're all slightly different, but let's call them around 10. As Peter just said, the newer networks are, these are networks that are 40 to 100 layers or more. And here's some of their names down here. And look, they're better. This is the late part of the response. So they're better at predicting that. And look, they're still only 20%. They're not really great, but they're better. So what's going on here? We got this big, deep network predicting the late part of the IT response. This is an important idea, I think, to keep in mind. We think about computation and algorithm versus implementation. Okay, so we're saying the brain has recurrent processing. None of these networks have recurrent processing. So what are they doing? Right. So if you think about modeling a recurrent system, you can unroll the recurrent system into a big, deep feedforward system. Now, if you want to, you'd have to have skip connections and weight sharing to do that. These models are not restricted in that way. They're just deeper. They do have things like skip connections often, but then they might, but they're not forced to have weight sharing in these examples here. But so what's happening is they're being optimized to do this kind of task. They're getting deeper. And what it appears to be doing is the engineering is converging on a solution that the brain has probably done in a recurrent form that they're doing in a feedforward form just because that's how the tools that they can implement or chose to implement, for whatever the reasons are. So these are easier to implement. These are the tools we have. Whatever the evolutionary forces are on the computer vision community or driving the implement, something in the brain that's implemented were currently probably in something that is unrolled. Not perfectly implementing the brain, but better than they were before, which is what this plot is suggesting. So there's a difference between algorithm here and then implementation, where it's sort of time contained within IT, whereas here, these are just deeper networks, which would like they'd have IT at different time points within the network is the way to think about that in the unrolled system. Does that idea make sense? This is some evidence for that. I'm gonna show you a model in a minute that actually kind of is a recurrent model that can kind of capture some of that. So this is what we just said. Suggested deep brands are partially approximated. Functions carried out by a more compact brain network. Here is the model. This is a model in our lab, we call basenet with recurrents. So we got tired of the computer vision communities making things deeper and deeper. Like this is starting to look way, not like a brain to us. Let's try to make these things shallower again. These are things that are more like that HMO style. But we started to use some of the recurrent circuitry. This is not area-to-area recurrence, but just recurrence of state propagation within each area here. And when I say we, this is Eunice Kabilius, the postdoc and Martin Shrimp, the grad student who did this work. And it's sort of a family of models that we call coronet. There's different variants of it. If you're interested, you could look at this paper. I'm not gonna take you through the details. I'd rather to say this is a kind of first version of a recurrent model for us. And this recurrent model, we liked it because it actually did really well on our brain score as well as some of the best deep networks in terms of matching the brain on a bunch of the measures I showed you yesterday. And it also, and by our measures, it was much more simple in that it sort of feed-forward simplicity was much, it's counted as the number of feed-forward paths to get through the network was much smaller than a lot of these networks. So we liked it in that sense. It's sort of comparable to AlexNet, but actually much higher on brain score. So it's sort of, we're trying to find networks that are more like the brain and not just higher performing. Yes. There are less parameters than those other networks, but I don't have the numbers for you. It's a lot less parameters. But I don't, I need, you should read the paper. So because this is the really, Jonas's work that I'm just presenting for him. So here's an example of a model. Let me just show you the, remember I said yesterday that performance was a way that we found these networks before, that better performance led to better brain scores. And then I showed you at the end, but more recently, performance is not necessarily leading to better brain scores. And so that leads to the idea that you need to start incorporating other things into your cost function to evolve the model to be more like a brain. And the brain has to not only perform, but it has to fit this thing inside a small space. That's where recurrence might help you too. It needs to worry about things like wiring, size, and power, energetics, and broadly. And those things are not necessarily important to necessarily winning ImageNet, right? So this is sort of a kind of first example for us of that. It's like we need to start changing our metrics of not just performance to drive the models to be like the brain. Now how should we put in those costs? I don't know. This is like one example of our first attempt to put in something sort of like a wiring cost. But I'm sort of just trying to inspire you with that general idea. Like we can't just go performance driven if we want to make it brain-like. We need performance in the broader sense of energy and size and so forth. And so here's that network cornet. And back to this slide, this is the first pass out of cornet, out of its IT layer. It looks pretty good, but you see as more passes, it gets better, right? So it's sort of kind of bridging between these models in effect, right? And this model is not like a breakthrough in computer vision. It's more of a model that's sort of better matched to what's going on in the brain. And its image net performance is actually somewhat lower, not super lower, but lower than some of these models. So the computer vision community is like, yeah, whatever you guys got cornet. I mean, I don't know what they'd say, but for us brain scientists, this is moving in the right direction of what's going on in the brain. How do I do in the challenge images? How do I do in the challenge images? I don't remember the exact numbers. It did better, but I don't remember how much better. I'm sorry, Peter. Again, in general, it's doing worse than the B4 networks on all of them. So it's probably may even be slightly worse. The real question is, is it differential? It's better on IT, yes. But again, you can see, we're down at low explained variance numbers here. So these may not translate and they don't, you can't take these IT members and go, oh, if I'm maximal this, I get performance. That's actually one of the tensions going on right now with Dan Yamans is building other recurrent models and they can fit IT better, but they're not necessarily higher performing. So we need to be a little careful about just chasing IT and assume we'll get performance. But I have your question, I should have a better answer to your question. And again, this is like stuff out like last month is archive papers. I don't know the answer. That's another good question. Maybe somebody should be writing these down for me. Again, you guys are helping me. I'll try to remember those. Okay, Davide, yeah. Eventually, if we have enough samples out of IT. Yeah, we should. We should eventually. Eventually we should. It says I don't, we're not there yet. Yeah, so we have a model that can kind of look like it's going in the right direction, but it's still not gotten, you know, performance and IT and everything perfect, right? So that's how I would characterize that right now. And so we have work to do still. What else is missing from these networks? I mean, I just talked about recurrence as something that sort of people have obviously said is missing, but I tried to do it in a way that was motivated from, you know, a neuroanatomist will look at the deep networks and say obviously recurrence is missing, that's what's wrong. But the same neuroanatomist would say, well synapses are missing, so that's what's wrong. And the physiologists would say spikes are missing, so that's what's wrong. So it's easy to kind of just say these are things that are not like the brain. So we try to do this in more of a kind of performance driven algorithmic sense of what's missing to get to this point, which may, you may have already believed recurrence is important. So in that sense, I'm not giving you any knowledge. Your prior was already recurrence is important. They obviously need to put that into the networks. And you don't need me to tell you that with some experimental data. And that's why I said earlier in the talk, the important part is like, if we put up on brain score the exact time that your network should spit out an answer to every one of those images, that's the constraint that will really help the modelers. He's not just saying, oh, I need recurrence because I may already like that. But if I say these are the versions of when the brain spits out those answers, I hope that will be a more of a constraining in piece of information or large amount of information than just the one bit of information of, hey, why don't you think about recurrence, which is a pretty low bar from an experiment. Yeah. Yeah, so the point that's what, as soon as that paper, I told Cove, as soon as that paper is accepted, those go up on brain score for available testing. And we're trying to figure out exactly how to do that, how to sort of score the temporal capabilities of various networks. All networks can be scored. It's just the ones that have no time will kind of, they're just making a prediction across time. But that's exactly the point of that brain score I said earlier is like, there's some scores, we're going to just keep adding our data as more scores on things. And we'll call it, we should really call it ventral stream monkey brain score. And we hope that Davide's group will make a ventral stream rodent brain score. And that's kind of how we'd like our community to start doing more of those things. It's a way to expose our experimental data to modelers without, in a way that isn't just, here's our data, which is not useful, or you need recurrence, which to me almost is also equally unuseful. It's trying to kind of find that middle ground of usefulness for the modelers. And that is exactly what we're going to try to do. But we haven't done it yet. But it should be in the next month or two, I hope. Depends on some reviewers at this point. Okay, what can, I want to go on to, I was going to leave this up, because maybe we could come back to like, what else is missing? But maybe we'll leave that to the end for discussion. Recurrence is not the only thing that's probably broken with these models, right? It's just the one thing that we had been working on that I wanted to tell you about. We could talk about guesses of what else might be missing. But let's sort of do that at the end. I want to now go to this question that Davide raised at the end of yesterday, which is, what do these models do for us in terms of understanding what's going on in the brain? And more concretely, I'm going to ask, what can we do with them that we couldn't do before? So I'm saying, look, the model's half full. They're already really good relative to what we had before. So if we're building models, aren't they for human intuition? Are we supposed to get some understanding? And we had a whole dinner discussion about that last night, many of us in this room. But so I would just like to say, instead of talking about human understanding, let's talk about what we can actually do with them in an engineering sense. So I'm going to give you one example of that that we've been doing in the lab that we call neural control. And we, again, as a co-heated car as a postdoc, I just mentioned the experimental work in Puyabashivana who's been doing the computational side of this work, both postdocs in the lab. And the setup of this use of models goes like this. So what I told you is you have some deep network model. Remember, there's a set of them that are all equally good. They're slightly different from each other, but from our point of view, they're equally good in lots of ways as we measure in brain score. But let's take any one of them. I forget, I think we're using a VGG for this. I can't remember actually. It almost doesn't matter, but there's a network that's up here. And we take a layer of that network that we had previously determined was a good model of V4 as I showed at the beginning of the talk. And so you have this encoding model which you trained up with a bunch of training data like ImageNet. You lock down all those parameters so it's a single model, it's a fixed encoding model. You find it's layer that you thought was like V4. You go make your recordings in V4. You build this map, which is this green thing that weighted map from the features within the model to the V4 neuron that you record. If you have 100 V4 neurons, you make 100 maps. So you now have this sort of approximate predictive model of every V4 neuron that you have at the end of your electrodes, your kind of array electrodes. Now you use this model to do things that you can do with models that you can't do with brains. And the thing that Puyah and Co did was to say let's try to generate images. People often do this for visualization. We wanted to do this for what we call control. Let's generate an image that will cause the neurons in this population to behave in a particular way. If you believe these models, you should be able to produce control images. And I say control because we're gonna show luminous power patterns on the eyes of a retina and we are experimentally controlling those luminous power patterns. Those are called images presented to monkeys on monitors. And just like I could shine light into a monkey's brain with optogenetics to do control, I'm gonna shine light on the monkey's eyes and I'm gonna let the model tell me how to do it in a particular way to see if I can control the neuron. So I find the neuron mapping, I synthesize a controller image that tries to push the neural activity into a desired population state. And then what we experiment here is how good is this control? So it's sort of a control test of the model. How well can we move the neurons around in this direction? So there's two types of control that we've tested so far. The first one is the one that neurophysiologists instantly think of when you talk about an experiment like this, which we call stretch control. Which is, oh, of course physiologists for decades have been recording neurons and they viewed their job from Hubel and Wiesel as to say, what does this neuron like? If I only knew what this neuron liked, I would understand the brain. That actually is kind of, I think, a misguided direction for neuroscience, but it's what neurophysiologists would like to do. It's like, what is the image that this neuron likes? And then I'll understand what's going on. So we're not gonna try to tell you what image the neuron likes. I'll show you some images, but our job was to say, if we really know what this neuron is doing in an algorithmic sense, we should be able to drive its response as high as we want or as low as we want, but we just did as high as we want or to some midpoint, we should be able to move it around at our will. And so here we were gonna try to say, could we push it beyond the maximum activity that we'd seen so far? So our job was to just try to drive the firing rate up. So let me show you how we've done so far on this. So this is now in V4, not IT. Here is the predicted firing rate of a neuron, and this is the measured neural firing rate. This is a V4 neuron. This is predicted by that mapped model that I showed you earlier, and this is on held out images. And you see how good this is. So this correlation is pretty good. And that's basically a summary of what I said on yesterday's talk, that hey, these models give us pretty good predictive models of what's going on inside the brain, and this is evidence of that. And now we can actually say, what's the best image here? So this is the thing a physiologist would typically do. There's the best image, these are just random sample natural images. We showed all these natural images, and here's this image that we happen to show. There's the receptive field of the V4 neuron. Remember, the receptive field is the part of space that when you flash bars of light in it that it tends to respond, and it's probably a rough approximation of where it sits anatomically. That's called the classical receptive field that we mapped with other stimuli. And there's a zoom in on that part of the image in that V4 neuron's receptive field. And you could look at this and say, well, maybe it likes this corner, maybe it likes that edge, maybe it likes this curve, and you could look at a bunch of images and try to make those inferences and word models, and that's been done again for decades. But our job was to say, let's find these controller images that the model predicts would drive it up even higher than this thing. So what's cool about this is the model's not restricted to natural images. It just plays with the pixels under whatever constraints you give it. And this is like deep dreaming and stuff and networks, but it's just trying to do it for a goal to find some image that it thinks will drive this neuron up. So we're using the knowledge implicitly embedded in the models to manipulate the pixels to find parts of this super crazy, high number of image space that we can never explore by hand, but let the model find things that we think will drive the neuron into some state that we as experimenters desire, in this case, a high response state. There's the image that it came up with. If we started from initial seed points, it comes up with slightly different images, but you'll see they actually, they kind of psychophysically look similar, which is interesting, and this is consistent with the work from Tony Mobshin and Aerosim and Shelley thinking about V2 neurons and sort of similar image synthesis procedures. But so you see, this is what this neuron should like, should control it, and here's what we actually got. So there's the synthetic images and you see that this V4 neuron, we were able to get it to fire well above what we had seen it fire above under a large set of natural images, and that is in line with the prediction of what the model had said it should do. So this is a good success case of being able to control this V4 neuron from the models. So we made the predictions, closed the loop and drove the neuron up to control. It's fun to look at these because humans like to look at images like this, but I'm just showing them to you so you can see that they look different across different neural sites, but they look kind of similar in some ways when you repeat from different initial seeds if you look up and down, which is on the rows here. So these are stretch control images for five different V4 sites. And then I'm gonna show you the degree of our control. So zero would mean we weren't able to drive it any higher than we'd observed on naturalistic images. Sometimes we can get really higher, sometimes we fail and actually go slightly lower. On average, we got about 40% increase above the maximum response that we had previously seen before using this stretch control method that I described here. Yeah, how do I quantify that? I'm trying to remember. So one thing I should say is for 20% of the sites, there was no prediction of that we would be, the model was already predicting that the synthetic images was not gonna be able to drive the neuron up, which is already interesting because you view that as a failure of the model, but we still include that here anyway. So what you want is a plot of how much did we go up among how much we predicted up? And I don't have that plot. Sorry, again, cutting edge of stuff. Somebody writing down everything, Peter's asking all the good questions. Maybe you can remind me later. I'm sure we could get that to you in like one day, but yeah, I don't know the number. I have a few slides that relate to that later and see maybe we can come back, but I don't actually have a slide on that exact question. What I do have is I thought you were going to, many people would ask, well, you're showing these natural images, people know that V4 neurons like curves and so forth, and many groups, most notably Ed Conner's group with Anitha Pasipathi spent many years kind of sort of testing curvature stimuli of what they thought V4 neurons like. So we also tested their stimuli and Anitha was kind enough to give them to us. We presented their stimuli within the receptive fields of V4 neurons to find what are the maximum fire rates that we can get across a large set of these kind of curvature-like stimuli that are also even higher contrast than the ones we showed in the natural images. And what's sort of intriguing about this is these stimuli didn't drive the neurons as much as many of the natural stimuli so that our control abilities referenced against these maximum looked even better. So if we had only collected these and then done the control, we would sort of feel like we were really doing great at control, but this was the reference set that we had originally collected. So again, if anything, it goes in the opposite direction of what you might have thought about what V4 neurons care about. Okay. Yeah, right. And that's consistent with the V2 work from Tony and Aero. Well, okay. Again, you can read into that. I forget there's an important detail that relates to your question though. So when we generate these images, this is what you can do with the models. You can put in a cost that says, I need this to be position tolerant to the response of the neuron, right? Because the neuron's got the model of the neuron, you can build an image so that it's trying to be, assume that the experimenter does not have perfect control because we'd have to have perfect eye control. So we actually built that in when we made the images. It's like, hey, we're gonna have some eye jitter. So because there's a whole manifold of stretch images the model can come up with, please try to give us ones that will be robust to experiment or error in the position axis. So that was actually, some jitter tolerance was built in. So I wouldn't read too much into any particular set I show you without, I'd have to show you a set without position tolerance for you then make such an inference from just looking at the images there. So I'm just telling you that that's an important detail that might be coloring these images. But if your argument or your point is that, well, look, these are position tolerant. I don't know, I guess that was your argument. I don't know, is this repeated right here to here? Is that, I, I, I, I, yeah, I mean, I don't think this is, again, we already believe IT builds up in variants to things like position. So I'm not, it's not, I'm not saying it isn't consistent with that. I'm just, I don't, which outcome? Yeah, it could. I mean, I'll show you some, I think I have some images in a moment that don't look like textures. Also, you gotta remember, this is V4 and not IT. So, so, I mean, you know, it's fun to look at these things for me. Now I'm just, you know, I show them to these people, I know, like to look at them and do what you're doing. I'm just, I'm just trying to say we can use the models to drive the neurons in a way that we would not have been able to do without the models. Yeah, that's, so, but this is an example of that more strongly. So instead of doing what the physiologist wants to naturally do, let's try to do something with the whole population. So this we call the one hot set state. So could we take a population of V4 neurons and find images, control images that would drive one V4 neuron that we choose as experimenters up while keeping all other neurons down, right? It's not even an experiment that a physiologist even would think about let alone try to handle. They'd have to manage, you know, 40 stimuli at once. It's like, oh, I think the neuron likes X, so I'm gonna hand optimize that. Oh, all these other neurons don't like X, so I gotta hand, I gotta somehow optimize over 40 things at once. Impossible for a human being to do, yet we can ask the algorithm to sort of find, try to find such things. So this is called a one hot target state. We could imagine other states like half on, half off, alternate on. These are just random numbers of neurons and not any order, just the order to come off the electrodes here. And this was, again, our first attempt at some population control. And our first reference here is to say, let's search through the images we had happened to show, these naturalistic images, and find one that was the best one hot control. So we have like a softmax score to judge whether this is a good control or not. And this was the best we found in our set for this neuron. It drove this one up, and these ones sort of lower than that one. So whatever was going on, this was sort of a partial control in that direction. And you might sort of say, well, look, this off axis or off target activity that you're getting out of these neurons is maybe not surprising, because one thing I didn't tell you is that these neurons have highly overlapping receptive fields, right? So this is what makes the problem interesting. If I told you I had two V4 neurons with different receptive fields, the physiologist would say, ah, obviously I can control these. I'll put energy over here, then I'll put energy over here, and I'll get independent control. But now I show you this field, and you're like, oh, that becomes a lot more challenging because I need to know something about the actual preferences of the units underneath that receptive field. And so that's just a reflection of this puts energy in there that's sort of of interest to many neurons within this area. But when we kind of use the synthesizer to say, tell us what you can do, here's what it comes up with first, to try to turn this neuron on and all these off. I don't know what it is. It's not an object. This dovetail, maybe you like this one better. I don't know. It has repeated structure. I don't know. Is it an object? I don't know. There's some weird patterns in it. I don't even know what to make of it. This is a full field test. There's actually some energy over here, which we're not sure if that's critical to any results yet, but because we were comparing against these eight degree, we were working within an eight degree window to sort of match what was done over here. And you see there's like a little bit of energy, but most of the energy is within the field, which kind of makes some sense. And then here's what happens. It's not perfect, but it's much better. So this neuron is off, some of it is off axis activity, but it's like I found a way to kind of put energy into the animal's eyes. And like I want that neuron on the other one's off and get some partial success out of doing that. And the model enabled that, right? So that is a control example. Now you can start sort of dreaming about things that you might do more broadly with this kind of technique in a control sense. Here's a bit of controls for our control. Here's the control gain score on that one hot. The example I showed you to be fair was one of the better examples, which is out here. Here's our sort of typical gains in control on one hot population control. Here's our gains. If we restrict them to be perfectly overlapping receptive fields as best we can get, you see there's still a 40% gain in one hot control. And here's our gain in one hot control over those curvature stimuli, which again, generally don't control things as well as the naturalistic stimuli. Those are just quantification. Here's a visualization of what we were dreaming about when we started this. This is actually eight recorded, simultaneously recorded neurons. And what we're trying to see here is we're asking the synthesizer to say, I want you to one by one try to turn on each of these eight neurons and turn off the other seven. And the neuron that's trying to be turned on is in dark red and the one's off or in light red. And you see it's like, you can kind of stare at the images. There might be some orientation stuff, some other things going on, but you can kind of see as you go around the clock that it does a reasonable job of doing that here. And so this is sort of the demo of what we were imagining, is we want to, if we sort of understand a system, we should be able to have independent access control on a system. And now all kinds of cool questions relate come out of this and Peter's already sort of gone on to some of them, which is things like, is it even possible, some neurons might be like anatomically coupled and it's not even gonna be possible to independently control them. These are questions we don't know the answer to yet, but these are experiments that now can be done with these kind of techniques. And I like to dream about IT. This is V4 and this can be done in IT, but places like IT, they're one step from the amygdala. And the amygdala is an area where you read tons of papers where people will say, if only I could turn on the neurons in a right way, people will feel less depressed, they're gonna be happier, and they talk about sticking in optogenetic things to turn on and off the neurons. If we really have good models of a system, maybe we can push targeted energy activation through the system to noninvasively mess with neurons in a way that might relate to human health. Now that's a bit of a stretch, but I'm putting it out there to say, this is the kind of things you might dream about doing with models that isn't really in the words of understanding, but is in the words of goals of the field that we might wanna think about with these kind of techniques. Last thing I'll say about this is, as the models get better, this is sort of the fidelity of the models to the V4 neurons, and this is the one hop population control score. As the models tend to get better, then the control gets better, right? So that's what you'd expect, but I'm saying this because the models are not perfect, but if we go back to the first parts of the talk, if we improve the models, we should improve the control, and there's already evidence of that. So these two things will tend to go together, obviously. And I'll just sort of, I think that's probably the last slide I have on that is to say, oh yeah, so go ahead. So you're asking for a behavioral experiment, because you're saying adversarial in the behavioral sense. Yeah, we can already do that without recording, though. You already have models of, that's a behavioral experiment, right? You can take images, you can take neural networks of the brain and say what's gonna be adversarial for the brain? Yeah, I think there are good experiments in there, but I'm trying to kind of get at what is the inferences that you wanna make. So are they about where the models are broken, or is it about the relationship? I think the problem right now is, I think the problem is the maximal adversarial image for the behavioral level is basically gonna be the same one, even if I didn't have the neurons. The interesting, so that you can run that experiment already without the neurons. But I think the interesting form of this question is, I have this monkey's neurons. Can I make an adversarial example for this monkey that is not adversarial for another monkey, right? And that is the interesting form of the experiment. We're not sure we have enough neural measurements simultaneously to do such a thing, but this gets at the question of, it's like we each have individually tuned adversarial examples. That's what the neurons in the monkey might be able to give you that we couldn't already do a behavior testing from the existing model. You gotta remember the neurons are a sample of so many, so it might not work, but the spirit of that I get, I like that, we've actually been talking about that in the lab. It's like we don't know whether we can do it yet and we're trying to run some numbers, but you also probably know the recent work from Goodfellow and others on the adversarial tests in humans, this came out a couple months ago, so that humans are susceptible to adversarial examples. This is of course a debate in the field, but when you show them for brief periods of time, then they show the same adversarial trends as the models, which is, again, the models to me are just approximations of the initial response, not of whole human vision, and when we play the adversarial game, usually in the audience, you're looking at the image for two seconds. Now I don't think that's gonna hold over all images, but I think it's the jury's still out on to like how far these models are, these adversarial examples is taking down the models as being incorrect. But the really interesting form of this is we probably do have individualized adversarial attacks. They're all black box attacks right now on us, but this becomes more of a, it's not a full white box attack because I don't have access to everything inside the brain, but I can get some internal measurements and make a sort of gray box attack on that monkey in a way. Sorry, I didn't hear that part. Yes. Yeah, let's talk offline. I don't want to, I don't know if the students are following all this, but I think we're getting into a Fischinato land. So do you guys have, let's see, questions about that before I do a little bit on understanding? Okay, yeah. This was all on V4. Yeah. We haven't done that on IT. But it's on the action list. Keep, stay tuned, I'd say. Okay, remember this was the debate we, some of us had at dinner last night, like what is understanding? But I said to talk about what understanding is and what can you do with this understanding embedded in these models? And I try to give you one example, is that we can use it to do noninvasive neural control, which could lead to potential health benefits if done in the right way. At least I like to think about that. It principle leads to noninvasive perceptual control and prediction, which is essentially the thing that we were just discussing. I haven't shown you that, but that's what these models already should enable. In theory, these models should enable better BMI development even as they stand by testing injections of how when you put things in, what should happen with behavior. We've been doing some of those tests experimentally but again, this is something that the models enable without further understanding of them. They are, could be downloaded by all of you interested in models of memory attention motor control or other areas that use visual outputs to drive your performance, drive what you wanna do. This is already happening in our field without people understanding the networks. They're using them in these ways and they can be used to generate hypotheses about disruption effects and we're doing some of that right now too. So the models as they stand are already powerful as hypotheses but the only thing they don't do is satisfy a lot of neuroscientists, colleagues on what the definition of understanding is and that's what we talked about at dinner last night and that's what the diabetes question was about. So I'm telling you things that you can do, some of which are going on, some which haven't yet gone on with the models as they are. But that being said, our job is to both try to do some of these things but to also improve the models as the encapsulation of our current understanding of the ventral stream. How do we discover even more accurate models of the ventral stream? And if I knew exactly what to do, we would have already done it but I'm gonna give you a sort of a back to the very beginning of my lectures of like a philosophy of how we should be doing this and it basically amounts to get more measurements and use them as search constraints through a model space. And then there are more formal ways that this could be done that haven't been done yet but the story to me goes like this, this is how neuroscientists used to do science is the classic science recipe is they define something, maybe not well-defined but sort of defined, they domain a behavior, they make some measurements, they measure some system components usually internally, sometimes behavior, they report their understanding as an abstract in a paper of what they've understood and if they're lucky they get a theorist to model it in some way to capture the data or to use the data as advertising for their particular viewpoint on how the brain must be computing. And that's sort of the end of the chain then and then the modelers complain that, well why don't we please give us data so we have something to work on and the brain scientists control the world and say this is what we're gonna do and I think that model is reaching the end of its limits end of its time with respect to complex systems like vision and their new model I keep arguing for is this reverse engineering recipe which makes several changes to this kind of classic recipe of doing science. The first change is that the models kind of become the leading thing that the models now become the sort of hypothesis generators for driving experiments as you saw all of today, every experiment that we were doing was driven by the models not by intuition of what we should do and that's of course what an adventuring science should do is to start to have models to use to drive the experiments. The second big change that I'm proposing making to our field in effect is this word understanding and that the model is our encapsulation of the current state of understanding and some people will know the models better than others but the entire field can benefit from the use of the models as I was just describing and the model is the understanding and that's a point of contention in debate and the last thing I would add is that well you've got this loop, it doesn't just go here and measure and done you have then outputs and this came up and the question about brain score and the recurrent where you put the data, those data that we measure become available to drive the next generation of models and through us that's going through this thing I call brain score and then I showed you kind of one or two turns of that loop within my lab with some models and we go get some data and then we build some new models and it's like well that take a couple years and at that rate we'll be slowly doing this so if we could move the humans a little more out of the loop here, this whole thing could go a little faster but this is now starting to sound like automated science and the humans want to be inside the loop knowing what's going on and transferring the knowledge and so forth but even in our lab we're kind of what we call techifying the recordings so that the recordings are actually requested by the modelers to say these are images we think we should test so that then the data can come out faster we're trying to again present them as brain scores in a formalized way and you can imagine doing searches over models that are more automated than just individual people pushing on one particular model and that works already goes on in a number of areas of machine learning and we think we could leverage that into these kind of data constraints and this is kind of my view on where we're sort of headed and maybe some people call this dark and depressing I just feel like this is what's going to happen naturally but that's sort of what we're doing already and even if you don't like that kind of view and what we're trying to do with it I hope that you still appreciate the idea that going across this defining operationalizing something and then measuring and modeling and measuring models essentially what good science is and the intersection of AI, machine learning and brain science is quite exciting in that regard so you can do this in your own domain of interest and I encourage all of you students to think about maybe there's some things that we did that you might be able to think about using similar ideas or related ideas in an area that we don't understand as well as object vision as we understand object vision now which is still incomplete but is now in this maturing phase of science so that's all I wanted to say and then leave the rest of my time for discussion and so I try to put that provocative thing up there so we could talk about that or any of the other things that I said or we can just go and have lunch, okay?