 It's a pleasure to have Professor Jim Nicalo who will now deliver first of these three lectures on visual intelligence. OK. OK, great. Everybody hear me OK? So folks, look, listen, I'm here for you. So I have, I know we have an hour and 45 minutes. I actually, probably if I just talk straight through, only have an hour of material that I was going to go through today. I can move on to the next lecture, but I'm saying this because I want you to put up your hands and ask questions along the way. I'm also going to try to, I noticed I heard some comments when Professor Zocalon, who I would call David A. Said earlier, somebody said, well, you know, I'm not really a neuroscientist, so I'm not sure. I want both neuroscientists and machine learning types to be engaged in this, and I'm trying to give you a talk that speaks to both audiences. So there are no dumb questions from either side of that. And I view my job and big part of this course is to try to find connections between those fields. So I'm going to try to highlight them as we go. And I'm going to do this in the context of vision broadly, but a specific problem called visual object recognition, which I'll introduce to you next. So I titled my talk, Reverse Engineering, Human Visual Intelligence, and I'm going to, I'm going to give you a sort of a meta flavor of what I, when I say reverse engineering, I think this is an approach that our fields broadly need to be taking to make progress, both on brain sciences and on things like machine learning, AI, broadly speaking. So let me try to define that for you sort of very briefly, and then I'm going to get more specific about when I say reverse engineering. So when I say this, what I mean is our goal is to somehow account for each ability of the mind, which we can measure as complex behavior. Broadly, we've put an umbrella over that called intelligence. You can't really define intelligence, but you can measure lots of complex interesting behaviors. You've heard some already. You want to be able to do that if you care about the brain using models that are built out of components that have some mapping to the brain. For us, that's going to mean neurons and their connections, but it could go down to finer scales if you want it at some point, but I'm going to stay at a level of abstraction of single neurons and their connections. And you'd like to do that in the language of engineering, which means that you've implemented something that can make testable, falsifiable predictions and that you can do things with. And by the end of my third lecture, I hope that we'll actually be able to show you some of the things that we do with these kind of models. As my colleague at MIT, Josh Tenbaum likes to say, this Jim, this just means let's do science like engineers. So not just make measurements, but actually build models of what's going on. So I hope for this audience, this is not a new idea. It's just how do we actually roll up our sleeves and do this. So I'd like to stick this slide in. It's a bit entertaining. I first put this up when I was giving a talk at CBPR, a keynote talk to just be a bit provocative. But so I'm going to kind of tell you about kind of I think the goal of many of us is to say, well, we want to be able to build some computational intelligence. That means something in silicon for right now, because that's what currently we engineer with. That's how we synthesize things. That meets or exceeds biological. And mostly we mean human intelligence. This is kind of a longstanding goal, probably of many of you in this room. It sort of drove a lot of us into the field. And so if you think about that as a goal, I hope that's the goal of maybe many of you, maybe not all of you. But if that's a goal, then you have strategies you can take here. One is to say, forget the brain. Just ignore brain and cognitive sciences. Just build technologies. You know, do something cool, right? Let's just go build AI and we'll figure it all out. And I don't think that's a crazy idea. I just think that many of you are here at this meeting, meaning you probably think maybe there's something to learn about the brain. So you're not really taking just strategy one. Strategy two might be, let's ignore the brain in actual engineering practice. But when you go out and talk about your systems or put them up in science or nature papers, you can sort of say how their brain inspired or brain like. That's really good, because it's really good for advertising if you can say, oh, it's brain inspired, stamp of approval. You guys have seen that, I'm sure. I don't like that, right? Because I'm actually a card-caring neuroscientist, so I actually like to think that there should be some real interplay, not just the advertising for the current AI wave. A next strategy might be, let's use human performance as a benchmark to report how good your systems are. This is kind of like a Turing test. Oh, we beat humans at chess. We beat humans at jeopardy. We beat humans at go. This is a bit better, because you're actually, if you think about it, you're making comparisons with the biological system. It's measured at the level of behavior in certain domains, but that's in some sense, I like that better because you're at least engaging with the biological system of interest. It's also good for advertising too, as you've probably noticed in those examples are good. So you get a bit of both there, and that's actually been going on a lot. This is a bit better. Now here is the one that most of my neuroscientist colleagues think that we're supposed to be doing. And this is really driven off of models of physics and biology and other successful fields of science is try to find a simple, reduced neural system. Hope that you can find some principles, some magic principles that are somehow discovered and then are scalable. And this is driven off of things like, well, DNA science, once we figured out how phage DNA worked, then we could think about whole human genomes. Those are examples in science that you say, let's go to some reduction thing that then we'll kind of scale it back up later. This is not a bad idea. Again, it's worked in other domains. But I'm gonna try to tell you something that I think I'm not sure how much traction we're gonna get out of that for complex systems like brains and vision. So I prefer this fifth strategy, which is a bit long-worded here, but this is really what I mean by reverse engineering. So what I mean by this is forward engineer models within wisely chosen, and there's some art here, brain science measurements, test the deviations from those measurements as you go. So the models are making predictions, you're checking your errors on those predictions, and adjust your engineering direction if the deviations are growing rather than shrinking. So we're kind of following a big meta gradient here to converge on a model that's approximating the system of interest. And this is what I call reverse engineering. It's a forward engineering within constraints. So I'm gonna call that reverse engineering, and I'm gonna kind of try to show you how we've been doing that. Really it's gonna evolve over all three lectures, but I'm gonna give you a bit of a flavor of that today. And just by the way, you know, you may like one, two, and three if you're an AI person, maybe you'd like at least three. But if you do these two, but especially I think number five is the way forward, is that this enables, if you succeed in that, not only do you build better AI, but you enable new advances in human health, educating kids, brain machine interfaces, all the things that you can do with actual and engineering understanding of the brain. And as a small aside, by the way, besides solving great engineering challenges of AI, it's understanding our own intelligence is only the greatest open problem in the history of our species. So by the way, that might be a good thing for many of us to work on. So I'm saying this to say motivate you to say, let's try this five direction. That's what I'm gonna talk about here today. So if you're not motivated by this, I don't know, maybe you could probably just leave right now and it's be okay. Okay, so let's try to do it more concretely. Reverse engineering in practice. So here's the recipe that I like to try to use when I think about what are we doing and how you might wanna do this operationally. So first you have to define, I sort of said complex behavior, you have to spec. So this is what engineers do, specify the domain, define it, it should be interesting, you wanna operationalize it, it should be rich somehow. Again, there's some art here in how you're gonna choose this. You perhaps wanna ask, as I alluded to in the last slide, what are ways that brains surpass current engineered systems? Usually we think in terms of performance. Oh, humans are better at AI system X than at some task, like playing go or something like that, right? So you can usually think of performance, but I encourage you to also think about things like power usage, sort of energetics, the brain runs on about 20 watts, nothing comes close to that yet. Size, okay, the brain's reasonably big, but these are other constraints that engineers have. So we'll talk mostly about performance today, but just encourage you to think broadly about that. So then once you specify something that's interesting about biology, then you wanna go and as I said, get some of the measurements that might constrain your forward engineering search. So again, you have to choose wisely. There's lots of stuff you could measure in a brain, right? So you guys know this, right? You can measure behavior. I mentioned that already. You can measure spiking activity of neurons. I'm gonna show you that from our own lab because we think that's critical to the problem we work on. You could measure anatomy. That's also important. You could measure blood flow. Davide referred to this morning when he said the brain gets hungry, so it draws blood in. This is what you measured by fMRI. Neural perturbations, subcellular genetics. The list goes on and on. Neuroscience is a very big tent and it's very confusing and you can measure stuff forever. So you have to somehow choose wisely. I don't have a magic recipe to tell you how to do this. I'm gonna say think about it. Then you can't stop there. This is kind of the classic neurobiology loop. Kind of sort of spec, don't really spec, measure some stuff, then write a paper and say that's kind of what you'll find in a typical neuroscience paper, experimental paper. But you wanna go the next step here is like let's engineer something, a model that can actually explain what you measured here and ideally strive to cover the entire domain. So again, this is the part about engineering under the constraints of those measurements and the model should not only capture these measurements but it should predict held out measurements. This came up in the context of the last talk where there was a question asked essentially about this. Can you say what can you predict outside of the things that you've already measured? And that's sort of the test of a model is how far it can generalize across a domain. So you at least wanna be able to generalize over some rich domain here and perhaps beyond that if you're really lucky. So because I'm a systems neuroscientist, we're gonna try to predict stuff that's not only behavior, but neural spiking responses as I mentioned already. And if I'm gonna predict neural spiking responses, I need my models to have neurons in them. My models have neurons in them and they're in silicon, they're called artificial neural networks. So when I say ANNs or artificial neural networks in this talk, I don't mean the exact neural network that is currently built today and downloaded on SiteX. I mean broadly speaking, silicon-based neural networks. And so I think we have to, as biologists that are trying to understand the brain, work with those kind of networks as hypotheses of what's going on. So because again, I'm measuring those things, I want them to be predicted. To do this, if you're gonna have accurate predictions, you often need your models, they can't be toy models. If you're gonna say measure visual processing in a monkey as I'm gonna show you, those are very complicated processes. They're done by millions of neurons. So you can't do this in a toy regime. You have to be building these models at sufficient scale. So serious engineering expertise needed just to pull off this side of this loop here if you like this direction. Okay, and then you don't just stop there, you don't just make measurements, model it, and then kind of say, here's the best I did, call it a day. You check the model against the measurements, again make predictions, see where your deviations are. This is that loop I was referring to earlier where you say go back and try to build the model better, make more measurements. And the faster you can make this spin here, then the faster I would say you reach an understanding of domain one. And this is controversial in our field that notice I'm calling the model as encapsulating the understanding over domain one. And I would be happy to talk about this and again we could talk now if you want, but I think this is a point of contention right now in our field is whether this actually counts as understanding. Let me just say that this form of understanding as I'll show you in some of my later lectures can lead to immediate usage for various types of applications because it's an engineered understanding and not words on a piece of paper or in a textbook or in an abstract of a series of papers. So this is classically the domain of science, make measurements, brain science, neuroscience, cognitive science, make some measurements. This is classically the domain of engineering, model stuff and this is AI, ML, computer vision for us. I'm gonna lump that all together and all I'm saying is bring these things closer together and if you do that well then you'll make progress. At MIT we have an effort dedicated to this called the quest for intelligence that sort of views these as two sides of the same grand challenge. If you're interested in that I'd love to talk to you about that offline. Today my lectures, really my series of three lectures are gonna focus on the application of this recipe to one domain of human visual intelligence called core visual object perception. So I'm not gonna solve all the vision for you. I'm not even gonna solve this problem but this is a foundational problem and I'm gonna tell you the story of how we try to approach it, where we are, how broadly other fields have contributed to this and maybe what's still missing and what's next. So I think that that leads me to sort of first reminding you that when I tell you things especially from my lab, this is my group here. I'm really just an ambassador. These men and women do all the work so I'll try to highlight some of them when I show parts of the talk but I'll probably forget so I'm highlighting them now. I especially wanna point out Dan Yamans who was a postdoc in my lab who is now assistant professor at Stanford and I'll highlight a lot of his work on convolutional nets and the relationship to the ventral stream especially in tomorrow's lecture. But again, a lot of folks here that deserve a lot of credit including Davide who was a lab alumni. Davide was a postdoc in my lab. Yeah, I hope you're not ashamed of me saying that. I believe I was. Yeah, okay. So let's sort of go through the lectures that I'm gonna give you. What I tried to do is I'm first, I'm gonna define the domain a bit today. Mostly today's spent on where, I call this where are the interesting computations done in the brain? You already have a sense of probably where this is, Davide nicely set this up for you. But then I'm gonna kind of not just tell you where they are just anatomically but I'll show you why we think that's true computationally and why the problem is challenging and where the brain sort of lays out its solution space to that and I'll try to motivate you there. Tomorrow I'll try to, then I'll talk about how we can model or approximate those solutions in silicon, how the current models look and then Wednesday we'll talk about kind of the cutting edge of like what's probably still missing and how do we put it in and if there's time then we'll talk about what can we already do with our new found modeling power and we'll see how it goes and I may tune these up depending on how things go for the next couple of days. Let me pause here because this was all sort of big picture introduction. See if anybody has any questions or complaints. Complaints are good. I like complaints, nothing. Wow, okay. Where, okay let's start with what we're going here. So I stuck this slide in, it's a little old but I think it was this audience probably needs to kind of hear this. Let's translate some language here. So we're talking about AI, ML engineering, relative to brain sciences. So these fields use different terms but I want to try to align them for you a bit. So what problem are we trying to solve? I've been talking about complex behavior. In these fields this is usually measured with a series of benchmarks for object categorization. For instance there's this so-called image net benchmark for computer vision but these fields have often done better jobs than brain and cognitive sciences in establishing shared benchmarks. We're working to try to do that as I'll show you later. But broadly what we mean in brain and cognitive sciences is we're going to make often behavioral measurements as they're referred to as kind of core benchmarks like we'll call them perception, complex behavior, measured with careful psychophysical behavior. So I sort of align those two as defining the domain of that's trying to be solved. What do solutions look like? What do good solutions look like? Well in computer vision for instance you'll talk about useful data representations and in brain and cognitive sciences you'll talk about explicit neural population spiking patterns as was one example that I'll show you but these are essentially trying to say here is what an answer could look like to a challenge and I hope that'll become clear when I do our problem. How do we build those useful data representations? Well in these fields this is called algorithms. You take some input, you do some interesting transformation for it and then you get some other form of data representation that is more useful. So given Chris's talk last to sort of go from the measurements to inferring the latent content an algorithm that's interesting needs to do that and then the latent contents expressed in a data space that is in the brain in spiking patterns is how we think about it. Now the brain and cognitive scientists don't usually talk about algorithms but they'll talk about neural wiring, weighting patterns, non-linearities and if you put these all together in the right way you're essentially executing something that's like an algorithm. So an algorithm can approximate this and this is again where the artificial neural networks have really helped us to sort of bridge right here. And then how do you actually construct these? So these are meant to be instantiated solutions. Like you guys are all looking at the screen, you're recognizing letters, you're recognizing faces, you have a running algorithm in your head right now for vision, how did it get to be that way in your head? So of course we think this is partly evolution, probably some postnatal development and plasticity although we don't know the breakdown between those and really maybe even adult experience. I mean you're quick in this, especially in this conference it's called learning. You may be quick to think of adult experience but for a lot of our great behaviors, a lot of it is pre-built in by evolution and then there's significant tuning here. This is a very thin sheet on top of this core and evolution and postnatal development. That's, I think the way you should be thinking about brain science is it's sort of out of the factory you're already very, very good. So for computer vision and AI folks they'll talk about initial conditions, broadly learning rules where they don't necessarily map them to evolution or development or to adult experience, we'll just say learning rules and of course they have to do training data to train things up and so that is usually thought of as some notion of experience but again it doesn't map well to evolution but so these are things to think about as we go through at least my lecture and maybe some of the other lectures how these words are mapping on to the things in brain and cognitive sciences. Okay, these are just to try to map terms. Questions here? Wow, you guys have no questions, how do I? How do I provoke you? I'm gonna throw something provocative, I'll just try to think of something provocative to say. Okay, let's try, let's start with like this problem of vision in 2002 when I started my lab I said hey I would love to be able to figure out how primates do visual scene understanding. That's a huge problem of visual scene understanding. I didn't think we could take that on us so let's try to do something simpler and the motivating problem for us at the time was a problem we'd call object categorization or identification which is essentially already been framed in computer vision in this kind of way where you wanna say let's label a bunch of nouns in the scene like cars, people, buildings, trees, you have a list of nouns, you wanna be able to be able to say oh that's a car so I've colored the box red and kind of where it is which is what this bounding box is indicating where it is in the scene. Maybe also it's other latent content like it's exact pose, these are the kind of latent content variables that you wanna be able to estimate from this. Now you and I look at these images and go obviously it's a car, it's pointed this way and so forth but again connecting to Chris's talk these are just luminance values on pixels, this is just a flat screen of pixels if you came up and looked at it, all the rest of this is your brain sort of making up what's out there in the world, in fact there's nothing out there, it's just some light flashing at you but you sort of construct some inference of like I think there's a car out there and it's pointing this way and how you do that from pixels which is an ill-posed problem is essentially the kind of key challenge of recognition and we'll go through that in more detail but I want you to kind of appreciate how powerful your system is to do that. Now here's, I'm gonna reduce the problem so instead of just working on full scene understanding we're gonna take lessons from biology and someone mentioned this I think this morning of when they talked about eye movements so first of all you should know that your retina is not uniform flat like a computer vision input typically is, you have high acuity at the center of gaze and it rapidly falls off as you move to your periphery so you have the illusion that you see everywhere but you're really seeing very well at the center of gaze so I'm gonna focus on the central 10 degrees of vision and for you it depends we are in this room but 10 degrees is two hands like at arm's length that's about the part of your visual field I'm talking about if you wanna hold up some hands and sort of get a sense of what that is and I'm doing that in part to reduce the problem, make it simpler but also because the parts of the brain the ventral stream I'll tell you about David alluded this when he drew the drawings of the receptive fields up here this morning it's really focused on the center 10 degrees of vision as you get to these high levels of the ventral stream so there's sort of anatomical and physiological data that motivate that as well and so think about 10 degrees I'm gonna be a little sloppy with that sometimes it's eight but just say roughly 10 degrees and that's again your relative to your current center of gaze which we'll just define right there as if you're fixating right there okay but as I just mentioned you don't when I put this image up you didn't just put your eyes right there and then hold that still as David mentioned this morning you can train animals to do that it takes effort you naturally as primates want to quickly sample this scene and get as much information as you can you are all doing that right when I put this up without you even knowing that you were doing it but what you might have done is a scan path that might have looked something like this where you dwell for a couple hundred milliseconds you do a rapid eye movement okay who knows what the eye movement's called I'm gonna get some people talking up yes saccade right do you see when you're saccading who thinks you see when you're saccading raise your hand who thinks you don't see okay good everybody else has no clue I guess I got 10 hands so right so you don't really see when you're saccading your eyes are moving too fast for you to see it's really a mechanism to get your eyes to some other sample in the scene and the dorsal stream that David mentioned is very involved in choosing where that next location can be I'm not gonna talk about how you choose a location what I want you to see here is that you typically dwell for 200 to 500 milliseconds for humans monkeys are a little bit faster but about that time scale before you move on to some other location and I'm doing this because now you can see I've reduced the problem to just 200 milliseconds you're gonna be able to extract something in that window and so you can think of this now in a reduced space that I'm gonna show you here at 10 degrees 200 milliseconds each image I'm gonna extract those snapshots out of the last image I just show you and show them to you now once okay those are about 200 milliseconds roughly 10 degrees and what I want you to know and I'll do it one more time I want you to notice that you can recognize one or more nouns in each and every one of those phrases those images even though you can't probably spit it out of your mouth fast enough even if you tried right but you're probably like car person car building sign right you sort of get the gist right but you're not seeing everything but you're doing a lot of inference in just that brief time scale you guys with me okay that is the core problem that we've been working on for essentially 15 years we call that problem core object recognition because we think it's foundational to all the other things that need to happen so like we let's get an engineered version of this first and then we can build on that it doesn't again it's not all a vision but it was a place that was biologically motivated to start so central 10 degrees to 200 milliseconds and a task typically is to report object category like say it's a car but we've also done things like estimate the position of the car inside the 10 degrees and things like that we can talk about that if you like but for now we'll just we'll talk about object category to keep things simple yes good I'm sorry yes right so the question goes like this when you when you it's well known this is from the original studies of Yarbis that if you queue subjects to say you know what's going on in the scene and you say find the person or find you give them kind of contextual cues then they will make different scan paths through the scene right that's your question I'm going to try to analyze given a scan path which I'm not going to again try to determine how you arrived at your scan path your scan path will extract these these images for 200 milliseconds I just want to know how your ventral string processes each of those images to successful estimation of the initial content well again we're going to show you realistic type images I'll show you images in a moment they're sort of like the snapshots I showed here but they're going to be completely out of context right so I'm going to remove that these aren't these were all from a street scene I'm not going to do that I'm going to just show you says if I'm taking photographs of arbitrary scenes and just saying here's 200 milliseconds what did you think it was and I'm I'm only introducing this to say to contextualize that understanding in the broader sense of how vision works to show you what I am going to try to take on and what I'm not going to try to take on so I'm not going to take on scan paths for instance in this talk that would be a talk about dorsal stream control of eye movements and somehow there's interesting interactions between the dorsal and ventral stream as Davide referred to to say hey I made a good estimate probably you can move on at least that's the way I think you the motor system can move the eyes somewhere else versus oh I'm not sure sure maybe you should dwell a little bit longer right those interesting interactions like that probably take place that we don't really understand I'm just trying to understand what happens in that first 200 milliseconds for the ventral stream does that answer your question yeah great question thank you yes yes right right why don't you maybe you can hold that the quick quick the question is the background context for instance can influence your speed of recognition or your accuracy of recognition and there have been studies like that that show such effect so what I'm going to concentrate on is I'm actually going to try to neutralize those positive effects and show you how well related conditions and I'll show you that in a minute but again that's I think of that as another add-on on top of the core system that I'm trying to first get at does that okay so again just trying to keep problem manageable before we engage in all of the possible complexities and that'll come up when I show the images here's here's the core of the problem and again Davide mentioned this earlier like the reason that recognition is computationally hard is that you never actually get to see maybe you've all seen me now and you sort of know my name you're never going to see the exact copy of my face on your retina ever again right it's just the world everything is too variable you know my hair could be slightly different the lighting is different all kinds of things change you really never see the same except in lab conditions you don't see the same image on your eyes more than once but somehow you're able to say these are all cars even though there's high uncertainty in the pixels that are striking your eyes and even you want to be able to say oh these are still cars even across some subordinate level variation you want to say these are cars even though there's background and there could be possibly foreground including things that I'm not even including here and this is correlated background but you can even do this with uncorrelated background as I'll show you in a minute so somehow you need to sort of wrap all of this in much more together to say those are all have a car in them and that ability to generalize over a large class of images yet still maintain selectivity say it's a car and not a truck is the kind of thing that you need to be able to do is to have the ability to call that selectivity versus invariance or tolerance right so this is the invariance part of the problem I'll give you a kind of graphical conceptual picture of this towards a little later in the talk that I think you might find useful but I just want you to have that in mind so this was actually the question came up of like how do you do this out of context so here's actually the images that we use for a lot of our studies they are rendered 3D objects placed on uncorrelated natural backgrounds so I hope that and I'll test you guys in a minute I hope you can even I put this up you said well that's obviously a Volkswagen even though you would never really see a Volkswagen floating in the middle of a field like this right so I removed the contextual cues yet you still say it's a Volkswagen here we'll try another one real fast you can sort of get a sense okay sort of right looks scary if the plane right here's a okay here's a fun one okay so you're not going to see that's unless it's Texas chainsaw massacre cut off head yet you still kind of clinically go I think I see a face out there right so this is is this in context is this out I don't know we're not trying to these are just naturalistic backgrounds and you can still do that quite well and the reason we did this I think I have a slide of this and I'll have a slide in a minute is because computer vision systems at the time they were claiming to do recognition but they were doing a lot of cheats using the correlated background the things you said humans do well at so we're like let's isolate the part of humans do well at without the correlated background and that brought all the computer vision systems at the time to their knees with a slide I'll show you in a minute okay here's if you actually test someone in a lab or you might go something like this fixate there I'm not even going to tell you what the objects are so all nouns are possible here you go okay all nouns are possible okay what was it and I'm going to give you two object choice nouns and what do you guys think the answer was so I didn't the face right okay this could have been a dog you don't know so I'm not pre queuing you so there's no feature attention there's no spatial attention yet I'm going to give you another pair okay here we go okay what was it bird okay you see it's pretty easy right now it's not always that easy because they can be small they can be slightly to the side right but those are kind of examples of what we do okay so if you measure a bunch of humans in doing this here's some humans and doing this here's performance in D prime units here's accuracy is like 98% here this would be chance 50% these are like psychophysical units don't worry if you don't know D prime units just think of this as accuracy and here what I'm plotting is the uncertainty so remember I said what makes the problem hard is if you have high uncertainty in the pose that it presents to you these are shown on white backgrounds just to kind of give the idea but of course they're on these naturalistic backgrounds that I showed you a minute ago this is just to kind of tell you that there's high uncertainty here and here you go here's humans and they're pretty good even at these low uncertainty they're good and they have little roll off so again you might imagine it gets a little harder when things get more uncertain but machines as I mentioned at the time this is about 2009 they did really well in this so-called pattern recognition regime where they could sort of say oh if I memorize that particular view of that particular object essentially memorize the image I'd be really good even better than humans so they could win like barcode reading versus humans for instance but they would lose relative to humans on these high uncertainty conditions so that was the performance gap that we were especially interested in when we started this work so remember what I said at the beginning is find something interesting going on and this is how we tried to isolate that here that gap so the whole rest of this most of the today's lecture is going to be about showing you where in the brain that gap is closed how we think the brain represents the data to support closing that gap without telling you the algorithm to get you to that gap which is going to be tomorrow so I'm just going to give you the lay of the land here today so I want to first kind of since we have time here give you a sense of I've been talking about 200 millisecond viewing duration and then I kind of quickly showed you some examples where 100 millisecond viewing duration I'm going to again I'm a little sloppy about this time we often run at 100 milliseconds in part for data like this because even though your eyes tended well for 200 or longer you can get data twice as fast at 100 and you can see this is human performance accuracy chances at 50 which is down near the bottom and as you increase the viewing time you quickly get up to a pretty high level of performance the absolute level depends on the particulars of the task this is an 8-way categorization task but you see that it's relatively almost even looks flat in this regime here and then you pick up a little bit more if you get into seconds of viewing duration where you're now making multiple eye movements on the on the image so again we're not trying to study what actually happens those extra movements I'm just trying to show you how much you get even in just 100 milliseconds and why I'm going to just work at 100 milliseconds from these kind of data okay so now we're doing 100 milliseconds central 10 degrees and I want to remind you again I already said this is not all a vision but this is a very big domain right if I if you say roughly people have estimated humans know on the order of thousands of objects if you think about I want to be able to predict your behavioral ability just even averaged over images on pairwise discriminations dog versus cat tree versus house if I want to be able to predict that that's sort of on the order of a million pairwise object discrimination tasks that I should be able to correctly predict that sort of span this domain behavior Lee and I should really be able to predict your behavior on each and every image of the types I've been showing you and so here's the images if I'm in the central 10 degrees and I'm showing you things that roughly 200 by 200 pixel resolution that's a 40,000 dimensional input that I'm providing you so that's like if even if I'm just black and white it's two to the 40,000 possible images for which I want to say I have a model to correctly predict the neurons all the way up your ventral stream and your behavior on that very large number of images now of course I can never test all those images but I can always test held out images and see how well our models are doing at predicting neurons and behavior all I'm trying to emphasize here is this is a very very rich space so often I get questions that say well vision has you know context or attention or all the things you read about in brain and cognitive sciences from an engineering point of view this is a very challenging domain as it is in my personal belief is as you figure out these kind of problems those other problems seem easy layers on top of these these challenges so that's why we've been focused on this core okay I've been talking about humans I've been testing you all to say look how great you are and I showed you humans are better than machines in some areas of generalization at least in 2009 but I also said I want to go in and measure things so it's really hard to measure things that you want to measure in detail spatial temporal detail in humans you can do it in some cases Davide referred to that when he talked about the Kyroga studies where you have human patients where electrodes have been inserted in them and you can measure individual neurons like that Jennifer Aniston cell those are very rare data and hard to do systematic studies on that patient population plus you're never sure of the tissues entirely normal because these are patients that have seizures which is why you're putting electrodes in their head so it's very hard to do systematic work in humans so we work on this species the Rhesus monkey here and if I'm going to switch species on you I need to justify that for you at the behavior level which I've been motivating all along okay so I got to wake this guy up in the corner over there there he is oh I got him so here is a monkey doing the task that you were just doing so these monkeys are trained I'll show you their training data earlier they actually learn a new object roughly it takes them a couple days to learn each new object I mean monkey doesn't know what a car is when he walks in the lab but they start to do the same task you were doing he's triggering the presentation of an image that lasts for about 100 milliseconds and then he's choosing among two objects and notice the choices are many there's no it's not again the same two objects every time so this animal knows something like 40 or so objects I think at this point again we can teach them a new object every couple days and they do this in their home cage all day long they run thousands of trials we get a lot of data this way and I just want to you know get a little sort of fluid reward for being correct the green is correct the black is incorrect just to give you a sense of what's going on there but okay you go ahead and you collect that primate species a lot of data you put things up on the mechanical turks you can measure a lot of other primates i.e. humans doing this we can also measure things in the lab to carefully control for things both the monkeys and the humans and you do all that and you can get a lot of data that looks like this this is one of these is a monkey and one of these is a human and I'm not telling you which is which because the key point is I want you to see how similar they are what these color maps are is indicating the difficulty of this or if you like the ease of discrimination so blue means easy to discriminate red means harder to discriminate so tanks are more often confused with trucks which you can intuit than they are with birds right so humans are not perfect everybody was perfect these things would be fully blue so sometimes some things are just harder to tell apart than other things and that's what's reflected in these color maps here this is averaged over hundreds of images in each case so this is kind of like the average tank confusion with truck the average tank confusion with bird these are symmetric matrices if you haven't noticed that so you really just pay attention to one side or the other but what I really want you to see is how similar they are and so at least statistically we can't tell the recess monkey from the human at least for the kinds of tasks that I just introduced to you and the kinds of images that I showed you and I can show you later we'll have data this will come up on the third day about more precise data at the image level we still can't see differences between humans and monkeys even at the image grain but what I want to tell you now is this gives us license to go in and measure the internals now that we said this primate looks like that other primate but this primate I have license to go in and make some measurements of in a systematic way so we can go in and do that so now we're going to go into the monkey brain this is the human brain that's what we really like to understand but we're going to use this as a model system which has the same behavioral patterns as I just showed you and say well we should take this as a almost literal model of what's going on in this system here they're interesting scaling questions we could talk about as well but that still have to be solved here but for now we're going to take this as a one to one model you see the primate brain is of course not as same size as the human brain I'm going to talk about the ventral visual stream which Davide has already introduced you to because decades of neuroscience has told us a lot about the ventral stream and most importantly it's told us this area called infertemporal cortex is important in object recognition in fact the whole ventral stream is important which goes from V1 V2 V4 and IT and I'll show you that again in a minute IT is called the highest level of the ventral stream it projects to areas involved in memory in the medial temporal lobe into areas involved in decision and action in the frontal lobe and in the subcortical structures like the basal ganglia that execute movements for instance to animal pressing left button or the right button so this is known from much decades of work that you put lesions in here you tend to have problems and tasks like recognition tasks although those data are quite messy and I'll return to that again for the end of my lectures so as an engineer we like to take this curled up thing that's sort of jammed in the skull and all crinkled up as a cortex and lay it out as these unrolled sheets of neurons which are here just to be schematic models of drawings of what's going on in these different processing layers so this is the visual input this is the retinal ganglion cells you've got a million retinal ganglion cells in the back of your eyeball that are transmitting spikes to the first spiking neurons in the visual system a nucleus in the middle of the thalamus buried in the center of your brain isn't shown here that then transmits primarily to visual area V1 that's why it's called V1 in the back of the brain where you have hundreds of millions of neurons and I'll give you the exact numbers later and then V1 connects to V2 primarily V2 to V4 V4 to IT and again I'll give you numbers on this I'm just giving a schematic here because I'm going to show it many many times these arrows indicate the feed forward flow the feed back flow these indicate recurrent connections and there's also skip connections that I'm not showing you here so this is an oversimplified schematic just to kind of keep you oriented on the lay of the land here okay, questions so far yes one back to this one yeah, I mean I think it's a great I mean I thought that was the point I was trying to make is that under these conditions the species look the same right and you could have say well there's a rate limiting step of just being able to estimate the shape of course the monkey doesn't know he can get in the car and drive it but that doesn't matter with regard to humans don't get a much of a performance boost out of knowing that oh I can drive that thing that is another level of intelligence that has to build on this but it doesn't matter it doesn't matter there's a level of intelligence that has to build on this but just there's a rate limiting step of here of just being able to say that shape's a car and not a truck that we share with the between the two primates these data were in the out of context data though so we need to be a little careful these are the images I showed where the backgrounds are uncorrelated so more careful examination would engage on the question that was raised here earlier like maybe there's going to become contextual gains that humans are going to show you can think of extreme cases like with no objects at all and then I show you a car and a you know I don't know a chicken and the human's going to probably pick the car based on sort of long-term experience of memories of cars go with streets the monkey would have no way to ever know such a thing right so these are cases where the objects are visible right so if you make them invisible you're relying completely on your priors about the world the monkeys we're not going to have trained that into them right so there's all these caveats in terms I don't want you to think that monkeys equal humans on all such visual tasks just on this one that we've defined here you're asking about the detailed correlations between monkeys and humans if I've got your question right and I will this is one level of analysis I have another plot that I've alluded to where I showed image by image and still almost perfectly correlated so so again it's under these kind of conditions that it turns out that they're well correlated I don't think it's going to hold under all conditions so exploring that will be interesting it didn't have to be this way it turned out to be this way so it kind of gives us license to proceed with this as a sort of a almost quantitative perfectly quantitative model I should add the monkeys are a little bit we can detect they're a little bit worse on average so something like 2% worse on overall accuracy even though their patterns are quite similar now who knows the monkeys are screwing around less motivated but there are things that we can detect right so questions I'm just I'm going to show you I'll show you a slide on day 3 I can jump ahead to it with the images image by image comparisons and you'll see it even is even more impressive how similar these species look especially compared to deep networks which don't look like either species currently at the image grain but just to kind of cut to one of the punchline today's deep networks look just like this but they don't look like it when you zoom in at the image grain so I'm going to show you that later if you just build rep we did this at the time and if you go to these papers down here if you put linear classifiers on pixels or Gabor waves or wavelets or any kind of common computer vision space at the time you don't get patterns that look like this so it's not like this just pops out of the oh any old representation will get you this so that's another important control that is in those papers but I didn't say here I just wanted you to say humans look like monkeys so far at this grain and at a higher grain but I was saving that for later when I got to the deep nets so okay I'll try it one more time so maybe I went too fast so a square let's say let me show you an example so this little square there see that's truck and tank I think that truck and if I sort of I don't have it written up here but you have to translate that that's truck versus tank and this red means here's D prime so there's D prime this is high and this is low so redder means harder to make that discrimination on average right that means your accuracy level averaged over many images is like something like two or one point something okay but if I go to the blue those are examples where I don't know it's like table versus watch I think is this one over here that's like blue that's like D prime level of like four okay so you're really good at you do better at that right so some of this is intuitive when you think about it these shapes are very different but that's kind of the whole point right that both species confuse shapes in the same way and other visual representations don't where did you go did I answer your question yeah okay so it's sort of an average difficulty in a binary discrimination test I was giving you single trials when I showed an elephant and a person that's like one trial of a binary task but we run them interleaves so that you you don't go into the trial knowing what you're doing and then we sort it all out later into binary kind of average performance levels that's what this is is that and the reason we run it interleaved it's neutralized attentional effects you're not like I'm looking for the elephant I'm looking you're looking for many things at once we want neutral attentional condition we just sort it out later yes well that's just like a physical question so we'd have to put things in the background that you would want to test so you a beginning part for us would be what if we had multiple objects up and you had to report both of them right we haven't done much of that in very reduced conditions you can do that we have not done it with these complex images it's a great forward question or you know if you put in sort of subtle things in the background there would be limits to your ability to do that they're also going to depend on how you cue the subject like you know are you looking for the background or the object right so again you want to interleave those you want to neutralize the the attentional effects on top of that so you're going to be able to do something but I share your intuition that you're not extracting I don't want you guys to think you're extracting everything from these scenes in 200 milliseconds or from even these these images here I was just trying to say you extract a lot monkeys do also and you're better than doing that that's all I'd like you to take so far but you guys are a kind of on to the next vision questions and I'm just trying to give you a history of where the field is for now so yes okay advantage relative to what but you said advantage relative to what did you mean to machines okay right so again I'm giving you the history here we were working on this like way back in you know in 2008 we show the computer vision field that they were using the wrong kind of tests right so in 2009 we were building GPU based convolutional networks right so I was showing you how what we were doing at the time and that's why I said 2009 and I already sort of give you like the current systems are up to human level on these kind of tasks but they're not fully human so that's the punch line but I was trying to also motivate you to say hey we could we the broader we could do this again for some other aspect of intelligence that's the one I'm trying I don't want you to just sort of take what I tell you like write it down and say let's go home and I want you to think about applying these strategies to other problems of intelligence right that's the main meta lesson I'd like you all because you guys are students you're going to kind of solve the next problems at the intersection between them so I'm trying to give you a historical perspective on how we approach and hopefully you see something that resonates with a problem of interest to you whether it's the next vision problem an auditory problem some out of sensory some other you know decision problems there's ways that you can sort of take that strategy of reverse engineering and if you I hope that would be the fun thing to discuss what are the next problems so again this is a bit of a historical perspective yeah okay this this one this part here so there's a what I wanted you to highlight is 50 is down here so you get this huge gain in showing up to 100 milliseconds and then you kind of get a little bit of gain if I let you linger for a couple of eye movements on it so what is the question on this one this part or this part why this is not at 100 or you mean why does it go to 100 oh I'm not sure look there's error bars in here right I'm not that's actually you know this was the data that came out when we ran this and we've not investigated whether this is really actually flat right here so I wouldn't read too much into that it could be interesting though because again that is your typical dwell time so so we've not dug deeper into that we actually made this because people were complaining that we were showing things too briefly and we just made the plot to say this is not brief you do a lot in this it's not everybody sort of realize that so we just ran this function I still like to show this but this there's much more work that could be done in these timing kind of questions in fact I'll talk about timing a bit more on day three again we'll come back to these kind of maybe we'll have once you see day three maybe we'll have had some other ideas there but don't take Jim this is saying this is flat I'm not trying to declare that I'm just showing you the data it happens to look a little flat yes where I lay out the areas right this one perfect setup that's the next part of the talk okay let me go on to this so in the interest of time this is great guys I'm glad we're having all these questions so let's see if I don't answer it through here then ask me again okay so how do we kind of lay that out that's what this is about why does this v1 v2 how do we even know that and why is one higher than another so that's monkey of course that's a racist monkey that's what its brain looks like when you take it out it's fixed right so now it's this is actual fixed monkey tissue if you cut it and stain it this is sort of zoomed out this is what it looks like so now it's a thin cut section on a slide and you notice you see these funny little lines in here so this is cortex it's about two millimeters thick all the way around so this two millimeters thick this is white matter down here is where the axons travel the little tiny dots that you can't really see that I'm going to show you down here to those dots that's a neuron right so this is a nistle stain see all these little dots that you can be they look like grains of sand those are all neurons neurons is more neurons here more they're more dense here a little less dense here this is the white matter these are called layers in cortex so all cortex all neocortex which is this outer cortex here has a six layered structure and you see you know most of the cells are you know in between layers two three and six and they have they have different cell types within layers I'm not going to we could go through that if you want but I wasn't going to go through it today I wanted to just say that when you slice tissue like this this is partly how originally neuroscientists these are Brodman areas originally you start to look at these and you can even see it here like look that looks like there's a thick line here and it looks different over here you can see it by eye that this looks a little different than that can you guys all see that just even at the scale that's how Brodman cut up the areas originally to say oh we'll call this area 20 this is area 17 this is area 18 gave them all numbers okay that was the first way to demarcate what later became areas it's not the only way but it's sort of that's how you define areas is based on differences in densities of cell types like this okay again for the visual areas the other way you're going to define them is Davide mentioned this you have a retinotopic map back to this slide here you have a full retinotopic map in v1 you have another retinotopic map in v2 and that coincides with the differences in cell structure so there's two pieces of evidence saying let's treat these as two separate areas both cell type structure layout and the fact that you've got a full retinotopic map and you can continue that up to the higher areas that I show you all that gets murkier as I'll show you in a minute to whether you should call it one area or two when you get these areas like IT but these are the basic ways that anatomists did this as I mentioned decades of work kind of tell us these things I'm going to show you a little more in a minute but I'm putting a highlight here to say this is what we call an area like we'll call this area v1 in a minute in the CNN models I'll show you tomorrow the neural network models these are going to be modeled as a layer and so this is an unfortunate kind of terminology because the ANN field likes to call layers and models but Cortex has six layers within areas so it's a little the layer and model layer in Cortex does not mean layer in CNN area equals layer in the talk I'll give you tomorrow and maybe we'll come back to that if you now these are the cell bodies you guys should know this is called a Golgi stain there's some neurons here this is staining only a subsection of the neurons a subset of the neurons a small subset this is what allowed Ramoni-Cahal to do things like sketch out all these cell processes so that you have these now you can see that each neuron is much more than its cell body it has these very long dendritic processes especially these so-called pyramidal cells and then axons kind of coming off the screen here that you don't see but I wanted to point out that the cell bodies are kind of the tip of the iceberg of what's going on here with the full neural network that exists here lots of room for connectivity in these kind of things here so that's what's down at the bottom level and you go and you put electrodes in like physiologists like David A. and I do is you go and you put electrodes in and you try to have electrodes near enough to his cell that when it fires an action potential a spike that you'll be able to record that like shown here the time scale here is about half a millisecond across this action potential and these spikes are what I'm going to talk about today they have privileged status in neuroscience and the reason they have privileged status is because if you think about a task like vision or anything that happens fast like you know I don't know I'm from Boston so I'm a Red Sox fan right you're going to hit a baseball so JD Martin is going to hit a home run the time for that to come through this is the only way that you can send information through the brain at speeds that can actually do actual online behavior like a vision right so there are lots of other ways to transmit information over slow time scales you know molecule diffusion and so forth but at fast time scales like 200 milliseconds you're going to need spikes or the main units of information transmission so I'm saying all this because that means spikes have a privileged status they are the units of information transmission unlike blood flow for instance which measured in fMRI which is an indirect measure of spikes right there's a lot again lots of other brain measures but spikes are privileged is that they are doing the brain transmission if we talk about learning we got to think more broadly but if we're talking about vision in action over 200 milliseconds spikes have precedence okay so we're going to record spikes I'm also showing this to notice that we're going to record hundreds of samples out of areas that have millions of neurons so we're making inferences from small samples onto what full populations are doing right and that's always a point of kind of how can we make such inferences and hopefully we'll come back to that and say would we do the best we can but also that's a good point to talk to it sort of like predicting the US presidential election from a small sample you don't talk to 300 million people and say who's going to win the election you take samples right so we're doing sampling here as well and we hope those samples are reasonably random but they're not always reasonably random okay questions about techniques here alright so back to areas so we talked about area v1 and v2 or areas of how they're demarcated on two ways by retentopic field and by cell typing in the Brodeman areas it sort of gives you these kind of maps but I'm also talking about these connectivities or these like hierarchies those come often from making injections so if you inject in v1 and you look at the tracer where it ends up in v2 you see sort of patterns like mostly if you inject in v1 you see a lot of axons ending up that's what this white indicates here indicate in layer 4 which is the sort of so called input layer of the cortex and a little bit in layer 6 and so that leads you to you know what you when you look at this in more detail what's going on is cells mostly in layer 2, 3 of area v1 project mostly to layer 4 with a little bit to layer 6 this is meant to this is a schematic of what you're seeing there in an actual slide and if you do the opposite you inject in v2 and you look at the patterns in v1 you get this kind of pattern where these upper and lower level cells project to upper and lower level so the pyramidal cells are the ones these triangle cells are the ones that tend to project to vertical areas that's what that's not the only cell type in the cortex but I'm showing you these two distinct type of patterns because this is called feed forward and this is called feedback and that allows us to say v1 projects feed forward to v2 it gets the input from the LGN as I mentioned projects forward to v2 and has a feedback pattern from v2 as illustrated here so that allows us to place v2 higher than v1 not only does it not receive the direct input but it has this particular pattern and why is that important then we can take all the visual areas and sort out and this is mostly work of Feldman, Van Essen and others lots of anatomical work they just organized it into these kind of plots that you'll see of like v3 is above v2 or v4 is up here and AIT and PIT the areas I'm going to talk about are way up there on the upper left well let's just call this let's just call this people get confused let's call this type A and type B and then once I tell you the input comes to v1 then that gives me a symmetry break and says let's call type A feed forward right so this is just you see these different patterns forget this then I say look the brain sends its inputs here so I'm going to call that feed forward it's really just a name based on where the input is coming from the eyes we could call it type A and type B and this means A, A, A, B, B which allows me to organize these areas in a hierarchy like that does that answer your question? there are many different ways you can do this some of these studies are older than others there's lots of different tracers you can use some of the classic ones I'm not a neuroanatomist but you can do things one of the more recent things you do is you inject fluorescent beads and the cells will pick them up and slowly transport them along and some just go what's called anterior grade and some just go retrograde and then you can look at a microscope and you see the fluorescent beads under the microscope there's many ways that you can do this it's just a more modern way to do that I'm not prepared to give a whole lecture on neuroanatomy but there's several things you can inject and then there's treatments you can give to sort of follow those to sort of visualize those injections like I was showing you there does that at least get you started we could talk more offline if you want here I'd like you guys to see this this is a slide in one of our review papers because this now shows the areas of the virtual scale so remember there's only a million neurons here there's like a huge sort of expansion of ambient dimensionality if you will and sort of v1 and v2 are very big in numbers of neurons relative to their input sizes so I sort of schematize that here and then v4 v3 is kind of smaller so we mostly ignore it v4 and then here's this IT cluster I was talking about earlier which is actually again three areas AIT, CIT and AIT and I want you to kind of hear the numbers remember there's 10 million roughly output neurons in IT and you can see the rough output neurons here I'm only showing the sort of feed forward flow here this is the dorsal stream areas I'm also showing this because you know you might have heard of areas MT or other LIP or you know you could get this whole industries built up around some of these areas but you see they're actually quite small in terms of cortical area relative to the ventral stream so Davide was saying well why the ventral stream is important for other reasons the sheer tissue volume the ventral stream is this huge much bigger chunk of the visual of the visual system than the dorsal stream which is mostly up here V1 and V2 are shared by the ventral and dorsal stream illustrated on this picture here okay so lesions here this is IT cause deficits and recognition I already told you that a few slides back Davide told you some of these facts that I'll just say briefly that if you measure neurons physiologically they have increasing tolerance to things like position and size, pose illumination boundary cues many things that have been studied by lots of people that just means they maintain some selectivity in the face of these kind of changes for instance position and size within the receptive field as Davide showed you earlier also there's some spatial clustering many of you we talked Davide also mentioned this that neurons with similar preferences tend to group together this is shown for general over millimeter scales by Tanaka's group, Tanifuji by our group and others it's the most well known cases face versus non-faces Doris Sauve and Rick Frywald who Davide mentioned who showed millimeter scale clumping of units with similar face like preferences Bevel Conway's group showed clumping for color like preferences, body like preferences think of these as not I don't want you to think of these as modules for processing those things that's what these authors would like you to think therefore I think of them as just reflections on a bunch of dimensions that have been laid down on a piece of tissue that usually when you pass through a threshold map you can make it look like a module but it's really a continuum of clumping of different types of features within IT and I'd be happy to talk about recent work that suggests that kind of that alternative view rather than the sort of strong modular view okay that's all by way of background because again you guys have probably heard about this stuff this is not critical to what I'm going to say next but it's sort of things that you should kind of know or you may have heard about in IT the physiology in IT that is relevant to what I'm going to say next for instance you know that each area I already mentioned this is a map of the visual field in the back of the retina that's kind of obvious right the back of your retina is going to have a map of your visual field because it's basically the back of your eyeball so just the optics of the eye are going to project the visual world onto sort of a map across the retinal ganglion cells what's less obvious is that map is going to be preserved going forward the reason that's preserved is if you think about local processing like here's where the retinal ganglion cell does you just take that and you do it again to this step and this step if you just keep doing local processing you will tend to preserve maps the maps will start to get sloppier but you can tend to preserve maps as you go all the way up here so you have full maps of the field V1, V2, V4 and then it gets murky in IT and again I think we think there's at least three different kind of maps but for today we'll talk about it as if it's one place for today for the next three lectures so here's what here's the way we've conceptualized four recognition tasks that I showed you at the beginning you take an image like this it impinges on the retina as a pattern of luminous power striking the retina turns the photoreceptors activate bipolar cells that are non-spiking neurons in the retina then gives rise to spiking activity out of the retinal ganglion cells which send their spikes down out of the back of your eye through the optic nerve into the LGN so again long distance to travel so spike transmission from the RGCs those retinal ganglion cells they're essentially sending a nicely processed copy of what's out there in the world my retinal colleagues would not like me to say it exactly that way but for the purposes of this talk think of it as like a really good camera that has given you a nice pixel image that's adjusted for a lot of kind of luminance and gain control and it's going to send basically a one to one copy of the pixels forward to the LGN LGN does basically something similar and then you get these kind of patterns of activity kind of all the way through IT I forgot to mention on the slide earlier that you have increasing latencies along here so this is about 100 milliseconds to see the elevation and spike rates and I'll show you these data in a minute out of IT when an image from when the image strikes the retina it's about 50 60 70 80 or 90 about 100 on average and I'm showing you this as if it's a feed-forward lockstep chain it's not that clean in biology it's much more of a kind of an average increase in latency going across here that probably has to do with these recurrence and feedback connections here so I'll talk about as if it's feed-forward but you should think about that that's just the first approximation when I show a new image it produces a new pattern in IT and the thing I want you to really keep in mind is this pattern this this is just a schematic here this pattern is not this is a photograph copy this is not a photograph it's a set of activated neurons about 10% of neurons typically in response to any image that that stands for or represents what's out there in the world is the way we like to talk about it but you can just just just really what we're doing is we're showing an image and this is what we observe spikes come out if I show the same image again oh here we go again if I show the same image again so there's different patterns moving through the ventral stream show a new image I get a new pattern show the old image I reproduce the old pattern modulo the thing I already told you right which is you've got variability remember his plot about the spikes you get the roughly the same count and they're posting his time Instagram but they're slightly jittered right so when I say same I mean same in the sense of what he was showing you right so and I'll show you some of those data in a minute but you you should not think of every time you put in a new image there's a completely new vector of activity coming out of IT there's a sort of average rate of activity out of each neuron that has maintained over long periods of time and so what's really cool is IT the whole ventral stream can follow along at lags even faster than 200 milliseconds you can go much faster see this is actually 150 you can go down to like 25 milliseconds and still these patterns will still spit out albeit weekly so the following rate of this system is quite strong and and to evoke these patterns which is why we run it about 100 milliseconds because we can quick it much more data that way rather than the natural 200 millisecond time frame okay so these patterns are quite important I mentioned I was going to show you this here's the spiking patterns out of IT this is a one neural site recording an IT David already showed you this these tick marks are action potentials these are raster plots as he showed you each row is a different presentation of an image these are four these are four older images from an older study just for example images of one IT site I want you to see when I say the same pattern I want you to see this elevation that exists on all repetitions here but not so much here and here so you see if you just squint your eyes at this this is like higher rates here and here and less here and here if I go to another site you see oh look this tends to like this image and a little bit maybe these two not so much over here and then here's another one oh it really likes this here and somewhat over here so you see there's variability but there's also stability right so you have to kind of keep both of those in mind at once and I'm going to talk mostly about the stable part we view this variability as a nuisance to us right now and that's another interesting point of topic we do our best to sort of average over that variability as I'll show you here in a minute so the way we do that is we count spikes Adavide alluded to this in basically big chunky time bins we're going to relax this on day three to smaller time bins to look at finer tempo resolution for now I want you to just think of a big chunky time bin it's averaging over time about 100 milliseconds and averaging over repetitions here we're showing 10 we typically do 50 times so for each image and these are randomly interleaved images not like we show this one 50 times in this 50 times we randomly interleave everything and then sort it out again later and this is what you get out of it so this gives you now one number per image per it recording site so your life is nice and simple this is in units of spikes per second here's some examples of these numbers I'll show you some more in a minute we scale this up by not recording with just single electrodes but by putting in chronic recording arrays we record hundreds of sites at once this is not about simultaneity for us it's more about how much data can we get again think of how many people we can call up on the census phone to say you know what do you think about X and the questions we ask are the images are presented and this these methods have allowed us to dramatically increase really in just the last 10 years the amount of data which is sort of number of images times number of sites per day that we can get by having these kind of chronic array methods in we can plug the animal in today these arrays are implanted in a sterile surgery the animal has connectors on its head they just come down to the lab we plug them in get them fixated eye calibration as David mentioned show a bunch of images collect a lot of data this way when you do that you can get datasets that look like this hundreds of neural sites in IT and these are these are now showing the mean response in a color green so all that that big chunky time window you're not seeing the spikes anymore you just see the difference out of each IT site and this is just to give you examples this is now neurons for one image and you see that those are four numbers just to fix ideas and you know here's eight images and then you see that these vectors look different so these are called neural feature vectors they look different and here's you know two thousand images so again we collect as many images we can in this case 50 repetitions inside each of those bins that we've averaged over this is kind of a typical data volume that we get out of an animal or a couple of combined animals to get to these kind of numbers and those are the data that I'm going to show you for analysis next okay so it's everybody on board with where we are we know what I'm showing you here okay so glossed over a lot of sort of spiking details that I think you guys may want to come pull back as we get to the next later in the talk but this is just first pass okay here we are again there's a spiking activity take the chunky 100 millisecond rate code we call it there's the feature vector that was in green before now it's in yellow but it's the same thing this is like n neurons here's the the intensity of response indicates the activity level of that neuron for that image that is a point you can think of it as a point in an n-dimensional space where each axis of the space is the activity of one recorded IT neuron so the dimensionality of this space is the number of recorded neurons this is called the neural state space this has this is useful conceptually to think about what's going on at least for me it's helpful but this is that image in that space it's a little bit of jitter around it which reflects the neural noise I mentioned so it's shown in green and the key thing here is we're interested in the layout of the images in the neural population state space why are we interested in that because if you are trying to say there's a face out there you need to somehow be able to look at some data representation like the one we've proposed here and be able to say oh it's a face or not a face and be able to do it in a way that generalizes to new instances of the face as I mentioned earlier when I showed the car variability and so what we've been doing really for more than a decade is just ask how well simple linear decodes can work on IT and other brain areas that we record to say is it a face or not so a linear decode is essentially think about again all the images here schematized in green are the face images schematized in red would be the non-face images this is a schematic data and if the world were pretty like this you could kind of draw a line in a two space it would be a line and three space it would be a plane and a high dimensional space it's a hyper plane it's essentially cutting the space into two parts one that you would say hey if I see an image on this side I'm going to go ahead and call that a face even though I've never seen it before if an image appears over here and if I see an image that appears over here I'm going to call it a non-face right so this is a linear classifier does this group do you know what linear classifiers are should I go through it slower did you guys do linear classifiers no okay then I should slow down okay so let's pause here then because this is an important concept right now I'm just plotting the spaces this is schematic but for now just think of these as spike rates so it's just 100 spikes per second 100 spikes per second 100 spikes per second on each axis I'm putting each dot to say here is the activity pattern out of IT as sampled for image one two three two thousand right so that's there's no distance yet so I'm just putting that that's what the dots are and then you ask is this Euclidean so can you say more what you mean it's just a way of viewing the data well we can now analyze it in different ways so these are important details of how we should analyze it but what I'm showing you here is just here are the points and think of each image as giving rise to a particular population pattern of activity which can be thought of as a point in this space where this is just the activity of neuron one and two and so forth so the units are spikes per second on all axes right and we measure each unit oh that neuron two fired 12 spikes per second neuron three fired six spikes per second so we measure this maybe this is an important concept that I said but needs to be stressed every image every neuron is tested with every image so we can construct the full population vector we don't collect some images for some neurons and then a different set of images for different neurons then we wouldn't be able to make such a plot right so every neuron is tested with every image and so that allows us to make a population estimate of what's going on in the population for each image and that's kind of what I'm schematizing here you still look puzzled though does that make sense so far yeah okay is everybody on board with this neural state space yeah in principle this is a schematic but for the data set I showed you there would be 2000 dots right because I showed you 2000 images it'd be and I showed you let's say 100 neurons it'd be 100 dimensional space with 2000 dots in it right and now so now our job is to say look our hypothesis is that some down when the monkey is saying it's a face and not a face or it's a face versus let's say this could be face versus dog or you know a two way versus many ways you can run this for now I'm going to run it face versus not face when the monkey's doing that somehow he has to instantiate some mechanism to go from the spikes in his head to actually pressing the right button and we just approximate that with some linear classifier where we're going to put a line through the space and say on this side it's a face and not face so where do we put the line or really a hyperplane high dimensional space we have to tune the parameters of where we put the line right so there's three parameters there how do we set the parameters there's some of the images as so called training data and then we test on held out images and now those details are important we think of them as like the images the animal had to see when it was training to know where the heck the face was in the first place and so we train linear classifiers to do this I'll show you that as schematically in the next couple of slides but I want to kind of pause here for the neuroscientist to go this geometric picture you know machine learning types like this I like this picture it sort of makes sense to me to think about what's going on with the data but this is really nothing more than when you think about a downstream neuron taking a weighted sum with a threshold this equation here of a response of a simple perceptron like neuron is essentially what this hyperplane is implementing and if it's firing then you're on this side and if you're not firing you're on this side right so these are these are exactly equivalent and what you need to do is find the weights here that best separate faces from non faces in a predictive sense meaning new images will come up and you'll still be able to say it's a face and not face just not just separate the training data but do well on the held out test data okay this linkage here do you guys do you guys see this does this maybe I see puzzled faces so maybe I should slow down here this is an important idea right you're going to take weighted sums with a threshold that amounts to cutting the space in two partitions like either you like it or you don't and for the neuron it's like it's firing but it's not now it would fire more and more and more and more the further out you go here but it doesn't matter for classification tasks if you're firing above some predetermined level you'll just declare that I'm going to go ahead and press the face button right why do I assume it's simple like hey this is IT there's a motor system there's all these like other non-linearities it's a great question we don't we just use this at first as an approximation of how we as one idea of how what could be going on and the reason we still use it is because I'll show you in a minute explain all the data if you just put linear classifiers on IT you can predict the animal's behavior performance almost perfectly so it's a good approximation of what's going on and it's sort of it maps to some possible neural mechanisms but it's an over simplification like all models you want to make a simplification that captures the data but not you know but that's not an actual precise neural model of what's going on which would require synapses and millions of neurons and basal ganglias and things like that so it's just an approximation of that but it predicts the animal's behavior quite well and the main point I want you to take know from this is also this these classifiers on these IT data if you had these IT data in 2009 when I said those machine systems were bad you would be up the machine system would be winning the competitions right because when you do this you're basically explaining the animal's goodness relative to the machines on that generalization task in 2009 so it's like we found the gold inside the brain we didn't know how to produce it at this point we knew here it is you measure this things that computer vision people and machine learning people know how to do build linear classifiers on this feature set it's really powerful you just do this and you get high performance on held out images and that was the cool result that was essentially in this paper especially and then we've expanded on how well it predicts in later paper so I don't know if I answered your question did I? the features here are the spike rates and I'm calling that the neural code which is here right see this simple 100 millisecond rate code we can we can play with that right so this was the simplest code which is like I'm going to take you through that in a minute count spikes certain number of neurons we'll call that the code it's a hypothesis of how IT is presenting the information to downstream areas but there's many possible hypotheses millisecond spike codes you could the list is endless of what you could do this is almost like the simplest thing you can do and we're sticking with it because it works so far the dimension here is well okay you're getting on the next question the dimensions we need to predict the behaviors about we think about somewhere between 100 and 500 feature dimensions but for real neurons and SNR levels you see it's probably about 50,000 neurons as our estimate to support each behavior but that's getting onto a few slides these are different numbers for neuroscientists versus machine learning types so I could go through I could go through a little more slowly I see more hands are coming up let's yes right right and that actually right so you're right so this is sort of an over like simple way to think about this is like oh you take basically about a hundred neurons average them together and put the weight on that line and then these are equivalent so that's a fair point right yes this is just schematic by the way but sorry keep going so it's it's nicely separated there's a plane but if I had put this line this way it wouldn't work right so I have to tip the line the right way yeah well I know the answer is you can't always find a line right so when you train up these classifiers they don't work perfectly but you saw the animal you and I are perfect the animals are perfect and the imperfections that we have come naturally when you build these kind of things so that's partly why we like this hypothesis because you build it and you predict the imperfections that you see that's actually what gives us inferential constraint that's where our inferential power on these downstream mechanism comes from not just that it's good and perfect at everything but when it's bad it matches the way that you and I and the monkeys are bad right and and here I'm going to talk about it with respect to the average object grain performance which was the matrices we showed earlier on but it's also true at the image grain performance but I don't have that for you here does that answer does that make sense but that's a great question yeah I see okay one more and then over there yes I think I have that on the next slide okay okay I have the next slide so it varies so here we were typically training with about a hundred training examples and then everything else would be test but we can vary that of course right and you get curves out of that so well I haven't shown you any results yet so it depends on which plot I'm showing you but typically when we make these estimates we usually cycle the data right so we do training test splits and then cycle them to get a sort of stable estimate of the held out performance right you're probably familiar with sounds like you're familiar with those ideas and you know the amount we hold out and the numbers is interesting question and typically we get away with about a hundred training examples we don't know how to map that to a monk we haven't done the details I said a monkey learns we haven't tried things like a full reinforcement learning classifier that followed the exact path of the monkey would it get to that monkeys those are kind of ongoing work in the lab this is just saying hey this idea gets you pretty far with the sort of like a lowish number of training examples that's certainly within the number the monkey sees a few thousand to get up to human level performance for instance one more and then we okay okay so let me see nobody would have 630 right till okay yeah but this is good so we'll see where we end up I think we're getting close to what I hope to get through today so look this kind of came up in questions if you like this idea remember you don't just build one classifier for face none you build one for every task like so then you can ask can the same kind of strategy with a similar number of training examples predict the performance the average performance on any task right so you build new classifier for car versus no car a new classifier for car one versus car two and you can go crazy and all subtasks you can build and ask how well it does and you know okay so this is the quest here's how this was already a question you kind of this is just graphically some of these randomly get selected as training images and then these become test images predict the response to these here's another car versus not car these are car images these are not car images okay predict the response to these and so here's here's the here's a data plot then right that says impressed but this is 2015 so sorry that's old slide here's actual behavioral performance is actually human D prime but I just showed you monkeys and humans are very similar and this is this is the predicted behavioral performance of a trained classifier there's face versus non-face it's actually a pretty easy task for humans high D primes and also pretty easy for the decoders out of IT this is consistent with all the face neurons you heard about from David and here's a whole bunch of other tasks right so fruit version offers harder car versus not car now I can't say that this model is perfect but you know you can see it's pretty good and with these data we couldn't reject it as a possibility and again that's our job to reject models but we couldn't reject it here that's what we published there this was the question that came up like what is the code we're doing about 500 features this was the question that was asked randomly selected over IT listen for 100 milliseconds learn an appropriate weighted sum that neural computation the output of that we take as a probability of that object being present again connecting to Chris's talk it's like here's an estimate of the latent content of how likely is a face out there how likely is a car out there how likely is a fruit is out there this is how we think the downstream brain areas instantiate that that's at least our working model of this I call this learn weighted sums of approximately 500 random these are unknown we're still manipulating these to figure it out but learn weighted sums of random average single unit responses distributed over IT that's a big long thing so then I just call it laws of rad IT as an acronym but when I say laws of rad IT I just mean this kind of simple idea here and it's really a family of possible decoders here the correlation here is 0.92 there it is 0.92 and actually it's within the bounds of human to human consistency so we couldn't reject that model but if you do decodes out of other ways of reading IT other the input to IT reading out of v4 computer vision algorithms at the time or v1 simulations or pixels you don't actually get anywhere near predicting that level of correlation so it's not like you just apply linear classifiers to anything and you'll get this result so it comes out of IT almost for free but not out of other areas so the summary of what I'm trying to tell you here is that you think of IT at the top of the ventral stream you got this population patterns that we measure and we can take an approximately linear linking mechanism to say this can reasonably predict this actually we couldn't reject it at this kind of grain of resolution that's what's in these both of these papers if you want to read more about that if you care these in parameters are actually important if you want to do things like turn you know brain machine interface zap some IT neurons turn them on and off I may talk about that in the last day if we have time but if you want to predict what will happen to behavior when I mess up with the IT neurons then you need these details of these parameters to be correct to make that work well so these are important to aficionados for all of you in this group I just want you to kind of end this part of this lecture to just say look Jim tells me there's a really powerful feature set up there and all I need to do is take linear combinations of it and I seem to predict all the behavior that we thought was interesting and that's the summary IT feature set is the penultimate product of the brain's algorithm I say algorithm it's like one algorithmic step away from completeness again the one linear classifier way for core recognition it explains this gap as I already mentioned so this gap here it can explain why this is so good relative to that because these features are magic and powerful we didn't know how to produce them but we could measure them and I think I had a few slides on untangling here let me just pause here for a minute to just see to make sure there are any more questions before I'm going to give you guys a conceptual picture before we end yeah so your framework is not constrained say that again not constrained no not this framework right we have tried it with we've tried it with shuffled and unshuffled data because as I mentioned these were simultaneously recorded we don't actually see any difference when we do that yeah the 500 comes from matching the accuracy though so what I have a cool plot of that at way it looks like this is we have two actual I didn't put that plot in but you have two actual things that we're trying to hit which is one is the one is the so called pattern of performance which is you can think of it as the the kind of error pattern of which tasks are good and which are bad and the other is the overall like percent correct right and I often like to joke and say this is what DARPA wants they just want you to be super accurate like this is let's say this is human they want you to be super accurate they don't care if you have a pattern like human this is kind of what you know NIH usually wants us to explain which find the neural mechanism that matches the pattern and don't care about performance what we want is to be up here right so as you add more units it kind of goes like this you actually get to here and then you start ramping up like that and you cross that point at around 500 but it's only about 150 to get to this within pattern thing and again these are features because I'm averaging over many repetitions ah does it get worse so now we're getting into like kind of like overfitting like kind of linear class of fire let I have some plots on that I don't have it didn't in our hands up to some point yeah sorry I'm not I have slides on that I could pull up but I'm not really I don't remember the answer to that when it gets worse yes no no all questions are good it it contains the latent content that's in the stimulus doesn't represent everything in the stimulus yeah okay mm-hmm well okay so correlations so the way what you mean by correlations is that if I took two it units and I looked at their pattern of response over many images they're not going to be independent they're not going to be uncorrelated for instance that's what you mean by correlation yes okay I'm I'm washing out the noise correlations right so I so let's take the north correlations offline and I sort of averaged over them and said forget the noise correlations they don't seem to matter much to us to sort of relate to this question there I'd love to talk about that with you guys at any time but let's talk about the signal correlations it's like what are we doing with that all I'm doing what I the way I like to think about this is like we as building decoders we're like trying to pretend we're downstream brain areas are like I don't know what the heck's going on here I'm a monkey I got to get juice I got a bunch of wires coming at me from IT and I got a reinforcement signal so I don't know some scientists as I should worry about correlations I don't want a correlation is I got a bunch of neurons coming at me and to build a classifier right so so I'm kind of trying to instantiate something that might be close to what the animal is actually doing and so when it says five hundred I think if we had a clever when we this is kind of what I'm trying to say if you had a clever decoder that could sort of align with the true underlying dimensionality of the space you could get away with less if you if you had that built in right so that's that's how I think about that as it's sort of like they exist but you just who cares I still been my classifier and work yeah but exactly but that's why we're giving you an empirical estimate if you just sample randomly you need about five hundred you sampled non-randomly you could get away you can get away with less that's kind of what I'm trying to say with but but then you'd need a mechanism in the brain to say find that neuron and that neuron and decode you know you'd have to build in more smart so that that's actually where the interesting questions come as machine learning meets neuroscience is that do we have pre-built face decoders this kind of came up I think at the end of one of the other talks do we have pre-built other ways how smart is the downstream circuitry right I'm kind of making it sort of dumb classifier and when I make that dumb thing it kind of works so I'm like maybe that's kind of all it does when the monkey learns things and I can't reject that yet but I'm agreeing with you a smarter version would could get away with less stuff if it and we could try various versions of that and then what how would we tease them out we'd have to do things like maybe they make slightly different predictions on which images will be confused right there there's probably ways that we could decouple them you know these are like alternative and and and we think we we can now we have we can't do that at this great because again the simple thing is past so many things actually passed the test I showed you here but it maybe with image grain data we can tease apart these things better and that that's kind of the data I'll show you on day three so I resonate with the spirit of your question and if you just haven't done everything that you would like there yet does that answer yeah okay one more in the back and we should try to wrap up the question so this was with 100 millisecond data but you can get away with less yes you have trade-offs of SNR kind of things right so so this again will come up in day three we look at finer time in just for the grain of behavioral data I showed you which is like how good are you at cars versus like on average that that's pretty easy to fit with many combinations of it reads but not would be for the same question as last time right so you're asking the time version of this you're kind of asking the sort of spatial decorrelation question and you're asking the time version these are both great kind of code space questions that that we haven't fully answered yet let me just and maybe one quick one yeah well again I don't mind it's to me all models should be on the table right I'm not trying to I guess I don't want you guys think Jim's coming up here and saying this is the way it works I'm an empirical scientist so I'm having a hard time falsifying this simple one then the models you guys are coming up with are more interesting and probably also consistent with the data so it's like which experiment is going to distinguish among these memory versus not becomes like yet build it and maybe we can figure out how to tease those out with a cool experiment right so that that's what we're supposed to do I want to beat up my own model right this is I'm not an advocate of them I'm just like here's a model that works so far and our job is to then break it down I'm probably not right but it's a good enough approximation at the moment but really that's I'm not going to talk about that much maybe unless we get to day four I mean to the end of day three I'm mostly going to want to talk for the rest of the time about what's going on to get you these good features that that's what I'm trying to set you guys up for I'm trying to sing these magic features I want to give this conceptual picture because people really seem to like this this is an old paper from a lab I may help you think I don't know I'm sorry this paper that we just wrote conceptually Dave Cox grad student in my lab about 10 years ago it goes like this so think about this state space of some neurons so think about the pixel space like this is the image of Joe in pixels and here's all the activities of retinal ganglion cells now right so these are not it neurons so these would be in the retina or pixels if you like and so it's again one point in this space and you think of Joe as having degrees of freedom that you want to be invariant he sweeps out these other points of Joe like this one dimension sweep here as you change one degree of freedom on the latent source of the underlying kind of hidden causes of the image and then you can imagine that's two degrees of freedom so over those two degrees of freedom those are all the potential points that Joe could produce of schematically in this population space because with me so far you guys get this these are all the he's not actually showing you all those images but that's all the possibilities are two degrees of freedom now imagine he has position in size and you can add more degrees of freedom the thing is you have this curved kind of complicated manifold of possible points that Joe an object let's call it one object I'm going to talk about another one Sam or some other object can produce and what you want is to have a feature space where somehow you can say well here's Joe and not Joe or Sam and you know Joe's manifold and Sam's manifold the potential manifolds are cleanly separated somebody asked can you perfectly do this cleanly separated so that you can take simple decoding tools like separating hyperplanes like I just showed you and if you can do that then you would call this an explicit neural representation of Joe versus not Joe you can do this for all possible objects and Davide uses this word explicit earlier as well this is our version of explicit like linearly decodable to some level of accuracy and I'm showing you this because it kind of you know in that sort of linear classifiers project downstream has already said that should be able to find low number training examples that came up in the discussion I'm talking about shape and I'm being vague about shape and category and identity is being closely related but the reason this is useful to think about is if you this is actually what it looks like in the pixel space of two objects this is projected in three dimensions these things aren't actually fused but they're just very complicated manifolds are crumpled up sheets of paper that are kind of intermixed with each other and it makes it hard to find separating hyperplanes they actually exist but they're very hard to find the low numbers of training examples right so this is what it might look like in the pixels and so we call this the object manifolds are tangled and somehow you have to transform the data to a new representation where they're less tangled and you can then say aha this is you know Sam or Joe or Sam and not Sam and be able to put a hyperplane in there and I'm showing you this not only to show to kind of give you a graphical version of how we think about what's going on here but also to remind you that we don't throw away and Davide mentioned this also all the other latent content like position and scale is just unraveled here I didn't collapse this to two points you still have this other generative causes of the image here that are in IT and position and scale for instance are some of them and Davide mentioned that this morning so this sort of unfolding or untangling a manifold is a way to think about what's going on in the ventral stream to produce untangled explicit object information not just object category information so basically you go from a poor encoding basis to some powerful encoding basis somewhere in the brain I usually give this talk this slide in the beginning of the talk you guys already know where we think the answer is we think this is in IT of course this has to be nonlinear you can now intuit this right if I'm just doing linear rotations of this space I'm not going to do anything interesting I have to do something nonlinear which we know the brain does across these areas so this is basically a good graphical version of how to how we like to think about what's going on in the ventral stream it doesn't mean this is exactly the way the data are laid out again it's just a conceptual idea that helps motivates you but I hope you'll remember these kind of things that IT spike code counts codes IT spike count codes or what I'm talking about like rate codes they're easy to compute for downstream neurons they're rapidly and reflexively computed I didn't say this but the data I was showing you those animals the original data the animals were not even trained they were just fixating they come in a lab we plug arrays in fixate flash images up you can decode cars out of them and then you can take animals that you do train to do cars like the one I showed you and those do decodes they're a little bit better but they're really not that much different so these things are powerful in adult there's an adult powerful representation accounts for the behavioral performance I said shared by primates we said all this potential mechanisms are secondary so I think I'm going to just I'm going to skip through this maybe we'll come to it next time because I think we're out of time and I want to maybe we'll come back to this at the end because it's sort of I told you that it takes monkeys only one to four days to learn this the learning data of the monkey but I'll maybe come back to that again if we have time so the way we think about when that animal that animal learns that I showed you monkeys actually doing the task they take a few days to learn each object we think most of that learning is happening downstream of IT mostly and I didn't have time to take you through why exactly we think that and that we have this powerful adult feature set here and to end today I want to say this was reverse engineering progress because we found a good spot and we showed you how to build approximate models to go from IT to behavior but it's not a fully satisfying answer because I didn't tell you again how to go from the image to IT so how are the IT features computed from the image similarly what are the intermediate features we had a breakthrough about five years ago on this that I'll show you tomorrow and then there's a bunch of related questions that I'll just leave you with to think about like how are the IT features evolved and developed and learned I'm not going to talk about that how can we manipulate IT to change perception I'm not going to talk about that or maybe if we have time why does IT have a patchy organization I referred to that we think we can explain that but I wasn't ready I'm not going to talk about that again these are things we can talk about on day three depending on your interest but I'm going to spend tomorrow talking about this thing like what's going on between the image and IT and maybe some of the other related questions I think that was the end at 6 30 so I think you guys I'm happy to hang around and take more questions but thank you for putting up with me yeah