 OK, thanks so much to the INCF and the organizers for having me to speak here. It's a great honor and a pleasure to be with you today. I'm going to talk about a bunch of different kind of beta sources, some internal through the Allen Institute, some collaborative. This will be kind of a bit of a tour. I have some occasional political commentary interjected to keep it less not so dry, maybe. I'll try to move around a little few different topics. So over the years, we had done a lot of different kinds of projects and things. We started by profiling atlases. Probably most people are familiar with the so-called Allen Brain Atlas, which is just this large scale institute hybridization atlas of the mouse. But as technology has kind of changed and such, we've kind of moved more towards cellular level neuroscience and profiling like most of the world. These atlases, it's funny how those things go, because biology and neuroscience are of course so technology dependent that people tend to, a new technology emerges and people will tend to essentially, you redo all the same experiments that you did using a previous technology. And it's not even so sure how many lessons sometimes we learn from that. I mean, it's a question as to whether all information is extracted from data sets that are like, I think of all the microarray data sets out there, for example, that were run in the world and how is that information used or usable? But in any event, now, of course, we can look, examine data at a cellular level. It can be done across sort of several different kind of modalities, transcriptomic, morphological data, physiological data. There's big sort of examinations and profiling. Many of these now can be done in an increasingly high throughput manner, some a little bit more than others. But there's pipelines for large scale, of course, physiology and even morphology. Ideally, of course, what is it striving for? So why cellular level profiling in the brain, right? And in some sense, it's because we can, right? The cell is, of course, a basic unit of biological function and related to computation, no doubt. And in some sense, we don't have a lot of other great ideas rather than to collect more information about the cellular components of the brain. I think that's more or less, there's a lot of theories, there's a lot of ideas about how cells might be involved in circuits. But in the absence of that knowledge, we're starting to collect some more information and examine and see what it all means. And so you can do this in a variety of ways, right? You can do it using genetic tools, you can look at the shape of neurons, their projections, and you can do it at different kind of scales, different structural ways using light microscopy or electron microscopy and then put things into databases organized by features, hopefully build ways of retrieving information that makes sense. This becomes increasingly more challenging. The problem of, as you sort of beyond pointed out so well, is this problem of information, architecture, and organization is not going away and it's a serious issue for the brain. You know, we study this organism, how to facilitate the study of it with a neuroinformatics approach. That's kind of like what we're all here for. And it presents formidable challenges of kind of organizing and interpreting data. You know, we have these different kind of data pipeline. Now, we're sort of coming more able to do kind of multiple modality profiling to profile using these sort of patch seek related methods where you essentially patch a cell, you typically record from it first, then you take out its nuclear, usually it's nuclear DNA and you sequence that. And then if the cell is still stable, you can attempt some sort of biocytin reconstruction. You can fill the cell and potentially get its morphology. That's by far not a fail safe method. It tends to work, you know, a modest amount of the time, maybe in best cases, 25 to 50% in very optimistic cases. But the idea is to build up enough information that we have this sort of multimodal picture of what's going on. Previously, you know, we were very related to the nice work that's going on kind of at the Rekin, you know, we had pursued this sort of mouse connectivity. And so we had studied, you know, using this very similar kind of tracer injections and reconstruction using, you know, kind of these tissue site machines, reconstructed the sort of mesoscale mouse connectivity in the brain. And a couple different things sort of things came out of it. One is even more recently, we did sort of a deeper dive in visual cortex. And some analysis, you know, showed some really interesting structure of kind of modular structure and how it relates and maps to the thalamus. And that's a work that's just come, just accepted in nature, just coming out of work of Julie Harris and Stefan Michalis. Of course, you'd like to take this to understand the cellular level of projections, right? You'd like to know how does, it seems unequivocal that connectivity will be a very strong determinant of cell type identity and that it will be very important to understand that. Unfortunately, it's also one of the hardest ones to get because you have to, you need cellular level resolution. You need typically long range axonal connection and connectivity, which means big imaging of whole brains, et cetera, things like this. And there's been some real progress there. Some nice work that's been through this f-most approach in technology that's now been taken, where you basically do typically some kind of genetic approach to sort of essentially label neural type and then you can clear the brain and then you can use these kind of sort of high throughput imaging to get full brain sort of reconstruction, whole brain reconstructions. This has been done really nicely. In fact, there's a nice group in China at the Nanjing University that has a whole kind of pipeline for reconstructing this data. And we've seen, we've done some preliminary analysis with this preliminary work by reconstructing a whole bunch of cloustrum neurons and some interesting types of cell types and their connectivity have been determined. My colleague, Han Chuan Peng, who's sort of been at the Institute also for a long time, of Genalia, is a real kind of expert in the reconstruction of sort of neuronal reconstruction of light based, typically light based kind of images. And he has sort of done a lot to set up that center in Nanjing and to apply their kind of, take some of their algorithms and pipelines to scale. So a slide here from the center, you can see our colleague, many of you will recognize Georgie Oascoli there who visited. And I know Sean Hill, I think if Sean Hill is still in the audience, somewhere maybe he was there just recently because I know that the human brain project is also, or the blue brain is interested in this technology. There's also of course electron microscopy and you can, the whole nother level of informatic challenge and problems to deal with this kind of data. There's a very big project funded in the US by DARPA, in sort of the IRPA group that is basically trying to propose to reconstruct and segment the entire sort of neuro pill of a cubic millimeter of mouse cortex. This work, a lot of the work uses, the reconstructions use a lot of deep learning, of course, to sort of do the annotation. Earlier attempts at this were just vastly inferior to the work that's been produced by the deep learning approach. Sebastian Song's group from Princeton is our collaborator and does that. But this is a very, very rich data set that's only beginning to yield information about sort of cells and cell types. This will, I mean, certainly the motifs that will emerge from here will be very, very interesting. From the purposes of, let's go back to this, I wanna go back to the kind of transcriptomic approach for a little bit, is that there's one technology which is very amenable to kind of high throughput profiling. It would certainly seem to be single cell transcriptomics whereby you can essentially basically run these sort of pipelines either with a genetically modified or not even in humans now, you can do it with a nuclear DNA and this is essentially amplified, sequenced, mapped, taking advantage of powerful mapping algorithms. And then one clusters this sort of data typically into finding putative groups or putative cell types. We've sort of done this. We started off by doing it in two rounds in the mouse visual cortex. One was finished in 2015. The other was filled and finished just last year, at the end of last year, in which they both sort of visual cortex and interior lateral motor cortex were both run to a great sequencing depth using 10x technology and basically sort of clustering and sort of reinterpreting this data. So a lot of interesting facts were kind of found from that that there's essentially the comparative homogeneity of excitatory neurons and the real diversity of the interneurons across the areas. Now this is being done both by us and by this brain initiative group, the single cell census work which I'll talk about in a little bit. A whole cortex. So essentially this is nearing the end of actually having an entire cortex sort of transcriptomic map at single cell. And there's very good agreements between the different platforms, between different approaches. One can use epigenetic hyper throughput techniques and profiling too. Also it can be done in human, in recent work with from Ed Lean in his lab, Rebecca Hodge basically sort of produced this pipeline to do this on medial temporal gyrus in the human in tissue which comes from healthy regions that were extracted as part of epileptic surgeries in the human. And it's quite interesting to try to take this now and to sort of to try to map these to determine is there sort of homology just like you have in sort of in the genome but say between mouse and human. And so part of the work that they did was to use kind of alignment methods between these kind of taxonomies that resulted. And finds quite an interesting, a pretty good agreement down to sort of moderate levels in this sort of tree and sort of this hierarchical organization. Some classes are they map one to one, others have more many to one kind of relationship. But there's largely sort of some homology of identity. Some of the things that the regulation of them is undoubtedly different. There's kind of the genes or certain genes are doing seemingly different things in these different groups. It's not fully interpreted what kind of, what the story is there yet. But this is a kind of a recent thing. It's just a recent alignment. Let's see here. Certain laminar shifts and sort of the mapping, different things, way things, acts certain cell types that seem to be more of a base in one layer in the mouse versus another in the human. But essentially there's quite a bit of a comparative structure there. Another really big technology which is going to be really important in solving this cell type puzzle from especially linking modalities, linking the transcriptomic to the morphological in particular will be these so-called spatial transcriptomic methods. Where now you're essentially, these are the great, great granddaughters of in situ hybridization of 1990s and 2000 where now one can sort of multiply combinatorially label relatively large panels of genes and then fluorescently image them and try to sort out by kind of various image processing and combinatorial approaches what these common expression patterns are. There's a lot of issues with actually segmenting the cells because the way they tend to get filled ends up having very complex kind of geometries. And then reading the quantifying, the signal is a problem. And there's a lot of study that's kind of going on. But we've one can really do this and do this in kind of a principled way that it will be possible to really disentangle a lot of spatial relationships with regard to cell type and their identity. You know, it's part of the problem of course is that there's this lot of issues with sort of multi resolution with hierarchy, whether or not these cell types are well represented by taxonomies or whether they have more set theoretic kind of structures which can better describe them. You know, we tend to use this TSNE or these more recent UMAP embeddings to kind of describe it. And you know, I think it's funny because there's always a tendency to look at these and you know, you see there's evidently structure but one doesn't really know quite how to interpret it and the people who use it they'll praise the fact that they've gone to three dimensions from two thinking now that they see so much more but in reality this is just a three dimensional reduction of a very high dimensional space and it sort of makes you wonder that what if you had 10 dimensional brain that you could actually see that? Still that would be an approximation. There's an issue of this gradients, a fascinating thing of these types that what do we mean by these transcriptomic types? You know, that when you look at it you look at the morphology of a neuron and you think well okay this shape is unequivocally this kind of a tree, it looks something like a chandelier cell and I think that's what it is and I've looked at 10 of them now and I'm pretty darn sure that's what it is and this is not so easy to do this in a genetic context. You know, when we did this sort of human brain atlas and they published this paper in 2012 this was all based on microarray data and was the six different people and we sampled in the cortex several hundred regions in the cortex kind of summarized by these little dots here and then we did, we found the most differential genes it was a kind of a voting scheme where each essentially sample got to vote and say which gene is the most differentiating from me from any other sample and if you collected all those you collected that voting scheme and that you used a kind of a you used a sort of a reconstruction a way to sort of essentially reconstruct where geometrically it came from you could actually get a pretty good representation of the cortical geometry namely that the actual, you could sort of predict where these samples came from to decent accuracy, let's say, 50, 60% that that could determine the from where the sample came and but what we saw in this was these inevitable kind of gradients that there was never something that seemed to be expressed it wasn't, nothing was really equivocal, unequivocal but there would be a genes would express and they would send to more continuously kind of sort of change from one sample to another now, you know, so the first reaction the answer to that is that well, of course, the reason is because you've done more bulk tissue sampling and these are kind of partial volumeing effects and the smoothing effects and that if you were to go to a cellular level of course this would all sort its way out and you'd be able to see cellular identity and unfortunate fact is it's not true is it now down at the cellular level we see exactly another level of kind of gradient and imprecision and determining of cell type it's as if something like this as you, here you are, here's my simplification now this is my natural T-S-N-E plot here where you're looking out there and this beautiful horizon and you see, well, there's Power of Albumin Peak that's unequivocal and there's Mount SST and there's Reel and Ridge well, these are definite features I can see that they're true but there's this unexplained gradient here I don't know what that is it somehow connects these types but I'm gonna ignore that because I don't wanna call that a type, okay and then, well, there's this unremarkable biological noise there but forget that too, okay so this is not, but that's not really a satisfactory answer it's not, those aren't, the question, it boils down to the question of what are types and we need a more rigorous definition of what you'd call a type rather than I clustered it 20 times and I sort of got the same answer because these types are involved in some very complex biologically interacting process right, anyway, a lot of people are interested in this problem, right the NIH is interested in it international groups are interested in it and it's the heart of many things is that, you know, what can we classify cells in the brain, in the body probably heard of the human cell atlas which is another generalization and attempt to sort of profile the molecular structure of all cells in the human body but so the NIH formed this consortium it's called the Brain Initiative Cell Census Network there was a bit of a proto kind of group that got together and then they did okay and so a bigger award was made and it consists of these different groups which will do different sort of approaches they're supposed to be hitting all the big ways of looking at cellular characterization and there's a data center associated with it and that data center is, we have that, Marianne is involved in that and you know, Jim G, I don't think Jim is here anymore Jim left maybe, yeah but you know, and others and we're trying to bring this data forward and organize it as best as possible and essentially put this in a context which will enable people to understand it so an initial project that was just done for this, how's my time, how much? Five minutes? Five minutes, okay so the initial project that was done in this was that it was decided to do a kind of mini-atlas to show that this group could work together and it was chosen to work sort of in the primary motor cortex of the mouse and it got a little stalled because people couldn't really agree what was the primary motor cortex it's still under discussion and this brings us back to the coordinate of the coordinate frameworks, right which are so important for mapping and positioning data in the space but you know, I think that we have to accept the fact that the image volumes and the image blocks will always be stable and they are at least data which you can bank on whereas annotations are subject to interpretation and they can be subjective and so that's why registering data to frameworks using informatics approaches to position and interpret your data is always better than strictly ontological arguments for saying well I think this is motor cortex according to someone if you go to this web, there's a website biccn.org and I won't go through all these things but there's a lot of different projects a lot of different people involved with there I'll just highlight a couple of them very sort of briefly here but there's Hong Wei Dong from USC they're doing a lot of sort of injection and tracing kind of different profiling approaches Joe Eckers doing epigenetics, attack seek and other things in the mouse Hong Guizeng, she's doing whole brain transcriptomic profiling and other things Arnold Krigstein from USC is doing sort of developmental human over many stages so there's both mouse and human data the mouse data is much more comprehensive at this point and it's intended to be whole brain but the human is just sort of getting started and there's others we have a kind of as part of our portal at the biccn.org you can get all this data it was important that it be brought forward and made public and so we sort of made this kind of cell registry at first that enables you to see what's there and sort of search by experiment and stuff like that and we're trying to enhance the functionality it doesn't have terribly strong functionality at the moment but we're trying to kind of get that going let's see, we don't have so much time just a few, let me make up some pitches here that are sort of off topic a little bit is that I think that you know that there's this issue of it's, we're all friends here we're all neuroinformaticists so we have the same plights of approaching all different things but you know it's funny the astronomy community in many ways as much as they're dealing with perhaps simpler entities their laboratory is horribly constrained they can't manipulate anything all they can really do is observe yet their data is really much better organized in many ways there are people who plenty of people get tenure in astronomy just by looking at databases they never run any experiments themselves you know this is pretty actually normal and there's issues of kind of of you know classification, what do we mean like just recently, talk about the ultimate PhD result this individual here I was reading about him Matthew Barron, I don't know if anyone knows his name is from US, from University College London but he just somehow stumbled upon something and figured out by looking at certain hip position that the dinosaur tree was all wrong and he did a reclassification and it made like super big news and like T. Rex was affected and everything I mean it was really brought the house down and I think that this is the kind of thing that you know we have increasing need for kind of like doing this sort of stuff right and we have a sort of a mess of cortical nomenclatures we don't, we're very much prey to this dinosaur phenomenon if we're not careful there's definitely progress on this there's nice work be this human phenotype ontology the human cell Alice work nice work by Tom Gillespie here and his colleagues on this neuron phenotype ontology but you know we need to organize this information in and sort of good ways and build ontologies that sort of enable us to really kind of make progress just sort of I'm gonna go back a shout back to Jan's work with these knowledge and engineering environments I do feel that you know that cell types would be something very very amenable to a kind of knowledge environment I mean I just don't see how we're not gonna fall into this dinosaur trap nothing against the dinosaurs it was that it wasn't their fault but that you know this dinosaur trap that by looking in PubMed to see someone's ontology and hope that they send you your data so that you can see if this is a cell type or not so I think that you know we want some kind of knowledge resource for cell types right it would have I think it's very akin to the kind of architecture that be honest talking about and maybe his goals in some ways are even broader than this but this is here I'm just thinking about it as a cell type kind of sort of entity right where one could base it on perhaps transcriptomics lay other information on there use it as a deductive environment etc and whatnot so anyway I'll stop there thank you to everyone involved all collaborators and all these two people and thank you everybody me so oh wow wow that looks like a large number of collaborators but you've touched on so many different things that I'm really overwhelmed completely so while we'll be asking Mike maybe the rest of the speakers can just sit down and we'll just go smoothly into the discussion after that right yes can I start it was, I'm here it's incredible research have a quick question so all this really advanced technology has been applied to understand neuron has all initiated or brain initiative is anybody trying to do the similar approach to the non-neuronal cell type in the brain similar approach to do what? non-neuronal cell non-neuronal cell such as G.R. general cells not just neurons oh yeah other cells cell types in the brain people have a sort of a there's a neuronal bias if you haven't noticed in neuroscience no Anita hasn't seen that but yeah no yeah it's it's it's funny that there should be more and in fact some people are doing it probably the best way to untangle the structure in particular of the non-neuronal types will be through EM through electron microscopy because that's gonna shed a lot of light on a lot of glial type I think structure which will be interesting people don't you know it is funny like that it's funny that there's a neuronal bias but it's not so clear that you need all that molecular structure to think you know I think that that's the thing that they have to they're a little biased on that one you know mm