 I'm going to talk about neuroelectro, just sort of moving from models to data. So the idea behind neuroelectro is really simple. There's a lot of published data on neurophysiology. This is just a simple histogram of the PubMed search of neuron and electrophysiology, and abstracts contain those. So there's about like 45,000 articles that contain probably some data on electrophysiology. So the idea about neuroelectro is what can we learn by trying to compile it? Can we even compile it? And maybe if we could compile it, we could have really awesome data to put to these good models. So yeah, that's the idea, to try and sort of get at this data in the published literature. Okay, but there's this massive problem. The literature is notoriously heterogeneous. Okay, people have known this for ages. So two simple examples, we know that the kind of electrode you use to record like simple electrophysiological parameters like resting potential and resistance, that's going to change depending on the kind of electrode you use. If you record from neurons in a slice, you often, like, you could choose to heat your slice or not heat your slice, and that change is going to affect the data that you collect. And there really aren't any conventions for doing it one way or the other, and scientists kind of just do whatever they want. Okay, and so these differences are going to affect the data. And so everyone's like, man, like these differences are big, so I'm just going to go collect my own data. And so maybe this is a problem, or I think this is a problem, but so something to keep in mind. Another difference is just even the nomenclature itself is heterogeneous. For a simple property like input resistance, that's measured a number of different ways. So scientists don't even agree on the simple meanings of things. Okay, so these are things to keep in mind as I'm going to talk about neural electro. Okay, the overall methodology of neural electro is we download tens of thousands of papers from the published literature. Okay, we have scripts that download them. And then from those papers, we try and extract structured information about the electrophysiology properties of different neuron types and electrophysiology properties. So for example, for CA1 pyramidal cells, we may want to extract the actual electrophysiology property values corresponding to electrophysiology properties like input resistance, stress potential, spike width, so on and so forth. We want to put this data into a central database, and then once it's in a database, then it can be used for other sort of needs like data analysis, like adding this data to models. Okay, so here's just a quick overview of how the current methodology works. So from these articles that we've downloaded, I have text mining scripts that identify data tables that are published within these articles. So occasionally, a scientist, when they record electrophysiology data, they'll summarize their measurements with a nicely formatted HTML data table. Okay, and that's nice because the data table is, it's relatively straightforward to extract information from that data table. So what I need to identify is I first need to identify what type of neuron was being recorded. Okay, so here in this example, the type of neuron is a hippocampal pyramidal cell. And so what I need to do is I need to map this mention of a neuron type to some like some listing, some canonical listing of neuron types. And so I'm using a listing of neuron types that's produced by Neurolex.org, which is like an INCF-sponsored expert-defined listing of neuron types. Then I also need to identify which biophysical property was being measured. Here in this case, the scientist uses the term R subscript N, and that corresponds to the property input resistance. And so I have like algorithms that look for terms like that. And then once you identify both of those, then you need to extract the biophysical data value. So in this case, the authors just report a mean and a standard error. So we extract both of those. And I just want to mention again that like this is done using algorithms. So like, you know, like text mining algorithms. Okay, once you, so after we extract the electrophysiology data, then we also extract some basic information about how the experiment was collected. So in this case, here's an example sentence from a method section saying that these experiments were done in brain slices. The experiments were done in vitro. It was done in rats of this age, and it was done in the rat strain, like wister rats. And so we extract those, like those highlighted bits as well. And then lastly, because text mining is very error prone and because like, you know, the literature is very, it's diverse. Like people don't use the same terms for things. All the data that we extract is also checked by experts. Okay. So all this published or all this extracted data, it's available on the web at Neurolectro.org. Just show a hands like, who's seen Neurolectro? Okay. Who wants to see it and hasn't seen it? Okay, I can give it, I'll just give a really quick demo. Okay. Okay, sorry. Nope. Where is it? Do you guys see it? Oh, here it is. Perfect. Okay. So this is the Neurolectro web page. It's at Neurolectro.org. And so here if I just click on this first tab, Neuron Types. Okay, we get a listing of Neuron Types. These are the Neuron Types from Neurolex. Does someone have a favorite Neuron type? Just yelling out. Okay. Oh, boy. Oh, that's going to be hard. I'm actually not going to, oh, here it is. Great. Okay. So Neuron Types right here. So this is a page for, in fear your Caliqyless Neuron. Okay. And each one of these is a summary electrophysiology table of measurements that were found for inferior Caliqyless Neurons. And so these are the input resistance values that are found for inferior Caliqyless Neurons. So each of these data points is a measurement of input resistance for an inferior Caliqyless Neuron referenced to an actual article in which that data point was found. Okay. And so we found like these values from a number, you know, like maybe a few articles for these Neurons. And so this is all the way we can make it interactive. So clicking on any of these data points takes you to the original data from which the, takes you to the original article from which the data was extracted. Okay. And so here that the original table is shown here. And so the colors indicate my algorithms markups. And so you as the user can just check out, you know, was this done correctly or not. Okay. Can I go here? Okay. So it's kind of hard to see, but over here, this is the contribute tab. And I just want to mention that like we've added features to allow users to contribute information to NeuroElectro. And like, you know, in the comfort of your home, I'd like you to go check it out. Because what we want to do is we want to start doing is I want to start like maybe trying to crowdsource the curation of data and have people begin to start adopting a Neuron. Because I'm not an expert of all Neuron types, nor do I really want to be. But I would like people to, you know, consider adding data about Neuron types that they really care about. Okay. So let's get back to the presentation. Okay. Is that here? Oh, and the, you know, the codes on GitHub and the data is, we have an API to the data at NeuroElectro.org. Okay. Okay. So here's some simple statistics about the data in the database. We currently index about the 100 most popular Neuron types. And from those 100, like that comes from about a little bit over 300 articles. So that 300 articles, like it's probably like maybe between half a percent to maybe 10 percent of all available data. It's by no means all or the majority of the data, like there's still, you know, the database is still heavily underpopulated. And most Neuron types are actually only, we only have like a single article, a couple of articles that talk about a single Neuron type. So we definitely like need to do better with like the redundancy of data and getting more data into the database. And like some properties are more likely to be mentioned in a data table than others. Like so, you know, we're always trying to like, you know, add more data to the database. So getting back to this issue of extensive variability that I mentioned earlier. So here, this is a plot of resting member potential for like maybe the five most Neuron types, five by most common Neuron types. And each one of these dots is the, like comes from a paper. So this is itself a mean standard deviation that was reported in a data table. And we see that like there's probably a spread of about 10 millivolts in the range reported for resting potential for any given Neuron type. And so that like, you know, like this data is variable. Okay, it's highly variable. I mean, this is, you know, a feature of all extracted properties. And so what we've started to do is we are, we're trying to account for these, you know, these, these differences in the reported, reported property values based on, based on differences in the experimental conditions. So for example, for say for input resistance, what really matters is what kind of electrode was used. Like did the scientists use a patch clamp electrode? Or did they use a sharp electrode? It's because if they use a sharp electrode, then like the input resistance value that they report is going to be much less than the input resistance value reported using a patch clamp electrode. So it's just a simple example of, of, you know, like a, like an experimental condition that matters for variance in a property. And so, so like, like, given that we're extracting a number of properties, we're just applying linear corrections as best we can to account for the variance that we see across the electrophysiology data. So I want to mention that we're, you know, we're extracting, like, like, probably like the most obvious things now, like electrotype, animal age, recording temperature, adjunct potential correction. And we're continuing to extract, you know, further extract more properties like recording solution contents. Okay, so, so now that we have this giant database of, of electrophysiology properties across a number of neuron types, we can start to do some interesting analyses with it. So here, this is a hierarchical clustering analysis where we just clustered the neuron types. Here, each row is a different neuron type. Based on its set of six, six commonly reported electrophysiology property values. So three, three, like passive properties, like, like input resistance, resting, resting potential, and memory time constant, and then three active properties, like three active, like spike properties. And then so, like, like the, the hierarchical clustering is shown on the left, that's on the dendrogram. And the actual neuron, the neuron types are shown on the right. Like that's just indicated by the names. So, and then on the far right, like then we, I have like these super, these neuron super groups that I've just sort of defined based on the hierarchical clustering. And so if we, if we like examine this, we see that, like, you know, certain clusters that we would expect emerge. So for example, like fast-spiking, fast-spiking basket cells in different parts of the brain, they cluster together. And so, like, that, that indicates that maybe this procedure is working, working like it should. And same with, like, cortical projection neurons. But we also see, like, like, you know, maybe novel classes that we wouldn't have expected beforehand. So that's an example of, like, this sort of analysis that maybe is telling us something new that we wouldn't have known beforehand with, without this, without this database that we just created. So lastly, like, just a direction we're moving towards is like, like, we're trying to integrate this database with other databases to say, like, novel things about science. So for example, we have this dataset neuroelectro where there's, we have different neuron types with different electrophysiology property values. And so what I'm doing now in my postdoc lab is trying to integrate that with gene expression datasets. Like, for example, like, the Allen Institute has this really nice gene expression dataset with a quantified gene expression in every region in the mouse brain, you know, across the mouse genome. And so what we're trying to do is we're trying to map the electrophysiology phenotypes that we see for different neurons to the patterns of gene expression that those same neurons express as indexed by the Allen Institute database. Okay. And so we, you know, we want to be able to explain, like, okay, this is the reason why the neuron has this electrophysiology phenotype based on this pattern of gene expression. And what we want to do then is just make arbitrary hypotheses like relating gene expression to electrophysiology. So maybe if we see that the neuron has, like, you know, a particular, like, electrophysiology phenotype, like, say for example, like a large SAG current, maybe we can say that that's actually due to this specific pattern of gene expression. And then that would sort of give the experimentalist a hypothesis that they can then further test. Okay. So just to conclude really quickly with future directions, we're currently trying to expand neural electrode to get more neuron types. We're expanding it to more domains, like trying to add information about synaptic plasticity. And then I'm really interested in, like, trying to continue to demonstrate the value of integrating different databases, data sets. Because I think that's the way that I could, we can move to a situation where the experimentalists, like, they see the value in these approaches and then they are, they're willingly sharing their data with us. Okay. Because, like, I want to move to where, like, we're not actually trying to mine the data, but where they're actually sharing it with us. So, because that would be much better and, you know, better long-term than trying to just mine the data from papers. Okay. Let me end by acknowledging, you know, like my postdoc lab, the Polydus lab at University of British Columbia, my former, my PhD lab, the lab of Nathan Irwin at Carnegie Mellon and Rick Erkin, who's going to talk next. Okay. I have a, I was going to start with the microphone. Could, have you looked at inter-settler solutions? Because I'd have thought those, you know, for example, the BAPTA or EGTA concentration would be absolutely crucial for, as a variable. And I wondered if you could pull that sort of thing out of the papers. Yeah. We haven't done it yet, but we are working on that right now. Okay. Yeah. Yeah. Yeah. Yeah, we could do that. And that is that, does that introduce a lot of variants, basically? I don't know yet. Actually, my sense is that probably most of the data that we have does not include BAPTA or a Kowson key later. Like, yeah, it's, we call it like a, a standard vanilla inter-settler solution. Srijani, I don't want to sink your battleship, but how do you know that a cell X in a, reported in a paper one, in paper one is identical with the cell Y reported in paper two? This is the fundamental problem of this. How do you solve it? You do it in the best you can, or I've been doing it the best I can. I don't think there's a complete solution. Like, so the, so, you know, the NIH just, like one of the main key, you know, grants for the brain initiative was, let's define the cell types. So this is an open question. Okay. The second question, which is related, how many, can you go back please to the cluster analysis? So how many, say instances you have there, how many types you have there? About 40. 40. Yeah, the 40 most common ones, the 40 most popular neuron types. Popular, what do you mean by popular? Like, we have the most data for them. Okay. At least three articles. Maybe you, you've been to the previous talk where only for the somatosensory cortex primary, there are about 207 neural types based on morphology and electrical properties. Yeah. So it, I mean, like, hmm. So, I mean, like, so. You know what I try to say? I mean, like, I think I see what you're saying, and like, you know, it gets, it gets back to cell types question, like whether you're a splitter or a lumper, and like, I'm, I'm, I'm, I'm a pride. Yeah. Talking about how many. Oh, types. Are you including in the analysis? Oh, I guess here, like, this is 40 neural X cell types. Okay. But then this is showing that, like, you know, in terms of these electrophysiology properties, there's only really like 10 superclasses. So maybe the definition of cell type that's defined primarily by where the cell type is, is not like the, like, that's, that's all the same definition based on these electrophysiological phenotypes. Maybe I'm shaking her head. So I think I'm nodding her head. So I think I, yeah, she likes my answer. We really don't, we don't know on what basis we're supposed to be splitting and lumping these things. So morphologically, we've come up with a whole lot of cell types. They are not commonly referenced. So these 40 were the experts in the areas around the world saying, well, these are the ones we're sort of certain of. We know we're missing a whole lot, but the, but when you actually say what constitutes a cell type in the nervous system, the answer is, well, do you go on function? Do you go on structure? Do you go on location? Molecular expression? You know, I think these are the sorts of things that are telling us these, if we have to account for 100 million cells, we're in trouble. If I suspect, you know, listening to one of my molecular biology colleagues who said, you know, the nice thing about the nervous system is only two types of cells, neurons and glia. I'm like, no, that doesn't work either. Yeah. It's some tractable number in between. Right, that's right. And it's like, you know, you'll, based on whether you really want to split or really lump, like you'll come with the, you know, somewhere in between two and, you know, 100 billion.