 So this is a project that's a lot younger than some of the ones you've seen. I neither have 140 collaborators in 87 countries nor do I have 100 million euros a year for 10 years. I'm hoping to have both of those very soon. So the basic problem is one of model validation. So verifying that your model does what it says it does just on a software level that each time you do the simulation you get the same trace. That's one question that we have a better handle on. We've had a better handle on for a long while. But what we really want to start doing is making sure their models are valid in the scientific sense that they match experimental data. So specifically what we mean is a model we want to be able to have both in sample and out of sample validity. We want to be able to have models that are consistent with data that we already know about and then we want to have models that predict the future experiments. And this is something that I think we all of course want to do when people are models but actually formally doing this in a sort of rigorous way something that really hasn't been possible and still is only barely possible but it's becoming more possible with the development of related tools. So in order to test validity here's an example of a model trace and some data from a model that I worked on about nine years ago. And do you guys think that the simulation matches the data? I mean I probably thought that when I published it and the reviewers were okay with it but I don't know. I mean the problem is that this is a really informal claim about whether or not the model was valid and that it's difficult to actually formally evaluate it. I don't even know how I would go and track that down and figure it out now. And it's difficult to even find out what are the claims in a model across papers and across literature. And also claims that are very broad that give models scope that they are claiming to be valid across a wide range of data are rare because usually the data that a model has access to is something that they collected or that they're collaborative collected and that's sort of the focus of their model. So this is becoming a real problem because there's lots and lots of data that we have to account for and the rate of growth of the number of papers we have has been growing exponentially for 100 years and PubMed if I had it on that would be somewhere up in the upper right corner and as Shrijoy said before. And so it's just going to be really hard to informally validate models. You can't track down all this data by hand. You can't write some sort of a test to make sure your model is valid manually and in any reasonable amount of time. So how are we going to actually achieve testing? Well, if we don't figure this problem out and we haven't then I think modelers will not know whether what they're doing has the phenomenon of the trend model has already been explained by some other model and what aspects of the phenomenon and what data sets need to be explained. And really we need a framework for doing this in a formal way and software engineering has already has a framework that we can borrow called unit testing. So what is unit test? Just from Wikipedia, a unit testing is something you use to make sure a piece of code is that it works and it's fit for use. You try to make it as small as possible to test only one aspect of the code base and it's really a strict and formal way of ensuring that if something passes a unit test that it really does what you say it does which is encoded in the test itself. So what does that mean for science? Well, a scientific model, some kinds of scientific models can be viewed as functions that take some sort of observation as input to parameterize the model and try to produce some predictions. So it's like a function in software and the observations would hopefully include some sort of metadata to direct the model because if I tell you that the firing rate was 20 Hertz and you're a modeler, I mean, when is it 20 Hertz? So what was the conditions under which it was 20 Hertz? You have to know something about that. So as just an example, if you had a complete model of some cell type, physiologically, you'd really want it to replicate things that are observed experimentally like spike shapes and rates in inter-spike interval distributions. So scientific unit tests, you want to try to encode each of the kinds of features above in a single test and then have a suite of tests that you can use to test your model. So that's what PsiUnit is all about. It's that we can't just sort of show an informal match to data in journal articles. It's not going to work, but if we can build these unit tests collaboratively, then we can continuously test models and characterize the validity of the model according to the set of tests that they can pass or how well they do on those tests. So that's a project we're working on now. So visualizing that, I mean, what would that look like ultimately when you're looking at models and data? Well, you might have a lot of different, this is just an example from another field, right, from solar dynamics. And so you have different models that try to predict something about what's going on in solar cycles. Maybe it's, you know, luminance or sunspot number or something. And then you have different tests corresponding, in this case, to different solar cycles. You have some measure of goodness of fit, which is like the output of a test. And you can also summarize that, you know, with some sort of test suite that averages or aggregates the test somehow and reports and look at a final value and you say, OK, I think I know which model I want to use. It accounts for the data that matters to me. So what are the challenges? I mean, this is really hard because there's a very wide range of model scales and languages and goals of models. And development time is something that, you know, you don't want to have to sit down and write this big suite of unit tests every time you make a model. You spend, you know, years doing that. And how do you really know whether the test that you're looking at is, or you're writing, it's a fair test. What if someone disagrees with you and they say, no, I think we should really be testing it this way. So we have three basic approaches. One is unit testing philosophy, having domain standards, and having collaborative development, I think can solve all these. And so first, you know, what does this mean here? I am not going to go into the details of the implementation of one of the key features in SciUnit, which is about examining and determining model capabilities, but I've written a small play that will help illustrate the issue. So if a test essentially should be able to say to a model, this is what I need you to do in order for you to take this test. If you don't have these features, you won't be able to take this test. And then that's just sort of not applicable. And what you have to do is be able to formalize what those features are and figure out how if and models have them and then leave it to the model to implement those capabilities. So one model says, OK, I can't actually do this thing. So I won't be tested. Another model says, I can do all these things and then the test can be executed. So it's, you know, behind the scenes, there's a lot of code that does this, but that's the basic idea. How do we solve development time? How do we make this fast? Well, there are some emerging standard data sources that people can use to parameterize tests, instead of just sort of looking in a paper and writing down the values or scraping CSV files or something. And Neuroelectro is one of those and you just saw Shudra presented. So here's the one example of code I'm going to show. But essentially generating a test based on data from Neuroelectro is really easy. You basically provide the name of the neuron and the name of the property and optionless and metadata, which I'm not including here. And you can sort of find out some other information from Neuroelectro about where that data is coming from and other information you might want to know. And then essentially creating a test that's parameterized by that observation and then it can test models against that observation. So it uses the Neuroelectro API and these neuron names is defined by Neurolex. And so cloud development. Well, if you have a test and someone doesn't like the test, you can always fork it. So if you think someone should have used a Z score and instead of used a difference in means, then you can fork the test and you can make the revision. There can be discussion about issues. Maybe hopefully you can one day reemerge and save the future. So you can always fork it. So you can use this sort of approach to have competitions between models. So one example of competition that's actually been done before, the quantitative single-neuron modeling competition is something that happened a few years in a row in the latter part of the last decade and eventually was hosted by NCF. It was a battle between reduced models of neurons and there is a reference data set that was used to see if these models could predict spike times and with what accuracy and which models did better in which data. And this is done in a slightly more informal fashion, but it's the closest thing to comparison of models across data sets that I think we've had in electrophysiology. And I'm rebooting this with SciUnit and it's actually on the NCF website now and it's in development. But the goal of redoing this is that people are always making new models, people are always releasing new data sets. So to have say one paper that compares them frozen in time forever, this would be a living competition. And then you can constantly see what is the state of the art in reduced neuron models, for example. So you can also have competitions between algorithms or techniques because essentially they're just models of take the raw data and give you some information that's assumed to be in that data, like spike times, for example, or if it was calcium imaging or spike assignments, if it was spike sorting, and you can run competitions like that and there are, as long as there's a ground truth data set available, you can really can run a test. So what are the active use cases that we're working on? There's sort of three scales. One is I'm going to show you the quantitative single neuron modeling competition, which is reduced models of neurons. Another is with open source brain. So, as you've probably talked about that, and they're biophysically detailed neuron models that are significantly more complex than the reduced models. It's on one platform, but they're models that span many different cell types. And right now I can, because the standards open source brain uses, I can test about a dozen models across among some of the 26 electrical properties that are neuroelectro and write validation tests for those, and then you can ask, okay, which of these granule cell models best reproduces spike widths observed in granule cells under these conditions and so forth. And then open worm, there's definitely validation tests to do, so you have the worm moving and you want to know if the worm moves in a way that matches the experimental or the real worm and you can drive, you can create tests and test that model. So the reason this is so hard is that there's a lot of pieces to going from getting data and models and testing them. And these are really pretty much all the steps. And each of these basically have to be connected together. And if everyone's using different implementations of each of these steps, it's an impossible problem. And that's why I think this is impossible five years ago, Ethan. But now, as we've converged towards domain standards, there's, the code development time's been reduced substantially. So here's some examples of domain standards. This is not to mean it's an exclusive list. In some cases, there's multiple standards. Some of these projects have varying levels of maturity. But if you can link these together, then you can have a pipeline for model testing that relies upon those standards and anything that's compliant with those standards or can be converted into those standards can be tested. And so that's the basic goal of a project called Neuron Unit. So for a particular domain, you wanna have a way of linking those standards together. And for single Neuron Electrophysiology, I personally have enough understanding, I think of the standards to know what they are and how I can link them together. And so that's what the Neuron Unit project is all about. And so what are the benefits of all this? I mean, to be able to test your models, I shouldn't have to sell that too much, but you wanna know what models, previous models can and can't do. So you have a path forward in knowing what your model needs to do. It'd be nice to be able to do a continuous validation of your model during development, rather than just creating the model and finding out if it's any good when you get ready for publication. So it accelerates model development. You wanna have the best model. So it's a way of proving against at least some test and address the many reviewers who demand that your model pass these tests. And even post-publication review of your model, it shouldn't be frozen in time, it should be on open source brain and you should be able to continuously test as new data comes out and see if this is still a good model or if it needs revision and many other benefits as well. So, and as an experimentalist or a tester, what is it that you want to do? You want to put your observations in context. When you observe something, you don't know if there's a model that would have predicted this. And if there is, you should probably look at it. And if there isn't, then you might wanna build it. And what other data does that model explain? It can give you an idea for future experiments that you wanna do. Here's an idea, when you write a grant, you might say, I wanna do experiment X and if it turns out this way, that proves this. If it turns out this way, that proves that. And you can actually do that before you write the grant by writing a test for that observation and using hypothetical data and then running against models that already exist. And then you can say in your grant, yeah, if I get this, it really will demonstrate formally that model X is more correct than model Y. And I think that's something that people wanna have those experiment results in context no matter what experiment you get, there is some implication. So, and then of course, increased awareness of the data that you collect or data that could become the gold standard. And I think we all know, in some domains, data sets that are a gold standard for a model or for just reference to what it is that this thing, this phenomenon is in a particular, say, cell. So, I'd like to acknowledge my main collaborator, Cyrus, who's a software engineer who's figured out a lot of the hard problems of putting things together. Shrijoy, Norelectro, Sharon at Arizona State and Porg with Open Source Brain. And thank you for the organizers for inviting me. Questions? So, maybe I can start then. So, how do you decide on your single cell model as to what is a good test to start? I mean, you can do some very basic tests. So, in the abstract, I think most physiologists have agreement about the kinds of things that they might like to test. As far as which particular data sets they like to test and what constitutes passing the test, those are entirely subjective. So, but to simply just lay it out and say, here are the data sets. I personally, with my domain knowledge in that field, will get the ball rolling by starting tests. For example, data collected from Norelectro. But there are other options for tests. Electrophysiology properties is something that there's until recently and still now not really a good ontology for electrophysiology properties, but we can still list things like input resistance and spike width and spike height and things like that. Good models that work really well for a very tiny part of parameter space. Yes. Therefore, you need to work with the person who is going to develop a model to make sure that the tests match. Yes, so obviously you can sweep parameter space to find the set of parameters that will pass a given test. Hopefully, you don't have to sweep it independently for every test. Or you're probably not, it's not very robust, right? But for something like, if someone has published a model, they will make the claim sometimes that this model reproduces a system. And it turns out when you look into the details, sometimes they have actually varied the parameters at every stage for every figure they make. And that's something that you have to deal with. But a model inside unit, a model is a class that is not parameterized. You instantiate it by parameterizing it with a particular set of, let's say, model parameters. And then when you execute the test, runtime arguments like how much current did you inject, for example, that's with the layers of going from general to specific for models. Any other questions? Well, thank you very much.