 Through several of the efforts that I'm involved with in the International Neuroimaging Data Sharing Initiative and the 1000 Functional Connectomes Project, we're amassing and sharing thousands of data sets from a variety of different populations. Here's two examples. One is the buy data set, which is 1,100 individuals, about half of which have autism. And also the recently released consortium for reproducibility and reliability. And so this is a data set from about 1,700 individuals, a variety of different test retest data sets, and it has about 5,000 scans, resting state scans, and about another 3,000 MRI scans. So these are some really massive data sets that we're putting out there. And our objective for doing this is to try to engage a larger group of researchers into neuroimaging analysis. There's a lot of people who are machine language experts, statisticians, and otherwise, who can really teach us a lot about how we analyze our data. And so one of the major goals of assembling this data is to put the data into their hands so they can help us do it. Of course, one of the first boundaries we have with that is whether or not they know actually how to do anything with that data in its raw, unprocessed form. There's a lot of things that we need to do to make the data usable for them and things they may not know how to do. So another initiative that we built on top of these data releases is the Pre-Process Connectomes Project. And with these we take and we pre-process the data using, I'm sorry, we pre-process the data using a variety of different tools and then make the data derivatives available for different people to use more directly. And these projects have been quite successful, garnering a lot of publications and being used by quite a few people. And one of the major challenges that we have to deal with when we use these data is actually teaching people and figuring out for ourselves what are the data set is useful, what are the data is of an appropriate quality standard. Of course, there's one approach that you could take and I should just throw in all the data and hope that the massive N that you can calculate or the relatively large N anyway swamps out any sort of nuisance variation you might have in there. But we don't actually know how well that works, right? I mean, there's definitely a trade-off between data set size and data set quality that we all understand, I'm sure, but none of us probably knows exactly where those numbers exist, right? Alternative, so because there are no quantitative metrics that are currently available or best practices for doing quality assessment, most researchers are left to do visual inspection, tedious visual inspection of their data, which tends to work quite well, but there is a lot of problems when it comes to inter-rater reliability and also it's not exactly clear. I don't think we know whether or not the human eye could actually pick up all of the relevant details about data quality that we may have, particularly when we're starting talking about more advanced techniques like multi-band imaging, there may be more subtle defects in the data that maybe we could pull out with an ICA or other measures that you may not actually be able to pull out just by visually inspecting the data. So there have been a variety of quantitative metrics that have been proposed on the data set. A lot of them for different FANOM measures, the FBURN project came up with quite a few measures, but we still don't know which one of the best metrics to use or have normative distributions in order to be able to take that data and come up with thresholds from it so we could say, well, if you use a ghost signal-to-noise ratio above 0.2, then that's bad or something of that nature. So my colleagues and I have developed, have been working on a quality assessment protocol. And essentially what we've done is try to come up with a list of all of the different measures that have been proposed in the literature and take the ones that seem to be the most reasonable and apply them, develop the software pipeline and made it freely available for people to begin to apply these measures for their daily or, I don't know, daily, but for their own analysis. So you can see here what we have. And again, a lot of these have come from initiatives like the FBURN and others. And we've broken them up into spatial measures and temporal measures. And one of the things that y'all might notice if you look at it, that many of these measures overlap quite a bit in terms of what they measure. For example, signal-to-noise ratio is very similar to foreground to background energy ratio. So one of the things that we've done, at least initially, is rather than being prescriptive and only going for the measures that we think are most likely to work, we're trying to be as comprehensive as possible with the hope that once we start looking at these measures, we can learn which of these are actually the most sensitive to data quality issues that we might deal with. So the next step that we've taken, so we have the tool set as I, sorry, listed here, it's really available. It's an iPype pipeline. It runs pretty efficiently. It's based off of FSL and AFNI tools. So you need those in addition to Python in order to be able to do it. And you know, you can run it pretty efficiently. I think it takes maybe like two minutes to run per data set for a functional and a structural scan. So the next step that we've started doing, because we do have so many large data sets that we've actually come up with, is we've started building normative distributions and sharing those. So we've created a repository of essentially just quality assessment values that are calculated from some of these very large data sets. And at least initially, our hope is, so we have data from a byte and from core right now when we're extending it to the other ones. But our hope is, at least initially, we could use some of these values just right off the bat before we learn very much about their distributional qualities. But we could just go ahead and say, you know, for example, if this was a plot, this is a full width, half maximum. So this is the smoothness of your data. So the more smooth it is, the worse you are. And so the, you know, right off the bat, we can compare all the data and say, you know, if we wanted to take the top, the best 50% of the data, we could do that, right? And that's actually kind of a smart way to go about things that doesn't require you necessarily to know a lot about neuroimaging or what visual inspection would do. So that's one of our hopes that we can at least do initially. We've done some other comparisons to begin to learn how these different measures work. As I mentioned before, a lot of these things are highly correlated. So we wanted to look at the correlation between them. So on these correlograms, what we have is, so these are the measures for anatomical. These are data measures that are calculated on the functional data. And on the top triangle, we have, I believe, core and on the bottom triangle, abide. So you see for the functional measures, if there's an X in it, that means that the correlation is not significant. So the the values that are not Xed out are the ones that are significant at FDR corrected point of five. And what we could see is is for the for the temporal measures, at least, the correlation between them is very consistent. And that there's a few measures that are correlated, but but a lot of them. There's a lot of independent information in the measures. So that may be pretty good. Some of the things that are harder to explain when you come to some of the anatomical measures is that some of the values, the correlations actually switch between the different data sets. Like some of these, you know, like yours is pretty negative in here. I believe it's pretty positive. So anyway, we're trying to figure that out. But this is information that we're capable to learn once we have these resources available. Also, because core has test retest data, we can use that in order to analyze the Oh, dang it. Is it going to go past it? Okay. Yeah, it went past it. No, maybe it didn't. Okay, sorry. I'm out of my slides are out order. But but the so another thing that we've done is is so with the abide data set, we have four manual raters that have gone through through various publications that they've done, we've gotten the subject lists. And so we have four manual raters that we've looked at. And we can begin to actually look at discrimination analysis to see which of the measures are the most sensitive to the to the things that we're looking at. So you see here, this QI one, which is per percent of artifact voxel, maybe, you know, is a better one to look at. But our hope is is now with these resources are available, we can begin to learn automated went methods, machine learning techniques, they can help pull out what are the most sensitive measures, but also do automatic classifiers of some of the data quality issues. We also have test retest reliability on these. I've been given the one minute, so I can't go into too much detail. But I think it's kind of interesting. We can pull out for some of these measures. So so, you know, like devars is a measure of the distance or the intensity variation between subsequent images. And you see that that varies quite a bit over time, whereas other measures, such as entropy focus criteria, or foreground to background error, I'm sorry, energy ratio don't vary that much. And I think that what we're actually pulling out here is, is that these are measures that are actually a lot more sensitive to subject specific factors, quality measures, I'm sorry, effects, subject specific effects on the data quality, whereas these other ones are looking more at technical variations, like maybe the signal, the scanner type, or the sequence parameters. And that's why they're more reliable. So a lot of people have been involved in this, as I said, so all of these data and their resources are available via GitHub. So if you all have any questions, please email me about accessing the data. There's been a lot. I'd like to thank these people for being involved. Also, I wanted to briefly let everybody know that so we have a next event in our brain hack series. We're hosting Brain Hack Americas, where we're going to have a series of hackathons in a variety of different cities throughout the Americas. And hopefully if y'all are, if y'all are interested in doing a regional brain hack near you, please let me know. I'll be at the the Miami, I'm sorry, not the Miami one, but the the Mexico Brain Hack, which I think will be very nice. And Samir always complains that the brain hacks don't have like, you know, some nice thing to them that really helps get his creativity focused. So Samir, this actually, this one's here for you. So you can imagine, you know, drinking a Michelada and hacking brains. Anyway, thank you very much.