 I've been working on to streamline the data management and analysis process for a project called Incanda, which is investigating adolescent brain development and the effect of alcohol. So here we're presented with a challenge for coming up with a framework for, oh sorry. So the challenge here is to support this large scale study and to reuse the neuroinformatics tools that are being developed by this community and use them to support this traditional lab in scaling up to a multi-site large scale study that supports multimodal longitudinal and multi-site study. So this study has five sites across the United States. It has a total of 800 and eight enrolled participants with three different time points, and all this data is being sent from these sites to the data analysis component at SRI. And so one of the primary challenges for this project was that we needed to start collecting data within six months of being funded, and we had very limited resources for being able to develop and deploy a framework. And so one place to get started is what are the system requirements that we needed to meet in order to develop this platform? So first we needed to be able to support a number of different data acquisition tools. So these were all set in stone by the time that we started building this platform. So we had already decided, or it had already been decided by the consortium, to use a variety of different web-based tools as well as tools that are implemented on just laptops and also paper and pencil tests. And so each of these have, it's a heterogeneous set of environments, so these are not interoperable systems by any means. And so we also need to be able to maintain a protocol for imaging as well. So T1, DTI, resting state, and also same-day ADNI phantoms and weekly FBURN phantoms. And so we need to be able to ensure ongoing data quality at each of the sites. We also need to be able to ensure that the data is being collected within a comparable time window between imaging and also the neuropsych and clinical. So we want to also be able to find a system or develop a platform where we can automate as much as possible. And so we're placed with this ecosystem of different neuroinformatics tools. And when we started building this system, it was in 2012, and really the two leaders at that time were XNAT and RedCap, which both support, both support, there we go, that both support an API. So an application programming interface, which allows us to actually script and automate ingestion from each of these different types of tools, and I'm going to go into describing how we wouldn't about doing that to create an integrated platform. So first I'm going to give you a quick overview of what this platform looks like and how it's set up. So each of the sites have a set of laptops that are used to acquire data and also interdata into web-based tools like the PenCMP and also have scanners as well. And so for the form-based data, we're using RedCap to pull that in. For imaging, everything is going into XNAT. And then we have a set of Python scripts and a framework for then pulling data out of RedCap and XNAT and pulling it into a computational environment that is configured for running all of our image analysis and also storing all of the additional metadata in kind of a joint RedCap database. And furthermore, the system is set up to have a data release mechanism where we're incorporating NIDM as a way of releasing data both first within the consortium and then more broadly to the wider community through these kinds of data resources or data repositories. So first, how we're working with the clinical and the neuropsych assessments. So again, so we have these heterogeneous set of instruments that we need to be able to start pulling in within six months of the start of the study. So we don't necessarily have all the databases fully configured, but we're able to have an SVN repository, so a version control system that is checked out to each of the laptops at the five different sites. And each night, there's a script that runs that then checks in any new data into this SVN repository, similarly for the web-based tools. And so this allowed us to start collecting data from all of the sites asynchronously in the sense that we were able to pull the data in and at least stage it for then being uploaded into the database once it was ready. And so at this point, we were able to transform the data into a common data model and push that up into RedCap, which is initially just a cross-sectional RedCap project which allows you to place all of the form-based data in its appropriate location. And then there's an additional step where we extract that data also automatically and pull it into a longitudinal RedCap project that allows us to have an integrated view across all the different sites and also let us know some information about what the status is of the completion for different instruments. So green obviously is complete, red is not complete, and yellow is still incomplete, but some data is there. So for our multimodal imaging, all the data from the sites is loaded into an ExNet server. We then have some semi-automatic QA that's done. There's also manual review of all of the images, as well as neuroradiologists' review of the images, and so we've also found some clinically relevant cases that we're then able to reach out to patients, physicians and whatnot and have that further taken care of. And then we also extract a variety of metadata from ExNet and populate RedCap with some information that allows us to validate what the scanning protocols are that were completed, and also do nightly and hourly QA reports to keep everything kind of running smoothly. And so once scans are indicated as being usable, this triggers our pipeline to pick up on the Nifty files and import that into a framework we call Lightweight Data Pipelines. This is on Nitric if you want to take a look. It's actually just a very simple shell script that allows you to basically define what the inputs are and outputs are to a given script or analysis script, and then it allows you to, when new data comes into the system, it'll just automatically pick that up and analyze it. It's fairly agnostic to what the script that it actually runs. So for example, our resting state preprocessing is all in night pipe, so it's still compatible with other workflow systems. We also use FSL's TBSS for our diffusion processing and free surfer for extracting measures such as anatomical volumes. So with all the data analysis that we've been doing, we've currently completed this section in red and green, so we have this platform that streamlines all these different processes for research, but currently we're working on this piece where we now have an integrated set of curated red cap data dictionaries that we now want to use in Canada as a use case for NIDM and BIDS that you heard about in the previous two presentations, and then we'll be able to apply the NIDM framework that Dr. Keter had presented and incorporate this idea of developing object models that we then can distribute our data using this NIDM-based approach. And so the general idea behind this is that we can use these red cap data dictionaries to then make a mapping to the NIDM ontology and then be able to generate specifications that describe exactly what we're distributing to the community. And then we can also take the CSV files that we're already releasing as part of our data release process, but then creating these NIDM mappings that enable us with some semantic technologies to improve the amount of queries or the types of queries that we can incorporate annotations from external ontologies like the cognitive atlas, and this will provide basically like a machine-processable format that's self-describing that also links out to this documentation. And so in the process of working through all this, we want to be able to use NIDM as a way of distributing in-canda data sets. And so what we've learned from this process, so we found that the ecosystem of neuroinformatics software that's available is now mature enough to be able to build your own scalable neuroinformatics platforms. We also found that it's really important for informaticians to be part of the design of a study where I think we could have minimized some of the complexity of all the different instruments if we were to choose more common platforms rather than having different data acquisition devices all over. We also found that version control systems are a great way to start collecting data before you necessarily have all of your infrastructure set up. And then, so inclusion, we found that by developing this system and reusing tools from the community, being able to run a longitudinal multi-site study is no longer something that's reserved really for large labs. I mean, this is something that you can build upon the tools and infrastructure that's already available to the community to be able to participate in, for more investigators to be able to participate in large-scale multi-site studies. And so all the software and framework that I've discussed here is resurgently released on NITRIC, so go check it out. Let us know if you have any questions. And with that, I'd just like to thank NIA AAA for funding and also the data analysis core, particularly Torsten Rolfing, who is the initial implementer of the framework, and also INCF, so I'm a member of the data sharing task force as well. And all of these folks, I look forward to working with you as we try to incorporate NIDM bids and other standards into the INCANDA framework. With that, thank you.