 It does. Okay, so my name is David Keter and I'm from the University of California, Irvine, and I'd like to thank the program committee for selecting our abstract for this oral presentation. This is certainly a collaborative effort of a bunch of people who are not paid for this work but are doing it out of their own charity because it makes a difference in the field or we hope that it will make a difference in the field. And it's been supported by INCF in terms of providing meetings where we can get together face to face and discuss this work and that's why I wear an INCF shirt here today to show my appreciation. So as all of you probably know that are in the field of neuroimaging and for those of you don't know there's been a large proliferation of available neuroimaging data and tools over the last 10 years. You know, you know 10 years ago we were saying hey look we need we need large data sets to be freely available for the public to share and now we have a lot of data and even more coming online all the time but what we're really missing is the ability this should be in front of that but we're not gonna worry about that right now but what we're really missing is the ability to do to reproduce results and replicate results and so we have some imaging formats and those imaging formats are good at fairly good at representing binary data but what we lack is formats for the metadata all the extra stuff that you need to know about an experiment a workflow provenance what you did to the data there we go no and so forth so okay all right so some of the goals for us are comprehensive data sharing enhancing reproducibility and reusability of the existing data being able to discover and access data where it is to enable new research and our problem right now is that we have no common standard for the metadata so you know how do you when you get a bunch of data sets sometimes you get them where they're organized in different directory structures which makes them hard to use and I'm you'll hear some later speakers talk about some solutions for that but also like how do you know what experiment this subject came from who the PI was I mean where is all that information for for provenance pipelines how do you know what was run in what order what the parameters were and so forth and sometimes they come in the form of text files and log files but we think we can do something better than that something that's more relevant to today's semantic technologies and that is both extensible and yet descriptive and so the neuroimaging data modeling working group dang it there we go neuroimaging data model data modeling working group is a subgroup of the neuroimaging and data sharing task force part of the INCF and what this working group does is focus on metadata standards for neuroimaging and so what we've done is we've built NIDM off of available semantic web technologies such as RDF and query technology sparkle built on top of probe which is a family of specifications for provenance that is basically composed of entities activities and agents these three objects in a series of fixed relationships and so building upon on top of that we start to develop object models is what we call them to describe experimental metadata workflow metadata and in what we've done right now is mass univariate statistical results within this framework we we put together these object models we iterate with a field of international collaborators every Monday we have a a conference call video call with people that are involved in this from around the world and we come up with object models we test them out and the key here is that once we have an object model that we like we will write tools or help the tool developers themselves that create software packages such as SPM and FSL for example incorporate these these things into their software so that for you as the user the hope is that you don't really have to do much to be able to use these technologies and you get the expressiveness that comes with them so let's just take an example really quick of the process kind of we would go through to use something like this so the idea is you take in here I've got just an Excel spreadsheet or any kind of tabular data the columns are variables the rows are subjects so what we're going to do is we're going to model the variables and how they relate to one another with what we call an object model we're going to use entities activities agents and some fixed sets of relations that come with the provenance data model so we create this little object models what we call it a graph of how our variables relate to one another and how they relate to the subjects contained therein what attributes we think are important to describe these variables are these objects and the way we construct the way we construct these attributes is in the same form as RDF where you have a namespace and a colon and then a term and that that if you were to de-reference if you were to go to the namespace which is just a URL on the web and you were you would be able to look up the term this is the the idea that you could look up the term and you could find out what it exactly means so it's hard to read but one of them says heart rate NCIT heart rate something like that I can't see from right here and that's the NCIT says Soros National Cancer Institute so if you went to the National Cancer Institute the Soros you could look up heart rate average something like that is what it says or time point time points a good one you could look up time point and that the Soros which give you a definition of time point and that's what I mean when you see a data file that comes to you with this form and you see time point you can look up the definition and so there's somewhat semantically annotated and self-documenting so in 10 years if you receive this CSV file you might not remember what these variables mean but in 10 years if you receive this this file you can the hope is that terminology persist and that's an important bit here that you could look up these definitions and know exactly what someone meant when they wrote this by time point in this example and so since these are graphs they're there the object models a graph we take this data parse it put it into a graph we serialize it into a text-based format in this case it's called turtle but you could use a variety of formats here and then you have databases that are graph-based databases and much like relational databases instead of doing SQL queries on a database you can do sparkle queries on these graphs and you can this is just an example from the Connie Center where you can build web applications much like you did for relational databases using these backends the benefit here is you're using semantic technologies and you're providing semantics for your data and if you use terms that exist in terminologies proper terminologies that are being integrated with ontologies you've now linked your data sets by the use of these terms to potentially millions of other data items that are out there on the web so the 90 m experiment model is one of the models that we're currently working on and this is again to describe experiments and so you have a whole block of entities agents and activities that describe investigations so the minimal set of data that you need to describe an investigation because this is based on semantic technologies you can continue to add attributes to these various objects that may be appropriate for your experiments and you're not going to break any queries we so it's extensible in that sense so there's a minimal set of things that you know we think are important for you to describe an experiment and then you could add to your heart's content additional attributes and you won't break existing sparkle query so then there's session levels and series levels there are collections that bind things together like anatomical scans and functional scans and we're currently working on this object model and testing it out so if you have any suggestions you're free to you know join the the working group or or contribute and there'll be some links at the end where you can get involved one minute cool all right so workflows so we have an object model for workflows and in similar set it uses activities agents and entities and same rdf and what we're doing is we're describing workflows in this case and again you create a model that describes your workflow what happened to the data what were the parameters what ordered things happen you serialize that into this rdf format and you've captured provenance about what happened with your data night pipe for instance in python already supports this and we'll output it you know like I said SPM supports it right now but we're working on incorporating into the other tools mass unit very statistical results we have an object model for that in this case was depicted here is entities and activities and things that differ between the SPM software and the FSL software pretty interesting if you're combining mass unit very results from both the software tools into a meta-analysis before you have a picture like this it's unclear to you what's different about these two software packages so we have an object model again we've worked with the SPM software developers SPM 12 will allow you to export your unit very results into this object model and then you can build tools on top of it or you can use sparkle to query it and the next slide okay and one of the important parts is terms here and so you need to make sure your terms are defined in terminologies and where you need to add new terms to terminologies you we have highlighted here neural X in red because neural X has been particularly good at helping us get terms into their terminology with a minimal amount of overhead and they allow you to iterate on the definitions of terms and things like that so important bit here is to pick namespaces in terms that you know we're going to persist because if the term definitions are gone in three years then all of a sudden you have a file that's not semantically meaningful and so we put into neural X a DICOM terminology all the DICOM tags are defined in neural X so if you need terms for DICOM tags they're there in neural X here are the list of resources all our code is on github it's on the web we have our specifications that we post here on the 90m website we also have a 90m primer there where we have a template for specifications so if you create an object model for your local site that you think is interesting for your local site you can download our temp our templates and make yourself a W3C C style specification that describes your object model and the hope is that you would contribute that but if you don't at least you have something that's well-defined and specified according to something that looks like a W3C C style spec these are a list of the contributors I apologize there are affiliations that are wrong because people move around there are people that are missing sorry Samir but it's a long list of people and that's it thank you so we will continue