 follow up a talk about the cybernetic eye with basically talking about file formats and standard. So you made my job very, very hard. Yes, shame on you, sir. So yes, I'm gonna talk about the way of organizing neuroimaging data and basically neuroimaging human cognitive and clinical neuroimaging at the moment is in a stage where generating data is relatively easy and relatively popular. So we're talking about thousands of publications published every year in fMRI alone and that doesn't include structural scans such as diffusion or anatomical. And it turns out that many, many of these studies are very similar in terms of the type of data they export, the type of data they produce. There are differences in experimental design. They address different questions but in bulk there is a brain, it's put in the scanner and there is data output. And we in neuroimaging are blessed with actually having some pretty commonly used file formats. And it's not just the Dicom but it's also the Nifty file format that is supported by a vast majority of software packages. So what's the problem? The problem is that everyone doing an experiment and obtaining this data is arranging them and describing them in a different way. And that actually even happens within a lab. You can have multiple PhD students and multiple postdocs in the same lab and they treat their data in a different way. And why is it a problem? Well, basically if you are a PI and you want to hand over a dataset acquired by a PhD student a few years ago to a postdoc to address a different question, you might be in a pickle because that dataset will not be described very well because the student didn't follow any instructions or any specification because there was none. It's also hard for writing software that would automatically analyze the data. There's a lot of manual labor when you have to specify different metadata about the dataset. And it's also a bit difficult to figure out if there's something missing if you don't know the structure of it. So we decided to try to address this problem by proposing something called BITS which is Brain Imaging Data Structure which is nothing else as a specification, a formal specification of arranging results of a human neuroimaging experiment, think MRI. And we have certain principles where maybe before we go to principles, let me tell you who is it for. As I said, it's for PIs. So you tell your students, how should they arrange the data? So you would be in control and you would know that you can transfer it between the people in the lab as well as share it with your collaborators or even go for a bit publicly. You can also build workflows. So it's for work developers so they would know what to expect and will expect certain organization of the data. And last but not least, it's also for database people. So it will be easier for them to import existing data sets into more structureized data. And we have certain principles. So first of all, adoption is crucial. We want to make a specification, a standard recommendation that would be for everyone. We want it to be simple and we want it to follow certain practices that are already out there. We also don't want to reinvent the wheel. We don't want to come up with a new file format that is superior to everything else but no one actually cares about it. We also want to keep it very simple. So we noticed that in the community, a lot of people are working with files and a lot of labs, many labs, cannot afford maintaining a server for a database. And a lot of people in the field are actually not very advanced in terms of most recent standards. So you want to keep it simple. We also want to capture 80% of existing designs instead of trying to make everyone happy. So this is the implementation we came up with. I'm only gonna go through certain major rules. You can read about it at the website and I'll show at the end. Basically, we encode a lot of data in the folder structure because that people do right now. We just make it more formal. So depending on what, how the file is called, then you will quickly realize which subject did it come from, which session did it come from, what kind of modality is it, what kind of task is it. We also used comma separate, sorry, top separated values for tabular data and we used JSON for key value stores. And we used the aforementioned Nifty because it's so well supported by existing software. We have some exceptions. There are some legacy file formats for B values and B vectors that we also use and break the rules of using tab separated values, but that's because we want to have this data ready to be processed and ready to be used by software. And also it's extensible. So if there's something that is not covered by the standard, we allow the researcher to add the files the way they want and in the future we're gonna extend the specification. So we support multiple sessions or visits as well as different acquisition types, different modalities, different types of field maps, structural data, diffusion data, functional data, resting state, so on and so forth, together with sufficient metadata to process it. So we basically looked at different types of metadata that are necessary to be able to, for example, do field unwarping using field maps and different types of field maps and we added in the specification. If you want to use field maps, you need to add, for example, effective eco spacing and things like that. We also support behavioral data on different levels so you can describe your subject and the population level. For example, these people had this age, but you can also have a multi-session scenario when you had a measurement every session and then you can attribute different measurements at every session. So I can show you how it looks like. We have this fairly simple intuitive file folder organization and you probably noticed that there is some redundancy there and that is by design. So we did want to have in the file name, for example, every file and thus encode which subject it come from and that prevents from accidental confusion and basically mixing this functional scan of this subject from another subject. We tried to make it as clean as possible and the whole process involved talking with the community. Okay. And this is how we encode tabular data. So this is an events files, for example, for functional MRI data set, we have to encode what actually happened during this scan. So what kind of stimuli was presented and what kind of response was given from the subject and that is nothing less than basically two columns. We tell when something happened for how long did it happen and then we give people the flexibility of adding any given number of columns that can encode other things. For example, response time or strength of a response and things like that or maybe a label of a trial. We give people flexibility and that allows actually to encode lots of different designs. And this is an example of a key value store, JSON file that can encode certain properties of the scan itself. For example, here we have the repetition time and eco time and flip angle things that you would usually want to know about the scan. And yeah, okay. And we are following the naming conventions from that come here. So we also don't want to reinvent the wheel. This is how demographics files look like. And the keys to success of this project is basically to get everyone involved in it. So we had numerous different discussions. I can tell you those are most exciting email exchanges you can get there when people argue about file names. It's just wonderful. We also build a validator. So whenever you actually tell your students to conform with the specification, they would have an easy to use web-based tool that would just point to a folder on a hard drive and it will tell them, hey, this is a valid BIDs data set. Or no, it isn't and you have to correct this and that. We also talk to developers and you already have someone aboard, building workflows that can benefit from this file organization and make things much easier to process in the keyboard automatic analysis, CPAC and iPype and others in the works. We also are talking to database developers because it's actually a very, very good opportunity for them to import data. And I think it's Loris, Coyne, Citron and many others. And I can give you a bit of a bigger picture why actually spend so much time on discussing how files should be called. Because we have this initiative at Stanford where we're trying to improve the visibility of neuroimaging research. And in this initiative, we would like to provide researchers with the most up-to-date and robust methods for analyzing data. And the first step of that is actually to get the data into the system and there's nothing easier to do than basically tell them how they should organize the data. So therefore we are working on the specification that will allow us to ingest the data into the system and then provide them with robust methods of analyzing it. Of course, this is not just me doing this. It's a big group of specialists and many of them are here, GB, Cameron, Nolan and Dave and many others. And if I were to try to list everyone, this slide would overflow. So I was gonna say that I'm gonna send to my home lab, the Poldrag lab and the INC data sharing task force. And some of this work was also funded by the Arnold Foundation. But the take home messages go to this website. Have a look and lend us now if you'd like to change something and then I'm gonna discuss. Thank you very much. Thank you. Thank you.