 Anyway, so I was going to, I'm going to talk about the brand imaging data string and development of the neuroimaging data model that we've done in the NCS task force. My subtitle is the missing principles and tools. So outline, a brief introduction, then why NIDM, a couple of use cases of NIDM and then its future, so in a quick, so let me start with a little bit of a background of the neuroimaging software. Basically, why do people develop neuroimaging software? Why was SPM, FSL, AFNI, the tools that we've heard about been developed and so on? And I think, I mean, usually it's a method group that really want to have like a media to, you know, propose to the world the method. It's also like a little bit of a branding aspect to it. And so it's, you know, it's quite good to be able to quickly put out your new research and methods in the software. And that was a kind of a critical sort of a reason why this software was developed in many instances. However, software development itself is usually poorly considered in the research area in the academic world. I mean, if you are a developer for one of these tools, it's not like a, you know, sort of the stellar position. And there's poor interoperability between those tools. And basically one of the fundamental problems that I want you to think about is that the publication aspect is still the main currency that people have for their career and so on. And that's a little bit of, that is kind of the background that I think we are working with at the moment. And basically one aspect of that is that there's no provenance, that there's no push to get something that is a provenance that people can get out of the results and the data. And there's also little testing of those things. So the consequences, that group has some, how to promote their software. They, you know, when there's a bug in FSL, it's not like a great thing. You don't, you know, advertise it that too much. I mean, that's, I'm not saying there's a bug in FSL. I'm being recorded here. But, you know, that's the sort of the sociology of it, right? Code to reproduce paper is usually, if you do a paper, if somebody, a cognitive neuroscientist does a paper, it's very rare that the code to actually reproduce the analysis and the figures and all those things is going to be available. And I just picked up this number because on looking at the NIH sites, I was looking at, you know, out of the 400 millions that are being basically given of, you know, funding neuroimaging research or fMRI research at least, sorry, I should say. How much of that is actually going into the development of tools? Very little of that. So that's the consequence. So what are the consequence? A very deep lack of harmonization and standardization between things. I would say, I mean, I would like to take and tell you the story of the Nifty standard. Again, if Nifty is one of the standard that we are all working with, it's a very good thing to have this standard. Many people are saying, this is a really bad format. This is like a horrible thing to have done and there's some mistakes in this on the way, you know, it's been developed and so on. Sure, it is great to have Nifty. That's the my message. There's basically a lack of reusability. There's an enormous waste. If you look at the research being produced, there's an enormous waste of resources. When the only product of a two or three years work is a couple of papers and then no one can actually reuse the method and the thing that you've been developing to actually produce those papers, it's how much, this is enormous waste. And that's one of the things that we have to really think about. And there's a lack of reproducibility. There is a reproducibility crisis. I think how many of those papers out there can actually be reproduced and give us the same results. It's not very clear. And I mean, there's a number of papers now showing like starting with button. There's the power issue, but there's also the availability of the code and the software that needs to be here. I think that's, I mean, I would say in my field, in the new imaging field, there is definitely a reproducibility crisis. And I think we have to think, OK, how much can I reproduce my colleague results before I do something? So the missing principle. If you publicly funded to do some research, the code and the data that you're acquiring or you're developing are not your code. There's something that you've been basically funded for. So they should be going back to the community, to the scientific community. And actually, if you're a good scientist, if you're interested in the stuff, that's a natural thing to do. The problem is that we are working in this economy of science where you need to sort of publish and advertise your stuff to survive in that environment. So that's one of the key problems. And basically, research is about diagnosing nurses. If you have some developing software, you should be able, I mean, that one thing you should be thinking is reproducing and validating and testing this software and making sure that they can be reused in an efficient way. So adopting a culture where we spend more time in doing sort of this slow-paced thing that we need to do to make sure that they can be, our products can be reasonable. It's more like a long-term thing than rather than a short-term thing. And that's the cultural aspect of the problem. So what are our goals in this task force and the development of the neuroimaging data model? Well, it's basically going to sort of go a little bit against the current. And we want to make sure that we can, if possible, have a comprehensive data sharing, announce the reposibility, announce resusability, increase interoperability, discover data, discover software, enable new research and ideas when you have those things available. And the challenges, that there's no easy tool, missing metadata, things are not discoverable, the data, and if you look for data on the web, that is very hard to get them, it's very hard to get them documented and so on. There's multiple software because of the economy of the thing, and there's very limited funding as I was telling you before. And there's no comments on that. I mean, there's very little comments on that. Thanks God that's nifty. So we need a common language and we need a method to construct it. So what are we trying to do? We're trying to basically look at, from the paper, which is the end result and sort of, unfortunately, sort of the almost only result that is really looked at by funding agencies or committees on the hiring in universities. And so instead of having the thing, our product to be only one publication, we want to get all the things that have been done before to be able to be reused and to be linked to that publication so that we can reproduce, validate, reuse. And the whole thing should possibly be linked and should be linked really if you have a paper and a result that is somewhere in that paper, you really want to be able to go back to the raw data, you want to be able to go back to the analysis and workflows and so on. How do we avoid the classical problem? We need a standard, we need to communicate. There's already like a anyways of doing that and trying to do something that we are going to just create a new one. That's a, and how we do, so we have to think of this, both the technical issues of trying to link those data and share those data and software, but also the kind of social and engineering aspect of it and that's probably even more difficult to solve. And that's the things that therefore if it's difficult to solve, that's the thing we should concentrate our attention and energy on as well. Anyway, an IDM, a data model shared and co-developed, that's one of the answer, that it has to be a community sort of tool. The methodology that we try to weekly have calls that are sometimes painful and long discussions on things that would get people frustrated and so on. There's a number of tools that we can use, those technical tools to help in that process. Git GitHub have been like a huge help in this process. Doing the pull request and some hackathon and getting the tools developers on board. If I'm doing something in your imaging, I need the FSL, the SPM, the AFNI, the free server guys to be on board with what I'm doing. Otherwise, it's not going to be going very far. And I just want to acknowledge those guys and especially the people being involved in the IDM development, Kenny, Carl, Jess, Guillaume, that is a photo, Nolan, of course, Sattra, I'm going to miss some of those, I'm sorry about that, but it's really, it's been a great, of course, it's been really a great pleasure to work with the people. And the social aspect has been really sort of started with the neuro bureau and the brain hack people and come on with, is one of the main player in that field. And really getting people together to go develop things is one of the answer to the social problem. So what is an IDM is a way of describing data, software, methods, results of the methods, in a way that is as much as we can come on and queryable and what does it use? It uses the semantic web technology. Technologies, basically RDF Sparkle and then the probe specification and then some vocabulary that when we can, we take from outside and it takes from the, it goes from the experimental design and data to the workflow and then to the results. And with the idea that all those things can be query across these layers. And we have focused a lot on the IDM results so you will see some example of that. Just want to go back to the probe model. Can you give me a head back when I'm five minutes away? Okay, thank you. Because I forgot to put my timing on so. The probe model is something that helps you structure how you describe things. It's not like a magic answer to anything. It's a little bit of a, it just helps to structure things as agent entity activities and you can do many things just with that and that's enough for us to describe our world. An IDM experiment is the idea that you use that probe model to describe this experimental data set that goes from the project to the actual study and then the actual acquisitions and all those files and use those agents, agent activities and entities to describe this, you know, what is being done to the data, how it's been acquired and so on. And we've got thanks to Dave, a character that we've got and all the group, the IDM specifications on GitHub where you can go and see, okay, how do I do that? How do I start with describing this experimental data with an IDM model? And we've got another view of that, a primer which we're trying to start to run training sessions on those things and we've got experiment and results aspect to it. And again, the social aspect of GitHub was really helping us in the process. The first use case I'm taking quickly because you've seen that yesterday, is one from Dave, the Conti Center taking the data, getting a model of those data, you're spitting out some total files, some serialization of RDF and then putting a virtual database which is a Sparkle endpoint where you can query things directly from the total file, from the set of triplets. And then from those things, you can construct your UI or anything you want to display those information. Just a quick quote, but I think what is important and interesting that Dave also pointed yesterday is that there is a comprehensive and long lasting markup of those things. So comprehensive, it means that if the habit was recorded, it was actually the habit of both the mother and the fetus and those things are encoded. Everything that all this information that we do not yet what is the information that is going to be very useful in the future, if we can, we put it in. And it's a long lasting thing. FreeSuffer, an example of FreeSuffer, it's a domain objective, FreeSuffer is a very used tool of course in the new machine community. And again, the idea was to go from the experimental information to the workflow of information and then to the results information. And the idea is that with a sort of few lines of codes which are sometimes you have to dig into, I mean, you have to sort of dive into that. It's good to diving in Cairns, that's always good. And you can actually get those information encoded in using our pattern tools, the pattern tools to encode this information. And then when you've done that, you can link to other data like the FMA, you can link to other tools if they are already marked up as well because you can merge different graphs of information using this semantic web technology. Third use case, the fMRI statistical result, that's the one that is the most developed, thanks to Cami. And basically, again, what you're looking here is the use of those software in the community for fMRI. And you see that we took the first three software that are the most used to start to develop this model across those software. And we have really like a SPM and FSL and AFNI still on the web. And this is the model. So in fMRI, task activation based tools, we start from a design, an experimental design. We construct the model, we take the data, we construct from the data the team maps that you've seen with Marta and other talks before. And so those are statistical maps. And then these statistical maps, we do some inference on those to select the regions that are significant across those workshops. And this model is just the model of that process and all the entities that are being created from the activities of, let's say, doing an inference or doing a contrast estimation or these sort of things. And the model is not going to be exactly the same for SPM or FSL. So these are the difference. So if you can query on the things that are common to those two models, but we also can query and say, okay, I want something specific that is only in FSL or I want something specific that is only in SPM. And you can do that with those queries. What is the use case of that? Well, how are we going to use those things? Because if you don't have like a good use case to say to convince the neuroscience community that is going to be useful, you're not going to go very far. And the use case that we started with is the meta-analysis. So coding all this information for the results of those software is really important for meta-analysis. The usual way meta-analysis was done in the past and will be done for, I'm sure, a long time is to have the XYZ coordinates of the papers to be taken out and then you construct some sort of maps of those XYZ coordinates for a specific domain, a cognitive domain or a pathology domain. And that's what you think has been done for. It is very successful. It's a great tool developed by Taliakoni and it's just a great thing to have. We think it's not enough. So the meta-analysis we can do with the marked-up results from SPM and FSL from with the Niger model is that instead of having just the XYZ coordinates that are in tables, in possibly papers, you have actually the T-maps that are images that can be shared and I'll show you how. But you also have all the information on those T-maps that is really useful for meta-analysis that has been marked up. So you know now the smoothness, the error model, the contrast direction, things that we are completely missing in the previous meta-analysis. Brandspell is another tool that we're working with that is actually a tagging tool on papers. There's a vast amount of papers that are legacy and we want to still be able to use that. But how do you use that if you don't have the right information directly coded out from the output of the software like SPM or FSL? Well, you can take Brandspell and start to tag and say, okay, this actual contrast list in this direction it comes from this cognitive class domain and so on. And so you have more information you can code it and our plan is to actually extract from that an IDM model. I was telling you the contrast images, they can be shared easily. That's thanks to Chris now. He's constructed this great tool, the NeuroVolt, that you can actually push all those contrast images and push those contrast images that have been spit out by SPM and FSL with the IDM information. So we have now in NeuroVolt the capacity to have all the information that SPM and FSL has when they're constructing their team apps. That's a fantastic thing. We're really making progress there. And there's also this little project with Nif and Jeff and Marianne and Jeff Kennedy to try to see how we can take the papers that are not yet entirely being published but will be published and mark those papers with some terms that are going to be in an IDM model and therefore have some more like a social aspect to it where we can say to the others, oh, that's great, your paper is accepted, but if you really want to make it really useful, you should mark all those things in that paper. And that's something we've been working on. It's a parallel little thing that should be then matched with the automatic things that have been already out from SPM and FSL. So we're basically trying, we're linking those things because we have the same model to mark them up. Queries, the key thing here is that once you've got all those information marked up, you really want to have the power to query that. And the standard W3C thing for those queries is Sparkle language. The great thing with that is that you may have a graph that is an output of an analysis somewhere on the web and then another graph that is an output of another and it is somewhere else on the web or something local, you can merge all those graphs and do a query across those graphs. So create a common graph across those resources. Problems for me, and I don't know enough about that topic, so I want to linger on it, but basically, I mean, how is that going to scale? How is the efficiency of those tools? We have been using virtuoso, but there's also other tools that are possible. And all those, how is it going to actually work in real life completely? I think that's still, those questions are still to be experimented and tried upon. And I show my colleagues, Nola and others, we will jump in and tell you a little bit more about it. I think we also need tools for neuroscientists so that, I mean, not everyone is going to write up a Sparkle query, so we need tools to actually make those query easy across various resources. And we need to train neuroscientists. I think if you really want to be a good cognitive neuroscientist or neuro-imager, you actually do need to learn those techniques at some stage to be able to free yourself from the tools that are being developed and actually develop your own ideas and develop your own queries that will be useful. So the training of the community is going to be critical for that, in that matter. So this is an example of a query. I don't think we'll go into it, but we have a number of examples in our IPathome notebooks, our Jupyter notebook now, in the GitHub repo of the taskforce. And this is just a way, you probably can't see it, it's just a way of showing that how simple it is to actually merge two graphs to do a query on those two graphs. But the thing I was pointing out is that queries can start to be a little bit complicated. And again, we do need to sort of start. So this is the query to get the XYZ coordinates from the SMEM FSL models. And that's going to be, not everybody is going to write that, so we will have to have tools to do that for all people. And these tools are started to be developed. So I think that's a central and can be developed, started to develop an IDM result viewer that works on the browser. So you can point to an IDM result, FSL and SPM, and then get all the team maps, all the coordinates, all the information that an IDM has marked up in tables. So what are the results of an IDM result? Well, we've got a free server converter, an FSL converter, a native SPM exporter, AFNI people are engaged in the project. We've lost the ad that have gone to industry, but that's hopefully we'll recover from that. And so therefore we, I think we have like the wide community of new imaging tools and the most widely used tools to be on board with that project. So hopefully there's a chance that this is going to work. Open FM write data, an IDM result for both SPM and FSL there, new volt ingest IDM results. We're working with Brunspell and Rosings. Bids, which is the new format that Chris was telling you yesterday, is going to be have like the IDM export where you can tag more information that Bids cannot have because it's just a simple data structure to make sure that it is easily write up by people that don't actually do coding usually. So that's things, these things are happening. It's a very exciting time. The IDM has been used in the Comtee Center that Dave was talking about. That's a real life application in NKANDA. We're working with Cameron from the NKI data set and so on. So there will be more and more big data set and data set spread across the internet that will be marked up with IDM. And as a result of those interactions, there have been like four grants that have been submitted that will have in part some IDM development aspect to them. So I think that's exercise. The one thing that I want to point quickly ever before finishing is that when, how do we understand each other? How do we construct a common language? Well, first of all, we have to know exactly what is our goal and that's not always the easy thing. We have the same goal exactly. We have, I think we have a common grammar somehow with the IDM model with the Prove W3C. And then we need a common vocabulary. We need to make sure that we don't reinvent the wheel each time. And we need to take the time to not reinvent the wheel. So if we are developing something with IDM, you have to search for them, then you have to use the standard ontologies and the hexagons that are on the web. And then if nothing is appropriate for that, then only you add something. And you add something by discussing it on the web, by using some sort of a technique that we've developed for the IDM results. We've developed this sort of methodology where you discuss the term and text time and it's boring but is very useful at the end. So where do you look for terms? Neural X, oboe foundry, FMA, a cognitive Atlas, but also all those standard domain names on the web, DCT of course and so on. And if you haven't found anything, then you create something and you discuss with your colleagues. So why this whole enterprise may be working? I think there's a good spirit. I think we all understand the sort of the problem we're going to want to solve. We're basing our sort of a way of working on the open source communities way of working. We have a very strong neuroimaging in pattern and open source ecosystem that is sort of an underlying layer to that enterprise. We are trying to solve a problem that is in the domain that we know. If I take all the people in the task force and around that are working on that thing, this is hundreds of experience in the domain. And we have a little bit of funding from NCF and soon hopefully a bit more from NIH to do that. But I think the key thing is that we have the great individuals in this task force that are really motivated and that do that on their leisure time and they're taking some time to do like maybe two days hackathon and so on. I think that's the key aspect and they're knowledgeable and they just have the good spirit. But our difficulties, it's hard because everyone has its own little agenda and next goal to it. So it's not coordination of those things are not easy. It's still a small developer community. We need to do more training on that. Software development is not well considered in the research academy sort of a world. Most research have domain knowledge but have a little skill. So there's a learning curve that is really steep to get there. But I think it is necessary to do that training. Current future work, NIDM experiment. I've told you about that. We should link with NDA as well. There's all those database world that we are linking as well. So XNAP, Clarisse and so on. Samir is around here to help us with that. NIDM workflow, there's a little specific group with Tristan and others from the world like a night pipe of course with Chris and so on that we're trying to get this markup thing, markup NIDM markup for workflows. And the NIDM results will expand with other software and so on when it's proved its utility. Again, I'd like to acknowledge people. I also would like to acknowledge Matthew, Rosa, Eva Christina and Linda and all the secretariat. And thank you for the invitation for the talking today. Thank you. Thank you Jean-Baptiste first. This is Guillaume. He doesn't want to have his photo on the way, but this is Guillaume from down here. Okay, sorry. So, yeah. Gary, you have a question? That was great, JB. So my question is about how to move ahead in having researchers adopt some of these improved reproducibility tests and so on. And clearly having the tools available is a key requirement. But I'm wondering if from a publication perspective you think journals could have a staged way of increasing the quality of research that's being submitted. For example, as you mentioned, power analysis is one sort of requirement it could be. And also some of the other metrics that you pointed out here. Do you see a way forward in trying to sort of implement this across the community? So I think there's several questions there. I mean, there's the how do we move like the general evaluation academic thing to account more for software. And there are sort of little things starting. I mean, there would be like a, I think in natural neuroscience a little bit of a advocacy small things on how software is really important and like how you should be testing it and so on. There are sort of like things like because of the reproducibility crisis, people start to understand that there's a need to consider that bit to be really critical and not to sort of focus on me on the publication of the scientific results that are possibly not scientific results in some cases. So I think that that's the general advocacy aspect to it. We just need to increase the awareness of the community that this is a critical thing. And if you were working in your imaging, you would never think of not training yourself in your anatomy. I mean, your anatomy, you have to know some anatomy. If you're working in your imaging, you have to know a little bit of a software development. That's the same sort of a, we know to have the tools of the trade, right? And so that's one aspect to it. And the other aspect is to engage specifically maybe publication journals to say, okay, that's something on the software is a valid contribution to science. And I think giga science for instance is doing something which you have a paper and you have in galaxy kind of thing like your tool that is available. So I think more and more journals are on board with that. And the thing to engage all the community of the developing tools, I mean, it's just have to be coming from bottom up and people realizing that working on your own little thing is probably maybe the best strategy in the short term to get the next grant on the next paper. It's not the best strategy for the scientific community. Any other questions or comments? I want to first from the chair. I mean, we're all talking about the need to share data and big data and this sort of thing. And yet, and you make the comment that there's little money for software development and yet so many people are seeking funds to develop yet another alternative or taking 14 methods and developing the 15th method to go with it. So how do we as a community say, let's work together and you're doing a great job to do that, but we've got to, we as a community have to come along and be part of that. And say, we don't need to go and yet develop yet a 15th or a 16th or whatever method. Yeah, and universities really have to develop ways of hiring at like a position for people that are, they're really good at that thing, developing and linking things for the, for the, for neuroscience in this instance. I think that's one of the key aspects. So it's a bit of an argument we've got to have with our universities and that is to what, what the benchmarks are for our staff is to, you know, at the moment it's, you know, first off the publications is unfortunately grants and that sort of thing. And we all understand that problem. So yeah, there is, but I think you demonstrated really why we need to go down a way of common data curation. Okay, so thank you very much.