 Yes, I'm an alien here. I don't know anything about Neurosize and a lot about data. There are no monkeys in this talk and you can tell that I'm from a different community because the collection of stickers on my laptop is completely different to the collection of stickers on everybody else's laptop so that's a sure sign that I'm from somewhere else. Okay, so I'm going to talk about fare. Now I know it's after lunch and it's all about data, and no groovy pictures of brains. But let's try to at least get some enthusiasm. So, Faire as Marianne, who actually gave practically all my talk this morning in her opening, who's also an author of the Faire Principles paper, which has done our H-indexes, no harm at all, Marianne. Oody told us it's stands for Findable, Accessible, Interoperable and Reusable. Maen nhw wedi fy enthusiastic wrth g surfing adrwyddoch, rydyn ni'n gwah Heavenfanc activism yn casgliadal ei pergywed am ymbygau, lle'r bwrs iawnntau. Nid yn fan gwrs-yfferddo lijaf xenon arno fel rhan fyddai hwn yn fwy yr holl o bron, neu mae'n ddyprif ordid ddweud, mae'n ddifredig awrraedd ei dddrealion ei gref 무엇 ar hyn. Eu disgwylchi'r mewn sefydlin. G faltwch ar yr gallwn cymdeithasol. Dwi'n cael ei gynnunio. Mae gyda'r cymdeithasol esudio, mae'r cymdeithasol ym fades pregnantor o'r hamlŷ ddechrau a'r ddeddarau. Mae'r cymdeithasol esudio. Ac mae'n meddwl byddwyd o phlog, ac mae'n meddwl gyda bod y moyddl iaith, ddyn nhw'n meddwl yn credu arddangos penthalof yr ysgol. Mae'n meddwl mynd i'r lluneg ac yn ei cherdd. Dwi'n meddwl bydwyd yn meddwl, ac mae'n meddwl arweithau. y cefnod o'r awrgyniad a'r defnyddiaeth ym mhliadau yn allanodol, yn allanodol o'r cyffredinol, yn allanodol o gynllunau, fel y cyfrifodol o'r ystyried bob Grufffyn i adler ym Scondiidd i gael geledig o'r rhai ddwybi, o'r ffwrdd o'r gyrchwyddydd, o'r ffwrdd o'r xunodol, ac oedem ni'n digwydd cyfraeg o'r cyfrifodol o gyrchwyddydd o'r llwyr o'r cyffredinol a o'r llwyr o'r cyffredinol. Ac mae'n rhaid i'r gweithio ar y cyflwyno'r ffordd cyffredinol, mae'r rhaid i'r rhaid i ddod o ddechrau'r gwaith. A oeddwn ni'n rhaid i'w dechrau'r ddod o'r ffordd. Felly mae'r cyffredinol, os yw yw'r cyffredinol, sy'n cyfrifio'r gwaith arall, ond mae'r gwaith eich portol. Mae'n gweithio ar gyfer y gweithio'r Gweithfawr, cyllid gweithfawr eraill? No, wey, no. Because the European Open Science Cloud, which of course isn't European, isn't open, isn't about science and isn't a cloud, but apart from that it's perfectly named, is actually a research commons and there's always this attempt to put a single entry point, one web page onto all data for Europe. This is just bonkers. It's not the way a commons actually works. It's also collectively created, owned or shared by the whole community with mixed degrees of control and I wanted to highlight that a little bit. So this is a picture from the research commons in order to be able to do cancer research which was funded at the NIH and this was an attempt to actually kind of quite control how this would be managed by organising a few key data sets and organising a community in order to be able to really work on very well defined standards. You're all building research commons so I found a couple of examples. You know the human brain project, that's building a research commons. The open science brain community, that's building a research commons. I think the next two days we're going to hear a numerous number of people building research commons. The one I'm associated with is called Elixir. Elixir is an equivalent to INCF in some ways for life science data in Europe. It's 23 nodes from European countries, 15 different communities that it recognises it particularly supports but it supports all life science data. So it's particularly how do we organise and manage life science data for all life science problems and projects. It's trying to coordinate a zoo of community resources. So you know you all use RayExpress and ENA and all these kind of other things, all of those, as well as all the national data sets. And to try to marshal all the different tools of the community, compute resources and so on. It's massively distributed. It's 220 different institutes in Europe. Each one of these has its own APIs, its own funding, its own web interfaces, its own submission tools, its own tool deployment. And it's held together with common identifiers, registries, workflows, love, hope and an awful lot of politics and drinking. I was like this is how we actually put this together. And because that's easy we are now doing the European Open Science Cloud Life project which is to build an uber fair life science community commons which attempts to federate 13 research infrastructures of which Elixir is one. So for example BBMRI which is to do with Biobanking, MIRI which is to do with Marine. I mean all these, so this is clearly a challenge. So what we're dealing here is a zoo of catalogs, a zoo of tools, a zoo of data, a zoo of workflows, a zoo of computing resources. So my first fairy story is to pick up on something that Mary Ann said which is there's a temptation if we're talking about fair to jump right in and talk about interoperability and to talk about doing really complicated things with standards. But the first thing you must do before you can link something is find it. So how would we, when we're working in such a complicated community commons where we have such a range of every kind of life science data, imaging data and omics data and model data and synthetic biology data and so on what do we do there because there is no one model so we looked at how do we do simple finding so our first fair story is conventions lightweight conventions for just finding and accessing registering data sets where you have existing data models and existing interfaces and we have two things here that we're working with particularly one called EDMI which is the European Open Science Cloud data set minimum information which is does exactly what it says in the tin is a minimum information in order to be able to describe for each of the resources what that resource actually does in order to be able to drive underlying infrastructure. I'm not going to talk about that one I'm going to talk about bioschemas. So bioschemas is conventions for using schema.org to find, access and index data sets and other objects in particular to make them ready well not particular but it also incidentally makes them ready for Google data set search as well. It's very small very lightweight, it's very viral and it follows a model of a little bit of semantics everywhere. Who's heard of schema.org? Yay right good so I don't have to explain this slide for the handful of you who didn't it's structured data descriptors in web pages. Very low barrier universal markup so it's a little bit of embedded markup in a web page but the key thing is it's a little bit so that you can expose what's in that web page or that data set or whatever it is you're trying to expose such that you can then exchange information about it, you can extract it from your registries, you can do automated curation with it so fair sharing which was also mentioned by Marianne as a registry of standards fair standards that's managed by part of its part of the Elixir family that uses this kind of mechanism. So it follows the Goldilocks principle which is not too hot and not too cold just enough and Goldilocks is the principle that we're trying to follow throughout all of our activities in our Elixir Fair Commons which is the temptation to over engineer everything is strong and we have to constantly hold back and I'll mention that a little bit later on. So what we've managed to do over the last two or three years, it takes a long time, is we take the schema.org now consider data set for example in schema.org there are 91 properties for data set we probably don't need 91 properties in fact we needed 5 so we chose 5 with 8 optional ones but really just 5 that we were interested in if you wanted to describe a data set, basic information like who owns it and when was it released, boring stuff like that the tools could then use when they were automatically processing over it so we have generic specifications we have some to do with scientific activities like marking up workflows, marking up protocols and then we have things to do with bio data so the few, the six or seven things we want to say if it's about a protein or it's about a sample we then embed that into our resources and then it can be picked up by aggregators and registries and search engines like Google and various different applications so we've taken up to now 200 people participating in that activity 68 resources are marked up so far we have about 11 million pages we have seven candidate new types like sample going to the schema.org community where they will become part of the regular schema.org activities and we have 30 profiles which are conventions for using current schema.org, 14 countries have participated in that and a little example of this is the marine our marine partners and they have a marine archive which they have no API, they've got no ability to do an API it's a very small activity, they mark up their webpages in bioschema we have a little search, a little harvester that goes and crawls and picks it up and then in biosamples which when it wants to register this data extracts that information, registers it into the biosamples curation and links back to it so you keep the provenance of where these things came from. So this is an example of very lightweight fare in a fare commons so it was a happy ending, it was endorsed by Elixir, our first types are going into schema.org and as I said this Goldilocks principle was very important because believe me the amounts of debates on trying to explode into ontology land so ontologists are great people, some of my best friends are ontologists however they could argue about the semantics on the head of angels and pins till one loses the will to live and particularly as an engineer because I'm actually an engineer we lose the ability to do anything with it and so this is a very engineering process so it was important that it remained tractable to software engineers. So when it all got a bit too ontology we went back and said no unless it's a very simple thing that somebody can immediately put up that just understands how to scrape a bit of Jason, forget it. So that's the difference between elegance and the best for tools and of course we have trolls because everybody thinks we are reinventing ontologies. This only uses a subset of the fair principles so then I want to go into a little bit about the fair principles or quite a lot about the fair principles actually having spent some time writing them with Marianne and the fair principles versus fair the nice intention because they are quite different things. So our second story is that once upon a time in a meeting in Lawrence in the Netherlands there was a gathering together of 40 people and a woman so there's issues there I had to nip out for a day and so that was their last female left those of you who watched Ice Age you'll get that joke with your children. So anyway we produced this paper called The Fair Guiding Principles but there have been many many efforts before there was a grassroot activity that has actually become a bit of a top down one. Here are the principles which I took off the website so I knew diligence I did go on to the ICF website although I'm now told that it's all changing anyway but anyway this is the fair principles they will be short test afterwards they are actually in the paper only in a breakout box they are never explained so that's actually proved to be good because we can get away with anything and bad because we can get away with anything and some of them we're not quite sure what they even mean metadata includes reference references to other, qualify references to other metadata people puzzle about what the semantics of that actually mean so what they're intended to mean is that data and metadata are locatable and accessible by identifiers and they're standard access protocols to access them and they have the least restrictive licences that you can get away with and the second piece is that they're machine and human readable data formats that is compliant to many community standards not just one community standards because you might have if you have a data set you might want to be able to mark that up with respect to a data a standard to do with registering that data set which comes from the library community as opposed to marking up with a community standard which comes from the domain like proteomics so there's multiple data sets persists that metadata persists and it tells you the provenance of the data and how it's crossing to other data so that was the intention so they're more than a fuzzy feeling and in fact on your own website you say it's enhancing the ability of machines to automatically find and use data or any object that supports is reused by individuals so it's all to do with access automation so that was the purpose but it's also got into policy and proclamation and vision because it has this cool word called fair so the message is spread across the lands there's numerous papers about the cost about not having fair and what does fair mean in practice and the whole grant program is called the fairification of data sets and things like this there's fair hackathons at bio IT world and there's a whole bunch of projects with the word fair written into them just to be on the safe side and including one of mine I'm not stupid and a whole bunch of these are all projects I'm funded on actually and all these different organisations claiming that they run the whole meaning of fair so lots of going on so this tells me something that simple words are powerful things are angled and simple concepts are not so simple to implement so it's going to have a little bit of look at what we mean by fair one size does not fit all and do beware if you remember one thing out of this talk beware of fair zealots because there are lots of people out there who are claiming that they will solve your fair problems because they have always been fair whatever that was use their technology because it's the only way that you can possibly be fair that they are claimed to be the only people that control fair and they don't know what it means but they're going to measure it anyway and that's one of the things I'm going to talk about next because that's very very worrying so even Barrow Mons who is our kind of he came up with the word wrote in his paper which immediately followed there are emerging indications that this word may be sometimes stretched which will raise concerns and confusions and in fact he's just produced or led the activity of another paper which is interpretations and implementation considerations which was still equally as contentious so let's get south at some reality so because this morning fair came up at least four times that I counted in different talks the principles are actually an aspiration they're a journey they are a call for this machine actionability they're very ambiguous it's a spectrum what's you know you can have different degrees of fairness which are briefly say they have to be domain respectful they have to be implementable with today's protocols so that's something else that people are trying to invent the notion of a fair protocol there's no need for a regular protocol it's more to do with how you use those protocols not the invention of a new one it's just a subset of what we mean by good indicators for a good data set and they are very much a work in progress they are not a standard they're not strict they are not about one size fits all they've got nothing to do with quality they are not open they are not tablets of stone so they're not about being open you can be fair not open otherwise it before because A stands for access not open and they're not about privacy preserving or regulatory rigour either I'll mention that a bit later on as well they're not about a resources quality or impact so for example in elixir we help choose which resources we're really going to put our energies behind supporting we call them the core data resources and amongst a whole measure of indicators are they fair so do they adhere to these principles but there are many other indicators like what is the quality of the data is it a good database does the community care about it you can have a fair data set which is perfect all the metadata is there it's readable it has all the APIs it has all identifiers and nobody cares a damn about it because it's irrelevant it's out of date it's not supported by the community and you can have another data set which is absolutely critical and doesn't meet all the fair measures but that's okay because it's an aspiration of where to move on to it's not about harmonising all metadata to one schema either so there's a bit of hype and we're about here at the moment about to descend into the trough of disillusionment because we need to do a bit more of this clarity business and in particular we need to do more on cost benefit analysis because it is quite expensive to make a existing data set fair and so you need to be very careful about how you choose your battles and what you're actually going to do so this European report actually is very good I know most European reports aren't but this one's alright turning fair into reality and it sets out a roadmap which I've summarised here so that you can actually read it which is first define it good point then implement it and then embed it and sustain it so this is a journey that we're currently on the moment and you'll notice some things that building a culture building some skills building incentives are actually critical so when I looked at your current website this is what you said about ISCF's reports fair and it's adherence to them is a requirement for the standards and you have this document which is about review criteria the standards of best practices so you've got quite a lot of things to do with the standards of best practices but nothing about in the website I'm looking at Marianne here about the actual resources themselves I thought you would so let's go into what do we mean by how would we then, how would INCF then account and be able to say your data set is fair not your standard but your data set or your registry or whatever it is you want to be able to do that means being clear having clear metrics and models and maturity models and you need to have some sort of assessment and then you have to have some sort of verification methodology and verification is a word by the way well I have made it up but you know it's a word and when do you do fair, do you do it when you create the data do you do it when you want to store the data somewhere else in a public archive for example or one of the splendid databases that was presented earlier or what do you do with all the legacy so here the thing to worry about is that fair is actually on its three steps it's a contract at first so it's all about setting up expectations as I've already said so that you can do some self-evaluation and do some reporting there's a step which is to do with compliance when you want to be able to comply to what you consider in your community to be judged to what you consider to be fair and that's in order to be able to review and to do comparisons and to do monitoring and then it's this third step which is judgment and regulation by whom and I presume as from Marianne said this morning I NCF has a view to do this for some of their resources the reason I put that up is because it's amazing as soon as you put out a paper immediately people say right we've got to measure it we've got to you know because the commission wants to know are you fair or are you not so because then we can decide what we're not going to fund anymore because you don't follow you don't get a tick so a whole bunch of people including the NIH came up with the idea of how do we do its assessment and we're going to have stars and scores and things like this and this is a nightmare because you've got something that's ambiguous that is domain specific that really needs to be worked out in the community and then you're going to have some folks who you've never heard of declare that your data set isn't fair that's not a good thing to do so this was a very unhappy ending from these efforts subjective hard to interpret judgmental when the intention was not to be judgmental the intention was to be supportive to be able to change how we did things not to measure things they tended to be drift to quality reviews occasionally just barking mad things came up with you know this resource clearly isn't fair oh okay so shall we just tell the EBI they should close their data set down I don't think so and you know what they didn't anyway so this is partly because of dunkelseifer dunkelseifer is the splendid german word it says dark figures it is the things that you cannot see that matter and so not everything that counts can be counted counts and not everything that counts can be counted so that's why this kind of evaluation has to be done very sympathetically because fair is non-trivial at anything other than most superficial level and that's from Mark who is our lead author so when you INCF go about defining their fair assessments they will need to do some very this is going to be a big piece of work this is going to be tough because you've got to pick up indicators you've got to divide your transparency evaluation and then you've got to eat your own dog food which means design build test and learn those indicators at the same time as you do the evaluations so all together not to do them separate and you're going to have to take indicators that are robust, have humility, are transparent diverse and reflexive and there's this brilliant paper called the metric tide that describes all about metrics and measures as well as putting it into context particularly what's the cost benefit and so now there's a whole bunch of activities throughout well not just throughout Europe but elsewhere figuring out what does a maturity model look like so maturity models are where you have different levels like a sort of value based assessment so this is a sort of naive maturity model that says the different stages, the different points that you could be at in your fair assessment and there's a lot of work being done in different projects in order to be able to identify what are the indicators that indicate that you might be meeting a fair principle and what is the process and how do you choose what the different levels of those indicators should be so this is a process or capability model that's being developed from the fair plus project which is to do with verifying drug databases and this is on the research data alliance fair data maturity model working group which was set up in part to be able to deal with quite a lot of these different challenges and its next meeting is on the 12th of September and it has various different sessions at the Helsinki research data alliance plenary the moment it's for those of you who's been participating in this can I urge you to participate this was actually although it's through research data alliance it's actually a price waterhouse Cooper contract for the European Commission which means that whatever they come up with is how your grants going to be judged so you want to be in there man because I think this is actually very important and in my opinion it needs a little bit of restyring I'm being videoed aren't I it needs a little bit of nudging shall we say so here's an easy indicator metadata are released with a clear accessible data usage license or metadata are released like this so you can say okay what's the property of license what's the format and what does it allow you to do and this is what you are actually measuring now this is currently what that was an early attempt this is a new attempt is mandatory that metadata includes information about the license under which the data can be reused well okay it's okay I guess it's kind of less clear it's less precise but this is the kind of thing that will become part of the judgment of all resources this one's a trickier indicator it meets the domain relevant community standard and this is trickier because all it does is it just says you will meet the domain community standard it's not very helpful to measure and suppose there isn't a standard suppose the standard isn't up to it they had to be community specific which standard how's it going to be validated how are you going to capture it so things like the Euro shapes, metadata your portal, your reviewers this is the sort of thing they're really going to have to work on because basically it's quite easy to do some of the F and the A you can do them in a kind of non-domain specific way but as soon as you get into the interoperability and reuse you're into it's up to you guys to come up with it and there has been an effort to build an automated system in order to be able to defyse and create maturity indicators to register them as collections in fair sharing to then write tests for them so that if you have a starting URI you apply those tests and then you report and that infrastructure is called the fair evaluator and there are companies already starting to start this up as a business so at least it has community governed indicators so you can decide your own the idea is to completely automate fair detection and at least they are trying to sanity check it while it's being developed of course this doesn't do anything to do with that one I just mentioned because it's impossible but it will do things like have you got a license, does it resolve is there a metadata there and the thing that we well Mark's been doing this for some time now is that most things that say they're fair aren't this effort to do with verification of legacy databases the new magic word which is how what is the process and the methodology in order to be able to take the legacy databases and make it so that their identifiers are persistent and make it so that their metadata is fair and so on without getting all muddled up with harmonisation pipelines is non-trivial and this is again turning into a business so people are turning this into a business so the conclusion of this part is just saying you are fair doesn't make it true okay is uneven some parts are fair or easy some parts of it aren't and it's multifaceted identify use is chaotic so that automated system that Mark Wilkerson has produced where you produce what you think you should be measuring and how and then really put it through this automated process has really identified that identify use is terrible actually and separating out what is metadata from data is really hard okay and yet it's part of this it's a non-trivial it's a set of behaviours not a specific technology and also that we really need to be investing in first mile fair that means we can't be doing this as a legacy effort we can't be saying oh we're going to try and just fix it at the end we can't try to fix it when we're putting the data into a public community archive or shared repository we have to actually do it at the beginning so for that is my third story and my third story is first mile last mile same thing this is how far away are you from the actual end goal of where you're going to deposit your data so I built a system for years now a decade now which is a common for self managing systems biology projects this is systems biology projects where there are no curators they're doing it themselves they're managing data and models and standard operating procedures and workflows and samples and what we want to do is to bridge from the infrastructure and the standard and the databases between the community to the actual investigator and it's been quite widely used and the infrastructure is very widely used so the key point here is what this tries to do is say one of the problems that we have is you're doing lots of different types of work systems biology and then you have put it into all these different repositories for different types and you completely lose the context so then you have to rebuild it all again because there's a really good database for proteins which is proteomic database for pride and there's the models for the models and there's another models one and so on so on, what you want to be able to do is to say we respect the existence of all of this but we want to be able to create a way of interlinking all of the different types of data and to enable them to still be in their own repositories to still be there so this is references between metadata including the repositories that may be at your own institution because that's where your data is stored so this is a commons that attempts to effectively bridge across the ecosystem it bridges across the ecosystem of your local environment so your local lab your local university but also the public databases where your data will be eventually deposited into or that you want to be able to refer to when you're trying to organise your metadata and it tries to also link things like your protocols with your data and your models and so on and it does it by using this thing called ISA which is the investigation studies assay model so that's and this is a little bit that you can briefly see a little bit there so how do I relate metadata together so why is that important because what we want to do is when you're starting your project and you're looking after your data like this put it up to a repository because it's already organised it's already prepared it's already got a pathway so it's a part of a staging post to these other resources because when you're in your ecosystem when you're in your commons what we frequently think about is this world all the different public data sets and how they all work together but nobody works in that world they all work in their labs and their labs is using their normal environments or they're working with their national infrastructures, their national store, their national galaxy installation so a commons has to incorporate all of these things so part of the challenge for INCF is how do I go from the data actually being collected and managed by the researchers in the field such that it's ready to go into the public archives in its format because here a miracle occurs quite a lot of the time and also that miracle is two different kind of agendas here so INCF has the agenda of wanting really good long term high quality data collections and these researchers want to get a nature paper out really fast that's completely different and sloppy science wins so that's going to be a challenge and here's the picture so that's exactly what Marianne said so I just had a different picture but that's what she was saying so the end of this part of this story is there's a tragedy there's a tragedy of the fair commons it's only as fair as its tenants so it's only as fair as they support and allow it and contribute to it project sovereignty rules so I've been running this thing called Fed and for a long time so I've got a hell of a lot of data about how people actually share in projects and the answer is they don't because of all the reasons that we all know professional stewardship into projects is essential because it's the only real way to just having a PhD student read up about a bit of of how to do a bit of data management isn't enough and there has to be significant community socialisation and values by which I mean primary investigators because a villain of the piece is primary investigators because PhD students in my experience are very keen on being able to do the right thing primary investigators are very keen on them writing a nature paper and it costs some very great primary investigators who have done this well and have benefitted but many are kind of, they don't see it's important to do good data management and this might be a useful little roadmap for you when you're doing INCF I mentioned nudging here because there are four incentives in the world in commons production love, money, fame and nudge that's the only four things you can do in order to be able to persuade people to behave well in a commons environment PhD students do it for love everybody else does it for grants and for papers but the way to really make a difference is to do it by stealth by sneaking it in to processes by smuggling in very good quality metadata collection into your spreadsheets things like this this is how you really manage it the story is about fair digital objects so in the turning fair into reality paper it talks about digital objects not just data so that means everything has an identifier there's formats, there's metadata and there's this object and in particular in the European Open Science Card Life project we are building workflow commons really into workflows because JB told me in Neurosciences and I've run for the last 15 years I think workflow commons the first general and still the only general workflow registry but there is a zoo again of workflow registries for all the different kinds of systems including GitHub itself currently available for science any guesses no not 1500 that's a good guess though 255 we know of 255 that are real as opposed to there was a paper like they had to actually have some code things like that so 255 and counting so the point of fair was to make metadata and data so it's machine actionable so that you could actually use it in things like workflows and even better if that fair data could then be generated from the workflows because then it would be automated and it would be good fair data and that means all the workflows have to operate in fair and not proprietary formats so that's your software has to produce fair and not proprietary formats it also means that you have to propagate identifiers and licenses and authorization through your workflows which is non-trivial it also means you have to mint identifiers and track provenance and know how to license the end product as well when that end product is actually the combination of many other data items all with different licenses codes that may also all have different licenses so that's quite interesting and here's a little picture of an entry of kind of a view on our forthcoming workflow commons we can treat fair workflows fair workflows themselves if they're like software then the principles don't work anymore because they're composite they have portability issues there's lots of versioning I mean it was designed the fair principles were really designed for data they weren't designed for Kylie composite executable changeable objects and in fact things like software maturity and maintainability and documentation practices are really important but we can also treat workflows as if they were metadata and if we treat them like they were data then we can give them machine actionable metadata and luckily we have some the common workflow language is a way of describing workflows to be portable and scalable interoperable for the different systems and so that it could be used for containerized tools and we also use things like e-diamontology to mark up the inputs and outputs and the different steps and we're using something called the research object specification to bundle together the descriptions of all of the different components of the workflow to add some context to relate the different components to other collections like the standard operating procedure it might be associated with or our protocols and also to link these descriptions with their native workflow systems and this is going to be the basis of the European Open South Science Cloud Life Workflow Collaboratory it's also the basis of something called the biocompute object specification and this is an IEEE specification that is currently going through standardization which is to do with how do I describe a high throughput sequencing pipeline such that it can go through regulation as a medical instrument so that means I really have to describe it so it's safe which means I really have to describe the parameters properly for once because that matters and so this is all part of that activity so this is part of the standards work that Marianne mentioned that is going on somewhere else that maybe INCF will be interested in so to finish up how's that going, how's this workflow stuff going well it's work in progress, again the issue here is to keep everything developer friendly, nobody reads specs nobody, everybody copies examples so the most important thing that we should do is make lots of examples and we particularly the European Biophematics Institute has really moved forward on this their Metagenomics division now designs all their workflows workflow language workflow blocks, makes them and then implements them in various different systems, it means that they've enabled pipeline exchange amongst projects, they can compare the versions of the different workflows, they can recycle, sub workflows, they're building libraries and that workflow library is now part of the standards it's becoming part of the standards genomics consortiums activity and nipypes, those of you who use nipypes CWL is coming, woohoo how exciting, anyway, well I find exciting because I'm a geek so the last slide is what is fair what should be fair and how to implement it I heard it a whole bunch of times this morning and every time I hear that word I go ooh I wonder if they know what they meant, right, because it's not simple but it is not good intentions, it means something very specific, all the stories that I hope I've told you are not technical ones, they're social ones they're all to do with how do we get society, how do we get the community to work together in order to be able to use a small bit of technology in order to make things fair but without incentives and cultural normalisation and long-term investment everything is just going to be a story anyway I suggested, where I suggest that this kind of road map that's been come up by the European Commission is actually might be a good road map for INCF because it kind of lays out what are the steps that you need to do, what are you going to be in your incentives how are you going to skill, what is your fair ecosystem, how will you build a fair culture what is it actually you want to declare as your community what you believe to be something that can be measured as a fair assessment and how do you want to measure it before some other guy comes along and says aha I've got the contract to measure you because you don't want that so that's the end of my fairy talk and I hope it actually gave you some stimulation I ought to mention the name at the bottom that's Ian Cotten and he's my husband and the reason I put him in is because he for the first time we've been married 37 years now he met me when I was a child honest and he for the first time he came over breakfast he said I've just been allocated to this research software engineer to this neuro informatics project he said and it's something to do with pipelines and I said oh I do hope you're going to do a workflow pipeline and he came back to me and he said my god it's so bad they're merging files using eyeballing spreadsheets I'm not going to tell you who the group is so let's be have a bit of reality here while we're all kind of in this exciting world of beautiful data archives and beautiful tools that I saw this morning and great talks that I'm going to hear later on the majority of the community are struggling to write a bash script that actually is under version control and emerging their data files eyeballing Excel spreadsheets so let's just worry about that for a second and then publishing papers so of course a wonderful talk Carol and when you said and you haven't gotten to evaluating resources yet I was like well we actually deliberately did not do that because I felt that there was actually a mistake going on with fair in this headlong emphasis on this technical validation when the communities who are referred to all the time in the fair standards had not yet come to grips with what that actually meant at a level at which they care about it they care about file formats, they care about things that are in the laboratory but a lot of people who talk about fair are talking about semantic graphs that sit on top of these public databases and I think that's great but when you talk to most neuroscientists it's at the data level and at the operations in the laboratory and that's where those people cannot help us that's why I've been emphasizing that there's a community infrastructure that's required for fair and that this work is hard because this is where the scientists themselves have to come to some agreements and that's why we decided to start on the community standards to say where does neuroscience do we even have any and the good news was yes they're starting to come out in a form that possibly can be used but we deliberately said we're not going to do that yet because we don't really even know how because we haven't defined what fair means for neuroscience. Exactly and of course I expected that because you are a wise woman and one of the authors of fair and one of the so I had a little Skype chat with Mark Wilkinson who's the lead author of fair doing a lot of the work and I sent him Marianne's picture and he said whoa they get it I said well Marianne does anyway and because because you're right absolutely and what worries me is that a lot of people there are a lot of missionaries there's a lot of fair missionaries out there a lot of people who are fair wizards who will say I can magic up your data set to be fair and I can be contracted to do it and people believe them and important people believe them like funders right and journals and other places or fame and money so this is why it's important to engage and to say put your foot in the door I spend a lot of my time in these RDA telcons putting my foot in the door saying no but I'm beginning to feel a bit lonely so you're in the right place now so it's really good that as an organization say this is what we are going to do right and that's a powerful thing and really good yeah you should absolutely Hi Carol thank you for the wonderful and inspiring talk I wanted to ask a question about two pieces of things that you said one is that you said it can't be about building better tools fair is not about building better tools that is absolutely correct but then I want to ask the question so you said we have to ment and propagate identifiers and I'm picking on that one because that's one of my hobbies and pastimes but it's also one of the things that we have founded to be just at a practical level one of the most difficult things to get anybody to forget bascripts I mean this is a whole other level so my question is do we have the tools to build a fair data ecosystem out of right now and if we don't have them right now what do we need to do first oh okay that's a that's damn hard question luckily we don't have time to answer that now so yeah absolutely so people have been worrying about things like identifier creation and particular propagation of identifiers through workflow systems for example for a long time it's really a research topic and it hasn't even got to the development stage it's still trying to figure out what that means so one of the things that we so this has to be done I think at the moment in practice what do we have to do in order to solve a particular problem so we don't try to do it philosophically we do it from an engineering perspective I would say that because I'm an engineer and then worry about well after we've done it over a series of different engineering examples what's the principles that we can do it so because at the moment I would say there's a lot of work on principles and theory but he doesn't ground out in practice we're better off starting at doing some bespoke practice and then coming out of that and saying well what are the general principles the other way around that's how we're doing it in elixir actually from that point of view so we're grounding everything in our experiences in next flow and galaxy and snake make and we've got about 15 workflow systems that we're using in elixir routinely and from different communities and they all have different behaviors and then we're trying to get that together in the common workflow language community so I wanted to ask a follow up to that which is I think that I would agree my question would be how do we sustain the not necessarily sustain the diversity but sustain within the principle of fair the idea that there won't be one and so how do we prevent a single implementation right from trying to say oh implementations are fair and Marianne is part of this exercise there is no one implementation there's no one technology this ship already sailed at one point so there's a community the semantic web community who declared that the only way to do fair was RDF now I come from the semantic web community I founded the first journal of semantic web no it's not you know that is not the only way because this is principles that we can then turn into practices and approaches which can then be ground out in specific technologies so the only way we can do this is by building a diverse community that is implementing it in diverse ways and to put our foot in the door and INCF and elixir are part of that foot in the door to say there isn't one technology and please don't believe people when they say there's only one way of doing it just like there isn't one workflow system and there isn't one commons and there isn't one database and you want diversity, diversity gives you strength we're all trying to retain biodiversity because it gives us robustness we have to in an ecosystem of infrastructure and data we need diversity in order to be able to retain robustness I don't know if you'll be brief it's actually a good segue into my question so you mentioned some of your best friends are ontologists but they don't really produce anything useful I didn't quite say that I want to go home and not be rude things written on my front door so the reason I mentioned that is it's easy to start with something simple that's sort of a practical to make progress but at the same time that also puts some serious issues under the rug what we have seen in the Alzheimer's and neuroimaging world is that people hurriedly produced a lot of garbage to put it plainly right so is it not better to bear the brunt in the first mile so to say take all the pain and try at least get some whatever we can agree upon standards than just propagate garbage forward there's a big trade off to do with doing something practical but giving yourself a root so what you don't want to do is to paint yourself into a corner so in bio schemas what we did was what happened was the ontologists 50 papers on ontologists so just to put my hand up say I was that ontologist it's like an AA meeting that the ontologists got really interested in trying to describe the science in the resource and that isn't what we were trying to do we were trying to describe how you found it because there were ontologists already in existence that described what the content was and what we were able to do is to say this is the minimum this is what a few things that we want in order to be able to navigate and then you drop out into the very rich ontologists but in the end we had to introduce a few things like gene and protein and so on because we needed it to work with the infrastructure which is what his question was right as an engineer we said actually what we're going to do is have a property which is this ontology term to build a harvester and that harvester is now hard to build if you're just an off the shelf jason programmer so that killed it that killed the ability to do it so we had to make a compromise so in bio schemas we have mechanisms for you to be able to incorporate the ontologists that already exist and the ones that will come but you don't break the principles that meant it was actually bio schemas rather than something else and that's what I meant so it's kind of how do you build that road map and that took two years to work out because we zigzagged you know we try to and we realise hang on we've gone down a rabbit hole here we've now made it beautifully ontological and completely unprogrammable Howard that was fantastic thank you so much I really love the fact that you ended your talk on the training and education aspect as well because if we want to get there that's definitely the work we have to do and with NCF we're running all those projects