 To start off, what are the fair data principles? So the fair data principles were drafted by a force 11. Now force 11 is a community, an international community of scholars and librarians, archivists, publishers and research funders. That sort of came together organically, started in 2011, hence the force 11, and has been around ever since that date. And what this group, this community has looked at is to sort of facilitate change towards improved knowledge creation and sharing. And as they were working on this in 2015, they came together and they said, well, it'd be good to have some principles around research data and the sharing of that research data and how you can just do that. So late in 2015, they drafted these four fair data principles and in early 2016, they wrote an article in which was published in Nature about it. And from that moment onwards, the ball started rolling. And these principles started to receive attention and international recognition. Sort of, this is actually quite useful. I think a number of things to keep in mind if you look at the fair data principles and probably the reason why they are attracting so much attention is there are a number of things there. I think one of the things there to notice that they don't just look at making research data human readable, but they also look at making research data machine readable. And I think that offers a lot of opportunities into the future by thinking towards the future situation in which research data is machine readable, can be harvested by machines, can be pulled together, can be used for big data approaches, can be used for novel approaches in exploring data and knowledge creation and finding patterns and different knowledge out of that data. I think it's an interesting step into the future. Another thing that's quite valuable, I think, about the fair data principles is that they are technology agnostic. If you read them, you'll find there's no one recommendation to go with one specific technology. It's formulated in a way that different types of technology can be used to solve the challenges. Another thing that they've done quite well is to create a set of principles which are discipline independent. So the principles can be adopted across different disciplines in different ways and meeting the needs of the specific discipline. Also, if you look at the principles, it talks not only about the metadata and it not talks not only about the data but the two combined and where working on the metadata can enhance the visibility of the data, for example, or the reusability of it. So as you probably have noticed by now, fair is an acronym and it stands for findable, accessible, interoperable, and reusable. The are reusable is the one that sometimes results in a bit of confusion. People think that it's actually reproducible but it's actually reusable. It's a broader concept. So just keep that in mind. We'll talk about each of those principles in more detail in the coming weeks. Before we move into the first one, findable I have a few general pointers which probably are worth keeping in mind as we look at the fair data principles. So one of the questions I get sometimes is, well, do you want all data to be fair? And I don't think that is the case. I don't think that is necessary and I don't think it fits in with research practice. If you look at researchers as in the process in which they create research data, there are various steps in that process and in some cases, huge volumes of data are being created out of experiments coming off instruments, et cetera. These huge volumes of data can't be kept or stored in that original form. They often need to be manipulated, analyzed, processed, et cetera. So these huge volumes of working data are probably not suitable to be made. Findable, accessible, interoperable, and reusable. It's rather that data as it moves through those steps in the final resultant analyzed data is probably more suitable for that purpose. Researchers also sometimes use scratch data to explore different experiments, explore different settings, see how things work. Not all of that data is worth keeping or worth using right till the end. Now there are also cases in which there are, there's research with commercial interests, maybe commercially funded even. In that case there can be arguments why, especially those commercial parties are not interested in having any of that research visible or public to the outside world in that they want to keep it quite to themselves that this research has taken place. This also happens in case of national security and defense research. So in those cases, it probably does not make a lot of sense to make any of that research data findable, accessible, interoperable, reusable. One question we sometimes get is well, what about data that contains data about human subjects? Whether there's privacy ethics considerations around the data, should that data not also be kept hidden or private? Now there is a distinction here between open data and fair data. So in the case of open data, you're talking about making everything open. In the case of fair, actually talking about making it accessible through the appropriate routes, and that doesn't have to be open. So in the case of human data that refers to human subjects, identifiable data, there might well be a very good argument why that data cannot be made openly available, but it can be made accessible through appropriate routes. In that case it would still be fair because it would still be accessible, however it would just not be open. We'll talk more about that next week when we get to the accessible point. What the fair data principles are not about, and this is something that only sometimes it crops up. In copyright law there's talk about fair use and fair dealing. That's not capitalized, that's in lower case, that's something completely different and that's not related to the fair data principles. One of the other things I ran into recently was it turns out that a number of market research companies actually have developed their own fair data mark, which talks about how these market research companies treat the data that they collect as they're doing their market research. That is also lower case and that is completely not related to the fair data principles and capitals. One other thing that's worth keeping in mind is that fair is not an actual standard. So some people expect to say, well, I wanna make my data fair and I wanna make sure it fits all the boxes exactly. You'll notice as we start talking about the fair principles and digging into them in more detail, it's actually not that black and white, it is a set of principles, it's a set of ideas about how you can approach it and how you will actually approach it in practice will probably depend on the discipline. So there's not one standard there that will work across all disciplines. Another thing to keep in mind about the fair data principles is that if you want to achieve, if you wanna make more data more fair, it's not just about the research data itself, but it will actually require some work around it. So it will require a layer of underlying infrastructure and that can be human infrastructure or technical infrastructure, which is in place. So that a researcher does not have to do it all on their own, but there'll actually be things in place that will make it easier for the researcher to achieve making their data fair. So things that you can think about there are policies around making the data fair, procedures and guidelines that might be in place. Be great if there are tools or platforms or software in place that actually make it easier for the researcher to make their data fair at the end of that workflow. And finally, it's gonna be important to have the skills and the skill set available to the researchers, the research manager, the data managers, librarians, e-research analysts, e-research staff, all the different staff members that are involved in that process to make it easier to make the data fair down the track. So I think one of the questions I get is, well, why is it now specifically these fair data principles are coming up and why are these being adopted so widely? Well, I think for one thing, they've got an attractive acronym. I think the other thing is that it covers quite nicely work that is already being done. If you look at them in more detail, you'll find that some of the things that are covered there are actually things that organizations around the country have been caring about for a while and been caring about more and more. So some of it is probably not, it's less about a completely novel approach, but rather bring it together under a nice acronym in a well-packaged form. I think other reasons why it's proven to be useful, first of all, it's receiving a lot of international recognition. It's not just a national initiative. If you look at the principles, there is actually quite a lot of detail hidden below them and quite useful detail. The fact that it is discipline independent makes it easy. It is not as hard a sell as making all data open. The only challenge here, and this comes back to that point about the fair data principles not being a standard, is that it is hard to measure. It's hard to hold up to a list and say this data is fair and this data is not fair at all. There is a more of a scale from being less fair to more fair. So if you're looking at where fair has been picked up and in various ways, there's plenty of examples out there. I've just picked off a few here. Some of them international, some of them national, some of disciplinary. So for example, in the European Union, the high level expert group working on the European Open Science Cloud sort of picked up the fair data principles and embedded that in their work and their thinking around what the European Open Science Cloud should look like. If you look at the Horizon 2020 funding program by the European Commission, that's also drafted guidelines for data management and in those guidelines, they also use the fair data principles. If you look in the US, NIH has just set up a data commons pilot in which they wanted exploring what a cloud would look like for sharing research data. And there, they're also looking at the fair data principles. In the Netherlands, initiatives being set up called GoFair, which is now reaching out to get more international momentum and more international partners. That's also a very interesting development in that they've looked at the fair principles and also how you need different elements to support that, including cultural change, training and building an infrastructure to make sure that data can be made fair easily. In the UK, there's currently a project going on around fair in practice and taking the fair principles and exploring what that means in different disciplines. The American Geophysical Union has just come up with a project in, I think it was only yesterday, the press release went out, that what they were looking at is what it will mean to make data open and fair in earth and space sciences, exploring that further. And closer to home here in Australia, one thing you might have already heard come by is the Fair Access to Research Outputs Policy Statement, which was drafted and is now available for support by institutions around the country. And the focus there is very much around research outputs in the ARC and HMRC definition, as in publications and conference proceedings, all sorts of publications, materials and how those materials can also be made fair. So that was a long-winded introduction, more in general, about the fair data principles. What the one principle I wanted to talk about today is the first of those and that's findable. And if you look at the actual principles and the way it's described, findable is broken down into four elements. So for the research data to be findable, in the principles they say that the data and the metadata should be assigned a globally unique and eternally persistent identifier. Well, in practice that just means it needs either a DOI or a handle or a poll, some identifier which is globally unique and eternally persistent and that there's an organization that sits behind it that cares about making sure that that identifier will resolve to that data set even when that data set would move. This is where that example of being technology independent comes up. They don't recommend one over the other. Any of those solutions works as long as that identifier is in place and it gradually resolves. Second heading there is that they say that data should be described with rich metadata. That's great. However, they don't specify what rich metadata means. So there is, this is one of those places where it's not black and white, is your data fair or not? What we'd say is make sure that there's enough metadata assigned to alongside the data so it can be found that it sort of answers the right questions first from somebody that's searching for your data. The third heading talks about the metadata and the data being registered and indexed in a searchable resource. So there's different ways and several ways to tackle this and a number of ways to think about that is while having a search interface locally, a database locally, some way of making sure that your data collection can be found through a search interface. But what we'd also definitely recommend is making sure that the data collections are, the descriptions of the data collections are passed on to aggregators, national aggregators. For example, we searched out Australia, but there are also other aggregators out there or more disciplinary aggregators like Tern. They might go out into an international disciplinary aggregator like OLEC, the Open Language Archives community. And data can also be published in international disciplinary repositories like Pangea for Earth and Environmental Sciences or in the case of astronomy, for example, the International Virtual Observatory Alliance and the systems they have in place. So there's various possible routes to publish your data, but make sure that it goes into a place where it can be searched, can be found, and also will be indexed by search engines like Google Scholar. Finally, last point, and this really comes back to the first one, is if you're gonna have a globally unique and internally persistent identifier for the data collection like a DOI or a handle or a pearl, make sure that it's actually captured in the metadata. Okay, so that was the first, that was a quick overview of Findable and the way that they have described Findable.