 Thank you all for coming to this session. This is, I believe, the second in a series of presentations for the ARDC Co Investment Partnerships to provide some information on our requirements, but also services and support we can offer in the Co Investment Partnerships. So this session today is going to be very much around our fair requirements in the projects and what I aim to do is provide a bit more clarity around what is fair very briefly and in more detail what are actual requirements in the projects, but also what support can we offer in that regard. And so I would like to spend some time speaking about that. At the end, I'll dig into data licensing in a little more detail because that is actually a specific topic that requires a bit more digging out and a bit more in-depth exploring. So welcome everybody. My name is Keith Russell. I am manager engagements at the ARDC and I've been working on fair and what fair means for the ARDC for several years now. So the first thing I'd like to mention is that I acknowledge and celebrate the first Australians on whose traditional lands we meet and I'd like to pay my respects to the elders past and present. So the first thing I'll briefly speak about is the fair principles, what are they and why are they important, but I'll keep that very short because I'm guessing that most of you will have already heard about the fair principles and what they mean. So fair principles go back to 2014. There was a workshop in Leiden at the Lawrence Centre in which a range of different stakeholders got together and this includes researchers, librarians, research infrastructure providers, etc. And these people came together and agreed a set of principles using the four letters, findable, accessible, interoperable and reusable to form the word fair. And it really resonated and it really picked off. It was further elaborated on in an article which was appeared in Nature in 2016 and since then has really picked off globally and been recognised and acknowledged by a range of different organisations that play an important role in this space. So think for example the European Commission in Europe, NIH in the US has very large funding bodies that have recognised the fair principles but also publishers like Nature and PLOS here in Australia, the Encriss programme of which ARDC is a project has also recognised the fair principles and the value of the fair principles. And if you look at NHMRC for example the recent Code of Responsible Conduct alongside the code is the Managing Data Guide and that guide also references the fair principles as a very helpful way, a very useful way of thinking about how to enable maximum reuse of data. So the principles have received a lot of uptake and a lot of recognition and I think there's a few reasons for that. For one, I think it describes very thoroughly how data can not only be made, enable maximum reuse by humans but also by machines. So this is working towards a future in which data can be taken up and used in AI, machine learning, other principles. Data can be combined and can enable maximum discovery and reuse down the track. So fair and I think another reason why fair is also very useful is that it can apply to all sorts of data and there's a range of research data out there which can for very valid reasons not be made openly available. So think for example of sensitive data but the fair principles still are very useful and still allow for the fact that data can be made accessible if not open in the cases of for example sensitive data. I think another reason why fair principles have been very successful is that they are technology-agnostic. They've been consciously been formulated in such a way that they're not dependent on any specific technology and can be used in different technology stacks and different platforms etc. They also address not only how the data can be set up in such a way to enable maximum reuse but that alongside with the metadata and how those two need to be related to each other. And I think that combined thinking is really helpful and enables maximum reuse. And finally they've been these principles being set up in such a way that they are discipline independent. So it depends irrespective of which discipline you're in, you can apply the fair principles. Now that very broad approach that they've taken and that principle-based approach that they've taken is great but it has a drawback and the drawback is that it's not completely clear how you'd actually implement them in practice and it leaves some room there for different implementations. So that for ARDC was also a bit of a puzzle. As we started working on FAIR we said well okay how do we translate the FAIR data, the FAIR requirements into practice. So what we've done is we've said well for a range of our data related projects and I'm speaking here about the programs in Australian data partnerships, public sector bridges, the cross-enquist program and also the platforms 2019 and platforms 2020 programs. In the projects in these programs some of the outputs will be data. And today I will be speaking about what our requirements are around the data that is being produced in these projects. So just for the scope of today what I'll be speaking about is the requirements around making data FAIR. If you're interested in making your software FAIR that is being produced in the project or if you're interested in making your platform FAIR at a different level I won't be speaking about that today because that actually there are a few other considerations when you start to think about making software FAIR or making platforms FAIR. So the focus today is very much around making data FAIR. Now also slight disclaimer ARDC is also working on a few other co-investment partnerships and the outputs of those partnerships are different often and these are frequently not producing a data output. So for those projects also these FAIR data requirements may well not apply. So this is very much for those in Australian data partnerships, public sector bridges, cross-enquist and platforms 2019, platforms 2020. Okay so the requirements that we've listed based upon the FAIR principles are listed on the ARDC website. There's a link here and I think Kerry's just put them into chat. So please go to that PDF document and there you can find them listed neatly. And there you'll be able to find not only the requirements but also an appendix with some metadata fields that we would either require or recommend. So as you go through those requirements you'll notice that some of these elements are actually required and some of these we've made recommended. And the reason for that is in some cases it will be dependent on the existence of communities, agreed standards, vocabularies, approaches etc. So where those exist we would say please use those community agreed standards and approaches. If you're in a discipline where they don't yet exist then we understand that and we can't require you to adopt something that doesn't yet exist. So that's why you'll see a little bit of variation in those requirements where some elements are really required and others are recommended. You'll notice as we go through this that it can be quite an ambitious task and there's quite a lot of elements here in making your data FAIR. And we perfectly understand that not everything may be achievable in the timeframes of your project. And that's fine. What we do ask is what we will be asking in your progress reports is to report back on how you are making your data FAIR. And if there are elements there that you are not able to achieve, then we ask you just to explain why you are not able to achieve that and provide an argument for that. We also ask you in the course of the project before you get to that point to already have a chat to the project liaison and the regular contacts you are having with ARDC to sort of tease out what is achievable and what is not achievable in our project. Okay, so now I'd like to get into the nitty-gritty. So the first of the four letters is F for findable and what we have done is we've taken the, the, the principles that sit under the four letters, and we've tried to translate those into things which are relevant to the Australian context and where ARDC can provide support and infrastructure to enable this. So the first of the principles asks that for the data and the associated metadata you assign a globally unique and eternally persistent identifier. So the thinking around this is that there will always be a link to the data set so that people can actually find the data. And when we say find the data, we mean usually find a landing page which provides information about the data and from there you can get to the data. So the persistent identifier we prefer for this is a digital object identifier, a DOI. There are also other technologies out there if you prefer to use a different technology because that might be the standard approach in your discipline, that is fine. But if you, if you're not sure, we can, happy to support you in minting DOIs for those data sets. And if you're interested in what that means and how to achieve that, please come to the session in two days time in which we'll talk more about the DOI minting service and how that's possible. The next principle there is that data is described with rich metadata so that people can find the data and understand what the data is about. In the requirements we have an appendix and that lists a number of required and recommended metadata fields that you should have alongside your data so people can find the data. The fair principles also state that data should be findable through relevant discovery mechanisms and relevant discovery platforms. Now here in Australia we have a national level discovery platform which is called Research Data Australia. It's run by the Australian Research Data Commons. So if you're interested in getting a record into research, we think it's a good idea that your data should be found findable through Research Data Australia. So we'll be happy to help you there in making sure that you have a connection to Research Data Australia and that your data is findable through Research Data Australia. There will be a session on that on the 19th of April and that will describe in more detail how you can achieve that. Also, because Research Data Australia is across all disciplines, if there are discipline specific discovery aggregators out there that are relevant to your discipline, then please also make sure that the data is findable through these. And that can be those for example can be international discovery aggregators. If you're interested in seeing which possible aggregators are out there, there's a sort of a portal out there called Re3Data. It's an international portal and it lists a whole range of different repositories and aggregators that exist around the globe. And it might be worth having a look in there to see if there's any aggregators you would like to make sure your data is discoverable through. Finally, a small practical point which comes from the Fairbred Fair principles, and that is that you're you should mean to persistent identifier for the data set and that persistent identifier should also be actually included and listed in the metadata for that data. So those were the first on findable. Now moving on to accessible. First of all, I'd like to emphasize again that Fair speaks about making data accessible and it does not always have to be openly available. So what it does say is please make your data as openly available as possible. So where possible please make it openly available, but in certain certain cases there are very valid reasons why data cannot be made openly available. For example, if it's sensitive data, if it contains personal information or national security information, etc. And in those cases, the data can be made available and accessible through closed means. And I'll speak a little bit more about that in a second. So we recommend that you make your data available through a repository and having that available through a repository will actually provide some of the features that we're talking we'll be talking about here below. So one of those is data should be available as a download or through an open documented API. So in case of small data sets, it can be useful to make the data downloadable. People can people can either access openly or through through authentication and authorization and then download it. Or in some cases it can be very large data sets or in some disciplines there is more of a standard around accessing data through API's and pulling in parts of the data through API's. In that case, please make the data available through an open documented API so that humans can easily pull in that data but also machines can easily pull in parts of the data they want to use. If for very valid reasons the data cannot be made openly available, please do include a clear description how a potential re-user can get access to that data, can request access to that data. So for example, if it requires ethics committee approval, please state how that can be achieved, etc. So under Findable, we talked about having a persistent identifier and that points to a landing page. So a landing page is a page that sits next to the data on top of the data really and it just provides information about the data set. So this is where you can find a series of metadata about the data set and then from there is a click through to the actual data. If the data is open, you can click through from that landing page directly to the data. If the data for very valid reasons is closed, then that landing page contains information about how to achieve how to request access. If data is not public, then clicking through from that landing page to the actual data will require authorization and authentication so that you can so that only the people that need to get access to it can get access to that data. And the final point here, and I find it's a nice one, it's an extra thought that the group that formulated the fair principles came up with and that is in some cases data may have to be deleted for whatever reason. And in that case, it still is valuable to have that landing page in existence. If somebody does want to find what happened to that data, they can go to that landing page and there there will be information about the fact that the data is no longer available and perhaps the reason why it is no longer available. But if people need to trace back what happened to the data, for example, to enable reproducibility of research, then that is very useful background information to still be made available. So moving on to the eye for interoperable. And this is probably one of the more complex elements in the fair principles. And this is very much focused around making sure that data can be taken up and be combined with other data that is available in the discipline in the area. So for that, we recommend that the data use community agreed standard data formats. Now this is very much dependent on what already exists in your discipline or community. So we understand that and if you if those community standards agreed already exists, then you can use those and please use existing standards. If they don't yet exist, have a think have a look with the community about establishing those but we understand that that will take some time. In the same way, please the metadata you use, please use those community agreed standards and approaches that exist in your discipline and area. We also encourage that the data and the metadata use values that come out of community agreed vocabularies. So if you might already have specific vocabularies that you are using or you might want to establish a few vocabularies that are relevant to your data collections. If you are interested in doing that, we have a service. It's called research vocabularies Australia, and that enables you to create a vocabulary and to publish that. And then once it's published, you can use it in your own tools, but others can also use it and reference it. So that can be a very helpful tool in making sure that you use vocabularies in your data collections. So for more information on research vocabularies Australia, please attend the session on the 19th of April, which will talk about how to use it and how you can best use that. Finally, the metadata that goes alongside the data and provides information about the data set. It would be great if that contains references to relevant research objects and entities. So think, for example, publications. If the data set is referenced by specific publications or used in specific publications, please include a reference to that publication. Or if the data set is being created by a series of researchers and contributors, please include the orchids for those researchers and contributors alongside the data. So you include that information alongside the data and preferably using persistent globally unique persistent identifiers because that makes it easier to trace back who that actual person is. So in the case that John Smith was one of the contributors by using an orchid you can be sure that you reference the correct John Smith. For more information about those persistent identifiers and which persistent identifiers would be most relevant for which use. I would encourage you to attend the session on the 14th of April, and there will be speaking about different persistent identifiers systems that are out there and which one you should use for what. So the last letter in their principles are for reusable. I would say it contains the, the other things that are not yet listed under FA and I, and together, all of those elements make sure that the data is actually reusable down the track. So first thing that was listed here by the authors of the fair principles is that data output should be assigned a machine readable standard license. I'm not speaking about that here now in a moment I'm speaking more detail about data licensing and what that entails. The authors also asked that that machine that that license should be in a machine readable form on the landing page so that a machine can actually find access get access to the data and understand what what the machine can do with that data. The next require next element in the fair principles is that that landing page that sits on top of the data contains information about citation of the data set. So imagine researcher somebody chooses to reuse that data set. Then there should be a clear citation statement so they can just copy that and they know that they are properly attributing those that create put the effort into creating that data set. The next one is a, I always enjoy this one because it, it's easy to write down but it's actually quite a little harder to achieve in practice. And that is to attach provenance information alongside the data. Now this is really helpful for those that want to reuse the data because it provides a lot more context and how the data was created. However, there is no one standard around how to attach provenance information and that will depend on approaches and standards that might exist in your discipline. So think here for example information about the instruments of which the data was captured the settings the calibration of those instruments. As the data is then processed through a workflow information about that workflow about the steps that were taken in that process, perhaps algorithms that were used to manipulate the data and to come to that final research that data output that is created and then published. So again here this will depend on what is available in your discipline. But if there are any good practices, please consider those and please attach that information alongside the data. Final point here. Undefinable we talked about a series of a few metadata fields to make sure that data is discoverable and research you can find or machine can find the data. As the research actually wants to use the data. They will probably need to have more information about contextual information around the data to better understand under which circumstances the data was created. Which methodologies we use etc. Now there are disciplinary stand specific metadata standards for that. If you have such discipline specific metadata, please attach that alongside the data because that will just make it a lot easier for a researcher that wants to reuse the data to better understand how the data was created. So that was a very quick run through. Oh, sorry, I just saw a comment from Steve McGeck and that's a good one. Actually, I forgot to mention that earlier. Yes, the slides from this presentation will all be made available to all of you. After this session. So, if you're interested in addressing these requirements. The first step. Easy step is to have a chat to the project liaison so we've assigned project liaisons to each of the projects or we will assign project liaisons to each of the projects. They know these requirements and they also know who within a IDC can best help you with specific elements of this. So if you're wondering about specific part of it. Please have a chat to the project liaison they'll be able to get you in contact with the right people and assist you in implementing services perhaps to make it to make it happen. You will be asked to report against these fair requirements in your progress reports. And there there's also the place where you can write about the fact that you might not choose to might not be able to actually address some of these requirements and that's fine. But just the head just the heads up in a warning that that will come up in your progress reports. So that was a an overview of our fair requirements as a whole. One of those fair requirements that I skipped over really under reusable was the licensing of data. And I want to spend a little bit more time speaking about data licensing here. This is actually really important and sometimes overlooked element in making data reusable. And I'd like to spend a little bit more time digging into that and what that means. Now, first of all, I'd like to put in a disclaimer. I am not a lawyer and not a legal expert. So the advice I'm giving here is to the best of my knowledge. But if you want, if you want to make sure that you have proper legal advice around this, I do recommend that you get in touch with any local legal expert you might have in your organization. So, first of all, we consider it extremely important that data is properly licensed. If somebody approaches, if you make your data accessible, but you do not attach a license to it, it makes it absolutely impossible for somebody to reuse the data. Having if you find a data set without any license attached to it, it actually means you can't reuse it. It does not provide any clarity. So that's why we find it really important to make sure that you attach a license of some sort alongside it. So to provide you with more background and information around what is data licensing look like, we've provided a research data rights management guide. And at the back of the guide, you'll find a few helpful workflows for, for example, data providers, what are the decisions you need to make, what steps you need to take to be able to decide on a license and assign a license, a date, a license to a data set. So the first step here is if you want to assign a license to a data set, you will actually, that needs to be assigned by the person or the organization that holds the copyright over that data. So we get a lot of questions around. Is there copyright in this data? Does copyright exist over this data? And it is a quite a complex question. So you'll find in this diagram on the right hand side, a bit of an overview which shows that on the one hand on the very left, in case the data has just been generated off a machine and arranged, for example, telephone directory data was a famous example from a court case. In that case, it has been ruled that there is no copyright in the data. However, once data starts to get manipulated and once human authorship and originality is involved in creating the data. So for example, effort, skill and judgment is involved in actually coming to that data set, then copyright does start to exist. In the case of really well developed data where a lot of expertise and effort has been put into it, then there was definitely should be copyright in that data. So if you're unsure whether the data you're talking about holds copyright or not, please consult with your legal experts within your organization to be sure which side you're on. If the data does have human authorship and originality involved, then it would hold copyright. And then we prefer that you use a Creative Commons attribution license to attach that to the data. Now, we highly recommend Creative Commons as a license because it is a standard approach. It's a standard license that's out there available. It's already in a form that's internationally recognized. It is set up in such a way that it is works across a number of international jurisdictions, and it's also in a machine readable form. So rather than reinventing the wheel and starting to come up with your own license, we do recommend you using if at all possible. One of the Creative Commons suite of licenses and we recommend the most punishable license which is CC by that's the Creative Commons attribution license. If you're sure there's no copyright in the data, you can still release it and rather than not assigning a license to it, we would ask you to put a Creative Commons public domain mark on the data. So what you're doing there is you're not asserting that copyright over the license what you're over the data, what you're doing there is asserting there is no copyright in the data and it is available in the public domain for anybody to use it. But by having that the mark on top of it people know that that is the case and that it is freely to use free to use. So if you have any considerations around data licensing, please use as much as possible a machine readable license. One of those standard licenses that machines can interpret. And that's where that's why we recommend Creative Commons because they are being set up in such a way that they are ready for ready for that. On that landing page create have a link towards that license. And if you can include that link and it's very simple in a machine readable form. So all you need to do there is have a little tag hidden underwater that says to a machine that comes to this landing page. The license that is associated with this data set can be found here. So the creative commons licenses are great for data that is openly available. It can be openly used and reused. However, if the data you're creating is sensitive data. Obviously the creative commons license won't won't be applicable and won't work. And in that case you'll probably need to use a restrictive license. Now that will depend on the nature of the data and under which under which restrictions you want to release that data. For that there are a number of templates out and available and a number of players have already worked on examples of that. If you're interested in having such a restrictive license for your data, please get in touch and we can see if we can broker up any connections with other organizations that have templates available for sensitive data. So final point, this is about data licensing and attaching a license to your data set. If you're interested in licensing something else like your software, you'll need to use something else. The creative commons suite works really well for data and for publications, for example. It does not work well for software and for that you need different licenses. We have a session on the 15th of April and there we will speak about software licenses and which ones are most applicable and work best and what the different considerations are there. So that was a quick overview of the fair principles and the fair requirements we have for our projects and a little bit of an overview of data licensing and considerations there.