 Welcome everybody to this, the fourth in the series of webinars about the fair data principles. This is the webinar on R for reusable. I'll just briefly introduce myself. My name is Keith Russell. I work for the Australian National Data Service. I'm the host for today and thank you to my colleague, Susanna Sabine, who's in the background co-hosting this webinar and organizing things in the background. Just as a general introduction, the Australian National Data Service works with research organizations around Australia to establish trusted partnerships, reliable services, and to add value to research data and enhance the capability in the research sector. We're working together with two other increased funded projects, research data services, RDS and NECTA, to create an aligned set of joint investments to deliver transformation in the research sector. So this webinar is part of a larger series of ANZ activities which aim to support the Australian research community in increasing our ability to manage research data as a national asset. So this is the fourth and final of the four webinars in the series about the fair data principles. We've had webinars on findable, accessible, interoperable, and now we're up to the fourth one, reusable. Please note this is one that comes up every now and then the R stands for reusable, it does not stand for replicable or reproducible. So reusable is actually broader than those other terms and means that it can be used for more purposes than just purely to replicate or reproduce the original research. Today I'll give a very brief introduction on what Force 11 says about reusable under the fair data principles and then I'd like to hand over to a narrator from Creative Commons who will talk about licensing frameworks and choosing a license to make your data more reusable. And after that, Margie Smith from Geoscience Australia will talk about provenance information and not only why it's important but also how GA have actually approached attaching provenance information to research data. Now first of all, I'll give a brief introduction to what Force 11 agreed on as part of the fair data principles under the heading of reusable. So first of all, I'd like to emphasize that to actually make your data reusable you will also need to incorporate elements under findable, accessible, and interoperable. So if your data is not going to be findable or not going to be accessible then it will ultimately not be reusable anyway. So this is, you best to see this as on top of making your data findable, accessible, interoperable these are extra elements that you need to think about to make it reusable. The way they've talked about it, well first of all there's this first high level heading saying that the data and the metadata should have a plurality of accurate and relevant attributes. Well that's pretty general and they then drill down into three specific attributes that are required. Now the first of those attributes is that the data and the metadata are released with a clear and accessible data usage license. If you make your data available without any license at all, it makes it very hard for a user or a potential re-user to actually use it because it's just completely unclear what the agreement is if there's any copyright over the data, if there's any restrictions, things like that. So that's why it's very important to have a license so it's clear what you can do with it and if you do assign a license please make sure you use a standard license, definitely preferred and ideally in a machine readable format because that way machines can actually interpret whether the data can be used by that machine to do an analysis to pull in the data and to actually incorporate it in analysis or whether they need to skip it because it's not licensed for that purpose. The narrator will talk in much more detail about the possible framework to use to assign a license to your data. The second point they make under these attributes is that the data and the metadata should be associated with information about the provenance. Now this provides clarity on the steps that were taken in collecting, selecting, analyzing the data. So all the steps that have been taken to turn it from raw data into derived data and into that final data set that is made available under FAIR through using the FAIR data principles. So this is for a potential re-user, this is extremely useful information because it gives you much more information about the context and the background in which the data was created and whether the data will also be suitable for the purposes that the re-user wants to use it for. Attaching provenance information is easier said than done and I'm really grateful that Margie is going to be able to talk a little bit more about what's happened in practice and how GA has tackled this and how GA is incorporating provenance information. Now the third and final point they make about these relevant attributes is that the data and the metadata should meet domain-relevant community standards. Under Findable they talked more in general about having a metadata that allows the data to be findable and under Interoperable they talked a little bit about the data and using standards. The point they're making here is that it's very useful to make sure that the data and the metadata is in a data format and a file format that is commonly used in the discipline. So that means another researcher in that same discipline can easily pick it up and use it and if you use a metadata format think about using one that is common in the discipline too so that it contains specific fields that are relevant to that discipline so that a researcher in that discipline can easily understand more of the detail what columns are in that data set, what the context is around the date in which the data was collected etc. So that makes it much more useful for a potential re-user from that community to pick up the data and reuse it. Now I'd like to first of all hand over to Nerida and Nerida Quatermas from Creative Commons Australia based at QUT. Nerida I've asked her to present on licensing frameworks and choosing a license to make your data be more reusable to give you a bit of a sense what a possible framework is you could use that is a standard framework and that can be made machine readable so that other re-users have a much clearer picture about how the data can be used. So I'd like to hand over to Nerida. So thanks very much Keith. I think I've just got time for a very quick overview of how the open licensing framework provided by Creative Commons achieves fair with regards to reuse rights. The slides will be made available to you and you'll notice that each slide has a link to relevant information and there's a slide at the end of the presentation which lists good resources as well. So copyright laws grant the monopoly over a work in material form to the owner of it. Creative Commons licenses have filled a need for a public license. That is one that anybody can rely on as a permission to reuse a work. Before CC licenses the only way to get reuse rights was by the exceptions allowed in copyright law or through licenses directly negotiated between a copyright owner and a licensee. So the public license like a Creative Commons license is central to opening up access to research output including the sharing of data associated with these. I've put an open access spectrum representation on the slide because it's really important to distinguish between free access and reuse ability which starts with permission to share but extends to the right to make derivative works. These permissions to reuse are communicated with a clear machine readable license. So you probably all know a little bit about Creative Commons licenses but as a quick overview there are four license elements that can be combined and that results in six licenses and they're featured on this slide again on a spectrum of allowing more to less reuse of a work. The most open or permissive license is known as the attribution or CC by license and the most restrictive is the attribution non-commercial no derivatives license. You'll see in my slide the free cultural works seal. It's just put there to show you that there are two licenses that qualify for that but the relevance of that seal was that it was developed for the Wikimedia or for Wikimedia and Wikipedia content and it signals an important delineation between less and more restrictive licenses applied to works in the digital commons so that just fills out the story with that. In addition to the licenses, Creative Commons offers two public domain tools. Now CC0 is the public domain tool for creators to use but there's also a public domain mark which is represented by a copyright symbol with a strike-through and that's something that is used to notify works that are already in the public domain so that's being used commonly by cultural heritage institutions in their digital collections for example. But I'm just going to focus on CC0 because it can be particularly important to maximize the reuse of data and databases because it otherwise might be unclear whether highly factual data and databases are restricted by copyright or other rights. So CC0 is intended to cover all copyright and database rights so that however data and databases might be restricted under copyright or otherwise those rights are all surrendered. So CC0 is foremost a waiver. It means you waive all of your rights so that you have zero rights left in a work effectively dedicating it to the public domain. It has a legal code beneath it because you need a legal mechanism to relinquish your rights. So when you release content under a CC0 waiver you're explicitly stating that you do not expect attribution. Now there's a little uncertainty around CC0 because Australian moral rights are fairly new but the licenses have been designed as carefully as possible to respect the author's wishes. So the intent and the general understanding is that you do not need to provide attribution. So probably the main point that I would like to make and Keith has already referred to this, do license your data. International rules are too variable to rely on the public domain. CC0 ensures maximum compatibility with other license works and it prevents attribution stacking. For example attributing to many in a project or where not only do you attribute the immediate source of a derivative work but plus, plus, plus upstream works and there are other ways to acknowledge contribution. The next best thing is probably the CC by the attribution license if you really want attribution to be a legal requirement. The licenses communicate reuse rights through the three layer design built into the license. Now the first layer is the legal code, that's the legal instrument which states the terms and conditions of the license. That second layer is the human readable format. It's the plain language summary that we usually see if we click on the link to a CC license. It's got the relevant icons that clearly indicate the conditions of your licensing and the reuse rights under the license. You might recall the words you are free to under the following terms. In addition to supporting reuse by individuals, the FAIR principles put specific emphasis on enhancing the ability of machines to automatically find and use the data and that brings us to the third really important layer of the license which is the machine readable translation of the license which attaches itself to digital works or digital copies of works. The translation code which is called Wright's expression language becomes embedded in the digital source and that helps search engines and other applications identify a work. I might say this can also be achieved by uploading a work to a content sharing platform that supports CC licensing and takes care of the machine readability for you. It's also important to actually mark a work with a license and I'll talk about that shortly. Regarding the robustness of the legal instrument, the Creative Commons licenses have been upheld in every jurisdiction in which litigation concerning them has occurred but to date there have been no recorded cases of litigation concerning a CC license in Australia which would tend to support the quality of their construction. I just will make the point that CC licenses are irrevocable and so they last for the term of copyright. The licenses are also non-exclusive so it's open to the Wright's holder to apply another license to the material should the need arise. That's called dual licensing. So for example if you release material under a CC by non-commercial license but a commercial partner wishes to exploit the material then you are free to enter into a separate license with the commercial partner that permits the commercial use. Now to maximize discoverability by search engines and software systems when you are licensing a work you should make sure to use our license chooser tool to get the machine readable HTML code. The license chooser also works to mint the license for the purpose of marking the work itself. There are four important things that I'll just point out with regard to the license chooser and that's that it gives you a framework to select your license to provide attribution and citation and I'll just talk about each of those things a little bit. With regard to license selection the license chooser guides you by a series of questions about what reuse you'll allow. So will you allow adaptations of your work to be shared? Will you allow commercial uses of your work? And depending upon the answer that you give to those questions the relevant license or the appropriate license for you to select will be offered to you and you can see an example there. You do need to remember that if your work is an adaptation of a work licensed under a CC share alike license so there are two of those then your derivative work must be made available under the same license as per the share alike condition. With regard to attribution attribution is a base condition of all of the CC licenses. There is flexibility around attribution requirements though which you'll read in the license. It says reasonable to means, medium and context. This is really helpful. It enables you to do things like not having attribution within a work if it's not reasonable to do so. You can link out to a separate resource that would provide the required attribution. It's also flexible in that a license or can waive some or all of the attribution requirements. The next really important feature of the licensed chooser is with regards to citation. So being able to locate the work and perhaps also the source works that led to that work. I think that probably answers some of the concerns from data creators about being able to find the original data. There are a few other requirements here. If the work you're licensing is a derivative of another work then you need to communicate that your work is a derivative and you need to include the source URL of the original work and you also need to describe the modification that you've made. Now when you're modifying materials under the new version 4 CC licenses you actually have to make a note of any modifications that you make to the materials regardless of whether the modification is significant enough to merit it being a derivative work and you have to provide the URI back to the source. So again I think that's a good reassurance that has been built in to the license, the version 4 licenses. It might be unfeasible to include attribution. I've already referred to that perhaps within a merged data set in which case include a URI back to the unmodified version. Lastly the license tool allows you or gives you the option to provide a more permissions URL. So for example if you license something CC by but you're okay with people not attributing you in certain cases then this is your chance to specify that in that resource document that you've got. Remember that you can't change the terms of a CC license but you can always grant additional permissions or warranties beyond what the license allows. The other thing is that CC licenses allow for you to incorporate elements of third party materials into your works just by marking these and providing attribution to them. So I referred to the need to mark a work to convey the license as well and on this slide you can see a number of ways in which to do that and there are some useful source documents there like CC's download page that gives you all of the icons, buttons, etc. Regarding content platforms, even if there isn't a license field in a content platform there's usually a description or some sort of free form field where you can enter information about a work. So that was a very brief overview of the Creative Commons license and the license choose a framework. And I guess my key message for today is that reuse is a core component of their data so do license your data to enable reuse. I think that the Creative Commons licenses provide a simple mechanism to ensure that the users of research have the rights they need to reuse, replicate and apply research outputs and data and to disseminate and communicate research output in order to maximize the impact of work while protecting very importantly the intellectual property and the academic integrity of a work. I think with the built-in attribution and citation which creates a clear path to the original data and that's the useful resources link and you'll all be able to get your hands on that when the copy is made available. And that's it, Keith. Thank you very much, Nerida. That was really interesting. And great to see not only the human readable side of things but the machine readable side of things and the way that that information can be made available to machines. So yeah, thanks. That's good. So now I'd like to hand over to Margie. Margie Smith from Geoscience Australia who will be presenting on how Geoscience Australia's been working on collecting information about the provenance of research data and attaching that to the research data. Thank you, Nerida, for the background on the CC license suite and the different options there and all the way from CC0 to the most restrictive licenses. And I think it's a really useful way of seeing how the framework works and how especially you can make it machine readable and attach that. Margie, thanks for the presentation on attaching that provenance information, how you do that and what that means. And I'm especially interested in the drivers for GA in actually collecting that information and making that available. Finally, in case you're interested in more information about reusable and first of all, the slides will be made available from Nerida and from Margie. So they have links to relevant information. There's also information on the ANS website. We have some information on licensing data for reuse. So it's worth maybe going to that link and having a look. So just a few resources on reusable. So there's also, if you're interested in the topic of provenance and attaching provenance information to research data, we have an interest group on this topic and you can join that interest group to be involved in the discussions and hear what's going on there. Finally, if you're interested in different types of metadata, metadata that are specific to different disciplines that allow for maximum reuse within that discipline, then I would recommend following this link and there's links off to a whole series of metadata standards in use across a series of disciplines. Finally, last year we did 23 research data things and one of these research data things is also relevant to the discussion today and that was thing number nine around licensing data for reuse. So if you want to not only learn a bit little more but also have a bit of hands-on experience, I recommend going to number thing nine and trying out the assignments there. Okay. Finally, as we've now come to the end of these, the four webinars on the fair data principles, just thought I'd give you a bit of an update on where we are at. So what will we be doing around fair in the coming year? So in the coming year, we are interested in to continue work on what it means to make data fair and that includes sort of collecting and sharing examples of making data fair in specific disciplines because there are different ways, different elements and different aspects to making data fair which are relevant in different disciplines. We're working in that space and trying to share some examples and good practices in that space. We'll also continue to engage with data providers, with research organizations, research facilities and institutions to work on aspects of policy, human and technical infrastructure, but also skills that can be put in place to make it as easy as possible for researchers to make their research data fair. We would like to acknowledge the National Collaborative Research Infrastructure Strategy program that provides the funding for Ns. Thank you very much.