 Hello and welcome to our webinar from the Australian Research Data Commons. We are gathered through the Encriss scheme and we're built from ANS, NECDA and RDS. And if you'd like any further information about our merger and where we're going in the future, please sign up to our newsletter. So today we're going to be talking about how fair is your data. Copyright licensing and the reuse of data. I'm Kate Lemay and I'm from the ARDC and I've got my colleague Greg Loughlin, who's been here from ARDC and we have Bayden Appliard with us who will be speaking to us about our new guide that is out around copyright licensing and reuse of data. So we're going to be covering a little bit of background of things that have been happening in the research sector around the Code for Responsible Conductive Research with fair data principles and then Bayden will be getting into the details about our guide about licensing research data. So the Code for Responsible Conductive Research. This is a document that is co-owned by the NHMRC, ARC in the University of Australia. It has recently been undergone an extensive review. The new Code is a Principle Space Code and it's a few short pages of principles and it has guides accompanying it to go into more detail for some of the principles. One of the guides has been released and there's further guides coming out soon. So the principle that is most relevant to what we're talking about today is Principle 3 and it says to share and communicate research methodology data and findings openly, responsibly and accurately. And there's two responsibilities, one for institutions and one for researchers that are related to this principle about sharing and communicating these items and those are available in the Code as linked in the bottom of this slide to be able to view to have a look at. These are quite high-level statements that there's lots of things that you could do underneath these umbrella statements and in order to help research institutions apply these more specifically there's, as I said, there's guides that are being released accompanying the Code. There's going to be a management of data and information in this research guide that we'll be talking to this principle of sharing and communicating research methodology, data, findings, recordings and primary materials in an appropriate way and it will be released by MATLC, ARC and University of Australia soon and ARDC has been involved in the drafting and editing of that guide. So also the research sector has been talking about the fair data principles for the last few years and you can see there's a link there to the ANN's website which has quite a lot of information about what the fair data principles are how you can apply them and unpacking what each of the findable, accessible, interoperable and reusable principles might mean for a data set and also there's a fair data assessment tool that has been developed by ARDC and you can find that there to assess how fair your data set is. So we're focusing today on reusable from the fair data principles because if a data set doesn't have a license then the secondary user doesn't know how to reuse it. So having said that, I will now pass over to Bayden and he will be speaking to our new licensing guide. Thank you very much Kate, I really appreciate it. Good morning everyone and thanks for your time. Before I begin it would probably be remiss of me as a lawyer not to raise a bunch of disclaimers before we start talking about the guide a little bit further. And also I would like to certainly acknowledge Greg Lachlan here as well as co-author of this guide because his insight and putting up with some of the things we have to discuss sometimes around the technical detail has been most grateful for that. I'm a lawyer but I'm not your lawyer and so what I have to say here should not be intended to be considered the legal advice from me to you rather the material around the guide is really that, it's really guidance and it invites readers to obtain further advice if their needs require. Most institutions or government agencies will have their own legal services people but in that circumstance they should probably be speaking to those people for further information. The other thing I'd like to say is that while copyright is an internationally fairly consistent phenomena it's characteristics around the world do vary and so the guide and what I'm talking about really today only addresses the law and licensing as it's relevant to Australia. The other thing is that the guide is primarily directed toward people and organisations or facilities that are motivated to share their information or data generally so as particularly so as to encourage broad reuse of that data. So the guide is probably not intended to address situations where you might be dealing with data that is content of importance to national security. It would only be a very small percentage of data probably that does that but in any event it's just worth setting the scene for this guide. The guide is as Kate indicated compatible with the FAIR fair principles and also the guide to legal interoperability of research data the principles and implementation guidelines which was published a little while ago by the Research Data Alliance and CoData. I also had a small hand in that particular document. And finally I'd like to say thank you to all the people that were involved in providing feedback to some exposure drafts of this guide and some of the flowcharts within it. That feedback was immensely helpful and I'm very grateful for the time that has been taken for that. So moving right along, there's a number of diagrams in the guide and this is the first one. I suppose when we start talking about licensing of data and copyright what I think about in terms of licensing is actually legal interoperability. What will happen to this data set if I apply a particular license to it? What can't happen to that data set if I apply a particular license to it? And so when I was doing some background reading for this guide I happened across a delightful lady in the United States Nancy Sims who's both an attorney and copyright librarian at the University of Minnesota which with much respect slightly modified a diagram that I saw her present because whilst there is legal ownership of in this case data the licensing decision is necessarily more complicated than just who has the legal ownership because the decision is often informed by other things. And I thought Nancy's diagram was an excellent way of articulating that issue. So here we have a diagram, legal ownership being one important characteristic and legal ownership in and of itself can be a curious thing because whilst if there's copyright attaching to data the default position is that the creator of the data set will own the copyright but if you have prepared the data under a situation where you're employed then generally speaking it will be the employer that may have the copyright in it. In the university and research setting it becomes a little bit more diffuse because some employment agreements that universities have with staff provide that the employee shall own the material but then supply a license back to the university to enable them to exploit it or vice versa. Or looking at the other side of the diagram there might be other business issues at stake you might have certain grant funding requirements in relation to the data which requires you to license the data under a particular type of license. There might be other contracts in place certainly if you're doing some research that's bound for commercialization there'll be probably a range of secrecy provisions and a range of complexities around the decision as to which license to apply. Indeed there's also the prospect of relationships and norms in your particular field of endeavor. It may be that your lab has a particular policy about how it licenses materials. It may be that your colleagues or your supervisors have a particular position on how materials will be licensed and over and above all of that I guess the point of the guide is to explore options as best as we can to maximize the potential for data reviews. And that's why it's hard to see perhaps in the PowerPoint presentation but I've added that fourth or the top circle in that diagram there to be a little bit bolder because I think that's in some respects depending on the complexion of the data and the situation you're in that is probably the one area that a little more focus needs to happen on. And right in the middle of all that is the sweet spot working out what the licensed decision is that may take into account all those different things. So one of the issues that was raised with me by Greg in the brief in relation to the guide was nailing down what is it about copyright and data? Does copyright subsist in data or doesn't it? And to answer that question graphically created this fairly high budget graphic I suppose that I've just come to call the grey area graphic. On the left hand side the law as it stands in Australia now pretty much since 2010, 2011, maybe a little bit later stipulates that copyright, no copyright subsist in data that is machine generated. So I think we've used an example in the guide where we talked about the data logger in a stream that might be placed there to measure turbidity or pollutants, things of that nature. The data logger may by telemetry or otherwise report back or start storing data maybe a certain amount of intervals. That raw dataset in and of itself is not going to have copyright subsisting in it. Despite the fact that potentially a great deal of expense and expertise has gone into placing that data logger in a particular part of the stream to guarantee that the recordings are the most accurate and only a scientist with special knowledge could have done that. That may well be the case but copyright doesn't protect ideas, it protects the expression and in this case the expression is whatever the machine is expressed and so due to the fact that there's a lack of human authorship and arguably creativity there is no copyright subsisting in that data. Now the full opposite end of the scale you have for example a book or something created by human authorship demonstrating creativity in whatever the content is or in the selection or arrangement of the content. Data of that type may well have copyright subsisting in it. I think that position is now fairly consistent around the world. The difficulty we have is the grey area because whether copyright subsists in data or not is often a case or a question of fact and degree. Between those two polar opposites a number of interventions can happen by humans that may or may not cause material to become copyright protected. For example if you take that data log dataset that raw machine generated data that we talked about a few moments ago, if a scientist comes along and thinks well maybe there's an error in one particular element of the data and they put in their own figures based upon what they think it should be because they have some expertise in the area or they change the way the data appears in selection or its arrangement then that dataset that was once devoid of copyright protection now becomes protected by copyright. Over and above all of that, if you're looking to share your data does this really present a problem? In my submission it doesn't. It's a better description, a storm in a teacup. So the guy talks about some Creative Commons tools. I think most of the people on the webinar today will be aware of Creative Commons but just briefly for those that are not Creative Commons are a suite of licenses and copyright related tools that are freely available on the internet. The slide at the moment that appears before you is just a depiction a graphical depiction of each one of the Creative Commons licenses but they are available on the internet as a human readable license for want of better words, a human readable description. They don't contain the actual license terms but in a nutshell it tells you what the license provides but if you click a button on that page you also get a full legal license that can be very, very long but for the sake of usability there's also these icons that have been prepared by Creative Commons. Creative Commons is an organisation headquartered in the United States but with a global affiliate membership and they are effectively the stewards of these licenses. There have been a number of versions of them over the years the current version is version 4 and if you type in Creative Commons attribution license for example you can see what that looks like if you just Google it you'll find it. So the guide makes reference to three Creative Commons tools in particular and they're represented on this slide with a tick side of them. Briefly these icons represent the terms of each of the licenses that are available to you to use. The top left is the Creative Commons attribution license it's the most liberal of the licenses it basically says you can apply this license once the license is applied to the material anybody can come along and reuse that material however they like as long as they attribute the license all as part of that reuse. Moving across the ways just opposite that is the buy share alike license which basically says take this material and use it however you wish provided that if you make a derivative of what I've produced you must license that derivative under the same license that I've supplied it to you which is the share alike license. Then there are some ones we don't recommend at all for data it's certainly not in the context of the guide one is the attribution non-commercial license which non-commercial basically means the reuse must not be intended toward monetary compensation or financial advantage so that's a mashup of the attribution and non-commercial the people that reuse that material must comply with both of those terms and moving a little ways across from that is the buy non-commercial share alike. Down next is the non-derivatives license which basically says take this material use it however you wish but don't make a derivative of it you may cookie cut some content out but you can't make a derivative of the in our case the data that's no good for data so a quiver of phrase that I often use is indeed doesn't mean no derivative it also means not for data because in effect when anybody ever uses data they're bound to be creating a derivative of some kind and the final two are not licenses at all they're referred to as public domain tools the one on the right is fairly well known to people in the scientific community as the CC0 the CC0 operates on a number of levels firstly it operates to render render any copyright in the material to which it's applied so in effect the person that applies that material to their that license to their material is abandoning or seeking to abandon their copyright over that material now in some countries doing something like that won't necessarily comply with the law of that country and indeed in Australia there are some concerns and not so much some concerns but it may be that the CC0 is not entirely compatible with the copyright act particularly for example in relation to moral rights non-economic rights you can't sell them or buy them in relation to the material as you can for the copyright they go really to the rights of the author to ensure that the author is properly attributed no derogatory use is made of their material now those can't be extinguished by the CC0 so the CC0 has built within it a kind of a fallback position such that if you cannot abandon all of your rights and it starts to kind of operate as a kind of like a CC by license where you can pretty much do whatever you like with the material but you've still got to respect some of the things that the law in your country requires so for example it might be attribution so we've made reference in the guide about the use of the CC0 tool but consistent with what we've said also in the RDA and code data document if it gets to that point then what's the harm in using the Creative Commons attribution license sometimes people in the data space talk about the concerns they have with the Creative Commons attribution license some of their concerns for example are attribution stacking if you've got a range of data sets and you're required to attribute every single one of them well it's very hard to do when you're mashing a ton of data sets together but it's important to read the fine print of the attribution license because in there and in fact in all of the CC licenses with respect to the attribution requirement it's where it's reasonable to the medium so if attribution can't be employed in the data set itself or in the derivative that's made then maybe a hyperlink or something to another place that gives the attributions would be suitable things like that can be dealt with under the attribution license another one we often hear is well I don't want to be attributed that's fine and you can not be attributed with the CC attribution license attribution can equal null so I think in the guide what we're really getting at is saying well you could use the CC0 license but equally you could use the CC0 waiver the last one on the left hand side is the CC public domain mark that's not a license or a waiver or anything it's really a placard to apply to material that you know does not contain any copyright and it's simply something to notify users that stumble across or find your data it doesn't contain copyright that it doesn't in fact contain copyright so it's in effect in the public domain when I use the term public domain and this is often confused in discussions about copyright public domain can access as in I can access it because it's in the public domain but public domain can also mean copyright does not subsist in the material and in that situation I mean the latter copyright does not subsist in the material so moving right along so to make the licensing decision a little bit easier we've tried to put together something simple fairly easy to read that may be an assistance when I first got this brief Greg asked me I think to produce a copyright licensing guide of about one and a half pages I think it was something like that Greg if it's not chime in and tell me about I didn't think that was possible we've tried to keep it pretty small though I think we're down to about 12 pages 12 or 13 pages and it's about as brief as we could make it but probably the key elements that carry a lot of the content in the flowcharts themselves I'll just briefly refer to them but leave them to you to have a look at more of your leisure the first one is the data rights holders flowchart and this is primarily directed to people or organisations that create data appreciating though that we are also creators when we merge disparate data sets together and come up with something new and there may be some mathematical processes around that cause things to be a little bit different different columns but people who are data users can become data creators as well so looking at the flowchart there's a slight there is a bit of a connection between the data rights holders flowchart and the data users flowchart this and the other flowcharts also presume that it is the intention of the creator to share the data with others so you get spat out fairly early if you don't wish to share your data with anybody else there's always a bit of a trade off honestly you could write a four or five hundred page book about this probably two or three if you really wanted to go to town but we didn't have that luxury so there's a bit of a trade off between making something simple enough that it's usable but not too simple that it's unusable or vice versa too simple that you can get yourself to trouble so we've tried to find a fairly conservative approach in relation to a workflow to deciding when and how and what to license material and on the right hand side you'll see then there's a number of red boxes so those red boxes indicate concern reason for concern or caution or a need to obtain legal advice wherever you see or generally speaking wherever you see a cautionary box that's something to be curious about too because there may be something in that that you may need to address before you seek legal advice or not but as you can see fairly straightforward as a start here do you own all the data set if you do that's great go down to the next one if you don't then do you have permissions to reuse or republish the data components that you have no well you've got to go and get further advice there's breakout points all the way through when you're complete you've hit the green box and the flowchart really can't take you much further than that so in the bottom one we're saying sorry in the bottom box of the first flowchart you eventually find your way down to selecting a suitable Creative Commons license we are unashamedly based upon what we say in the guide preferring the CC by version 4 license but you know cognisant of the fact that you must only select the license that is compatible with all other data components as the data set if any and that's sort of one of the reasons why we get some concerns about the share alike license because depending on what type of share alike licenses you are using particularly if you're intermingling for example Creative Commons share alike license material with other types of license with share alike features maybe some of the European share alike features those are not the same licenses they have a bit of a difficulty almost cancelling each other out because when you make a derivative you can only choose one license it may not be the one that's appropriate so just take a little bit of care with share alike licenses if in doubt you know ask the license all the next is the user's flowchart again start here in effect that flowchart is directed towards people who are not creating it's primary intention is to ensure that users comply with the obligations in relation to those licenses the secondary intention is in fact to manage the risk of your organization to support good organizational data management practices so if you have researchers who are using data they're mashing data sets together and they have desires on publishing data later on it's very helpful to make sure that they go through a process like this to know that what they've actually done with the data there is legal before they go doing anything else and this particular flowchart has an additional also check box down the bottom right hand side moving along to the third one this is the data supplies flowchart it's primarily purpose is to ensure that to the extent possible data source through the supplier is legally interoperable and also to manage the risk not only for the supplier but also to manage the risk of users and reduce transaction costs for users for example a data supplier may be considered to be authorizing the infringement of copyright in a data set because something is not supplied with an appropriate license or an incorrect one or there aren't procedures in place for ensuring that data supplied through a facility is matched with an appropriate license so that's a bit of a run down on the guide some take home points so if your data rights hold on, if you publish a data set, apply a license, it's just that simple don't put your data set out there without a license the reason for that is because the law presumes that if a license is not applied to something all rights are reserved so your right to communicate your right to publish your right to reproduce that data set doesn't exist or a user's right to do that does not exist unless the copyright holder has offered that, offered a license with that data set to enable users to understand that that's what they can do with that data I really don't think that needs to go any further than that but if you want your data to be widely reused then you should apply an open license and I suppose a bug bear of minus I've had for a long time is don't go and create your own open license the last thing the world needs is yet another open license there are many many good open licenses the guide talks about the Creative Commons licenses because in my humble opinion I think they're pretty much the standard these days in most parts of the world they're very very good licenses, they're very very well drafted by some very good people so apply one of those make sure you only apply if you need to go outside of CC by the ones that we recommended in this presentation. Well data users comply with the licenses that have been applied to the data you're using and in the case of CC licenses you accept the terms of that license with your use, that's when the contract as it were under the licenses is formed your use of the license indicates your acceptance of the term so make sure you comply with the terms and for data supplies please ensure that your facility is equipped with the right policies and procedures and the functionality to supply data with an appropriate license. I think that's about all from me. Greg you've been remarkable I think I lost you half way through I felt like I was lying solo there I think we had a minor technical glitch. Are you back? Yes apologies Baden we lost our sound in the webinar room so we couldn't hear you but luckily everyone else could so we have you can tell we've relocated and we are now available to help with any questions that have come through so we've got a couple that are in the question box and if anyone has any other questions please type them into the question box so the first question that's in here Baden is if there is no copyright in a data logger would that be the same with an image of the Hubble telescope if it wasn't retouched by human analysts well this is a really interesting issue and I think there's a lot of that legal academics around the world who have in fact I've seen it written somewhere where somebody has suggested that there be a new right introduced not like a database right that they have in Europe but something that recognises some form of right in things that do not have copyright if you think about property and the concept of ownership of property only we have real property we can you know buy a house it's something tangible we can stand on it live in it we understand that intellectual property only has a value and only really exists because a law says that it exists and it has value if something doesn't have that you'd like to have value that isn't otherwise recognised in law as having existence or value and that makes things very difficult I think the general position taken by most lawyers is that there will be a copyright in some of those images if not copyright then this nebulous concept and I see this increasingly in journal publishing agreements with authors authors are turning away from the strict notion of copyright to talking about ownership do you own this data set do you own this image that's come from a satellite but also I'd hasten to add that the raw image that comes from a satellite could very well be and probably usually is treated by a human when it lands in the base station so it could be that someone over at Airbus or whoever is receiving some of these images could be stitching them together or making them something other than the raw format in which they receive from the sensor I probably can't take it further than that it's a bit of a moving feast I would say there is probably some ownership there probably isn't any copyright and then it comes down to whether on a contractual basis grounding that ownership in contract and that's I think the way the majority of the world has gone they talk about ownership but I think it's an answer that has more questions I know certainly I've had good discussions with some IT and science lawyers in Sydney who have been grappling with this problem for their clients for some time they've chosen a pragmatic view to walk along the contract path and just simply say well we have some ownership in this we may not have copyright in it but we have ownership the other thing too is people can declare ownership over the physical medium upon which the material sits so they can enforce access to that material via the medium or the network admitting somebody to a network that allows them access to that material I hope that answers your question we've got a couple more questions that have come through so we'll just try to quickly get through them there's a lot of difference between an archive to hold data securely and an access repository where the data may be accessed under whatever conditions so Greg what is the ARDC view of these two different levels of data survival and data access and use this difference between just archiving the data and putting the data somewhere where it can be accessed in the framework of licensing I think that goes back and I'll be very very brief because there's some really interesting questions piling in here Baydon I think you've already answered that because if the intention is not to share then the data is just being held in a secure place if the intention is to share it needs to be licensed I hope that answers the question so one is put in a particular place so it can be shared and the other one is just like a very safe version of a hard drive okay so Baydon you and I have actually had this discussion as well previously how the question is how do we license anonymized data sets about human research subjects it says typically there will be a formal process to apply for access before the data can be released and the applicant is not free to share the data further so in this case it's one of those mediated access arrangements and so that's an arrangement for example if you tried to pull that through one of the flowcharts it would probably spit you out and say you need to obtain further advice and I don't mean to sound like that I'm avoiding the answer I don't intend to it's a complicated area and it will be naturally it will be mediated and I think the guide quite properly points you to obtaining advice from various people in your organization who are dealing with it so in those situations there will be some form of restrictive agreement around what can be done with the material it may even cause or call for the destruction of the material after the use has been had of it there's a million in one way that type of arrangement could go it would be great in fact if we could come up with a suite of restrictive licenses for certain maybe that's something to think about for the future of today we can't solve all the world's problems in one day so here's another problem is the copyright holder the appropriate person to add the license to their data or could a data supplier do that on their behalf well only if the data supplier has an agreement from the copyright holder to do so it should be the rights holder of the data set to do that and if I was a data supplier facility I would be making damn sure that what my policy say is that before anybody places their data on my facility they have to put a license on it and if they have any desires on the data set being reusable it ought to be an open one so we've got an interesting question here what are the practical consequences of publishing data without a license for example if a public data set with a license is mixed with an accessible data set which has no license and the result is posted online and the questioner said could I be jailed or fined no you won't be jailed you won't be fined not for that anyway if something is placed on the internet that might have other material in it that is open to license or even public domain well it depends what treatment has been had of it what the mix is but as it finds it it may be misled but nevertheless the user will take it as they find it so if they find it without a license on there then the astute informed user will think oh well this material is all rights reserved I can't redo anything with it I can't reuse it I may be able to employ some of the fair dealing provisions in the Copyright Act very carefully to utilise it but if I don't fit in one of those provisions then I can't now what are the options well the person can always go back to the license or or in this case the publisher because they haven't applied a license to it and say look can you give me a license to reuse this material in fact that happens quite a bit you quite often find material licensed under Creative Commons non-commercial license on the internet quite deliberately to expose the material to others in the hope that someone will come and say to them I love what you've done can I reuse this commercially and I'll negotiate another form of agreement maybe with a fee the other thing too is that Copyright necessarily is something that needs to be monitored by the copyright holder if your Apple computer for example you'll have ranks of lawyers and firms that go scouring the internet looking for infringements of Apple's trademarks and copyright material and you've probably got about 30 minutes in your use of that before you get a cease and desist that but for those of us who don't have those resources then it may be that your misuse or inappropriate use of the material goes unnoticed if and or when it does become noticed though you may have some questions to answer I guess that's one of the other things I've often thought about too with licensing of material you know if you put material onto the internet and you don't but you don't intend people to reuse it then don't put it on the internet without a license just don't do it because you may as well not and if you're scared about somebody reusing your material you know commercially for example don't do it don't put it up on the internet if you haven't got resources to pursue infringes of your copyright then put it up under an open license and see what happens you'll probably be pleasantly surprised but that's just my two cents worth it Excellent thanks Faden we've probably got time for one maybe two more questions and if we don't get to anyone's question we might be able to put a Q&A document on the website with all the materials from the webinar answering any other questions that have come through that we haven't had time to address the DRIAD repository applies a CC0 license for all work submitted which she thinks is unusual they state that CC0 does not exempt those who reuse the data from following community norms for scholarly communication in particularly the citation of the original data authors is that the case Well I think we're talking about two different things there's attribution and there's citation for example they may be one in the same thing but they can also be two very different things my view on that is well even if the CC0 does indeed do that and I reflect again upon the fallback provision in the license I have to go back and read the CC0 license I've read the code of that but if that is the case but yet you're in an academic environment I think the norms of your organization and the norms of academia nevertheless require you to cite your material I mean when it is the basis I think of scholarship and scientific inquiry that you point to where your sources are from I don't think there's any harm in that it certainly won't disturb the CC0 that's for sure I've got a lot of researchers so maybe I don't know Greg you've been actively cited Certainly attribution is a norm in the scholarly field definitely okay thank you so much Baden you passed on a wealth of information to all of our participants in the webinar today and the guide that you and Greg have written is a very valuable resource for our research community and we thank you and Greg for all the hard work that you've put into it we know it's been a very long labour of love and we really appreciate the effort that you've both put into making this issue as clear as possible for the research community and thank you for your time today speaking to us all about this and thank you very much Baden and thank you everyone for attending Thank you very much