 First of all, my apologies for the delay, but I'm very happy to have Venkat with us from the Digital Creation Center who's going to give a presentation about research data management. I hope we'll be able to follow this with a presentation by Thomas Margoni from Create, the Copyright Center at the University of Glasgow. But he had some troubles with his sound, so we're hoping to try to get this fixed while we're having this first webinar. If you have any questions for Venkat, you can just, there is a Q&A button button at the bottom of the screen and you can just enter your question there and at the end of the presentation, we will go through each of the question and try to see if we can find an answer. So if that's okay for all of you, then Venkat, the floor is yours. Okay, yeah, hi everyone. So my name is Venkat, so I work for the DCC, the Digital Curation Center, which is based mainly in Edinburgh and Scotland. Just a quick background about us. We were established around 2004 and we work at a national and international level. We deal with consultancy and training and policymaking and advocacy and digital data management. So the best practices and also on the flip side, as well as the researchers, we deal with the people that actually provide these services and how to build those services or improve them. We also are involved in international consortia, schools and projects, lots of EU funded projects, including open air. And one key thing to remember is that we do all of what I just said, but we're not actually actively doing any data curation, although the name Digital Curation Center might give a different impression. Okay, so I'm gonna take you through some basic tools on how to better manage your data. And to start off, I'm just showing you this slide from a nature paper, which was only from 2016. And in this, they did a survey of 1,500 or so researchers to find out if they thought that there was a reproducibility crisis. And interestingly, more than half of them at respondents actually said that, yes, there is a significant crisis in reproducibility of published research. So this immediately raises a problem and this is what we're trying to address here. The other question that you may see is why make data available? And I think this slide, I think sums up quite nicely for me that it was never acceptable to publish papers without making data available. And so we are talking about publishing data and not actually peer reviewed papers here. So when we talk about data publication, we're talking about in repositories and other means. Something that you'll probably encounter a lot and maybe you already have been taught about is the Fair Principles. This is quite a recent thing, only in the last few years. And it's something that is trying to be pushed quite a lot in EU funded projects as well as globally. And in case you don't know, Fair stands for findable, accessible, interoperable, and reusable. And what I'm going to show you in the subsequent slides will hopefully address many of these points. If you don't understand what all these words actually mean, if English isn't your first language, then hopefully we'll be able to clarify later in the Q&A. And also you can find out more about these Fair Principles from this publication that was just from last year. I should find out that these slides will be made available to you and the links that are shown in these slides will be, you'll be able to find these documents. And a final breakdown of what those, each of those findable, accessible, interoperable, and reusable things mean. Some common misconceptions about the Fair Principles is that data needs to be made open. Now, although I said in just one of the earlier slides there that data should be made open, this is only when you're actually able to, not all data can be. And the Fair Principles do acknowledge that. Maybe there's data of a sense of nature. And of course, in those cases, it might not be possible to actually make it open. They don't actually particularly specify any technologies or implementations. So it is more of a wide scope, wide ranging scope of principles. Fair is not a standard to be followed or a strict criteria. It is a spectrum so that, yes, we're trying to be accommodating as possible. And it doesn't only apply to the life sciences. And I say that because it originated from the life sciences, but now it encompasses the full gamut of research. So just to tie that all together, what we're seeing here is if you have this overall large circle which shows managed data, fair data is essentially a subset of that. And meanwhile you have open data that might not be managed in any particular way, but which can also overlap with fair and managed data. And what we're trying to do here is to increase this overlap between fair data and open data. And you will hear this saying many times in your RDM travels as open as possible and as closed as necessary. So the rest of the talk will actually now focus on the actual nuts and bolts, the actual how you go about practicing good research data management. That's RDM. So when we talk about data here, we typically talk about a life cycle. So from the point that a piece of data is created all the way through to its preservation. Now, this life cycle that I'm showing here is our version of it, but you might actually encounter other versions too that have different terminologies. But we typically break it down into six nodes here. Creation, documentation, usage, storage, sharing of the data and preservation. What I'm going to do now is go through each of these steps and show you some practical things that you need to do. So for creation, data creation tips, we encourage people to use a consent forms and licenses and agreements to restrict the opportunities to share that data, choosing appropriate file formats, adoption of a file naming convention and creation of metadata and documentation as you go. So just going through those consent forms. Now, this seems like something that you may not think is valuable, but it is, especially obviously if you're talking about sensitive data. But even when you're talking about non-sensitive data that doesn't require any permission from anywhere, you should do this. A consent form is required by many repositories. So when you're thinking about the long-term and when you're going to actually eventually deposit your data into a repository, most repositories will actually make sure that you have consent to actually use and deposit that data. So from the outset of your research project, make sure you use a consent form. Choosing appropriate file formats. Now, this again seems like a very trivial thing, but let's actually take it step by step here. When we're talking about file formats, we typically are talking about open file formats for good research data management. Now, there's a difference between open file formats and what we call proprietary file formats. Proprietary file formats are such things as you might find as Word files, Excel sheets, and so forth. These are specialized file formats for specialized, usually commercial software packages. But say in a hundred years time or so, you wanted to make sure that that data that you deposited is still accessible and readable in some way. It might be that the file format that you're using, if it were in a Word format, might not be readable anymore because that software package doesn't exist anymore. Now, of course, in this case of Microsoft products, it's highly unlikely that's the case. But if there are other file formats that might not survive the test of time, then you need to be careful. So for text documents, we typically recommend that people use .txt files, for instance. The other thing to remember about open and proprietary file formats is the use of lossless file formats as well. So lossless, meaning uncompressed, basically. So when you, for instance, are talking about images, if you're looking at JPEG images, which are very widely used, these files are actually compressed. Now, if you're doing some research and doing analysis on some images, for instance, you would want to actually do the analysis on the uncompressed or lossless file format. That's because that contains all the data that was captured. And, of course, when you're doing analysis, then you want to make sure that you have the full set of data. Breaking this down, this table here has been adapted from the UK Data Service, and you'll be able to access this slide when I share it. But you will be able to find many tables like this by doing simple Google searches. This table shows you the different types of data that you might encounter, and the preferred recommended file formats and the acceptable file formats. So, again, using image data, for instance, you can see that the recommended format is TIFF file. So this is an uncompressed file format and open. Whereas acceptable, but ones that you should avoid if you can, are JPEGs and other file formats that are like payments. And so for different file types, whether they're text or audio or video, you have these recommended and acceptable file formats. Organizing your data, again, seems like a very trivial thing, but you do need to keep aware, be aware of how you are doing your file naming conventions and your directory structures. Especially if you're using command line or units operating systems, this can be a very powerful way of allowing better analyses and faster analyses. If you have a standardized way of naming your files and folders, then you can do command line operations, for instance, which allow automation. So, again, just following these simple rules is very key. I'm going to move on to the next two here, documentation and usage. So documentation, again, seems very trivial, but think about what is needed in order to evaluate, understand and reuse the data. Why was it created? How have you documented and what did you do and how? Apologies. Did you develop code to run the analysis? And if so, this should be kept in share too. Important to provide a wider context for trust. And this goes back to this idea of reproducibility and trying to stem any kind of research fraud. So what is metadata? Now, this is a word that you will frequently encounter in research data management, and it is essentially a subset of documentation. It is something that is machine and human readable and is standardized and structured. It helps to cite and disambiguate data, meaning to allow knowing what any particular data or data set is from another one. So some metadata standards that you might encounter are ones that are more generalized in scope, such as Dublin Core, and other ones that are more discipline specific. And these are a few examples right here. But if you are starting a project, one of the first things that we recommend that you should do is try and find out if there is a metadata standard that fits your particular field of research. And these two links here actually take you to some catalogs of known metadata standards that already exist, and which you should try and find some metadata standards that you might be able to adopt. Moving on from that, another thing that we encourage people to do is use controlled vocabularies. Now to try and describe this, using this example here, where this is an actual use case where a group of researchers were asked to describe the subject of an actual experiment, subject meaning who was being experimented on or analyzed. They were asked this to just describe it using pretext, meaning they could just write or type in the answer. And this was the list of answers that were given. So looking through this list, I hope people can see that actually what all the respondents said were actually the same answer. They were talking about humans, except they've actually written it in different ways. H-sapien, Homo sapien, et cetera. Now for us as humans ourselves, we can read that and we can actually see that they are all the same answer. But for a computer, that isn't the case. It cannot actually figure out that they're actually what are the same thing. So by using controlled vocabularies, we can actually get rid of this problem. Meaning, for instance, when you are filling out a web form, for instance, you might get some dropdown menus which have some pre-written options. Those are controlled vocabularies. One step beyond controlled vocabularies are ontologies. So these are controlled vocabularies, but now they are structured. Meaning that you usually will find parent-child relationships. So for instance, in this slide, I'm showing two different trees. Now just looking at one of these trees, you can see that, for instance, term B1 is a child of term A3. So these are different controlled vocabulary components, but now structured into a tree. The double-headed arrow shows that what you can do here, in a computational sense, is that if you have two different trees of ontologies, one for one organism and another for another organism, if you were to try and do some kind of searching between the two trees, these trees could conceivably have thousands upon thousands of different components. Now, if we were trying to do that searching between the two trees manually, that could be very difficult because you are trying to do, you have so many different terms. However, for a computer, this could be done in literally seconds because it can actually process this information so much quicker. That is the power of using ontologies. Now, we again recommend that you look for a ontology, if possible, for your field of research. And on the slide, we have this link to a catalogue that you can hopefully find a suitable ontology. It may be the case that you will not find one, but we recommend that you at least try to look. Going on to storage. Where will you store the data? Again, it seems very trivial, but make sure that you store your active data, meaning when you're doing your analyses in an area where there is backup of that information. Don't only keep your data in laptops or flash drives, et cetera, because if you lose those or they break down, then you will not have any way to recover that data. We recommend using departmental or university servers. Many universities have provision for this. If they do, then please do use those. The other option is to use cloud storage, which is becoming more and more prevalent. There are third-party tools that you can use for collaboration, of course. These are very widely used like Dropbox, and there's another option called OwnCloud. And here at the University of Edinburgh, where the DCC is based, we have our own systems that have been built on OwnCloud, for instance. But using these commercial services like Dropbox or Google Drive, do be careful. These are commercial options. Again, if we're talking about sensitive data or any kind of data breaches, then you need to make sure that ownership of your data is actually addressed. Preservation and backup, not the same thing. And this is something that some people do get confused about. Backups are regularly done and maintained, whereas archiving of data and preservation is usually what you do at the end of the lifecycle, and it's for long-term preservation. Sharing of the data, this generally means licenses. So in the fair principles, the R stands for reusability, and this is typically where we are talking about licenses. So one of the most widely used licenses that you will encounter are called Creative Commons. And when we say most common, I'm talking for research data. So this table here shows what the different Creative Commons licenses are and what level of openness that they provide. So typically when we are talking about open data, then we are asking people to make it public domain or use this CCBY denotion here, which the BY means to attribute. The more closed licenses are the ones that you see further down here, CCBY and CND. I'm not going to go into each of these step by step, but please do go and look up the CC Creative Commons license which you can easily find online. And when you are wanting to apply a license, there are ways to do it if you don't know the most appropriate license that you want to use. And this tool here, for instance, can help you out, which will take you through a wizard and it will ask you questions on your data and it will suggest a suitable license for you to use. Finally, in the preservation stage, what we are talking here about is typically repositories. Now repositories can be both institutional or discipline specific. Now, a place that you can find suitable repositories are in this catalog called re3data.org. Again, you'll be able to access this when you get the slides. So we recommend in many cases that when you are starting off your project, that you actually look for a suitable repository first. This is actually a useful way of doing things because if you can figure out if there's a suitable repository for your data, then that actually addresses many of the questions that we were discussing just previously, including metadata ontologies and licenses. So using this catalog, you can find a repository hopefully. We recommend that you try to find something that is domain or discipline specific because this actually adds value to your data. The more that you keep your data with other data of a similar nature, it increases the usefulness because any kind of cross comparisons between the data make it more valuable in that way. If you are only able to deposit into a more generalized data because maybe your research is so different to anything else out there and there is no discipline specific repository, then your data does become somewhat diluted in amongst other data types. But unfortunately in those cases there's not much you can do about it perhaps unless you were going to build your own repository. Of course that could involve a lot of time and money and effort. Check the match particularly data needs, formats accepted, make sure of open and restricted access. You will also encounter this notion of persistent and globally unique identifiers. So something that you probably have already encountered are digital object identifiers or DOIs. Now that's one particular type of persistent identifier. Excuse me. Many repositories will actually provide unique identifiers for any data that's deposited. Not all of them but many do. And this is actually something that you should look out for. The final thing perhaps to look out for is look out for trustworthy digital repository. This means that that repository has been accredited by an independent body. This might not always be possible but do look out for that. So yeah. Just finally mentioning the persistent identifiers. Excuse me. PIDs come in various forms. I describe DOIs. You'll also encounter orchids which are for individuals as researchers for ISBNs for books and so forth. Typically they're actionable, meaning that if you were to type the ID into a web riser then it will be resolved. And many repositories will assign them on deposit as I previously mentioned. So that's the end of my presentation and I'm going to take questions if there are any. And hopefully Thomas will be able to join us as well if he's not all right. Thank you very much, Venkat. Again, my apologies for putting you on the spot, but I think you did great. So if there are any questions from you for Venkat, feel free to send them either in the chat or in the Q&A. There's one question in the Q&A. I don't know if you see this. Maybe we can read it out loud. I'll read it out. Sorry, I started to cough at the end. So the question is, having a lot of different repositories in archives however seems to have two drawbacks to me. A, due to the fragmentation to different databases, the findability will certainly be hindered to a certain extent. And B, research consortia may put data in multiple repositories, making the workflow sticky. So we would ask or recommend that you stick to one repository. When you finish your research project or when you've finished with any particular data set, that's when you're going to deposit your data, of course. Try not to fragment. You are absolutely right. Try to find one repository that you can actually deposit your data in. If you do end up putting it into more than one repository, then it's not that it's not allowed. It just makes things more complicated. So yes, I agree. Try to stick to one repository. Research consortia may put data in multiple repositories. Unless you know of specific examples where that is true, then I'm not sure that is actually true. So maybe you have a particular experience or examples of this, but I'm not so aware of that. Okay, I've just got another reply there saying, I mean, you have three different universities in an EU project. Any of the Unis has its own repository. Okay, where to fit the project data? Yeah, okay. So it depends on the policy of the university or the institution, I admit. There might be a local, meaning the institution has its own policy that needs to be adhered to. Now, in that case, if you are in a consortium or some kind of pan-European project, then this is something that you need to discuss from the very start. I can establish that you will use a common repository perhaps because that is the solution there. Unless, if that is not possible, then I'm afraid you will have to live with depositing into multiple repositories, but hopefully that's not the case. Well, of course, thank God, aggregating services like OpenAir should make this a little less sticky or seamless as well. But that might not always address the question whether a local policy, which one is more important, basically. And unfortunately, maybe some institutions make that a priority, but it's this idea of ownership of data. Okay, so there's another follow-up question on this. Sorry, there's another question saying, wouldn't it be better to have the EU forcing Horizon 2020 projects to use one central repository? Yeah. The problem here is this idea of forcing. I would somewhat agree, but it's better to try and persuade people rather than forcing people to follow these ideas. And that's, I think, where the EU stands with the way these projects are funded. They don't want to enforce any kind of particular solutions onto the researchers. Yeah. We are talking about standardisation, but leave consortia alone with the choice. Yeah. I agree in many ways. Yeah, we're trying to persuade people. We're trying to make a culture change here. That's essentially the part. We need to try and encourage people to follow these principles and hopefully that will happen. And it is happening, believe me. It is. But we want to allow there to be enough room for consortia to be able to make their own choices. I don't know if there's anybody from EOSCUP in the audience who might be able to elaborate a bit on the role, potential role of EOSC in this process. Mm-hmm. I'm looking if I see any familiar names, but not immediately. Feel free to chat, shout in the chat. So do you want me to go to the questions that were written before? Oh, yes, please. Yeah. Let me just find those. I can, I can hold on. Yeah. Okay. Can you hear me? Oh, look who's there. Great. I just thought to use this short pause to check whether I fixed the audio issues. Oh, perfect Thomas. Then we will just, we will just finalize the questions. Yeah. I can go to these, these questions that Gwen emailed to me. These were set before this webinar started. So I'll just, do you mind Thomas if I just quickly go through these? Please go ahead. Yeah. Okay. So the first question was, what is the human and financial cost of RDM? So that's a very good question. Of course. We typically say that maybe 5% of the budget of any given project should be put towards proper data management. It will just depend of course, depending on the project, because some projects might produce vast amounts of data. Others might not produce anything in fact. And it needs to be this taken in a case by case manner. But on average, we seem to think about 5% of the budget should go towards that. The next question was how to choose the best repository. So I'm not going to say any file format or file naming. I hope I've addressed that already. We'll plan S also affect the publisher requirements for research data. I actually don't know the full answer to this, but I, I'm, you know, I'm not the best person to ask about this. So I'm not going to say any more. How do academic libraries participate in the RDM to provide users information needs. Again, that's a very good question. And it will just depend on your host institution. Perhaps they need to be involved. Certainly that the librarians should be involved in building the infrastructure of your host institutions. RDM infrastructure. And this is something that we at the DCC actually do by going to many different institutions around the world and actually assessing what the needs are at that institution and helping the librarians, the IT professionals and the research officers to actually come together along with the actual researchers themselves to build RDM infrastructure. Since Horizon 2020 focused on innovations, do you have any feedback about how project partners from industry and other privately funded entities perceived open data? Were there any legal concerns, especially on intellectual property rights or some embargoes on opening the data? I think I might leave that to Thomas. He might actually have more better answer to that question than me. Next question. Would be interested in hearing more concerning the use of data from social media and research, especially in terms of sharing and ownership. Okay, that is a great question, especially with what's happened with Facebook and other platforms perhaps. I'm afraid, again, I don't have enough knowledge there of this. Next question. I would like to hear about working with data sets within RDM systems and also about the role experiences of academic librarians in this part of RDM. Okay. Yeah, I don't really have an answer there either because you would really need to speak to your local librarians perhaps. Working with data sets, yes, certainly as a researcher, a librarian should be someone that you can turn to if you have established good training, front calls in your institution. They are the people that researchers should be able to turn to find out how to create data management plans, find out about repositories and other things like that. But, yeah, specific examples. It's beyond the scope of this webinar. What's your recommendation for future applicants to universities to horizon projects from what key thing to do and not to do when writing dissemination and communication activities? I think that the key thing here for particularly horizon 2020 projects, is to create a data management plan. Again, it's beyond the scope of this webinar, but you can find out lots of information about horizon 2020 projects and data management plans, DMPs. If it's something that you've not learned about yet, please do just a simple search for data management plans and horizon 2020 and you'll be able to find out more about that. And also templates that you can use to put together a DMP. And finally, what's the best and most efficient way to store your data? Okay, hopefully I'll address that in my actual slides. So, I'm going to shut up now and hand over. Okay, thank you very much, Venkil. I just want to make use of this little interval to just to point you to a couple of resources that are also created by OpenAir. So, if you go to openair.eu slash guides, you will find quite a lot of guides and fact sheets and resources that are linked to both the topics of Venkat's presentation as the one from Thomas's. So, feel free to take a look there and browse around. But that being said, I would say that Thomas, that we try to start your presentation. Yeah, absolutely. You can hear me, right? I can hear you fine. So, everybody, this is Thomas Margoni from CREATE, which is the Department of the University of Glasgow that deals with copyright issues. And Thomas is going to talk about copyright and legal issues related to research data management. And I think he's going to talk about copyright and legal issues related to research data management. And I think he's also going to address ownership issues, right? Yes, I hope so. At least these are very complex issues. And I really hope that we'll be able to do this in a understandable and not excessively accessibility this way. You can see my slides for now. Okay. Do we have to do that again? Yes, okay. Now we see it. It's fine or do I have to full screen it? I think you have to full screen it because now we see your navigating mode. Okay. Now? Still navigating mode. Yes, this is full screen. Perfect. It's full screen? Okay. Okay. So I think it's better to keep interest high and the time within a manageable amount, let's say. Is that okay? Yes, 20 minutes is fine. I mean, we can go over time. So we started late. No, just, you know, as often happens with legal analysis, it's really exciting and interesting. But over time, I think that we kind of discover that this applies only to lawyers and other people find it extremely, extremely boring. So also for the sake of this lawyer effect, I'll try to be as concise as possible. So yeah, well, thank you very much. Thank you to everyone to go in for inviting me and organizing these activities. Thank you also to Venkat for a very interesting presentation and sorry for the initial problem with my audio. So I will go over some of the issues that actually Venkat mentioned because I found his presentation extremely inspiring. But I do think that he and I do think he has identified a couple of basic concepts that are the reason why we have problems with data from a legal point of view. And one of these concepts, the first one and the aspect that I focus most on my research attention, is precisely that of data ownership, which is a very strange and to some extent a recent concept. Because if we look at the area of law that traditionally regulates, let's say, information property, well, one of the basic assumptions is that data as such is not really protected. What is protected is an original expression. So the usual example is a plot about a doctor who goes mad and creates a monster that looks like him and eventually will underline the philosophical tension between human condition and creating life. Sorry, everyone can do that. This is an idea. This is not protected. Now, if you start using the expressive form that Mary Shelley did or any other author, then there it's the area where the form of so-called property, but also here we have some clarifications to make, is that of copyright. Copyright doesn't protect that. It doesn't protect basic information. It doesn't protect facts. It doesn't protect any of these aspects. Why? Well, because they are the basic bricks of our knowledge. The reproducibility crisis that we were mentioning shouldn't, unfortunately, shouldn't be too surprising once we notice how much within research we have switched to a model where data has an economic value. And obviously if it has an economic value, then the closest tool that the law offers you is some form of property. So there are often these associations. Oh, that's my data. When the truth is that, you know, it's not really your data. It's simply some basic information. And international conventions in the field of copyright and the light rights usually clarify this aspect. Only your original expression, the output of your ingenuity, the original, the choices that you make, only this aspect can attract protection. But the basic data, no. Which doesn't mean that there are no forms that can offer some sort of proprietary protection to data. But then what is protected in this case is not the data themselves. It's not a single datum and it's not any form of data aggregation. But we have specific legal requirements that tell us that, you know, this data, it's protected if it is constructed in a specific databases. So there needs to be some sort of methodological or systematic arrangement, et cetera, et cetera, et cetera. I will not enter too much into these legal details. We have guides where we try to, some of which Gwen showed us just now, some others are having my slides. And I refer you to those guides if you want to have the legal details. But I think that the main message that is important to understand now is, you know, data ownership is the enemy of reproducibility. Because to the extent to which you have to ask for permission to use someone else's data to some extent, then you are creating barriers to this reproducibility. And this is a first main aspect to keep in mind. The fact that data as such is not protected and should not be protected has really to do with basic freedoms, such as freedom of scientific research and freedom of information and end of speech. And this is quite clearly stated in international conventions. In Europe, we're a bit peculiar in the sense that we decided to offer one additional layer of protection to data, the famous or infamous Swai generis database, right? And in that case, you do not need creativity. You need a substantial investment in collecting, verifying, or presenting the data, but not in creating it. Once again, all the information necessary is in the guides, but the main problem here, it's the same. You do not want as a legal system to favor forms of property over information because this limits basic fundamental freedoms. And again, if you impede the free flow of information, if you condition it to specific barriers, if you create transactive cost to that, then these are the spaces where data, sorry, the reproducibility crisis or reproducibility issues arise. And now after this nice statement, I basically am telling you exactly the opposite. So what has happened over time? It's that you have to imagine that copyright laws, most of them have been written 200 years ago. Now, back then the attention was on mostly books, maps, songs, that didn't really attract much of an attention. All these relevance on the role of data, it's certainly the result of the last few decades, few years of data analytics developments from tax and data mining to machine learning, et cetera, et cetera, et cetera. So a legal system that was designed to protect knowledge 200 years ago with minimal updates had to offer some sort of answers to modern questions, questions around data. And whereas the main principle survives, data as such is not protected, it turns out unfortunately that at the end of the day you have to ask so many permissions because sure, the data as such is not protected but because you need to make a copy of almost any support where data is saved, then almost always, not always, but very often you have to make a reproduction of, for example, a database and that is protected and then you need to ask for permission. So there are these kind of mismatch between what is the basic legal principle and what is the applicability of the law to a specific case. Now, you can perhaps connect these, hopefully not to abstract scenario to the debate that we had around tax and data mining and on whether we need a tax and data mining exception or the right to read it's the right to mine. These two, this dichotomy I would say mirrors very well this tension that I just explained to you. Of course, the right to read is or should be the right to mine, while if I read something, I don't need permission but if I do that through a tool called a computer then I need permission, that doesn't make too much sense. But unfortunately, there is so much legal uncertainty in this area that at the end of the day just in the opinion of the majority that approved the directive it made sense to have a specific exception. So the text and data mining exception that allows you to do certain things with data. But this is also an implicit acknowledgement that well, we could say that data as such is not protected but at the end of the day, a lot of the ways or the forms in which data is stored or contained or saved or presented while those forms are protected. So we are kind of surreptitiously denying those basic principles of free flow of data and creating some legal barriers. The other aspect that before we spend a few words on the article three, so the text and data mining exception which has been approved and should enter into force over the next now, I think one year and eight months. I want to mention another aspect that Venkat identified in his presentation, but I will not go into the detail from the point of view of a researcher that it's data. So whatever the legal impediment, this is not, you know, I guess probably rightly so it's not something that should bother the researcher. Whether you can do or don't, you know, or you cannot do is a certain thing. It's the question that really matters to you whether the reason why you cannot do it it's grounded in copyright or personal data or freedom of information or public sector information or contracts or technological protection measures. This becomes a bit more of a secondary issue I will imagine to many researchers. Unfortunately from the legal point of view, we couldn't be, we couldn't be on to limiting our analysis to corporate and personal data more far apart in terms of what are the principles, what are the reasons, what are the aspects that we want to protect. So many of the things that we can say about that ownership look at a certain area of law, but when we talk about personal data, we are looking at a completely different area of law and the tools, the permissions, the kind of activities that you can or cannot do work on a completely different basis. And within open air, we have a task force where a colleague of mine and I are working on these two issues. I am focusing as probably having further this point on the copyright and proprietary and ownership issues, whereas the colleague who probably you know, Prodromos, he focuses on the data protection aspect. I will not say much more about data protection for this reason. I don't know if Prodromos is giving a similar talk within the open access week, but certainly the guides that open air has prepared cover both copyright issues and privacy or data protection issues. So very briefly on the text and data mining exception, first basic clarification, certain member states have implemented the text and data mining on their own, the UK for example, but because this was done within the old European framework, all these text and data mining exceptions have to be limited to non-commercial purposes. Whereas the European text and data mining exception, article three of the copyright and digital single market directive, the directive has been approved, but in last April, and since the publication member states have two years to implement it. So member states are right now in the process of implementing this example. So this is probably something that you don't have right now today at your disposal, but it would become available very soon. And it will be the big advantage. It would apply almost in the exact way across all the, at this point, I don't know what to say, 27 or 28 countries anymore. So that's a big advantage also for the issue connected with big large consortia, which sometimes have to follow different national legislations. In this case, article three would apply in a very, very similar way everywhere in the EU. And article three did some good stuff. So the definition of text and data mining, I skim through it quickly. It's broad enough to cover almost any data analytics. Literature hasn't found a big problem here. The scope, unfortunately, it's quite limited. It's limited to the right of reproduction. So you can make a small copy, for example, of a database if you want to extract data, but you cannot communicate or distribute that copy any further. This should be contrasted, for example, with what is the situation in the United States where a fair user says that these activities are transformative. They create added value. So we don't have, for example, this limitation to their production. You can do whatever you want. You can communicate your results to the public. On the basis of the EU solution, well, if your results are production in part of the original, in theory not. So you see that we have a big problem here. Beneficiaries, again, the EU solution limits the availability of this exception to research organizations for research purposes. Probably many of the attendees of today's webinar belong to one of these organizations. So that's probably less of a concern, but we have to keep in mind that here we are cutting out all the small and medium enterprises. One could say, well, you know, the commercial sector should pay for it. Sure, but we're cutting out also all the startups and small and medium enterprises. And the difference is that, you know, incumbent players do have the money to pay for, let's say, a license, whereas startups don't. So this condition that at first sight might sound reasonable. But the difference of marginalizing further the new entrance in the data analytics field, whereas the big ones don't have a problem with that. And again, this I understand you can get a bit technical, but the good thing is that if you're doing text and data mining for this matter, either under, let's say the UK exception or the future you want, and the terms of use of the website say you cannot do this. For example, the terms of use of the website say, by reading this website, by accessing this website, you accept the terms and conditions, which no one and never reads, but at some point in article 23.1.6.8.44, it says you cannot text and data mine. Well, that provision, it's void. So this is a good thing. However, if the same effect you cannot data mine, it's done through encryption, then well, the discourse becomes a bit more complex, but the general answer is you had a problem. I'm happy to discuss this further. I don't want to go into the details because it can get quite technical. And again, I find it extremely exciting, but perhaps, you know, I'm the only one here. Now, the good thing of or apparently a potential good thing of the exception, it's that it created in its latest version, a new article that allows everyone. So you see in point number three, it's all not only research organization to perform text and data mining. However, in this case, right holders can limit contractually. So the famous close 4.3 point, et cetera, et cetera, et cetera. In this case, under article four, it could be enforceable. So then again, you see there is always this tension between openness and non-openness in this specific area of data and data ownership. Again, if this sounds complicated or tedious, well, that's understandable. You're not alone out there. We tried to create a few guides. And here I reported the main guides that we developed with OpenAir. This focus, as I said, on copyright ownership and reuse. So Venkat earlier on showed some licenses, creative commons. So here we have a bit of an analysis, what you can do, what you can't, how can you combine different licenses, for example, et cetera, et cetera. And I'm sure that those slides will be available. So feel free to, you know, click on these guides, have a look, hopefully they are easy enough to understand. And, you know, if they're not, please let me know. So this is an extreme summary of my presentation. I think that the main message, it's that, surprisingly, and probably also deceptively to some of you, that it's not and should not, most importantly, be owned. And for how much you can feel a strong relationship to your data, because I don't know, maybe you have spent the last five years collecting it. There is a much larger public interest need there that wants, that requires that data to be open. Obviously, we're not in a black and white situation. No one will oblige you to disclose all your data immediately before you have had the opportunity, for example, to verify it or to perform a number of activities. But, you know, these are our details, if we compare them with the problem connected with the fact that we think that data should be owned. And it's like to say that ideas should be owned. That if you think that, you know, the idea of, I don't know, a flying object, it's good, well, then it's yours, that no one else can build airplanes. And I'm afraid that in the case of data, we're only starting right now to understand the far-reaching consequences of these assumptions that we are making. So hopefully, the application of open science principles to the field of data will guide us in the right direction. Thank you very much. Thank you very much, Thomas, for this very interesting presentation. Let's just briefly, I'm not going to have this Q&A for too long, because we're going a bit over time, but I would just like to refer back to the two questions in the list that we put in the chat that Venkat referred to you, because the one question was, are there any legal concerns when it comes to IPR with private partners in an industry? I don't know if you have any feedback on that from Horizon 2020 partners? Well, you know, sorry. It's a huge question. And usually when you sign a GA, there is a specific section dedicated to IPRs and the IPRs that already exist within one entity and whether those should be considered owned or not owned by that entity and what happens if part of those IPRs are reused by a partner. But there really, it depends on what you write in that grant agreement. So unfortunately, I'm not aware of whether there should be and if there are not, it's something that probably should be done, a guideline or a best practice on how to write that clause. Unfortunately here, we are always in the same situation. As a researcher, it's usually not you, but it's your TTO or grant team at the university who writes that clause. And their job, if they want to do this job well, they have to protect their employer. So they normally apply a very restrictive clause. Everything that belongs to the university of Glasgow, belongs to Glasgow and it's ours and you cannot reuse unless it is specifically authorized. And I agree, again, sorry, I keep citing Venkat, but I agree with his statement when he said that we're trying to change the culture here much more than the law. I mean, this question that you just reported, it's not as much as a legal question as it is a cultural question because honestly, it depends on what you want to write in that clause. If we understood that should the universities funded with public money really own the IP on what they have produced? Well, there is quite clear in the literature, there are clear cases showing that it's inefficient to recognize these IP rights. And then if let's say that the IP right is a patent, you hit the jackpot and you make a few millions. But the truth is that the general cost to managing and administering all these IP data largely outweighs the advantages of exploiting IPRs from a university point of view. But then again, this is not actually the, you know, direction where many universities are moving. Obviously it has to do with bigger questions connected to public funding, etc., etc. But yeah, I think I would stop here. Okay, thank you. So the next question was about social media, but maybe I'll propose that we refer that question to Prodrovis. I'm correct that he would be the one. There's a couple of questions in the Q&A window. Yeah, yeah. Yeah, I was going to get to them. I just wanted to make sure that we covered all the questions that people send them beforehand. So the one one's sure about the use of data from social media and research, especially in terms of sharing and ownership. So Thomas, you understand you correctly that that would be more a question for Prodrovis. What data, right? If it is personal data. Yes, I mean, don't get me wrong. It's not that I refuse to even consider personal data issues. I do have a working knowledge of those. But the guides have been written by him. But don't get me wrong. A lot of, I mean, even when you, I don't know, when you do some site scraping of Twitter or, you know, whatever other website, you know what you want to acquire data from. There are two types. The data that you are obtaining could be personal data. If they are personal data, meaning that they identify or could identify an individual, let's say an email age, then in this case you have to look at data protection issues. And yes, consent is so the main thing of data protection of the GDPR is that it tells you that you need to use to have a legal basis in order to reuse that information. And the consent is certainly a very important to legal basis. However, it comes with strings attached in the sense that consent can be withdrawn. So, you know, you acquire consent, but in the known one year and a half, someone writes you and tells you, please, you know, I would withdrawing my consent to eliminate the data that applies that relates to me from your database. In that case, you have an obligation to do that. It has to be specific. So it has to be specific for a specific for a purpose. So if you acquire consent, because you want to, I don't know, analyze social interactions. You cannot reuse the same data for a different consent for a different purpose such as analyzing. I don't know, biological interactions, if this means anything, making up examples, but the problem of consent is that it has to be specific for a purpose. And it has to be time limited. So, you know, you have to indicate the time and at the end of the time you have to delete it. So what we're doing the guides is actually identifying situations where the, you know, different legal basis operate better for you. Sometimes, depending on the situation, but again, I refer you to the guides, a different legal basis other than consent may apply may offer advantages to the kind of activity that you're doing. So in this case, for example, a legitimate interest or the fulfillment of a contract, they may offer you a different set of opportunities or limitations. So this is something that cannot be answered in general, it has to be analyzed in the light of the very specific situation. And then again, the data ownership or copyright aspects are completely different. One would need to see what the terms of use of that specific website say and if they limit or permit certain data analytics, you have to analyze where they're depending on the jurisdiction where you are. There is a mining exception that applies to you, whether it is limited to specific scopes, whether it is limited to specific entities or whether it is limited to specific type of uses. And then again, you know, you have to be to almost develop a case by case analysis, there is really no other way out. But what we're trying to do with the guides, and sorry that I keep promoting the guides, but I really think that one of the main contributions of open air in this specific field, it's trying to standardize a set of common issues that researchers might encounter and offer answers to these type of situations. They won't be as detailed as your individual situation, that's simply impossible, but hopefully they will get as close as you can get. And you can use those guides when you go talk to your technological transfer office or to your university grant team and show them that the guide says something else than the usual default disappointing answer. So maybe again, this can help change the cultural approach in this type of issues. Okay, thank you. Before we close, I'd like to go back to the Q&A. Ben knows because there are like three remaining questions there. The first one is one by Sebastian Lange who basically makes a statement, not really a question. I'll just read it so that you see it all. Science is also that it arises from itself and it's not dictated by superordinate political organization. Even it would be nice to have a uniform structure. Structure is always a constraint of science, especially when it comes to measurability. As there is not really a question there, I'll take the liberty of going to the next question. That's okay, I'm sorry Sebastian, but we're running out of time. There's a question for Thomas by Petra again related to the ownership of data. So how can you explain to patients that their data cannot be owned by them? I don't know if you want to comment on this Thomas, I think you already partly answered this in your previous comments. Yeah, it really depends on the data, right? So it really depends on whether these are, if they are personal data, meaning that they identify the individual. Well, ownership is not the right word, because ownership means property and if you own something you can sell it, whereas you cannot sell your personal data, you can do certain things, but not others. So it's not ownership, but there certainly be a connection between the, it's like, you know, your personal data and extensions of your personality. But you wouldn't say that you own your personality. I mean, there is some sort of attachment between you and your personality, but it's not something, you know, you can go at the market and then buy or sell it. So ownership is not the right term here. There is certainly a power to control, but only to the extent to which certain certain data are identified or identifiable. Because otherwise, there is, you know, this balance that you have to make between public and private and say, well, you know, combining and linking all these gigabytes of data, right, of data can help finding new, you know, discoveries. So that's a good thing. Okay, so let's leave the final question. It's to be expected and I'm surprised that we kept it at bay for 90 minutes, but the question is what will happen to your research data stored in the UK after Brexit. For example, from research projects in which a number of universities from different EU member states participate, which law applies UK or EU law. I don't know. Thank God, Thomas, you're both based in the UK, so I don't know if you want to, if you both want to. I will say that largely depends on, you know, what will the law say. So right now there is a negotiation of, you know, this whole deal withdrawal. I would imagine that it doesn't go specifically, you know, in this very scenario but probably indicates something on the future relationship between UK and and the EU. But then again, you know, it's also a matter don't expect that the law can offer you every single answer to your question, certain other things. You know, there are certain things that are not regulated by the law, but maybe by contract so is there a store the data in the UK, what does the agreement say, can you get it back, you know, again, it really depends on the on the on the details. Yeah, I agree. It's too early to say, I think, but, but if it's EU funded research, then I think that it'll be very difficult for the UK to block access. Sorry, yeah, it's just going to finish there. Okay, well, let me finish the entire webinar up to that to that cliff, slight cliffhanger. So thank you. Thank you very much for Thomas and Venkat for agreeing to talk here I think it was both of both of the webinars are very interesting. Again, my apologies for you the audience for the slight technical problems at the beginning and Venkat you for making your rush through your presentation because I communicated the time wrong. All of these presentations and recordings will be put online. They will be put on the open air YouTube channel on the open air webinar pages. We'll also distribute them by a social media and you will also receive one email from us next week with an evaluation form and if they will also link to all of these recordings. You'll definitely get them somewhere in the next somewhere in the next week it might not be today or tomorrow but rest assured they'll be they'll be there. So, then have a nice afternoon I'd say and I would I'm hoping that I will see some of you back in one of our next webinars this week. Thank you very much. Thank you.