 And again, hi everyone, I am Ellie from the pool from the research center. I don't know how many of you were already present during the last course that we had. This is not me again, but let me quickly share my screen so you can see how I look. Hi everyone. I will now close it because we are having a network and bandwidth constraints, too much rain increase those days. So yes today is a very exciting day because not only you will have the chance to, you know, have many demos of tools that would help you and assist you in applying the open science practices and principles as we saw during the past courses, but you will also get the chance to practice them yourself by following a few exercises and see actually and how far your data is and if you can already now an existing article that you've published somewhere which you can already offer it in open access green open access by depositing it in a repository and so on. So these are things that we will be touching upon today. Yes, during kind of at the end we will also have a guest speaker he will be talking about the more, yes he will be talking about the more sensitive aspects of data processing, and he will be showcasing a tool that supports and anonymization specifically for data and also providing us with, you know, a bit of insight and knowledge on how what are the differences between anonymization should anonymization and other mechanisms that are being used for data protection and what are those the GDPR and let's say guidelines. So yes, whoops. All right. So let's time to have some fun today. We'll start I tried to follow a logical and logical path for this, for this session. It's not a presentation, as we said, just a read just to recap. Again, now you know I think by now you know more than us than me and Emma for research data management life cycle from the different steps in the different activities that are under each step, but just to recap, these are the, this is a research data life cycle and different activities like we're starting from hypothesis where we have to consider when we plan our data we have to consider the costs associated to each activity. We start with a within move on to data collection when we have to be, you know, and ensure that if we are reusing data we don't violate any copyrights, we have copyright clearance process that we follow and we ensure that we also give credit through citations. To the to the people but are reusing their data processing during the processing we make use of open source software and open interfaces to process our data and clean and tidy them and make them, you know, prepare them for analysis. Storing data and results we make use of service infrastructure we might store in our results due throughout the, throughout the research project we might store them locally or we use an institutional infrastructure provided by institutions like cloud drives or storage facilities that are from our institution we attach as persistent identifier to our results we describe metadata and we publish metadata with an open license. For the long term preservation we use services that safeguard the preservation and integrity of materials and produce standard metadata so we use standards for our metadata not just some fields we don't just introduce some fields that can be used for information exchange but we actually make sure that they are in that they are provided we are following a standard from the data exchange. Publication and distribution we publish the data with an open license we use open evaluation if applicable. We ensure links between publications data and methods and make use of repositories and for use we ensure the accumulation of credits and we clear citations so this is again just to to review to revisit what has been already said during the past, the past courses. So let's see how we can find and where we can find information with open access information that we can immediately use and exploit for our research. This first and this first mini demo is for open and explore which is the search engine of opener and let's see. Yes. Okay, so let's see how this works. So I'm ready. Do you do you still see because now I'm using the browser. Do you still see my the PPT or the. I think we see the PPT because. Okay, so let me stop sir, or. This is such a fun session when we are face to face. We're really trying very hard to make it as you know, as seamlessly as possible. So in the meanwhile, I will paste the links to the tools that Ellie is showing you in the chat so that you have them. That would be a good idea. Thank you. Yeah, so they can check now when they're explore. Okay, let me see in a search screen. Yes. Do you see explore and yes. All right, perfect. So this is explore.opener review as I said is the search engine of opener you can find, you can see how many content we aggregate and all the different sources that we use to too much to count and, you know, as we tried to categorize and create entities from this content to make it easier for people to find information and to access information and more, more easily and also more accurately. You can search by all content. So whatever is indexed in opener you can search by research outcomes meaning publications versus data software and other research outputs under other research outputs you might find workflows you might find reports you might find posters, presentations and this kind of outputs. You can search by projects so find projects and there you will be able to see all the results that they have produced. You can find content providers like from journal from journals and journals, repositories data repositories and literature repositories because there are different differences in the type of content stored and preserved in repositories so you have to make sure that you that you select or search for the repositories that they have the information that you want and organizations of course, you can search by organization and see what are the outcomes that they have produced. So let's say that I want to search for, and I will use the all content that I want to search for this output. Right. Here is the here. Let me minimize this. Do you still see. Is it still visible. Yes. Okay, great. So, here, once you search, once you click search, you can see the outcomes like what are the results. You can check in which category of the results that I've already mentioned the results are under. So here we're talking about the research outcome, not a project, not a content provider, not an organization. It's an open access so we can see the mode that this output is provided. If we had a more, let's say, not not a more broad, a broader search, then we would have more results and we could filter by publications which is data software so you have a few criteria here that you can filter with range the year you can sort by year, or this year last year last 10 years, you can sort by funder. So this is preselected it's only one because it immediately found what I was looking for but you could find, you can, you could search for funders, you could search from the list of types, language, and all the different extra information that would really help you to refine your search and find your what you're looking for. You can always download your results in a CSV here and use them for your research, maybe you want to do some statistical analysis. And let's see so this is my record right now I will click to view my record as I said opener aggregates information so it doesn't store anything in the database, it just provides all the necessary links and to the to accredited sources, but that they that they still and preserve this information. So we can see here that we're talking about the publication, a conference object so it's a publications. It's a publication for for conference object. It was published in 2020. And this is the title. These are the authors the associated authors. It's in an open access. It's an open access. It was published 17 November. We can see here what how many people have tweeted and we can cannot see it here but we also provide other metrics that you can consult. So if I want to check the full text, I have to click on the DOI. And then I'm redirected to oops to there's another record where it's actually preserved. And I can see all that. So we see that this was a poster. It has a DOI the keywords everything here, and I can preview and download it from this platform, whatever platform this this would be in this case it's in order. Right. This is, I think what I wanted to show you from here, but now you won't be able to see. So now I want you to, can you see the PPT? No. No, we still see your browser. Stop and say it again. Say it again. Okay, now you see it right. Yes. Not in presentation mode. Okay, yes. So now we would like you to explore explore. Find the project. It is there a project that you have worked. It should be and should be recent, like from 2017 let's say, and find the project and we can give you the rights and you can set your screen and show us what you're doing. We want you to show us the data sets and publications of this project. So probably we can, you have the link to explore in the chat. So when you are done, you can raise your hand and I can give you the right. Exactly. Okay. But what do I see? Yes, I see Maria is ready. So Maria, let me promote you as a member. You should now be able to speak and also to share your screen through the button on the bottom of your screen. Okay. Thank you. Okay, so I think you are watching my Firefox. Yes, then I included vicinity. And there were so many projects so I filtered by European here. And then here is the project. And there are many publications but I found it because I look before for this data set. This publication. It should be also data set link to it, but it's okay. Then I have to show the publications, right? The data and the link is here. Okay, so let's see for example. These e-word experiments is the data that is associated to an open access process, a publication. And there should be two, there are two entries in Zenodo because we have to make some changes after the reviewers. So this is the second one that has the data, also the software and the R files. And then I think you asked for this link to the project. I was about to ask, though, so you can foresee things. Okay, because I saw that in the slide, sorry. Ah, yes, yes, I, yes, so it was the first part. You can move on to the link part. Okay, so this one. Yes. Oh, I'm not. So I cannot continue, I guess. Okay. Do I need to show something else? No, no, it's okay if you don't solve this, don't worry. Okay. Okay, then I stop sharing, right? Thank you very much, yes. Thank you. Yes, I now see that I, next I wanted you to link. Yes, find the data set in this project and link to the project. Okay, so anyone else who was fortunate enough to find something or not, that can also be a use case that we can see together. I see no race stands. Okay. I just wanted to say, like, you can, because just era is like it is a consortium. So it is. It has many, many funders this consortium. So it's founder has different entity and different ID. And now what we're doing is create creating a unique and persistent ID for them. Well, we're not doing it just arise doing it to facilitate other research data management needs that they have. We're creating a unique persistent ID for just era which we are will be also using so from now on, you, the, the, sorry, the, the, the example that you showed us you search by the European Commission, I think you mentioned right, the, the funder. Because one of the funding organizations of just era was which funding organization. Sorry. I don't remember which fund organization was. You mean the one I show. Yes, yes. So I look for the vicinity project and it is a European Commission is an age 2020 project. Okay. And I finished I CT. Okay. So let me find it is there a one example. Let me see the projects. If you go. I don't know if you have a project it is there a project that you want to suggest that we search together. There's a there might be. Maybe there are some just there are researchers who would like to to search for their own works otherwise I could give you many, many examples. One example you could go and Well, let me let me do it myself, perhaps. I found this one. Let's go ahead. Let's go for course model. Yeah. That's one example. Okay. So if we go. And we go it says no result but it's not a research outcome it's a project so here we can see the project. Right. But we see that has no publications in research data currently. I'm pretty sure that because it's not a one year sure that there are there are publications there so if we search, but we cannot see them here right. So we have to go. We have to know exactly where to go to find them. Yeah, not only us but other researchers as well. So it's it creates a bit of confusion. So if we know where these these publications are we can link them with I have to sign it. We can link them you have to have an opener account for that. And you can link, you can search opener there you will be able to find something. You can link them to the project. Very easy. So you just click link. It's after I clicked link. It shows me this I have to search open the research outcomes and I don't know can I can I find the research outcomes because I also view. Okay, let's see. So, if I if I had paper to search. Then I could add it to to here right and link to this project, but currently I have nothing. And so this is how you link. Is it understandable. Yes, I guess so we can also say that this is normally done automatically by the the open air platform but sometimes for some repositories say you can find the you we are not able to retrieve the links. So this is how you can manually link your output to the project. It's not something that you have to do always just to say but if you do not find a particular publication assigned to your or other results assigned to your project that then you can still manually add it in this way. And then at the end you will have everything in one space so you can easily know and provide this information. Immediately without using time when when you're called to you know report for your activities on a specific timeline. Then you will have everything in one area is something in the camera. Where are the projects retrieved from can we create a new project in open air. So the projects are. So we aggregate many resources as you saw, like more than I don't remember I don't even remember I've lost count some thousands of resources. So, one of them is cordies. Another one is it will be just there. They are journals they are for full projects though it's cordies. So if you if you can create a new project in open air and know you cannot because it has to be a legitimate project that receives funding so not something that is in a proposal stage let's say. And only those that are the submitted proposals that are. That are successful are indexed in opener. We answered that question I don't know if Emma has to add anything. No. Okay. And then can you add to explore that open their publications that are not open access, for example, I triple explore. You. Can I think. Yes, so the thing is we have a list of sources where we take this information from so for publication sometimes we also retrieve information directly from the publishers, or we use cross breath. Also, but the thing is, you can also add manually by using the DOI. So this is possible, but then the source, the record will not be open access. So sometimes you already find I don't know if Ellie can show but you already find some versions of the article that are not open access some of them are taking directly from the journal website. So, I hope that we answered this and then we have another question Ellie. I'm asking because I can see our project is there but only from the Austrian partner. Yes. So this will change and this we are working with, which is there to change so that you can view all the different partners and funding organizations under this project. Ahmed, if you want to add something on that. Well, yeah, I mean, it's a little bit. It's a test. So we will very soon we will soon assign as, as you mentioned already Ellie identifies for all projects, not just for the new projects. We will try to collect as much data as possible on older projects so it would be easy to monitor also already already started project. But again the complexity is the fact that there are a number of identifier a number of grants related to national funding agencies, and not all projects acknowledge Chisterra in the same way. This diversity create the complexity and and but we're working on it. There is also a question in the chat from Julia. She's asking if the publications of the project are missing. Can you add them or it's all automated. So, the thing is that the publication and the results have to be deposited somewhere. So if you need to deposit something, like for example an open access version of your paper, or data set, then you should use one of the repositories that are linked to open air, or if you have no one you can use Zenodo. In order to link a result to the project, it must be deposited somewhere. And in order to do so you choose a repository and then open air goes to the repository and retrieves information. The only repository that is directly connected immediately to open air is Zenodo so for all the others you have to wait a couple of weeks in order to see your deposited record in open air. So, this is how it works. Ellie, I don't know if you have anything to add here. No, I'm also looking at the chat, but I think that was it. That's an extra information provided for this question but I have a project I work on called cloth. I don't know how to pronounce it but not database yet only publications. That's because you can see what publications are open access. Okay, I understand that. Maybe is this a project by Chester or is it another funder? If Julia can, no, it's an ERC. Okay, so it's probably not there because what we do is that the European Commission is giving us the data about the new project. I think it's twice a year so sometimes for the first part of the project, we don't have the information. Okay, no publication appeared yet. Okay. You do have a paper but did you, this paper, is it deposited somewhere in open access version? Yes, okay, so probably it's a matter of time. Yeah, it would be good if you could use the link functionality and if you have any problems, just let us know but I'm sure that you will be able to resolve this. So one thing that we can add here since you are all from the ICT domain, it's that the ways open air connects projects and results are three. And the first one is by looking at the metadata of the repository where you deposited the result. And it is not always the case that the repository provides a field for inserting the grants. And sometimes the connection with open air will not see this even if it is available. So the second step that we do is that periodically we download all the texts. So the full text of the papers and we have an algorithm for text and data mining and we check if inside the paper we find the reference to the, to some projects. And third thing is what Ellie just showed about the manually link. So if no one of these two methods work, still you can search for the publication and link it to your project. So these are the three methods. Julia, if you want, you can go and check if your publication is in open air, and then you can link it if it is already. Okay, so that is what you can do. If you link it, if you find it and then you can link it to your project as Ellie was showing because probably the information was not I'm just realizing that may be sorry. I don't see Julia will say we acknowledged another paper. I was confused. Okay, okay, so that's good. Another project. Okay, okay, okay, no problem. But this is the way it works. So you have to know that these are the three methods we use for merging. I have to say that in my know what's life, I get many, many requests from people asking to remove some link to projects. Because in some cases people tend to insert the acknowledgement to maybe all the projects that they are working on. And this is done for many reasons, but sometimes it's not the good way to do because what happens is that when you link a project or acknowledge a project in a paper, then we find it and we send the publication to the funder. So sometimes people get in trouble because they acknowledge paper that they shouldn't, and then they have to scientific report in a project something that has nothing to do with it. So this is something that you should be aware of. No, no, no, don't worry. I mean, these are very useful things to know. Okay, so I think you answered that so I can move on we were here looking at the metadata. We're about to to see what tool can help us with metadata and finding which data standard to use. So this is a very useful tool. It has a list of index is a list of metadata standards, and it also categorized per subject areas, which makes it very easy to to find and allocate based on the research activity that you're doing most of them, you know, the best solution for you. And let's see what how this looks like. I'll have to share it. Okay. So this is how it looks like. The metadata directory, it was created by the resistance data Alliance, it's initiated from versus data Alliance. I don't know how many of you know there is a state Alliance is a global forum where people can so interest that there are many working groups and interest groups available and they can so interest and find people to so to voluntary work together and address major needs and major you know issues that your specific discipline area is dealing with at the moment. And the business category, you can feel the standards that are out there in in base. This is one way that you can view them. Sorry. And the whole list, you can, you can see the title of the standard, you can edit it, you can see brief and brief description of what the standard is is about and then you can also view the standard, how it looks like. What are the, if there are any extension, if there are any use cases you can, you can consult that will help you maybe find commonalities with your research so it makes it a better better candidate for you. You can see that you can be directed to the standard web page and view the schema the different versions of the schema that it's been followed. So yes, this is how it looks like you can view the extensions associated to those standards because some have minimum metadata but some provide also extensions that you not only extension to the standard but also they are the standards that provide those extensions for you to use in your platforms. You can view the tools that can support metadata here you can view use cases from people that have already used some of those standards, and you can browse by subject areas, which I think it's, it's very handy. Let's see if we search for material science, then we see that the standards that pop up are about crystallography and it's a core scientific metadata model. This is an international standard for storage and exchange of neutron x-ray and new experiment data so this is for this kind of data to describe this category. And yes, you can explore how this standard has been used, and you can learn how to use it by going to the website. So now, let's see so now I want you to go to the this website and find a metadata standard for your data what you've been working on recently, for example, what's a specific thing that you might be working on or just choose a metadata standard for your data from this list. And then let us know once you have you select standard raise your hand so we can give you the floor. Yeah, I think there is another question by Julia chat. I have another question this paper is acknowledging two projects, but I guess related to what you were saying it can only be linked to one. No, it's fine. Yes, it's it's completely fine to link more than one projects. But if they are both you yes, it could be something that you have done in conjunction. Yes, it's it's it's completely fine. No problem. The only thing that you have to be to care about is when you will have your review for the projects. Be prepared that the the reviewers and project officer may ask you to explain how these single paper is connected to both the projects that you have. So the one of the main reason. Yeah, that that is fine. It's fine to have, you know, synergies and superimposition in the work that you do for two project is completely fine. One thing is is related to both. I was referring before the question that I always get is removing an article from a project because the article is not related at all to the project. So this happens quite often at least many, many Italians do ask me to to remove papers from the projects. I don't know how if, if you had any. Yes, so because sometimes people add projects that are not related for many reasons. Many of those are, for example, administrative reasons, or yes, so, but sometimes but we cannot the thing is that open air is open air is providing a monitoring service for the easy and one of the thing that the easy at the other funders are asking us to do is to measure the impact and also financially speaking. So, for example, if you pay a conference journey with the money of a project, because you have a paper, a conference paper, then you should provide the funder with the proof that the paper is scientifically connected to the to the project. Okay, so Julia has some something else. Paper to the chest era one. Yes, because it doesn't appear linked. Yes, you can do that. Yeah, you can probably, you will probably see it linked in the next weeks or months because when we run the text and data mining algorithm, we will find it in the acknowledgement. But this is because probably cheese there are now doesn't. So I don't know if this is an old project or a new one. And so we didn't really look at all the project from cheese there are in this moment we will but not now. Isn't it weird. It was linked to one project and not. Yeah, it can be because if the cheese there are project is not linked to a funder that we already serve. Probably, we don't have the project, or we don't have the text and data mining algorithm set up to search for that project. It can be this or the other option can be that. And for the other project that we found that the link in another source. Maybe it is this is in a repository that provides us with the only option of a single project. I wanted to say that it has to do with the data that you also add and what you information you provide in the journal that you publish. So if you, if you acknowledge both funders, and then it will go under both. But if something is missing from your part as well it can be the case that it won't like like if you if you forget one one information then you won't be able to find it. If you want to Julia you can paste the link to the paper in the chat so we can check. Quick check if we understand why it's not always the case that we can understand why we need them to go back to our technical team and ask them why we can check. In the meantime, I don't know if you had the chance to search for metadata standards to use in their in their data activities. Anyone like to. I'm just putting in the chat to the link to the meta data standard. Sorry. Yeah. Otherwise, they won't find it. Don't worry I can do it. Okay, so. That's the link sorry apology. If you can go to this link and yeah find the proper metadata standard for you. Note that it might not sometimes in some disciplines and in some specific cases for for data and some metadata standards might not be invented yet let's let's use this verb. So it's not a problem if you cannot find it, but try to find something that is very close to the purpose that you are conducting your research and characterize your data. Yes, in the meanwhile, I found that the article that Julia was pointing us to is in one of the repositories and there it only has the link to the Clotilde project in the metadata. So this is probably why we associated it only to one project because if you have a look here, I will provide you with this link to the repository and there you can see that European Commission project that that is mentioned is only Clotilde. So, I see that here, this was published in 2020 so probably we didn't yet downloaded the text and and run the text and data mining. So that's probably why we didn't associate it to all the projects yet. But you can still do it manually. Any volunteer. Okay, let's give people one more minute, like I'm 55, one more minute. If no one has has anything to add, please feel free to to interrupt us later or raise your hand or add anything you want to make you in a regarding this exercise. In the meantime, I see that our colleague manuals that have been this has joined us. He is responsible for a tool that helps anonymize data. We will continue with with two or three more exercises and then we can move on to human all this right. So, yes, this fine with me. This is the presentation again. Oh, and also I mean, during the revision. We touched upon this issue how you can license your data and Emma was very explicitly describing and explaining what are the different conditions that are attached to its license of the creative commons which is a standard set of machine readable licenses that licenses that are being used. So, please see that together and then you can have an exercise on that. It's good. If you can also also go through the tool and try to license your own data following the different steps. Let me stop and let me go back again here. Is it something in the chat. Yeah, so I put the link in the chat to the creative commons tool. And also there is a question about the plans to add the funded project by the Irish Research Council. So I have no news about that is usually the funder that that asks open air to provide the monitoring. So maybe I'm not knows about the Irish Research Council or is it is it part of just era or is it outside just there. Yes, just there are there are members of just there as well. So they, so you will be able to find to find them under the just era ID, once it's ready. And then we can also check with the Irish funder to see whether they would like to have something, you know, separately. What are you doing with the rest of funders like European Commission and so on. Like the ERC or NSF. All right, perfect. So moving on, let me again serve my screen because see service screen. And this is a, this is a tool that you can use a it's provided by creative commons and it helps you understand how you would like to. License your own data so let's see the steps are six. And those are the questions as well, starting with if you know which lessons you need. I guess if you know this is self self explanatory, you don't have to do anything you know what to use and you don't need this tool so let's assume that I do not know and I need help with selecting my license right. The next question that they, I have to have in mind is if, if I want attribution for my work do I want others to site to acknowledge my work later. Yes or no yes anyone using my work must include proper attribution no anyone can use my work even without giving my my me attribution. Let's say that I would like at least to someone to attribute this work on me. Do you want to allow others to use your work commercially. Yes, no. Yes others can use my work even for commercial purposes no others cannot use my work for commercial purposes. Let's say that I don't want them to use it for commercial purposes right. So maybe I want them to use it for educational purposes and it's fine. Do you want to allow others to remix adapt or build upon your work. Yes, others can remix adapt or build upon my work or no others may only use my work in an adapted form. So, yes, I would like others to remix and adapt my works. So, because this will get you get more credits right since I will get acknowledged. Do you want to allow others to serve adaptations under any terms yes or no again. Yes others can serve adaptations of my work under any terms so they can serve, they can take their work, their work. In addition to my work and they can share it with any license that they want or no others must license adaptations of my work under identical terms. Let's say that I want them to license it under identical terms. Next. So we see here, and all this time this this was in parallel was being built the type of the license that that correlates that corresponds to each of the answer that I provided. And we see that for my answers that it seems that the correct license for me is the attribution so like for international. And it looks like this work rate must be given to me as a creator and adaptations must be served under the same terms so under this license. And if I want I can also fill out this form and by adding my, my information here. And while I'm doing this you see that this changes this is how others, this is a text that I can copy and paste and add it under my work for in a presentation for example. And this is how others can also serve my my work they can copy and paste this and acknowledge me in this in this format. Let's create a profile let's say oops, what did I do something appeared. Let me see the URL of my profile let's say it's that one. This is my profile title and work URL let's say that it's, I don't know where they can found my work right that's the work URL. And now I have, I have built this, I know what the license what exactly, what's the exact license that I can use that that suits me and how to properly. Others know how to properly cite me and acknowledge my work following this. So if I copied and pasted pasted it somewhere. Okay. Well, we have a question in the chat. Yes, so the other day you said that that commercial proposals include this publication in banked journals. So we need to add this if we wanted to be cited or used in papers right so the thing is, the publisher are commercials. So, if you want others to adapt and reuse your contents for a publication. Yes, definitely, you should not say that you do not want commercial use for the reuse. So, you should not include and see in the license, because this will not allow people to reuse for the commercial purpose of publishing a paper into a journal. So, in case, in case instead, and this is well highlighted in in the fact sheet from creative comments. I already gave you the link in that in the in my presentations but I can paste it again here. But for other purposes, for example, if I wanted to share the presentations of this course and say that I do not want others to reuse them for commercial purposes, then I can do that. It depends on what you are sharing. Of course, yes, it's correct what you're saying Emma that you have to check with the publishers options, what are the licenses that they assign to the papers and to the data that they want you to assign to the papers or data. If it's a commercial right and if it's not just the positive. And now I think you can put it would be good if you can also go through the tool and choose a license for your data choose the data first that you might want to license and then provide a license for them. And one option that you could do is this. And the second option is that you can check out the researchers license to know how to reuse their work. That's also a good option if you can search for data and find how find the license associated with those data and tell us how you can reuse it. If you have the link in the chat. Yes. I put the link to this tool. Let me paste it again because I already I also gave some other links. And once you're done, please raise your hand. So we can give you the rights to let us know. If you can speak, of course, give you the rights to have a microphone and then you can let us know which option you chose and how what are the steps that you followed. What was the outcome. Let's take five minutes for that. I see a question in the chat. I don't see raise the hands. How limiting is the share alike attribute. I mean if we can allow commercial use but then we added the share alike isn't it contract victory. Yes, so the share alike is something that limits the license that can be applied to derivative works and open the creative commons framework provides you with a list of a table where you can find the licenses that are compatible with this so I can I can find you this this table and link it to you. So when you put the share alike then you you basically limit to what the others can the license that other people can apply to derivative work. Not all the licenses are compatible so you should also be aware when you reuse something that is under the share alike that the license needs to be provided accordingly. Then we have another another question how it is expected to give credits when someone uses for example an image. In our case we made available some images for the people to reuse or adapt we are not sure how they keep the attribution when adapting. Then we included guidelines with the explicit text they should be using. Are we doing it wrong. So I'm not sure is the text related to the lies to the. Yes, I have opened that link. Okay, okay, so please take it says that if you want to use the figures, please follow the guidelines if you want to use the figures without modifications use the figure with the current license and authors. Let us know in which project we are using them so that we keep track of their adoption, we will appreciate it. And then if you want to adapt the figures, if you need to adapt the figures methodologies to your particular use case for the lot repository is the one the first guideline. The CC by NCSA license with the statement derived from lot methodology, and so on. And third is upload the figure to a new folder in the folk repository with the name of the project and generate the request to the lot of posters that we can keep track and adoption, and so on. If you're not doing it wrong is is the first reaction to your to your question. But it is true that you cannot know always if people are doing are following those guidelines or not. That is true. If you find a, like, for example, if you can if you come across this image without having been properly cited, you know acknowledged, then you can, you can report this. And there are some cases on the on the website of creative commons where things when when misuse of creative commons license content has been found and has been reported. You can always report it. But unfortunately, I'm afraid at this stage I don't know if Emma has to add anything. There is no way to know if they are following the guidelines or not. The thing is, as always in this legal framework, if you find someone reusing your material in not in a proper way, then you should ask them to. But it's this happens also with with open air. We find many, many people are using our material without following the rules. But this is, yeah, something that you cannot skip, I guess. Yes. There are a few raised hands, by the way. So maybe, maybe give the floor to Jean, Martin, yeah. Can, I think, yes, there you are. Hello. Yeah, I'm trying to have the right to share my screen. Okay. I don't know. Am I supposed to do so? If you try now. Does it work? Yeah. So actually, it's, it's some data set that doesn't exist yet. So the URL blank, but I basically I followed your advice for choosing all the steps and which is quite nice to have this rich text automatically generated here. Okay. The only thing that it's not exist is the URL, right? Is what? The example is totally, is it fictional or not? Yes, it's fictional. Okay, perfect. Yeah, yeah, it's fictional. So we are in the process of designing the data set, but it doesn't exist yet. Okay, good, good. Perfect. Thank you. You're welcome. And just one question. So when we, if we have several authors, should we, should you write them here? Yes, yes, of course you can. Right. It doesn't matter how you write it. It's not like in citations, you know, the standards, the MLA or Chicago or Harvard, it doesn't follow this kind of rules. It's free text so you can use comma or and or as you wish. Okay. Thank you. Thank you. Anyone else? We have another, another hand raised from moods. Okay. Can you open? Can you open the mic now? Yes. Hello. Okay. So I don't know how to share my screen. Oh no, now I can. Okay. Yes. I created this one is a real thing. I do have a question though, like this website automatically assigns a license. Or do I need to do something else? No, it doesn't. It helps you find the correct license. Okay. Then you need to copy and paste, you know, take this information and use them in the context and in the recent in the richest output that you want. For instance, if I copy this and now I put it, so this is a, it's not work exactly, but it is work, but it's like a, I did a book on robotics for children. And it's published online. I don't have any license on it. But there. So I can just put this in the website. Copy the HTML. And then my work has this license. Exactly. Exactly. Yes. Yeah, that's kind of busy. Yes. Okay, I didn't know that. Okay. And then if I want to undo it. So for instance, if one day I want to publish this book commercially and I find an editor that is willing to do it. Then I can this because I put non-commercial share a life. Then I can just remove it or not. Well, that's a good question. I'm the author. So, like, I can do, I can decide to change it or no, because I prefer to do anything. The only thing is that you cannot add other people that would have used your work. For example, you cannot read this. Like the past, you cannot read things that have been done in the past. You see, really like that is really that as easy as copying this in my website. I can just remove it and nobody's really going to know. I mean, I don't understand. Yeah, but someone may have used the before you. Okay. Okay, so if I, if you put it online today, I've downloaded tomorrow and I use it. Yes, something and then you remove the license. I am not aware of it. Yes. Yeah, okay. I understand. Okay. Yeah, you should ask some some legal experts. Yeah, that's my problem with licenses always that they are very confusing to me and I don't want to get in trouble. Yeah, the thing is be aware that if you do not provide a license, then it's the trouble because people it will, you, you are not protecting your work. I understand. Yes. Yeah. This is totally unprotected because I just shared it publicly. Yes. Like, take it like. Yeah, because it's because it's for children and actually one as many people as possible. But we've been talking about the possibility of actually editing this book and selling it. So then it's better if I don't put anything. No, I think it's good if it's non commercial the license you picked so you will be fine if you also talk to the editor, the sort of the publisher that you have been using this license so you might, you know, get to I cannot really sell the book. Even if I don't need to sell this book, it's going to be to donate it to some cost, but I don't. I'm not sure that if I follow my own life. Yeah, yeah, the thing is that this license is for the others to reuse your, your work. So, yes. So, but, but you should ask a legal expert or maybe we can check on on creative commons website if they have an FAQ about this, it's probably they do. So this is for the others to reuse your work. So then I could, I could publish you. Yeah, I'm not saying that you could but probably and you have to check in the creative commons framework or FAQ I can go and check for you if I mean if if she retains copyright she after you know the contract in the contract agreement with the publisher then she will be fine but yes, I will say I will agree with Emma. Okay, so this is just a website where you can see the book, but you cannot download it. I already kind of protected it. Okay. And then you can download some stuff that is fine just to build a toy and something like this. I can just add here this text and then that could be covered by the license. Yes. Okay, it would be better if if if you add the license done leaving it like this. Okay. Okay. And well done. It looks like a good book. Thank you for sharing that with us. Stop sharing. Okay. Perfect. Okay, thank you. Great. All right, so let's move on. Oh, do we have any raised hands. No, I don't see any. So if we move. Just one, while you you share your screen and the people that are. Okay, so I see no no other. Maria and the others are back to attendees but if you are a speaker for this course and then you cannot raise your hand. So just open your mic and talk. I see you put back them in attendees way. Okay, just go on. No, I was about to give the floor to Manolis, since we are in this process, you know, we, we have been talking about metadata licenses, these are all things that we are doing throughout the process and analysis of data steps. And it fits well if we if Manolis can introduce us to to anonymization them. Manolis. Thanks a lot. Sorry, I was searching for the window. I'm happy. Thanks for the invitation. We'll be happy to talk about amnesia and data anonymization. Let me share my screen right away. I understand that you did the rest of the presentations are more hands on, but because anonymization is a new process and I think there's need to sell for some context and background. I will focus on giving a PowerPoint presentation and there will describe what amnesia does. Maybe we will have time for a very, a very quick demo or I would be happy to have full demo some other time. So, I'm an old star of it is I'm a researcher in Athens research center, and I'm leading the development of a data anonymization tool called amnesia, which is offered to open air and it's available in the use catalog. Before going into the details about amnesia I'm going to talk a bit about anonymization. You all know GDPR. So the introduction of GDPR cleared the ground to define what is anonymous data what it is not. Now the users of personal data is regulated and you have, you can use personal data only if the law permits you or the contract you make with person permits you, or there's explicit consent. Also, you can use for research data that have already collected for research purposes without consent. Now, the users, even when you use personal data, there are the limitations on and several overheads on how you use them. For example, if you use the data based on consent, then consent might be withdrawn. You have to take a lot of security measures to prove that at any point you're doing what's ever possible to protect user data. If you use it for research then you have to have some internal processes that assess that it's that the data are correctly used that there's some internal oversight that it is used only for research purposes. Also, data is hard to set with third parties for research, but without consent and sharing with now distributed environments of data processing is something very common. So what anonymization does it, it unlocks the potential of the data by removing the barriers of GDPR when we want to use data for research or marketing or some kind of knowledge extraction process, we're not really interested on the personal identifying information, but we're mostly interested on statistical properties on patterns on hidden knowledge artifacts. So, the idea of the anonymization is to remove the personal identifying information and preserve the interesting knowledge in the data. Before going more into anonymization, I would like to make clear two terms in the way they're defined in GDPR, because when we use the term anonymization in everyday language, we usually, and in everyday practice, we usually refer to what GDPR defines as pseudo anonymization. The pseudo anonymization is the removal of personal identifiers from data and the possible insertion of an arbitrary identifier identifier, and the main characteristic of pseudo anonymization is that it is reversible. You can map the anonymize using some external knowledge, you can take the anonymous data and retrieve the identities of the original persons. This can be done by an arbitrary identifier, but it can also be done by combinations of unique identifiers, which we call unique combinations of descriptive information, which we call quasi identifiers. This information can be the date of birth and the zip code, for example, if someone has a unique, which is very common also unique combination of zip code and date of birth, when this information can be used to reidentify him. This is called pseudo anonymization and pseudo anonymized data remain personal data and the data curator or the data owner have to adhere to all GDPR limitations. But if we do anonymization in the sense of GDPR, true anonymization, then the anonymization process is an irreversible transformation of the data. There is some kind of guarantee that anonymized data cannot be reverted back to the original data. And this is what amnesia does it and what we're going to talk about here is the transformation of the original data to an anonymous form where there guarantees that this transformation cannot be reversed. The motivation for anonymizing is that if we, if we do real anonymization of the data, then the data are no longer personal, so they fall outside the scope of GDPR and you don't have the limitations of GDPR. Anonymization provides a statistical guarantee on the data transformation. And this allows the data owner to prove that see where he has taken all measures needed to protect user privacy. Apart from the GDPR, it actually gives a tangible guarantee to it offers a tangible to guarantee to users for the safety of their data. So I told you what's the great things that anonymization does. Let's see the its limitations. This one way transformation by definition loses some some important information. So, for example, if you have the date of a birth of a person, then you might have to replace it by the year of birth or by the decade. So this information, the accuracy for formation is reduced. So that's no longer identifying. This might be okay in some applications in some others, this reduction in the data quality might be unacceptable. GDPR and makes a clear distinction, as I told you before, between anonymized and pseudonymized data. In practice, this clear distinction has some gray areas. For example, even if data transformation cannot be reversed, maybe it can be partially reversed and some information can be leaked. There's a discussion in the working group where it identifies these problems. So it falls always to the curator to make some decisions on these great areas, what's adequate protection. I would like here to comment that the methods we present here are usually beyond what's usually used in practice. So I think there's no doubt that someone can justify that it has taken, you know, the best state of that available methods for protecting hierarchies data. So part of the gray area is that we offer some statistical guarantees that the data transformation is not reversed. Privacy is a social notion. Sometimes some information that can be leaked or infer might be enough to break the user's privacy, whereas every statistical guarantee holds. This is a more philosophical problem in a way because if according to GDPR you have taken all reasonable measures, then you have adhered to the law. And because till now we lack experience in practice, there's no years of usage of these methods, it cannot be fully automated. You need some user input and decisions have to be taken. So what's the good on anonymization, what's its limitations and then when to use it. Anonymization does not replace crypto encryption. It's complementary to that. Anonymization is suitable when you want to give your data to a party that you do not fully trust. This is a good example of this is if you want to publish your data publicly. So you don't know exactly who will get it or give it to a large audience. So you're not going to sign an NDA you do not have. It's very very difficult to take all the measures that GDPR requires so instead you will reduce a bit of the data quality and transform everything from personal to statistical data. So, if you're a practitioner and you want to give it to the research community. Anonymization is a good example. When you want to do open publications, when a reduction to information quality does not cause a real problem so anonymization is the simplest method, in terms of safety and regulations you do not have to take all the precautions that GDPR requires if the data are anonymized. And after this summarization of what anonymization means I'm going to tell you a few things about how amnesia works and the first thing I'm going to talk about is why you choose amnesia and one thing that we put real effort is to make it user friendly. You use it, you will find it even despite our efforts, a bit of complicated tool we believe that it's one of the simplest compared to other very few other tools that exist that do relative stuff. It is a complicated process as the more users feedback we have the more we'll be able to automate things and make it even user friendly. So amnesia works locally, we do have an online version that you can use it for demo training purposes but we have some limitations on the data files it accepts and also it's not a safe scenario to use an online service to anonymize your data. But we have also available the application which you can download to your local premises and anonymize the data locally. So it's part of the users to customize the solution and decide which information loss should be chosen there are many ways to lose information to make the data anonymous, you can put different weights to different attributes of your records. So the data are optimally anonymized for your analytics. We do have, we do offer algorithms for complicated data like set value data these records that have a collection of arbitrary length of events or attributes like the bill for retail store. And we have a special form of canon imiti for that came anonymity and where the only tool that does that. And finally amnesia. You can use it as a completely standalone application with the user friendly interface, but you can also use just the backend engine through a stopping. This is very useful if you're developing your own information system and you just want to incorporate the anonymization engine of amnesia through your own interface. Okay, these statistics are one or two months old, they refer for the 1.5 years or two years that's amnesia site was up we had 32,000 visitors, more than 100 page views and 2000 unique downloads. And in general, it becomes more popular as time passes. We offer care anonymity care anonymity, we deal with objects relational data sets we have algorithms that scale to very big data that do not fit in main memory. We prefer we offer rest API. We use it with information systems, and we have it up for two years so the main components are no longer beta bugs have been removed and it's quite stable and robust. Let me give you a quick example of what can and it is and what amnesia actually does after all this description. Imagine that we have this table where it has very simplified medical records of patients. This is a pseudonymous data, the names have been removed which just have an arbitrary ID, and then you have the zip code dates and the nationality of a person. And for each person will have the diagnosis. So, these three attributes, the zip code dates and the nationality are quasi identifiers can act as quasi identifiers because it's information that that can can easily be retrieved from other sources about a person. So, this pseudonymized data set is not safe because if I know for example that john is American and he has a zip code of 13 or 68 and he's 29 years old, then I can uniquely identify his record and and see that he has been diagnosed with heart problems. So, pseudonymization is not safe. If you can link, you have additional descriptive information. Now, what amnesia would do it would transform all the quasi identifiers in a way that it would create groups of K records where the values of the quasi identifiers would be identical. The way it would do that would be to replace specific values with more generic ones we call this process generalization, and it would do this as as much as needed as to have every record belonging to a group of K records. So, what in this example what he did is it removed a few digits from the zip code. It creates three eight categories, less than 30 30 to 40 and more than 40 and completely removed nationality. Now, if I know that john has zip code of 13 or 68 and he's 29 years old. I cannot identify which of the for first record belongs to him, and he might have heart disease he might have viral infection. I can no longer decide exactly the diagnosis has been given. To do this, there's need of some user input on how the replacements of specific values with more generic ones will happen. If we have some continuous domain like numbers, then this is simple simpler we'll have a total order and then we can define different ranges. We can say that the exact numbers should be can be replaced by steps of 10, or thing the gears, and then we could have, if we had a larger domain we could have more levels here, saying that if these steps of, of size 10 are not enough do steps of size 50 and then 500 and so on. The user has to provide what we call a generalization hierarchy, which shows, which instructs the algorithm how specific values can be replaced with more generic ones. The work of the algorithm is to choose how much it's how much its value should be generalized as to guarantee the desired anonymity and it will generalize it as little as possible as to achieve the desired privacy requirement. Okay, since it's the first, I think lecture I'm not going to get into the details of came came in committee which is used for high dimensional data, the idea came anonymities that would not provide protection against the adversaries that might know all the quasi identifiers, but because the situation where the quasi identifier can, can be thousands, and we will limit the potential knowledge of an adversary saying that I'm going to protect against adversaries that knows three or five and then we make certain that every combination of M quasi identifiers appears k times in the anonymized data set, and this technique is suitable for sparse high, high dimensional data. So the limitations of the tools, the tool, which you should be aware of the following. The first problem is not actually about the tool is that the concept of anonymization is new so users do not know what to expect from the tool, which is makes using it a lot harder. Another limitation is that amnesia does the anonymization, but it cannot provide guidance on how to define the main parameters of the privacy guarantee, for example, in k anonymity, it cannot decide on the K this decision has to be taken of the user. This is based on practice. If people ask me, I just refer to what the statistical authorities usually do in Greece, they use groups of size three, when they publish aggregate results not k anonymous results and he was that I think users groups of size five. And the last thing is that k anonymity and the other guarantees of this family protect against certain types of attacks. They're not safe under every scenario, but this falls on the problem of gray areas between pseudo anonymization and anonymization. I think for most common applications k anonymities way beyond what's happening now in practice. That was all. I think I've taken up my time. I don't know. Alice, do I have a few minutes to do a very quick demo from. Yes, yes, you can, you can take five minutes to do like a fellow to show us a fellow. In the meantime, as you are preparing I think you have to answer your screen and serve your screen again because. I will stop sir and sir. Yes, because I'm not sure because you will be using the browser or what you will be using will be visible. In the meantime, there is a question in the q amp a that I will read out loud for you. Is it possible to choose which variables you might want to keep for example if you wanted to keep the nationality because it was necessary for further analysis. This is the whole idea for guiding the anonymization process that you can choose which variables. Which variables will be preserved at the end and you can even choose which variables will be preserved better than others and all these choices are up to the user. I'm going to show you quickly in the demo how to do that. Now you, you see the, the first screen of amnesia, I guess everyone. I'm loading a data set. Amnesia accepts the limited files. Sorry to interrupt you but we, at least in my end I cannot read them they're very tiny that the letters. Can we zoom. I don't know how to do that, how to zoom, because in my screen it's the whole screen I don't know. Now it's better if I resize. Has something changed here. Yes, it has. Okay, better, better. Yes. Okay. In the text files, you can define with the text files where it's column is using a delimiter to be distinguished from the rest of the columns here it's a common delimited file. And I mean easy I guess, then presents a preview of the data's to understand it. I'm just checking some of the data we simply remove them so if you want to should anonymize the data would just remove the direct and the device as I did it and the data is already should anonymized. Now amnesia has guessed the type of the data, but this is just a guess it uses only the first the first few records for this guess it might not be correct but it's the work of the user in this case everything is correct so I go on and this is the data set as it has been loaded by amnesia. It's again simplified medical records where here you have all the codes of the diagnosis and you have the date of birth and the mental status of its user. Now, what we have to do is provide the generalization hierarchies so that the algorithm will know how to replace specific values of more generic ones. We might use predefined hierarchies like this one, but I have made in advance, and it's based on the values that appear in the data set, because this is a real data set. So we're really interested if someone was single or married and people would answer more eloquently saying no divorced widowed or maybe did not answer at all so we put that the signal and amnesia would also allow you to create a hierarchy based on the data. This works a lot easier on continuous domains here we're going to do this for the date date is one of the most complicated domains because it does not follow the decimal system but we have to count in terms of years in terms of months. And then we group all years, all groups to other groups of size three. So here I'm instructing amnesia on how to create the different ranges in the in the dates domain. And it will create this hierarchy here we have ranges in terms of weeks and weeks are grouped in three month periods three month periods are grouped in two year periods and from now on. All, all records are grouped into nodes that have three, three sub nodes, except for one that might have the leftovers. So, then we can use this information to anonymize the data, we have to instruct the algorithm to use the big date hierarchy who created with a date of birth, the marriage hierarchy for the marital status field. So here we have k anonymity with k equal to three so that's three anonymity. And here I get this thing. This thing is a visual lattice that represents all possible solutions for anonymizing the data set. So here we had two quasi identifiers the date of birth and the marital status so these numbers reflect how many times, how many levels in the generalization hierarchy, have we gone up to generalize its attribute. So the 4.0 says that we have generalized date of birth four times and we have left marital status as it is. We have generalized date of birth three times and we've left marital status and we have generalized marital status once. Now, the red nodes represent solutions that do not provide k anonymity and the blue ones represent solutions that provide k anonymity. Now, with respect to the question about if we could save nationality. Instead of some other attribute. If we're using a different node. We can choose how much information we're going to use in different field. Here we preserve the marital status completely, and we lose a lot of accuracy in the date of birth. Here, we lose information in marital status and we preserve a lot better, although perfectly cannot happen. The reason that we see solutions that do not provide canonimities that sometimes canonimities not achieved just because of a very few records, for example, here it's only 1.2 of the records that violate the canonimity in this red solution. So it's this solution where the date of birth has been generalized three times and the marital status ones, and this is not a solution just because of 1.2 of the records so instead of choosing a more solution that would generalize all the data more to get the desired anonymity, I could instead just remove this 1.2 of the records and then my result would be k anonymous. Let me. Okay, I'm glad to show you. Right now, as you see the exact date of birth has been replaced by two year period, and the detailed information marital status has been replaced by singlin married with more generic answers than the original. And this is how amnesia works. So that's all from me, and I would be happy to respond to questions. Yes, about the uploading. As I mentioned before, there are two versions of amnesia. We have the online version. The server is in the U but we do not propose to use the online version for anonymizing data. What we propose is that you download this information. The amnesia I'm showing to you is a local, a local application, the server and automatically in the backend of your own computer so you don't upload it to any other place. Everything happens at your own premises so you do not have to worry about where the data goes. And this is the way we propose that amnesia is used. The online version is just to, you know, play with a dummy data set and see how amnesia works. And it just for demonstration purposes. Thank you very much, Manolis. Thank you for answering also this question. I saw that in the Q&A. I don't know if there are any questions for anonymization and amnesia in particular. I don't see something now, but okay, so let's continue. Nothing rises so we can take it up to next. Okay. Okay, thanks a lot, Ellie. Thanks for inviting me. If anyone has wants to use amnesia or has additional questions, feel free to contact me anytime, through email or anywhere you want. Thank you. Okay, so we saw how to process and analyze our data, yes, but where to deposit our data. I think this, this tool has already been mentioned so it needs no introductions. We can check that online together now. Yeah, we'll have to stop sir and sir again. Okay. And let's see. Have you, has anyone of you already used Re3Data? Re3Data is a registry. You can find information, indexing and holding information about data repositories, and it actually is very useful to for many reasons, which we'll see in a few minutes. So this is the first homepage. You can add any keyword that you want to search the registry with. You can also browse by subject by content type as I mentioned earlier in the presentation in the session. There are different types of repositories by country maybe if you want to find something that is closer to you maybe your institution if it provides data repository. I don't think, well there is an API if you want to, if you're building any application that you want to use it but I don't think it will have a big impact for you, the rest. So let's see, let's say that I want to search for somewhere, I have a data set and I don't want, I don't know where to deposit it. So let's see that I'm working in the environmental area. So I just used the keyword environment and all these results like 268 results appeared. I can refine my search, I can filter with all these different criteria that you see appear on the left hand side. This is very useful because one of the things that Retreat Data does Retreat Data is also aggregated by Opener so whatever you see, and most of the information that you see here you will be able to find in Opener as well. So the very interesting thing and important thing that Retreat Data does is that it provides you with a categorization of the information about every data repository. For example you can find the data repository and you can refine it by subject, you can refine it by country, say that I want Greece, there are three. Let's say that I want to refine it with the API that it uses, the data license that it uses. Now I have already reset it. The data license that it uses or the database license that it uses, if I have to comply with something with those licenses I can quickly find the repository to do so. With the metadata standards that I want to use which repository supports which metadata standard, I can also refine my search with that. And there are other, as you see, many different criteria that I can refine my search. There is versioning mechanism integrated with the repository and so on. Let's say that I want to refine my search first. There are also the certificates, so I forgot to mention that if the repository has a certificate you remember that was also mentioned and we had a discussion during module two, and about how this is, you know, how certification of repositories is also an important component in the RDM ecosystem. And so on, so let's try to refine possibly by country, so I will click Greece, and I will have these three options, I can check and compare what the three of them do. So I can check the subjects, and if those subjects correspond to my research, you know, let's say here is a kinography, but I don't see a kinography somewhere else, right, neither to that one nor to the last one. So let's say that I want something that my data have to do with a kinography then I can click on this data repository and select this. But this is not the, this is only one of the key information that I can check and select my repository by. So I can see the content types it this for example comb base index is standard office documents software applications and different statistical data formats, the other index is plain text structure text software images, etc. And so on. And there are affiliated countries, sometimes the data repositories are part of the larger infrastructure larger research infrastructure, like for, for example, I guess. Yes, so see data net is part of pan European infrastructure for ocean and marine data management, for example. So it, it, it is for use by not only Greek, let's say researchers but all the rest countries of the region, class Russian Federation for some reason now I don't know. And you can also check the brief description about this data repository here. And another interesting thing, apart from the filtering is this little icons on the, on the right hand side up on the record. So there are icons to help you get more insight of other internal processes and policies that are in place in the repository, like if the repository provides additional information about service. So that it helps you with your data management plan to know how many times your data are backed up, how for how long they are retained and so on. So, so you have so you can check that these policies it has this information for you. If the research that I proposed provides restricted access to its data. It also has this option. See, it's, it's, it's orange if it was gray like this to this means that it's not applicable but whatever is with a font like a vibrant color. It's, it means that it's applicable so it has a you have can have the possibility to have restricted content in it. And the terms in the official lessons of the data provided by the research data repository so you can check them. Yes, what another thing that is provided for this specific repository is a policy where you can also, you can also go to check things. What is not provided is that it doesn't have a persistent identifier system. So your data will not get a DIY or any other identifiers as we saw during the session three. The date this this is data repository is neither certified nor supports a repository standard. So these are again very useful to check before selecting your repository as it will erase the fairness of your data of how your data will be how far your data will be at the end and when you publish them. Okay, so enough for me. I think that it's now your turn to explore and find a repository for your data I don't know if Emma did you have the time to maybe add this. You're amazing. Yes, you had the time to do it. Okay, so you can follow the link that Emma said in the chat and find the repository to add your data and raise your hand so we can. So you can let us know and you can share your experience by trying to select the proper repository for you. Just couple minutes like two minutes for that. Try to think if you already have a data set that you could appeal somewhere. That's a good scenario. Ali, there is a comment in the chat that there are so many repositories that it is hard to choose. So that's why you have to refine your search. Find out what is the most important criteria that you want to refine your search for you and what you need and possibly what the research data management policy by your funder or institution needs and refine the search following this criteria. Okay, give it one more minute. Please raise your hands if you want to share your screen and share with us your activity and free free data, or if you have anything to say or comment. Using the search with objective criteria can lead to slightly out of focus repository that might impact the base and would you like to do to share with us what you're doing and discuss that whilst we're viewing your screen. Possibly. John, let me. Okay, now you can open your mic. Yes. I'm sharing my screen now. You can see it. Well, I have so many windows that I have. Search for it. Sorry. Are you seeing my the book that Julia. All right. Okay, sorry. It was not the right window. I will try again. Okay, I don't know why it doesn't work. Okay, well, I'm trying to share my screen. I can explain. I, I'm trying to search a repository to to to deposit the event that I said that I was mentioning earlier. Again, it's quite a specific data set with visual data, like images and videos, but I can't. I can seem to find the proper data set for like repository for this and I just type in like a keyword like that is very generic like computer science. And that leads to many data sets, of course, and I've tried to refine by countries and keywords. And then I'm led to my experiment that you might be able to see now. Because I'm also certain what your work. I'm following your steps at the same time. Yes, my experiment. Yes. We're sent since it's the work. Yes, it's for workflows. I assume that this is not what you want, right. Sorry. Say again. It's not a workflow what you want to publish or is it a work. Is it an image of a workflow. Visual data like images and videos but in a in a nonstandard format. That was acquired with a nonconventional sensor. It's a nonstandard file format. Yes. Let's see. Well, that's not a problem now I guess you have to consider first to convert it to a different format and search for a repository but okay it doesn't affect this. Yeah, I'm sorry it was not very clear it's not only about the format it's also about the sensor the type of visual sensor. You had like some infrared image or I don't know satellite imaging or or medical image so it's it's a type of sensor that is. I don't know that is not the same as like a normal photograph or videos. I see if you click content types images. Yes, images. So I'm doing a different I'm following a different path. I'm starting my search with using the same keywords computer science, and then I refined by content type images. So we want the content type to be images, and can you follow this path, instead, like, you know, playing around with the criteria to see. You have to reset this this my experiment first. Okay, so you have 175 let me check was it 175. Yes, now if you refine by content type images, since the key thing that you're looking for. And then maybe countries which country is your with your affiliation. I'm in France. And just perhaps the Chisholm project involved some partners in Spain, Greece and Switzerland as well. Check one of those countries like check France for example. Okay. Okay, the program the service heritage documents and parses research association, students skills in human sciences and computer science. So it can be that this is. Actually, this is a very good example of what I was trying to explain like it's this this might have the older the create the criteria objective criteria that I'm looking for. But it doesn't seem to seem to be the main purpose of this repo that seem to mainly deal with humanities and social science. So what I wanted to express is that if I ever deposit my data set on this, maybe it will not have the same visibility as if I would, for instance, select some other country, like some, I would like to host my data in Europe for instance, maybe the, the referee data would lead me to some, some, some repository that wouldn't probably be the best one. I don't know. My concern in terms of visibility you you are saying. Because like bibliotheque virtual humanist is like a humanist is is exactly the like, sorry, the subject that was here like humanities and social science. So it looks like this repository that the main target for this repository is for like humanities and social science but of course it. It also qualifies for image databases but it's not the main purpose maybe it's a purchase. So in that case you can always deposit your outputs in the model. Yes. And also, check what we have in open there, because there it's guaranteed that you that the repositories will get all these aren't in opener so I wouldn't, I wouldn't, you know, that skeptical about it. But this is the, I agree and I agree. I understand I agree that this is more for humanities oriented first. So it seems that it's not for you that's fine. You can either again refine your search or use the no dome to, to deposit your, your data. Okay. Yeah, that's what I think as well. Thank you. Thank you. I may add something here is often that specific subjects and data don't find the right repository. It is easier to find the repository for those community that have that are have a strong collaboration, and that they have the need to share their data in a proper way. So sometimes for a specific data sets and specific community. You will not find us a repository, but then you know that the nodo is is then the right place to put your, your results. Yes. Okay, so I will skip the nodo because I already I understand that most of you if not all of you have a prior experience to that but if you have any questions why depositing or if you have any questions regarding related identifiers how you link other deposits that you have in the nodo with a record that you're uploading now please let us know offline and we can we can support you with any and different that you have. And I want to go to I want to go to there are two more tools I will skip all the others that are very important and we should all pay attention to before we finish this is this module. One is, okay, we've done everything how how fair are we at the end so we attached, you know, we gave licenses to our data. We assigned the proper method data standards we followed all the guidelines that open for practices are saying, and how do we know how far our data. It's good if you could follow this link as well yourselves. Let me see. Yes, it's in the chat. Okay, okay great so I will also go there and show you how it works. So screen services. Okay, so this is, this is a tool created by ARDC many tools are currently it's it's trending. Many tools are trying to develop, you know, build on some methodologies that are out there about for assessment and measure how fair our data are using different technologies and different workflows. This is one of them this was developed by ARDC the Australian Recess Data Commons and it's a very easy one it doesn't. It doesn't need you to know any technicalities or it doesn't need you to, to know coding for example because there are some more technical tools out there that can measure in a programmatic way measure fairness, but this is a very, fairly easy and also effective efficiency. All right, so what we need to do is follow similarly to the licensing tool follow the questions answer the questions and then we'll see what what changed right. These are the questions does the data set for findability and therefore the first we have does the data set have any identifiers assigned. Let's say it's not only yes or no it's specify also what type of identifier is that, for example, let's say that we have a global inside the boom unique identifier like a DIY right. And you see that this bar and when it goes at the end it means that you're 100% fair, so it will start building up until we we see what is the level of fairness that we reached. Is the data set identifier included in all metadata records files describing the data that to all our metadata include the identifier yes let's say that yes I made sure that in the metadata include the PID as well. How is the data described with metadata. There's brief title and description, let's say there you see there are more detailed answers there are more brief answers so according to how we've, how we've applied what what we've. What are what practices we've applied during the research data management activities, we select. What type of repository registers them at the data record in. Let's say that it's in the generalist public repository approach I uploaded in Zenodo which is a generalist public. And I'm done with findability. So for accessibility how accessible is the data. Is it fully accessible is it fully accessible to persons who meet explicitly stated criteria like for ethics for ethical issues and so on. Is it embargoed is it access to metadata only let's say that it's unspecified conditional access. Is the data available online without requiring specialized protocol tools once access has been approved. Let's say that it's. With that require specialist protocols yes it doesn't require any protocols. The data record be available even if the data is no longer available. So, even if the data retention period and data preservation period ends will I still be able to find with the metadata record. What this data set was about yes. What file format is the data available in. Is it in the property proprietary format like, like it was. The example of. Of your of your colleague. Yes, before. Is it an instructor open standard machine readable format is it an instructor open standard machine readable format. It depends what standard let's say no let's use. Mostly inappropriate format. Describes the types of vocabularies ontologist tagging schemas used to define the data elements. So what I'm using the ontologist I'm using the tabularies are described are not described and how. Let's say that I'm using standardized vocabularies without global identifiers. How is the metadata linked to other data and metadata to enhance content and clearly indicate relationships. The metadata, the metadata record includes your links or metadata as you presented in machine readable format. So, let's say that there are no links to other metadata reusable is there a license which of the following best describes the license user tries to touch the data is there a license is it standard. Is it the license like creative commons is the standard text based license. Is it non standard text based license no license at all. Let's follow this paradigm like for the book, which has no, which has no license. As proven information has been captured to facilitate data reuse. Did we make sure that we keep track of how data evolved during time after all this processing and the analysis. And can we ensure that we can go back and we can review all the different versions. We can go back to the to the raw data to the first set of data. It's fully recorded in a machine readable format fully recorded partial recorded no prominence information is recorded let's say it's fully recorded in a text format. We see that we first see, according to to fair metrics that are used for this tool that this find ability in terms of find ability. We are. We are fairly good. We are here, the bar is here intense here so I would say like more than 75%. In terms of accessibility we could have done better. By, by, by, by seeing that we understand what we could change in a maybe if we introducing a new version where we change how access is refined, how access is defined in the in the record in a data record. We didn't like it. It's like 20% or something 30% here interoperability again very low and the reusability similarly to the accessibility. And we have less than 50%. We are less than 50% fair, which means that we have plenty of room for for improvements. I just ask you to go through this now, but it's very good if you do it's a very good tool to have to understand how fair your data are. So, but what I want you to do before we end this course is. No, no, we have two more sorry I said to but it was three. We have very quickly. Let's see how you can probably what are the tools that can help you understand if you comply with the publishers scientific publishers papers policies. One is this. It's, I will just. Okay, so if we go to Sherpa Romeo, it was also mentioned in module two. Is there a question. No, it's me. So if we go here we can say we can search for a journal title that we want to publish our paper. And let's say I want to search for ACM the Journal of Emerging Computer Systems I can see all the basic information like I to the ISN I can go and click here and and check the publisher's website who is it published published by the journal set by the Association for computer for computing machinery. And there, this is very useful to know the various submitted version accepted version published version it shows you here what you can deposit in an open access following an open access path and open access route as Emma has indicated in her course. And these, these are just the different terminologies used to indicate and to note pre prints post prints and the final the PDF version. So this is the pre print that's a the submitted version the pre print before any changes or any review has been made the accepted version is with the review so it's it's considered it's considered as a postprint. And the published version is the PDF. So the exact. The exact version that you see on the publisher's website you can take that downloaded directly from the website and upload the deposit in a repository. With open access policies. So here you can also see what are the different. What are the other elements that you have to have in mind. So if you want to, you're free first of all you're free to publish following all of this. Different versions like you can publish your pre print and there is no embargo attached to it so you can immediately provide access to it, you can publish the accepted version of your of your paper. And again, there are no no embargoes attached to it and you can publish the PDF, the exact PDF, but there you will have to pay in order to have immediate access to provide immediate access to the paper. And so there isn't a cost that we call them article process in charging an APC cost, which you pay, and you get to the published version, immediate access. This is the same. The license that you can have that is attached to your published version it's CC by. So these are these are very useful information you can check that again, these are very useful information that you should know before publishing before publishing your paper. And after so that you cross check with what your founder or the institution wants you and guide your selection to where you can publish because if if the publisher that you want does that you have in mind doesn't doesn't satisfy open access needs, and your founder for example do want you to publish open access papers then the paper that you will end up publishing in a non open access journal, it won't be counted in the reporting. So these are things that you have to have in mind. Yes, how to comply for both the founder and the end and the, the publisher on the publisher for the publisher policies. If you publish on that particular journal. I don't know if Emma wants to add anything on that before I move on to the next. Okay, great. So what I can say while you move to the next is that if you have any problems and you're not sure about the policy, then you can refer to us. Or your, your node, because open air has no ID in every country so you can find the list in our website so you can refer to them, or to me or Ali for any problems you have or support. Of course, yes. Yes, please do. And now I added here the test area of of Argos where we keep with what the environment where we test new things before we push them to the production. Before we push them to Argos that you see here. And the reason why I've added that here to you and this is only for and solely for the purpose of this course. And it's, it's useless otherwise for you because you won't need it you will need Argos that open a debt to you from January, to, to use in order to create your data management plan for following the Chester template. The reason why I added that here that the website is because I would like you to, can you see my presentation. Okay Argos. No okay stop. And, okay, because I would like you to go to the link follow the link that I added in the chat and follow like to do the following. Start the DMP create a DMP insert with a colleague at their email and send the invitation to your colleague and please raise your hand when you're done. So you can let us know and show us that you have received you know my colleague has received this email and now I have access to to his here DMP. The second one is complete a set a section of the chest error template and then again raise your hand and show us exactly what you did. And then the third one is find the data set from explore the opener explore and describe it in the chest error template. So do we need I guess we need it would have been easier if we were not online but face to face of course, then we could allocate people like you know split people to groups. Emma, can we do that, like deliberately, you know, arrange groups, say that those people that we see first work together, not together but you know work on the first few people work on the second. Okay. So, in the attendees I will take it as I see it you know alphabetically I guess it's it's alphabetically. So the first way we are 16 so the first five six people from Alba Alexis Aris Dimitris and Athena. So these people can start the DMP and share it with each other. I'm not sure they have their emails, though. Yeah, so that could be a problem. So can people that work together raise hands or let us know in the chat. Who can do that, who can do the first exercise. Yeah, we could maybe have a volunteer and then he or she can share with us, Ellie we can give our, our email and then maybe they don't even see the participant. I'm sorry. Yeah, I think maybe because we are also running out of time so maybe if we have a volunteer and then we make them. Okay, Jean can volunteer. Yes, he is already in the, in the speaker list. So, you can already share your screen and maybe you can, you can also our email. I can copy and paste my email or Ellie emails. So, I believe we need the email for for our open air. Account or No, no, no. I'm sharing that. Okay, so if you can use the email. Yes, so you can share your screen and go through the steps so that you send me your. This is actually the, the, the, this last demo session is, is always funnier when we are. It's not that easy to plan in online remote. Okay, yes, we see your screen we don't hear you. If you're speaking. Yeah, sorry, I was speaking. Okay, yes. So now I guess you, you're going to provide a text. My great. The description the language. So here I should. Did you buy based your email address in the chat. No. What, what email address should I use them. This field is to add colleagues that have been working with you for this DMP and the data and with the data sets. So you can search for colleagues like. Yes. The list is long. Okay, and then yes, if you couldn't find it you could insert it manually. That's fine organization, you can search for your organization. Or where your colleagues work. Okay. And then you are the contact for this DMP in the future if we have any. You know, questions, follow up questions. I'm sorry, I should go to the next section. Yes. Under we want. Just error right. It's not very yet. So you have to create it but from January it will be there it will be listed. We're working with just there to do exactly this. Can I just write it no. Yes, exactly. Okay. Yes. You're. You can search for the grants or insert. Yeah, maybe in mind. Insert it, insert the, since this is fictional, let's insert this project. This is the field is to be completed only for projects where multiple grants apply is it multiple grants. No, just a single ground and you don't need to. Do I need to fill in this. The field is to be completed only for projects for multiple grants apply so you don't. If you don't have multiple grants and you're fine. License should get. So here. You can also search. Okay. I can make free license. Okay. And now we want just error that the template that we want to use. This exists right. Yes. And then. Should I save. Save and not data set or you can just save it. You have two options. Okay, let's start. Okay, we don't, I don't want you to go through the data. If you go to my DMPs. Yes. Yes, then you will see it in your dashboard. And I want you to send it to like. With me. Oops, sorry. So export invite. Right. And. You can copy my. It's in the chat. Oh, it's in the chat. Okay. Click enter. Oh. Great. Now let's see. Oh, it's it's at Athena RC, right? Okay. What you hear is this. So I received. Okay. Yes. Let me share my screen now. Thank you very much. Yes. And you can see. That I received. And there I can join this. This DMP. Hello, can you hear me? Yes. Okay. Okay. Sorry. And then I can, I can. I can join this DMP. Okay. Why I'm not going to do it now. And I have access to, to this. DMP that you are. I can, I can view that. And I can work with you on the DMP. So. Okay. Because we have to go to the next one. No, you don't see my screen. I have to share it again. Yeah. Okay. The second one is a disc. The disc. The disc. Okay. The second one is a disc. Complete a section of the chest error template. And just create the DMP and start creating a data set. And just randomly select a section that you want to. To. To write on the DMP that you will create. Okay. We have already. We are over time. Maybe we have volunteers who could do that. Someone that. Has not spoken yet. I have to live. Thank you for the sessions. Okay. Thank you. Okay. Understand that. Yes. It will be very quickly. And easy. We saw. This can be done. In the previous course. Maybe you can ask a John if he's so. Yeah, I'm still here. He's still there. So. Okay. Since we have you. Okay. We volunteer you. That's fine. Okay. So. Yes. Yes. Yes. Okay. So I'm still with the same great DMP. Perfect. Which you can. We want you to add it to. To describe a data set. For this. These are the public data. The public. So. So can I can I use a fake one? I don't have a data set to describe. No, yes, don't worry. Select. Okay. I would. Can I suggest that we go to the, the last. No, no, they'll go to the last. Section. Okay. Yes. Like. No, like the wrong position. Last. Yes. The reusable data. Yes, exactly. Okay. Okay. How will they, how will existing data be reused? So how are you planning to reuse. In the context of your research. Compare. If you know the URL for the DMP where the state are described, you could add it. Well, I don't know. Don't worry. So where can you. So you can search. For a repository. Maybe. Yes. We have many. Here. We have a. University. Yes. That's fine. Thanks. Okay. Which data will be reused? So if you knew the data, maybe you can quickly. Open a tab. In explore. And search for a data set. And then we can add it here. Explore that opener that you. Yes. So, um, Search for it. Oh, I'm sorry. No worries. Okay. We want the data set, right? Remember that. Okay. Research. Okay. Yes. There's audio visual also. Okay. Okay. It's a second one. Audio visual. All right. Sorry. I was looking on the, on the left hand side. So this, um, If you open it. Okay. If you, if you click on the DIY, no, no. If you click on. Okay. Okay. It seems like is it a data set? I don't understand. Okay. Yes. Ah, there's audio visual also. Sorry. Where do you see audio visual? I don't understand. It seems like it's a poster. That's all right. If you, if you go back to explore. And. Hope you paste the title of this. Um, the title. Um, Yes. This one. Yes. Okay. And pasted. Yes. There. Yes. That's the one computer computer science. Yeah. Yeah. I think it's a good one. Like. Uh, Which one was it? And it was the second one. Okay. Yes. You're right. Um, but if you knew the data set that the, exactly the, the, the correct. So it was yours or someone. You knew you could avoid going to explore. You could directly search here. Right. And then state any constraints and reuse of existing data if there are any. So we assume that you would check this data. Okay. So. Uh, I could check with. Can I check here and explore. If you have to get the full text. In order to, to check. The full text. Yes. The full text. Is this, there you have more information. So here we have the license. This is zero. So there is no constraint. Right. There is no constraint. Yes. Uh, sorry. We're here. So the reason it. If there is a. Okay. So here. You didn't write no. Okay. Yes. You could write if you wanted just to, to, to, to say in a sentence, no constraints because it's, is it buying? Yes. Exactly. So the reasons if the use of any existing data sources has been considered by discarded. Let's say that. No. Okay. Completed for the section. Okay. So shall I say. Yes, you can, you can save it. You can save it. And then you can continue work. If you save and close, then you will go. Yes. You will go to your dashboard. Okay. To the, so to the whole thing. Okay. So, um, any, any questions also from your side? Was it, was something confusing? Well, well, um, following this, um, process. Well, from my side, it was, it was very nice to do it online. So as I, we would do it like if, if it was face to face. And, uh, so maybe I would encourage the other participants to do so now. Yeah, that would be good. Thank you. Anyone who would like to quickly go through the steps or has any questions that would like to, uh, to, uh, open, uh, the floor no is open. Okay. But if you do have what, when you will, uh, start playing with our goals that open the data you in January, we will let you know when everything will be set. Um, and, um, and we're happy to, to answer any questions and to support you as Emma already mentioned, support you in any of your endeavors regarding, um, um, open, following and integrating open and fair principles. Also, an audience about amnesia. Uh, if, if you have questions, uh, about a K anonymity, and a method for amnesia or amnesia, and workflows, uh, we are here to, to help. Okay. I think we can, um, conclude or we can actually end it. Uh, would you like to, uh, say anything? Yeah. If I may add a few words. Uh, let me just quickly share my screen. All right. So first of all, I want to really thank you a lot. Uh, Ali, Emma, and, uh, all the open air team for having organized such a great, uh, training sessions, training courses. Uh, for, for researchers actually for us as well. Uh, it was very beneficial, very clear, and, um, very interactive. Thanks a lot for all your efforts. Uh, and that they will be able to, uh, to use the tools efficiently and help us also improve them. Uh, because I think we will, uh, we will thrive as well. Constantly to improve the tools and the processes in terms of open science. So, um, I hope it was very beneficial for them and that they will be able to, uh, to use the tools efficiently and help us also improve them. Uh, because I think we will, uh, we will thrive as well. Uh, because I think we will be able to improve the tools and the processes in terms of open science. So obviously any feedback throughout the lifetime management of the project is welcome. Um, so, uh, quickly. Uh, yes. So quickly speaking with, uh, we all know now why open science is nice. Uh, what are the different possibilities of different roads to open access, uh, to publications? What are the main issues related to data sharing? And as well as the main tools and resources at the disposal of the researchers. Um, in particular, uh, thanks to, to open air. So. Overall, we've seen a very nice, modern approach to open science and a very flexible one. Uh, from our side from Chistera, we've taken a lot of measures. First of all, we will grant all Chistera projects, not just the new ones, unique identifiers in such a way, uh, that it will be very easy for them to monitor all the, the outputs related to their projects. This will happen very soon. Uh, we encourage a lot to use of the, the tools we've seen today and as well as all the good practices. Uh, we will prepare obviously a lot of guidelines, uh, for the future and keep optimizing the tools and the guidelines. Thanks to all the feedbacks that we will get. And obviously the most important thing is the implementation of this new multilateral open science policy for the cold 2020, which is actually already online. So on, on the website of Chistera, we can already see the pre announcement, uh, with parts of the, of the open science policy. But obviously this, this can apply to all ongoing projects, which are all invited to follow the same good practices. And again, roughly speaking, all publications and open access without embargo and underlying, uh, generated data openly available following the fair data principles on fair enabling data repositories. Uh, together with the assignment of this new role of open science coordinator, uh, whose role obviously is to meant monitor the implementation of, uh, the right dissemination activities in, within the projects, as well as the planning of the open science activities and the, the, as well as the DMP. Uh, we are again, as this have said already very much committed to the transparencies of our own processes in Chistera. We've signed the Dora declaration and will keep improving our procedures throughout. So thanks again. Uh, if you have any questions now, later throughout your projects related to open science, feel free to contact me or to contact directly your national contact point. And we're happy to guide you and accompany you in all the efforts and support you with it. Thank you again. And have a nice weekend and lovely, uh, and, uh, end of a year, even though in very special circumstances. Thank you. Thank you everyone. Thank you. Thank you very much.