 Good morning all. Many thanks for joining this webinar. We're accepting our invitation to another open air webinar about the open science observatory. This webinar is part of a series of webinars that is organizing to present the open air services. And today we have our colleague Joana and Alessia to present the open science observatory. Just a few rules about housekeeping rules. The events will be recorded. Participants' microphones are off. If you want to participate, you can use the chat to introduce yourself, to interact with participants and write questions to the speakers. And you can also raise your hand to speak. Presentation and recording will be shared with you by email and also in the open air portal. If you want to share some thoughts or this presentation in your social media, you can use this hashtag or mention the open air channel. So today, as I said, we will have Joana, Kripari and Alessia Bardi from the open air team. Joana is from a dinner research center from Greece and Alessia Bardi from CNR from Italy. And I will pass to Joana on the floor to start this webinar. Many thanks. Hi, everyone. Thanks for welcome from me as well. And thanks for joining. So I'm going to present the open science observatory. I expect that you can see my screen, but no, you cannot. I'm going to give a brief demo later, but first I will present the main point so that we have a shared understanding of what we're going to see. And at some point, I'm going to pass the floor to Alessia Bardi that is our open air research graph expert. And if there's more technical questions in the end, she will help us to answer them as well. So you can visit the observatory at osobservatory.openair.eu and we invite you to do so and go and play. What is the purpose of the observatory? Well, we wanted to build a platform that helps users better understand the European open science landscape. And in particular, provide the information necessary to monitor and therefore enhance the open science policy activities. Moreover, besides the open science views in particular, we wanted to present a European and within country view of research activities, especially open access research activities and their impact on the society. We present information at different levels of interest so they can be compared. And this way, and I will show you how in the demo, a user, a policy administrator, policymaker, research administrator and so on, can see what works and what does not, which areas I like behind, the hidden potential and so on. So basically the aim is to take this, the open air research graph, which is this giant ball of information, which has a 360 view of the research field, let's say, and turn all this data into actionable insights. And in particular, since this is about open science, promote good practices as well. How do we do this? Well, this is open air, so it's all about open data and the completeness and relevance of data, transparency of the methodology and therefore replicability of any results and indicators that you view on the platform. So it's built on the open air research graph that we will describe in a couple of minutes. And it generally includes all linked scholarly information from different content providers and it goes beyond publications. There are different types of research products such as data set software and other besides publications. It is based, the observer is based on open science principles. So we are talking about open data sources, open APIs and what documented metrics and indicators. Later on, I'm going to show you the methodology page on the platform and show you how you can use it. And we work hard to provide indicators that are relevant for the community. In fact, we have very much welcome any type of feedback both on the numbers but on the type of indicators that you would like to see on the platform. So I will show you how you can send the feedback using the platform or you can see my email at the end of this slide step. So the open science observatory, what is it? It's user friendly. So we think data visualization platform that has a bunch of exporting capabilities so that the data can be used for analysis and the visualizations for reporting and so on. And it contains indicators that track and evaluate the progress of open science update across the entire Europe view and within each country. Just a small note here that the open research graph has global coverage but the observatory for now is focused on Europe. It is also possible to have a funded institutional and research infrastructure used on open science. But for that, so not only per country, for that we have another service which is on demand at this point, the open air monitor. You can also contact me for that if you're interested. So the first question that we're asked and what is important to say what is the data behind the numbers that I'm seeing? How do I know that this reliable and what it is exactly? And to give a brief description of the open air graph, I'm going to keep sharing my screen but pass the floor to Alessia. Yes, yes. Thank you, Johanna. So the graph is an open metadata research graph of interlinked scientific products. So as Johanna was seeing before, we have descriptions of publications, dataset, software with links, with semantic links between them, with semantic links with the funding projects, relevant research communities and infrastructures, organizations and also access write information. And this graph is generated by collecting metadata records from sources that are trusted by science. So we have a network of institutional and thematic repositories. We include content from Crossref and also from the funders. So the funders give us links to, give us, sorry, give us information about the projects, their funding streams and so on. And but the open air research graph is not only a collection of metadata from these thousand sources worldwide, but open air also finds and merges the duplicates that we find. Because, for example, the preprint and the published version of an article appear as two versions of the same publication in the final graph. And this is very important since we have to provide statistics for the observatory and for monitor. We also have enrichment algorithms that run on the metadata and also on the full text of open access publications. And these enrichment algorithms enrich the graph with inferred links and properties like subject classifications, links to projects, links to datasets, links to patterns also, and many other things. So in the end of this process, after the duplication enrichment, we provide the graph for our portals. So explore.openair.eu, our portals under the connect.openair.eu. And of course, the statistics that you can find on monitor and the open science observatory. So you can find more information on the dedicated website that we have. So graph.openair.eu. And if you want more information about the type of content or the metadata that is available, just ask a question and at the end of this webinar. So, Johanna. Perfect. Thank you, Alessia. So this is what's behind the indicators. In particular, one more thing to say here. Now, if this is at this point too abstract or too technical, we can discuss it later, please ignore the slides, but it's important to cover this. So we show country level indicators. There is two types, results that are created using results. When I say results, I mean publication, data software, no research results. We saw results affiliated to a country, most of the indicators. And we also saw results deposited in a country's repositories. So all the numbers, most of the numbers you see have to do with country affiliations. So it is important to understand how do we get these affiliations? So there are three ways. Two of them are active right now that you will see. And the other one is coming up next. The first one is the result organization country. So for example, a publication is written by an author in the University of Athens, which is in Greece. So just from the institution or the organization of the author, I know that this publication belongs to Greece. The other way that also assigned the publications to Greece, for example, would be the result institutional repository country. So a publication is deposited in the, let's say, University of Athens repository, which is in Greece. Therefore, this publication is affiliated to Greece. So these are the two ways. Again, if this right now doesn't make sense, we can just skip it. And the next up to be included is the affiliations to the country that comes from national aggregators. So this is just for anyone who's wondering when we're talking about country level, what are you showing exactly? And to have it in the slides that we will share. Okay, let's move on to the indicators. So some of these are included already and I will show you. And some of these indicators are due to be online by the spring of 2022. So this is kind of like short term roadmap. So we have two types of indicators at this point. One of them, the first one is open science indicators, where we cover different openness metrics, such open science colors or the number of open access results, different indicators that measure compliance to the fair principles, indicators on a plan as compatibility and transformative agreements. This is work by our colleagues, Eliah Fava and Iko Yan. And indicators on extrapolated, the estimated, article processing charges. Okay, so we expect to have a full set of these indicators by the spring of 2022. If you're interested in the particular details of which ones will be online exactly, please contact me for us. And then there's a bunch of indicators that relate to open access research output and its performance overall. So besides the number of how many out we have and so on, the indicators are networks and collaborations. And there will also be a usage statistics, downloads and so on, citations and eventually SDGs. So which SDGs are targeted in each country, let's say. This may be a bit later on, but we're aiming for sooner rather than later. Indicators are broken down by different fields of interest, as we showed later. So one can make meaningful comparisons and I will give you some examples in the demo. So these foods include the different types of research outcomes, time, countries, data source organization, and so on. So one can make meaningful analysis or gain some insights. So enough talking. Let's do some hands on showing. So I'm going to give you a short demo. So this is the landing page of the Open Science Observatory. As you see, we start off with a map. At this point, we're showing open access publications across different countries. I'm noting here as written that all publications in the Observatory refer to peer reviewed publications unless stated otherwise. Okay. For the definition of peer reviewed publication, one can visit the methodology page and I will show it to you later. So this is an example just by viewing at the map. One can see here that, let's see, out of the 1.8 million publications affiliated to an organization in the UK, only about a third is deposited in the country's institutional repositories. Whereas, for example, if we look at Turkey, there is a higher share of publications affiliated to the country deposited in the repositories. Now, of course, this can be explained in many ways and we're not here to claim we know the reason, right? But what can see this is because of some mandate or policy of there is a need to increase the number of institutional repositories and so on. Just giving some examples here. So clicking on open access data sets, one can see relevant information for data sets across different countries. Of course, the numbers are much less here mostly because the requirements to report data sets and provide them as open access have been starting to increase lately. One more reason is this and the second reason is that we find overall that the metadata quality of data sets is not yet up to par as that of publications. Why do we care about this? We care because we may miss some of the country affiliations. It's not included in the metadata, right? Okay. Further down, we provide some summary statistics. Now, this is a good opportunity. Sorry, there's other sections here. You can click and browse. It's a good opportunity to show you the methodology page. So here we see that there's so many repositories, but only 31.9% are validated. What does it mean? I can go to the methodology page and click on terminology and construction. Here you can see that we have the different types of entities. They're inherited and inferred attributes with links so that it is clear what we show. Also, they constructed attributes. For example, validated, what is a valid data set? We provide first the definition, a data source of research outcomes at hopes metadata standards. Okay. Very nice. Very abstract. The actual construction here we shows how we actually calculate it. Basically, this was just to show you the methodology page back to the observatory. We see that the repositories in general don't have a good metadata quality standards or they're not upheld from most of the data records. Whereas, for example, open access journals that have higher standards, which is not surprised because the deposits are also reviewed and so on. This is an example. So further down, we provide a table that is basically the table view of the map. Excuse me. We also include besides repositories, journals, open access publications, data sets, and also good software and other research projects. So there are several options like sorting or viewing something by percentage or the actual numbers. So as an example, let's see for the number of access open access publications, we can see that Belgium is near the top. Whereas if you look at the share of open access and we sort that Belgium is further down. So a lot of open access publications, but as a share a bit lower. There's a drop down here that shows the affiliated publications, what we're viewing now, the same number including the non peer reviewed. And then you can see the number for deposited and deposit including non peer reviewed by selecting at the drop down here. Okay. There is also some chat here over time chat for the different research products just to give you a bit of a summary of what's going on. We can click here or here at the top to see a detailed view of open access research outcomes in Europe. So let's see here. So we're following the same structure throughout. So there's different tabs that you can click across. Okay. So let's start with overview. So they are different charts. For example, open access publication by peer review over time by type. And then there's a bunch of charts that have tabs here. So you can see let's say open access publications by country or by data source, organization and funder. Now for every single chat that you see here, you can click on these three lines, download the image in a normal or interactive format and insert it to a report website or whatever or download the data that has created the indicators have created this visualization and use it for analysis. Now, and this is the same everywhere. Okay, let's go in the open science tab. Now these thematic tabs that first they have these buttons at the top where one can view all the indicators for the particular type of research program. For example, let's say here we have for software. So viewing publications, let's say here. First of all, so here we have the gold versus green open access publications. You can visit the terminology page for how they are defined and how they're constructed. So let's say someone wants to see just green open access publications and export that image and analyze that data. You can just deselect the series that you're not interested in. And if you download here, you will download this particular information, not the original graph. Okay. Let's see here. So we have golden green open access publications. Let's see by organization. So, for example, we have an interesting case here. We have that the autonomous Barcelona. It has only gold open access publications. Actually, they rank towards the top of the top 15, the number of gold open access publications, but they're not deposited somewhere. Or if they're deposited, the affiliation is not provided in all the content providers that open it hard. Now there's several reasons for this. Perhaps it's a matter of what is required from the university, or it could be because they're provided under different names of the organization or the affiliations are missing or I don't know. But someone who is interested in green deposits could view here and try to understand the picture a bit better. We also have some other indicators on metadata completeness. I'm not going to go through everything. So we have some times for questions. But let me say that in all these tabs, there is a more detailed view where you can see all the indicators, as you see at the top, by country. And again, a certain viewer has a within country percentage and select the type of publications you want to view. Or if you are, for example, in the case of data sets, let me click here more details, either affiliate or deposited. Of course, there is no peer reviewed here. So you can use this for comparison. And there is also a collaboration tab and there will be more indicators as I described in the presentation a few minutes ago. Now let's say we want more details on a country. So we saw we have comparison tables, we saw what's going on in Europe, but what's happening in a particular country. Now there are several ways to go to the country view. One can type the country here or one can activate the map at the top and then click on the country or through any of the more detailed states. So let's say here as an example that we want to examine Spain in further detail. So we start up here with some general research overview of Spain. Here we see, for example, that we have open a research of five funders, but two hundred and eight for funding organization. If one is interesting to know the difference between these two, just go to the terminology page. Inhated and interviewed funder versus funding organization with links of how they're defined. So we see here, for example, that Spain has a lot of data sets. So one is a lot of open access data sets. So one may be interested, for example, if what is the quality of the data in these data sets? Are they are they in general fair? Many definitions, but how do they fair in general with the fair principles? So what one can do is go here to let's say open science and then click on data sets. And then we start here, for example, with licenses. So what we see here is that a lot of these are open access data sets with any type of license. And this is with CC license in particular. First of all, we observe that the blue are pretty similar from one to the other. So most of the license that we do get for Spanish open access data sets are CC licenses. So whatever has a license is an open license. It makes sense as we're looking at open access data. However, we do observe here that there is a bunch of licenses, a bunch of data sets without a license. So that's potentially an area to work on. Or again, there may be an issue with the deposits and the metadata quality of these deposits. One can view here, for example, by organization, let's say here with the CC licenses. And what do we see here? We see that most of the data sets come from university, which is great. There's a lot of production of open access data sets. And for some reason, they enter a lot of them enter the graph without a CC license or the other open access. So that is potentially something to start. This is just an example and we do not claim to know what's behind the numbers. This is the type of thoughts that one can make. However, if we switch here to PIDs, we see that in fact, the most data sets come with a PID and it's very, very well over time. And let's say here, organization, all of them at one university. Of course, these are just some insights that one can view from the visualizations. One can always download the data besides the indicators. Or if one wants to go to a lot, a lot of detail, they can start with the statistics indicators here, get some insight, then download the open data from another user app, and then the type of analysis can expand a lot. Also, if there's any issues with the numbers or some feedback we'd like to give, you can just click on the button here and it will direct you to the email. Okay. I think it's a lot of information. So please go and play on the platform and let us know what you think. What is the big picture for the observatory? Where are we heading further down the road? So one of them is to right now we're unfolding the open science story across countries and across Europe in general. So what we want to do is also provide this type of analysis per scientific field, which we think will provide very important insights. And we are considering also moving on from the European view to the global view for the open science observatory. And lastly, the vision in terms of the indicators is to make some impact analysis so one can understand how the progress of open science updates across and within countries and scientific foods has affected the entire research landscape, which I think is one of the main questions that anyone interested in open science has or in research in general. So I think I talked enough for now. I'm going to stop sharing my screen and open the floor for questions. Let me see if there's something in the chat. Oh, who? Yes, we have. Many thanks, Johanna and Alessio for the presentation. We have some questions in the chat. Let's up with a question from Janusz. How often the country indicators are updated? How often are the country indicators? Well, every time that there is an update of the open research graph, which is at least one month monthly, this is on the content of the indicator. So they're always up to date with the graphics automatically. And you can see also here. Let me show you briefly. If you go to the home page, data last updated, for example, December 9th. So this is always shows how up to date they are. Okay. Okay. We have also a question from Johanna. Affiliation is not always part of metadata in institutional repositories. How do you extract them? I suppose from BPDF. But what if that? Okay. Okay. Let's break the question down. Thank you, Andrew. Okay. So for institutional repositories, we are validating that they only include publications from the institution. So we do not need to get the additional affiliation. So if I belong in Athena Research Center and if we have a repository and that repository only had Athena Research Center author publications, then we don't need an additional affiliation. Just the fact that in the institutional repository is enough. But this is also validated in practice. Okay. And you also say, Brian, I don't know, first of all, if this answered your question, maybe you could show us some ideas for improvement of our repositories with affiliations. Yes, we absolutely can. Alessia, how can we coordinate this? If the repositories can expose the metadata according to the latest guidelines, then for example, they can add the raw identifier of organizations in the affiliations metadata field. This will really help us because raw is a persistent identifier, unique identifier for organization. So this will give us a correct and precise way to link the publications to the affiliations, because otherwise we have to rely on the names of the institutions, which can vary a lot in time or because of different languages and so on. So adoption of persistent identifiers in the repositories for organizations would be really helpful, I believe. So we have another question from Janusz. What is the source of open source software numbers? I think this is on me also. Okay. So we collect metadata about software from different trusted sources that are used by scientists. So for example, we can find software in Zenodo. We can research software, research software in Zenodo, research software on DOE code. We have bio tools, which is another repository for software in bio informatics. So we have these trusted sources that are often closely related to the research communities. But then we also exploit the full text mining. So thanks to the full text mining, we are able to find the links from papers to software, which can be referred to in food notes or in the text. So what we do in this case is that we create a link between the publications and the software, and the metadata of the software is produced based on the information that we can find on software heritage. Software heritage is a wonderful initiative for the preservation of software that is deposited in different software repositories like GitHub, Google code. That may disappear just like Google code did. So by linking to software heritage, we have links that really still resolves, and this really helps the disability also of the research itself. Thank you. I see the next question. Sorry, it's about, there's two questions on the same issue, which is, what is the share of the open size observatory publications, data and so on, out of the total number of national publications? So we show the entire, everything we know that belongs to a country according to the definition I told you before. And as you saw from the share of open access, it includes both open access and non-open access. So what we are in the process of doing, and we would like to enhance in case someone is interested, is validate the data from responsible people from the countries themselves. So we are aiming to show the entire of national production. And if something is missing, it is important to identify the source and figure out how to improve the graph in terms of that. Is there a follow-up on these two questions from Gareth and Janos? No, thank you. That was enough. Thank you. Okay, thank you. And I think in the shot more questions, I know. I think Alessia, there is a comment that the software can sometimes be deposited on GitHub or on an organization's GitLab. And there is a plan, I guess, to harvest this as well. Well, we want to harvest from GitHub, but the mining algorithm is capable of identifying the links to GitHub. So in this sense, we are covered. And the same is for GitLab. The problem is that if they are organizational GitLab and they are not accessible from the outside, it doesn't make a lot of sense to have the link, because we will land somewhere where we cannot access. So in this case, what we think the best practice is, is to tell the researchers that when they publish their paper, they should also publish their software. And to do so, they can use, for example, the integration that Zenodo has with GitHub, so they can perform a release on GitHub and automatically publish on Zenodo, the software. And they can do something similar also, also with GitLab. We have another question. Do you count Weren, NBN as a PID for datasets? All of creation datasets? If Weren, NBN, but in the graph, there is stated that only a few of our datasets have PID. From the point of view of indicators, and then I'll pass the third to Alessia, we did not discriminate on the type of PID that was accepted. So if in the graph, there is a link to a dataset with any type of PID, we count them as having a PID. About Weren, NBN in particular, Alessia, do you have a further comment or should we investigate and come back? Yes, probably we should investigate on this particular PID type, because also the open-air guidelines consider as PIDs a given list of PID types. So if this is not yet included in the guidelines, this could be the reason why these PIDs are not counted among the PIDs. So we can investigate that. Okay, so I say URN is listed there. I encourage, Boyan, if you can send us an email, we'll take a look and we can investigate this further to make sure that we have the right numbers on that. I'll put here my email address just in case. Okay, also Yan is saying it is extremely difficult to present all products of research from a country. National crisis systems seem to be reasonably close to that, at least where there is sufficient motivation of research and institution to report. That is right. And I think when I mentioned national, including the data from national aggregators, this is what I meant, Alessia, the national crisis system. Well, we have national aggregators that are not crisis, crisis much more than a simple aggregators, but they are for sure related. Yes. Okay. And so we can put on the plan to include some crisis already in the graph. So this is a work in progress. Okay. I'm learning things also. If someone wants to make some questions, you can open your microphone. Ask a question directly. You have an additional comment from Yanos, from a researcher perspective, organizational data repository, very similar to organization GitLab policy should be very similar. We'll take a look on organization GitLab and see if there's something additional that can be done there. Yes. And if Yanos, if you're aware that the GitLab offers API that we can use to harvest rich metadata records about the software, of course, we can investigate this, but I'm not aware of such API or any way to collect open air compliant metadata from this kind of services. Please let us know if instead this is possible. Some additional minutes for additional questions. Yes, we have another from Juliana. Why some resources which are visible at open air export are not visible in observatory? Yes, there are different kinds of data that are shown in the two portals. Open Air Explorer shows links to a country to any means possible. So by an affiliation, by a national aggregator, by a project link, by data source that is in that country, any sort of because that is the need of those users. Whereas in the open science observatory, we restricted to affiliations in the sense of the author or the creator is in an institution in that country. And as you showed previously, we're also going to add national aggregators there as well. So there are different users with different requirements, hence a different slice of the graph is shown in each case. But what we're going to do, because this is an excellent question, is that in the methodology in the documentation of the open science observatory, we're going to create a section that clearly describes the differences between what is shown in observatory and what is shown in explore. Thank you for the questions. Ayanna also replied, Open Air Explorer is visible presentation of open agraph or I'm wrong. It is right. It is also the same in observatory. It's just not everything because not everything is related to country affiliations. But yes, you're right. If you have some more questions, we have some minutes to use. We can also open the microphone if you prefer. If not, I think we can close this webinar. I think we can close the webinar if there are no additional questions. Any thanks, Johanna and Alessia. Thank you. Thanks to everyone for being here. Thank you. Please contact us if there's any additional comments. We want you to be using the platform. Please let us know if there's something that's amiss. Thank you, Andrei. I think also all the participants for accepting our invitation and we will share the presentation, the slides and also the recording. If you want to revisit the presentation. Many thanks. Bye-bye. Thank you. Bye-bye.