 Okay, as much as I can see everybody have also the audio connected so welcome everybody to this first open a graph community call. We are happy to see how many people are interested in our community. So, for today what we are asking is to mute during the presentation and then we will have about 30 minutes of discussion for you to ask questions. Please drop your questions in the chat and we will moderate and give you also the floor if you would like to speak. So for today we have Paolo Manghi, who is basically the father of the opener graph to introduce you. What is the graph and why we are starting this community call. Okay, so Paolo, the floor is yours. Good morning everybody. Hope you can hear me. Let me start by sharing screen. Can you see it. Yes, it's all good. Okay. And it's presentation mode right. Yes. Okay. Good. So good morning. Thank you for being here. I'll try to introduce the work that we're doing behind you and a graph to the extent that we are doing it today and where we started from. And in fact, the reasons, the reason behind these community calls is also to understand how we can collaborate to improve its quality and its production, because the graph after all is ours as a community as a whole. So, let me start from the basics. There's a new term that is going on scientific knowledge graph and it's been there for a while now also adopted by the commission in recent calls scholarly knowledge graphs scientific knowledge graphs we don't know yet what's the right one, but skgs and in the end is the acronym that is being used. And the opener, the opener graph is just exactly one of those. So it's a collection of metadata, metadata describing entities that are common in the research lifecycle, together with the relationships among them, which we use to detect relationships and understand how these objects are connected from organizations grants publications and as we know in open science also research data and research software. So the aim of the graph is to be open as much as possible to the extent to be CC zero and CC by where we're possible because in some subsets we have still open but some limitation from our committee data providers. And the lead, which means that we're trying to collect from all sources that are pertinent to science. Again, we're dealing with data and software at the same league as with publications so we're not focused on publications alone. We wanted to be the duplicated and we know why because we're collecting descriptions so meta data records about the same objects from different data sources so we need to the duplicate those we wanted to be transparent. We want to know where the data comes from. If we collected from a specific source or if we infer it through a method or through an algorithm we want to track where the data comes from. And we also want our consumers to know that so they can knowledgeably reuse the data. And this is one of the reasons of the community calls in the sense that we want communities to take on board the graph care about it tell us where we're wrong. Contribute with the research results. This has been the case already in many scenarios we have external collaborators that are helping us at improving the overall quality. And we wanted to be decentralized in the sense that the graph is here but we want the data that we collect to be returned to the original sources, and we have mechanisms for that which would not be described today but mechanism that can deliver the metadata that we infer, and that is missing on a specific data source to the data source for the purpose of enriching local collection where presumably in a repository can be persisted forever. And we wanted to be trusted again one of the reasons here to engage with the communities is exactly that so we want the people to trust the graph to contribute to the graph we trust the data that's the other important thing. So, it's a data, it's a data collection a big data collection, which requires quite resource consuming data backbone. Okay, it's a scientific knowledge graph we know that, but at the same time, we need to comply with some basic requirements which is for which are for the fact that we need to have to represent the snapshot of the whole research outcomes globally worldwide in a specific moment so we need to be to collect the comprehensive coverage and to make sure that this makes sense. And for that we try to deliver monthly update. We want to go through rigorous and transparent as I mentioned before cleaning the duplication and enrichment phases which are critical. We have full text mining we have artificial intelligence to let's say compensate the lack of quality of metadata or completeness or accuracy of metadata where this is possible. And of course we make it run on a professional infrastructure which is located in Poland, ICM. It's a known data center in Europe. Why we want to do it because this was something that was missing so well it's something that others as well are doing because this is what the open sense community would like to do. So it's to build open data collections on top of which we can transparently build services that can be useful for science. Discovery services monitoring services and for that we need to track science and we need to track it using open data as much as possible because we don't want to have vendor looking experiences that we're still suffering from so far. So we need to build these collections and what was missing there and this is why we wanted to build the open air graph is something that it's a big collection of open data collection that goes beyond publications that tracks publications as other collections are doing. In some cases we take advantage of exchanging data with these other SKGs out there. I can mention many from crossref data site. We have research graph from Australia and many others in Europe, even at the national level like Chris systems because the fact that they are graphs of information. So we track science in an open way we interconnected we for the clean it and then we offer this data for the purpose of discovering monitoring mechanisms which are critical for science in terms of the developments developments but also in terms of its planning. So monitoring is really important for planning and evaluation. So this is the model bear view of the model as you can see we have products which contain indicators products are publications data software and other products in the sense they cannot be classified in one of the three above. There can be many relationships between these products from citations version supplement to etc. We have persons which are in related with products in many ways, mainly as authorships represented with orchid identifiers. And then we have organizations projects sources and funders connected with the projects. All this graph is built and made available to the world. So special features we embed indicators so that's one important thing so we collect indicators that can be useful for science also from other services and this is where collaboration comes from. We collect APC set the level of the persistent identifiers. We collect counter metrics like downloads and views for publications data and software. We collect other popularity metrics from BIP. We actually calculate citations because we collect the actual citation things and we produce we are producer of citation things through coaching. So the overall collection allows us to build indicators that we attach to the individual objects. We are very flexible with respect to persistent identifiers. We adopt all the ones that are known out there and the community care care about including accession numbers like PDB etc. And we do that for all sorts of products or publications data and software and we keep stable identifiers based on a stateless approach. So we generate ideas that depend on the original identifiers and we produce the ones that are stable with respect to this in turn. These are the numbers that we are calculating today after the duplication because the total number of records that we collect is close to half a million five hundred. Sorry, half a billion five hundred millions we collect and as a result we have the numbers that you can see here. So 173 publications, almost 400,000 software, 60 million research data, 80 million of other research products which cannot be classified again as the first three. We have 168 funders 30 of which are providing us grants data and three dot four grants together with 200 and 200,000 organizations. In terms of links we have quite a big amount. So we are very much aligned with all the known databases out there in terms of publications of citations. We are just adding some more on top because we infer ourselves publications from PDFs. But on top of that we have close to 80 million publication data links which are very important for open science indicators as well as 400,000 publication software links. As you can see we also infer quite a lot of links to grants. So close to five millions for publications one million for data and 700,000 for publications and grants. We work a lot on affiliations and data organization links and so on because of the final objectives of producing monitoring information and monitoring statistics for organizations and funders where this is possible. These are the kind of data sources that are contributing to the graph so it what is quite unique about the work that we are doing is that we are collecting from 2000 and 200 data sources. And we are trying to enter the world of the communities, the scientific communities in Europe and beyond. So we are collecting from resource data repositories specific to certain communities but also specific to some countries. At the same time we are taking a look at the software universe so we have closed embedding and integration with software heritage. We integrate with other databases out there, like the EGI application database the GitHub itself, and we of course care about existing large collections like and paywall open citations, Microsoft, and so on. We have the usual suspects. So the thematic repositories like archive, PubMed, Hal, Repack, and we tend to be very flexible on that respect so we can, of course, include others in the future. So the number of data sources is growing. But most importantly, we care about repositories. So we have an approach that goes towards the institutional and repositories and the organizations behind them really care about repositories we have built a network around it we have built guidelines that are trying to align the way repositories can export the metadata to contribute to this graph and to other graphs in Europe. We call this the onboarded data sources. So these are data sources that are compliant to the guidelines that we provide that close to a thousand and beyond at the moment. The guidelines are built by the community itself. So we have a group and a community around it that updates the guidelines make sure that these are implemented by repository platforms so that repository managers can find them out of the box in their technologies. And through these repositories we collect metadata that is aligned to what today is called also the European Open Science Cloud, which of course adopted as one of the standard for repositories to export the metadata, the opener guidance themselves. They're not the unique, the only ones, but these are one of the tools that are to be used. And then we have, as I mentioned, what we call the instrumental data sources. These are very typical in many skgs out there dimensions open Alex scope was itself. So we collect from the very large data sources which are not necessarily compliant with the opener guidelines but that are critical to build the graph like ours. And these include Crossref, data site, PubMed, archive, open citations, DOIJ, and then the registries. Of course we care a lot about the registries because we want to be aligned and interconnected with the scholarly communication out there. So we have orchid from which we collect not only personal persons data but also the works behind it and which we enrich a lot we are orchid members so connecting to the open air portals. You can enrich your orchid profile through the tools that we are making, we are making available, but orchid is actually enriched by our work because through the graph we can extend the range of objects that are connected to specific orchid identifiers. Thanks to inference techniques and so on. So we go through and say a prompt suggestion process. And as you can see we have many data source registries, the ones adopted by the community open door for sharing with the data also the eosk service catalog which has been lately added to the list. And here with the duplicate them so we are bringing also this useful view of common profiles for for data sources across different registries, which is useful today us but also useful to all of you and the community of course. So, briefly, the graph is basically built by aggregating a quite large number of data sources, some of them are compatible actually the majority of them are compatible with the open air interoperability guidelines. Thanks to this community driven approach that led us to let us the guideline to be implemented across Europe and in the platforms. So that helps a lot in our aggregation process, but on top of that we have again national thematic and community data sources that are connected so onboarding registering to the open air graph that we aggregate this aggregation of data which is not yet the duplicated is then enriched by mining so we collect PDFs where this is possible. We are close to 25 millions at the moment, and we have a dedicated infrastructure to run put text mining AI and so on to enrich these objects for each PDF brings content to its own record. After that with the duplicate so we merge records that are describing the same entity, keeping provenance over the records from all the data sources that have contributed to it and building and richer record. Then we also enrich by inference because now we have an interesting collection that connects the dots so we can pass the information from one node to another. I give you an example, if I have a publication that is supplement is supplemented by a data set and they have the very same authors and the publication has orchid ideas, we can move the orchid ideas to the authors of the of the data set. Or we can pass the project funding the publication to the data set because they are part of the same experiment. And after that we finalize the graph and send it out through API's and dumps. Many users we have loads of users I just mentioned a few here. We have researchers of course that use our data to perform the science and science of science, bibliometrics, this stuff. We have organizations that are using the data to perform monitoring and so on. We have funders that are doing the same, including the European Commission. And we have service providers like the US but Scopus and Springer were actually taking our data to enrich their own collections we have very good collaborations based on MOUs with many publishers, so which we exchange pf mining and data to enrich bilaterally our collections. Yes, that's I don't know why it appeared but we also have, of course organizations like like countries or universities that are using the data for monitoring. And as I will show you later for island access to the graph through data sets in Zenodo you can find many of them. There's a specific collection that is called open a graph that you can have access to and you will find different slices of the graph so the whole graph. So you can collect the whole thing or subsets of it, which in the years have been requested more and more by people. So we decided to expose the single individual data sets for them to use access to the graph of course is also available through API's. Here are the links you can collect them you can go there. Request the token and access the API as much as you want there. For more, we have documentation so graphed open air dot you is the site of the graph but you can find documentation. So information about the data model, how the properties as exposed the schemas that we use. The whole graph production workflows you can also find all the publications scientific publications that we've written about it. We're trying to be as transparent as possible. Please tell us if you have any questions or requests or inquiries through the forum that we have lately provided because we really care about interaction and improving our services and getting feedback on our mistakes. And just a quick scroll of the features that we introduced last year now we have the fields of science and sustainable development goals so we're tagging all the records with these very important topics and tags. I think we have a coverage of 245 million sustainable development goals linked to our connections, which is to our products, which I think is a very relevant and important result. And we did that thanks to an interaction with a couple of resource data centers. That actually contributed a lot in in this activity again. As open air we cannot of course we had we don't have the whole knowledge of the world so we definitely need, because we really care about this, others to contribute to improve the quality of the graph and to make it more trusted. Fields of science classification. I'm not going into the why and what you know that but it's very important for a graph like this to have a uniform interpretation in view of what science is the research fields the research topics. We picked the field of science because we know that is well adopted and widely adopted out there. So we have implemented algorithm that follow up on very consistent research and integrated and embedded that in our data provision chain. So applications discovery and research assessments so once we build the data of course we built this large collection. We can use it for many purposes and one of them is for sure discovery. So there, there are a few things that you can do through the open air portals. One is Explorer, which is a generic search interface on top of the whole graph, through which you can search through the different kinds of entities but also navigate between them and find information about the statistics, and so on of the objects therein. If you log in through orchid course you can interact with your profile and enrich it with everything because in this in this gateway you will find your publications your data and your software not only publications. We can do this also for the specific communities that are asking us to provide them with a gateway, and in that case we have, let's say a very meticulous approach so we care about cutting out this carving out this, this lies of the graph that pertains a certain discipline. And we do that by delivering very specific mining purposes very specific criteria to have inclusion, and so on. Okay. So again, through the portals. You can also when once you're logged in. Enrich the growth. So you can manually add links between products. So links that connects the different products or also between links or also between products and projects for example if you want to claim that your object is being funded by a specific product. So most of the communities this process is validated by the administrator of the community so in the case of the digital humanities for example, there will be a team of people, validating yourself. Monitor is really interesting is one of the things that excites a lot of researchers in the field because you have such a risk collection such so many links around it and you have to you can start coming up with indicators. What I suggest you to do is to visit these two sites and take a look monitor dot open air dot to you observatory dot open air dot to you. In monitor, you will find a lot of closed that dashboard because monitoring is typically a private matter for the community for the organizations, but the European Commission one is open. As you can see, we are producing already white number of indicators regarding fairness regarding open sciences and so on, which are the outcome of course of processing the graph with special and dedicated statistic databases. Recently, we have started working on a tender and I want to mention this because it's a very interesting scenario that actually explains the reasons of these communities, community calls as well, one of the main reasons. So, what Island wanted to do is to have access to a dashboard through which they could monitor the overall production of Irish universities Irish repositories and so on, with an open science lens on. Okay, so tracking again plan s evolution of open science open access and so on. So, we started doing this using the graph and what came up is a very interesting flow through which they can use the graph to produce these indicators. So they have a dashboard now it's in pie it's a pilot mode, but it's visible through which they can interpret the data that they're coming from the different repositories and universities and that comes of course also from Crossref and data site and all the data sources that I listed before to understand how Ireland is doing right and of course fear fear decision making and so on. At the same time we have established a workflow through which we can also detect the anomalies that we identify in the original data, because again the original data is not always published in a proper way, not always aligned to standards not always curated necessarily as it should be, and we feed it back to the original repositories so that over time they can improve the way they produce the data they can also in some cases, and possibly in the future instruct scientists on how to behave when they publish. What exactly they have to do, because to in too many cases scientists are really left alone in this process. They act by common sense and when we act by common sense we are all rights, right. But we are not aligned and that causes problem in the final data. So these virtuals, let's say life cycle is very interesting and I think is what we should do in Europe and actually beyond to make sure we have a fully open data scholarly communication record of high quality. This is these are just a few screenshots of what the Irish OS monitor provides from open access routes fair plan as APCs cross country collaborations, etc. number of citations of course, and you can dig in by university by repository. The top RPO monitors are a full monitors. Also the founders from Ireland are being focused. So it's a one single shop for such shop, but single entry point for information about national monitoring. So the take home message here is that the graph is probably the largest in the sense of open science monitoring SKG out there. So coverage and quantity and size is actually important in some cases, because we want to have an overall view of what is the production. We care about publications, we have all of them, but we also try to have all data sets where all is trusted by scientists, the data source is trusted by scientists, it follows and is integrated in their publishing workflows. The graph is open, of course, so that's one key element. It's transparent. Again, everything we do is transparent in the processes and in the data itself, where we try to describe part of the process that has generated the data. It is a public good. So that's another very important thing. So, and by public good, I mean that the open a graph is operated by I'm going to open and that is a no profit based on membership from the community. So anybody who's a member of the organization can contribute to the roadmap and steer the roadmap of the graph. Okay. So it is an entity that goes beyond the people that goes beyond the individual organizations. It's actually a community effort. And that's very important. So it's like an highway in a mid game in a way is not the result of Paolo and it can go on forever, or the current team. And that's a very important aspect so we don't have a vendor located. So there's a founder behind that is a public fund. And then it is participatory. And again, we're going back to the reason why we're having these community calls. They're very important. So we welcome a contribution from communities for service providers organizations. We establish relationships with them. We learn from each other. We exchange data and methods to conclude. Why community calls because there are many gaps that need to be filled and problems that needs to be addressed. Now I'm mentioning the three that came up to my mind when I wrote the slides probably there's more, but one of one of this is the interoperability issue. So we have protocols, models, metadata formats and vocabularies that vary a lot, especially when you dig and dive into the communities. Okay, so it is extremely important to leverage on organizations like the EOSC itself and or open air working groups to come up with common ideas and solutions. So it is critical to do that. So, for example, collaborating with the repository platforms or the data source platforms like this space, E prints, Dataverse has enormously improved the alignment worldwide. And we can really count today on in the majority of cases of data sources that are already compliant to some basic structure requirements. Then of course we all know that there's something to be added on top because resource typing vocabularies may change, but it's really important to move in that direction. So to align have a common view and a common mindset. Otherwise we can build these collections that are useful for all of us. There's in general a lack of standard publishing practices across disciplines. Some disciplines are very mature. They know what they do. In other disciplines, scientists are just left alone. So they pick a repository here and there and they choose what to do. And these results in many cases in incomplete metadata or inaccurate metadata. Nobody has checked. In fact, if what the metadata is declaring is the truth or if there's a mistake. And when you build this kind of collections on top of which we want to build research assessment, this may be an issue. Okay, resource typing is an issue. So what is the resource data in a discipline is completely different from what it is in another discipline. And these decisions have to be taken at the discipline level, not at the central level from an organization like open air. So these interactions are critical. Any compensation technique, which is typically goes down to full text mining artificial intelligence, or feedback in, as I mentioned before, in the Irish monitor. The communities on how to behave in a way are huge, well require a huge amount of skills that cannot be gathered in one organization alone. So here is where the membership approach of open air comes handy because we have many organizations research and data centers worldwide that can support and enrich what we do but of course we want to go beyond. The more we know the better. I've recently exchanged idea with the open science France monitor we exchange methods we exchange ways to to perform some mining for example, these are the kind of things that we all need in an open science community. The graph is public good. And this is where the community comes handy again. We're seeking collaborations because we want to enrich quality and coverage so we want, of course, high quality data sources to join us. Especially when they can ensure trusted publishing life cycle where metadata is curated where the objects, the products that are being published are trusted by the communities and by the data source itself. But we also want to integrate research and solid research results. For example, the experience we had with the sustainable development goals has been extremely successful and we love that we did the whole visibility of course, and we all gain out of it. End users feedbacks are critical. Of course we make errors. And of course we need somebody to tell us because when you produce such amounts of data, all the monitoring tools that we have, of course, are kind of driven by the fact that we know what we have to monitor. So we are investigating on potential anomalies and so on and we find many, but many of them come from the end users from the researchers from the services like the experience we're having with Island, right. So we gain out of it by improving, improving the quality life cycle. We care of course about community expectations. So this is why we're asking you to ask to use the forum to tell us where you would like to go in the future. What do you think is missing what we could, how we could extend the graph to serve different use cases from today, or something that we're completely missing from our view at the moment. And of course we want to improve scholarly communication infrastructures and workflows. So we have, we want to establish synergies with service providers, researchers, publishers, because we'd like to align on the way we publish data. Starts from the researchers themselves or who is acting or was performing publishing on their behalf and from the data sources themselves so from the publishing venues that they should provide the right tooling to make sure a science is published in a less burdened way and in the right way. The main holds for policy and for guidelines. So we are building the fact of a community in open air where we're trying to align on common standards and idea, and we do this on several for, okay, we have the eos we have RTA, for example, the SKG community framework we're building because we want the graph itself to be easily interconnected with other services and vice versa. And this is something that we can only establish together as a community. So thank you for your attention. I'm now very happy to answer all your questions now and of course beyond so once the call is over, I'd be happy to reply to your questions via email. Thank you. Okay, so I'm, I'm giving the floor first up to the people that are raising here at the end. So Katia, go ahead. Hi, hello, good morning. Thank you, Paulo. That was a brilliant talk. Thank you very much for that. I was wondering, you emphasized a lot, the transparency beyond or that underlies the open air research graph. I was wondering if you have published or if we have public if there's public access to all the rules and guidelines that underlie all the inferences and the duplication rules that you have to build the graph. Yes, I mean if you go to the, to the documentation side, you will find the description of those processes. We are very happy to also to share the software and there's also a couple of publication that come and try to explain what we're doing. Especially the duplication in the context of the graph has some aside notes that are not easy to address. For example, when you merge two notes, you also have to merge the relationships the notes as with other notes. Okay. These are things that we care. We are taking care of and they're described in the publications on the documentation you find the general rules. In fact, there's going to be soon, like in the month, a refinement of those rules that we're going to publish together with another publication because we want to explain exactly the process. But you will find a lot of it already in the documentation side. We will definitely take a look, but I was, I was leaning towards that last part of your answer. So the very specific decisions that have to be made when you are in the record. Yes, because that's the details that that makes a big difference. It's true. This, this is something that it's also not easy to write because it changes in many cases. But of course we can have a dedicated talk dedicated chat on that we are very happy to to share. Okay. The other question, and I would have more questions but I will finish with this is, I was wondering about the feedback loop that you were describing, whereby you try to feed all the enrichments that you do to the records to the data sources. How is that actually done. I mean, do you have already an experience with a big uptake of this information. Yes, yes, yes we have. Okay, so, first of all, let's make a distinction between two feedback, let's say flows. One feedback is one through which we realize that there is something like in a data source that is not performed in a performed in a nice way. For example, authors are not properly described or there's quite use number of mistakes made in providing this kind of information that rings about basically and so we go back there and say, guys, there's something wrong in here. That's not the case for island. But for example, we had cases in the past with repositories that were exposing 10 times the same meta data record. Okay, so you just go there and tell them guys, please fix. There's a technical issue in there. Then there's a second flow, which is the one that we implement through the broker service, the broker service, it's a service that basically takes a look at the graph. From the point of view of one data source. This data source is giving me 10 records, I assume, here just as an example, I look at these 10 records since I know how they were in their original form. I can buy subtraction basically detect which are the enrichments that I provide. Okay, as a graph and I can send these enrichments at the level of the records to the original data source. So we have a mechanism through which the data source manager can subscribe to specific enrichments. For example, tell me all the open access versions of the records that they have and where they are located the URLs. Okay, so we can send notifications of this or give me all the topics that I don't have or tell me if I have another title or tell me whatever. So these are things that the repository managers can do. So if you go to provider provide sorry dot opening a dot to you, and you are a repository manager, you can configure the your notification settings. As to the first case, which is the one that we apply mostly to, to the case like Ireland. Instead, we are trying to detect the miss the misses. In fact, in the metadata that may compromise. So if data sets come without a date or they come with author strings that I cannot really connect to an orchid ID, or there is no raw, for example, no raw identifier but just a string telling me about the organization. These are things that we can detect and discuss with the repository managers on how to fix it. And this is actually very important because there's some magic that you can do at the level of the ground with AI and mining, whatever. But this will always reduce a margin of error, right. And you need somebody to validate it, you're never sure enough, and so on. So producing the right metadata is actually key and also instructing on how to provide it and why you should care about providing the right metadata, because otherwise your institute or you will not get your reward as you should. Thank you. Thank you for your questions. So we have many questions so Paolo tries to be brief in the answers. I would like to give the floor to Laura from Daria, who asked a question here in the forum, if you can turn on your microphone. Yes, I'm just moving because I'm not in the place where I can talk. Okay, all good. I wanted to know about the possibilities or investigations or if it's an entry requirements, but what about products that don't have identifiers in our environment, it exists. And I would like to know if there are, if it's a strict kind of entry requirements for the for the graph or if it's something that you are also investigating or something that you let the repositories basically. No, no, we don't care about identifiers. Of course we care. So if we have persistent identifiers we collect them and we also identify them with a specific scheme, but we also collect from repositories that don't have a persistent identifier. They have local URLs. We count on the fact that these URLs are stable enough though. Okay, because again, the open a graph is a picture of the outside world. If you're migrating your data from one place to another and you change completely your identifiers, then you will suffer from that in a way so previous data will disappear the new data will reappear. Okay, and we integrated in the workflows and so on, etc. But the open a graph is completely agnostic of the existence of an identifier. If there is we exploited otherwise, we just go ahead. Thank you. Thanks. And from the registration form, but we have also other questions like how can be data sets research publications integrated to open a graph. Well, all you have to do is to publish your data sets, whatever software publication in a data source that is compliant with the open a guidance as is integrated in the open a graph. Then this will be happen automatically in Europe. This actually comes handy to many because if you publish in your institutional repository and institutional repository is in open air. Then you can immediately report to the commission your results because the commission participant portal is integrated with the open a graph. They will also appear in Scopus. For example, if you have links between your publications and data that are nicely specified, people navigating Scopus when clicking on the publication will see your data sets because they're collected from the open a graph. Thanks. And Maurice, would you like to speak because I've seen a lot of comments and questions here. So, I would like you to ask directly to follow your questions. Well, thanks. I think these questions are very specialized so perhaps Paul can answer them in the in the list. It's about the links between the relations. I saw the slide between about the relations between publications to publications, publications with data projects. I'm not seeing relations between affiliations and researchers and publications and researchers is this coming. Was it just not in the presentation but it's already there. Okay, so we have we have links between publications and authors that are not reported here, of course, but we could, in principle, calculate them. We're not keeping the link between the author, the publication, the author and the affiliation for that specific author in the application in the in the publication. Okay, so we're instead, let's say, flattening those. So what we care is to know which are the organizations involved in the publications and which are the authors involved in the publications. Not exactly which author for which publication is involved in in a publication. The reason is simple is because the information in the majority of the cases is not provided, though it doesn't come handy. And it would be left empty in most cases. And since instead we know where the what the organization is and the author is that would complicate the representation in the record. If you think about it right so so basically we decided to represent the two independently. And then we can always try to recombine those when this is possible but is not so often happening. So, I can go deep into the discussion if you want it was really a pragmatic choice. Yes, because you don't really need that right. You really need to know if the publication belongs to. I don't know certain and who is the author of the publication. You don't really care in terms of statistics and research assessment to know if the author was for certain for this publication. And this is also a natural expression of the fact that this information is not provided in many cases in the metadata. Well, we can discuss about that. Definitely. My second question but I don't want to take up the space for other people who have also interesting questions so. Okay. Yeah, go on with the second questions and then we switch. Okay, so the second one is about I was triggered by by enriching by mining. So we did participate with open air on the on the SDGs. So it's a pretty good model but we have also another model built with open with Aurora. That is triggering a higher recall, but keeping the precision somewhat similar. So you get more more results by labeling more publications to an SDG without getting a false positives. But the nice thing is that it's also multi lingual the model that we use. And that is could be handy for because in the European context there are many languages being used. And I was wondering if if we, I mean, we can we offer this this whole thing for free, of course, and can you put that as well as a second labeling instrument into the graph. For example, like open Alex does right now. Yes, use our model to to reach their data. I think it is doable. Okay, this is the kind of thing that you discuss with the with the team that is currently doing it. Now we have, we had a collaboration with you and with Athena Resource Center, and they are contributing with their, the AI models for that. But of course, we'd be interested. It would be great actually to do it and check if there is no conflict in the sense in the data production, of course, if the two models are aligned and if they can learn from each other so if they can reach their both their contributions. What I know is that is very resource consuming very much, and we need GPUs so we need to perform these kind of actions on a separate systems because there was no chance we can do it on our HDFS and the same infrastructure that we have. So we need a dedicated machinery for that. So if you think this is possible. And if you want to collaborate I think connection can be established, but I think we discussed that right. Sort of, but then it's all, I guess. Okay, no, because I, because I had capacity, see a GPU available, but the project ended so now it's not. That is the issue with this kind of models that you need to also make them sustainable over time so you can take advantage in a specific period of some resources and then what right so we need to think thoroughly about it. But on our side, we found some solutions so we may discuss this. Okay. No, thank you, Maurice, as usual, always pleasure. Thank you both. I would like to leave the floor to Joao Mendez Muregra with this question. Hi, good morning everyone. I wonder if you have a public dashboard about the quality of data that can help you make the decoration based, for instance, on a whore or an orchid. You can see the data that is missing and suggest correction or stuff like that. And the other the other question is, if you have public APIs for subject classification, I mean, fields of science or SDGs, thank you. Okay, so as to the first question, I'm not sure I fully understand what you mean so so you're you're claiming that so because we include orchid data and raw data. Okay, so we are collecting their data in info. And we are integrating into the graph is vice versa with that we are kind of enriching their databases with information they may miss. Okay, because we established this connection between the researchers and orchid, for example. And we also propagate orchid ideas where they're not yet there today. Okay, because I think sorry. My question was about the quality of data that you can access if you have a tool. You mentioned Ireland, and you mentioned that they can see what that is missing. You have. Do we have a public tool that we can also use. Okay, process out the quality of the data. No, so there's no, we don't have a tool that exactly does that, but we have of course all the portals through which you can search browse and verify the quality and you can send feedback so the forum now is one very nice way of doing it, but we keep on receiving the help desk request of that kind. And that's very useful for us because we can detect if the damage is one that we have produced because of our mistakes, or if it was to be blamed by the original data source which is in many cases unfortunately the case. So mapping is one of very common things. So people publish an object and they claim it's a data set instead it's a software. Okay, and then somebody at the end of the chain tells me tells us, but the graph claims that this is a data set but it's a software no we didn't write so and there's nothing that we can do to improve it in some cases. It's the way to go. We are not sure I have the time to say that but we have in the roadmap in the next few months. The idea of implementing specific labeling system basically that allows us to tell if a record is missing information that is crucial. Should include something else to be perfect, let's say to be exactly according to open sense expectations for example, a software that is left alone with a title, one author, no URL to any repository, no link to any publication is not something that we would like to include in the monitoring Okay, so it's something that is should high ring let's say a warning to the author. So the idea is also if we know the author to let them know letting her them know that there's something wrong and if they really care they should do it because the graph and all these systems that are relying on data data coming from data sites for example site valid self from a severe now and suffers from the fact that typing is mistaken. So think of supplementary material. People publish publication together with the pictures, and the pictures are published as data sets in Zenon or in feature. Okay, when you collect it. It sounds like the publication comes with a huge list of data sets that extremely powerful and useful their pictures right the figures in the paper. So, how can you detect these differences. It's actually crucial, because if you count them as part of your monitoring and production of your university of your failing right. So this kind of detection is something that we are we really care about. And in the next month, we'll try to implement heuristics that detect this kind of anomalies tag them properly. So you can be the right monitoring at least as far as our techniques are good, but we'll see. So you should expect this before the summer in in a beta. Okay, with regard to the API is for subject. With regard to the API as we have public API so I'm not sure we can search by SDGs this cloud search is is for us to use and do our own classification based on the information we have. Okay, we are publishing it so we are publishing this part of the graph, you have the information so you can collect it. I think you can. In some cases download subsets I'm not sure there's one for you in university or for your country. But this actually come together with the gateway so if you have a gateway for your organization, you also have a subset of the data for your organization. Okay. The same comes from a monitoring of the country or ministry or a funder. We in general don't generate the data sets on request because it's actually expensive to keep all these dumps. We're talking about terabytes of data right and but we can discuss it offline. If you send me an email we'll see. Okay, maybe we can tell you how to easily find what you need in the graph. Thank you. You're welcome Joe. Thank you. Thanks. So here also in the chat and also in the registration form. Jens asked about the plan for the citations number. Claudia answered that we are currently have two billion citation links. Maybe we can. Okay, maybe we can give some kind of answer with our collaboration with open citations. No, actually, the citations are exactly the same that we have in open citations. So they're actually we actually enrich open citations with some citations that we are inferring. So publication publication is exactly the same number plus a little bit more that we are contributing. We have really integrated with the open citation. We integrate coaching. And we also infer on top of that citations from the publications PDFs that we have. So it's a sort of circle. So we are basically re ingesting what we give them, but as natural. On top of that we have the citations from publications and data publications and software, the majority of which have been inferred because software is not. And software publishing is still not a very spread, let's say practice and they don't do it in the proper way. So we mind the PDFs we find the links to the software. We reificate the record for the software and we build a link. So that's more a reference like I mentioned. Okay. We are not sure if it's a citation, positive, negative, if it's a supplement of. We can start from to run our investigations. One step that is still to be made is to understand what to do with the with the citations actually with the links that are coming from data site. Because we have publication publication links in data sites, we also have publication data and publication software links in data sites. If to consider them citations or not is not clear is something we're discussing with open citations. And the citations has run an analysis by picking some key data sources that are part of data site and they realize they have completed different practices. So some of them publish citations like references and some publish citations like citations. So in the end, it's really hard to understand if a third party publishes a references that is not the citation and make a distinction between these two. So we at the moment have decided to treat all references for some data sources as citations that complicates a lot the approach that we have in mind. But fortunately, they're not so many. I think they're kind of in the order of six millions, which compared to the two millions is nothing. Okay, so we will answer to the other questions offline. And I would like to give the floor. Yes. I would like to give the floor to Elena with the last communication. And we will. We will give you the notes and answers in the notes document. So Elena the floor sees yours. Thank you. Okay, so I just like to take a minute to announce to everyone the new open air graph website, which is set to launch this week. So this is a new site it's updated, updated version of the last one. It features new things such as the user form this is the same one that Paulo mentioned earlier on, you can access the form from now from today. In the website there will be links to the form as well there will be an FAQ page and also a page that clearly lays out how you and your respective role can use and contribute to the graph via the open air open air graph API data sets and open air services so definitely check it out. You can see when the updated page will go live on our Twitter page so that's here at open air graph. It's a very new page as a very new from the past couple days so at this moment. If you go to now it is empty but we will soon be substantiating it with a lot of news about the graph developments. Any developments in open science also relate to to the open air graph and scientific knowledge graphs. So you can go there to see when the new page will be live and also just to keep up to date with any and all developments in the open air graph. So, thank you. And also thanks again for everyone that came today showing your interest. This is really exciting for us in your community call series and we really look forward to all of our future discussions and hearing all your feedback so thanks again everyone. Thank you. And we will have a follow up of your mail with you. Okay. Bye everyone. Thanks. Bye bye. Thanks a lot for joining everyone. Goodbye. See you at the next one. Okay. Bye. Thank you. Bye.