 Hello and welcome to this joint Freya and OpenAir webinar on the new developments in the field of persistent identifiers. And we have two excellent presenters with us today here. So our first speaker is Ketil Kopjakobsen from Pangea and Bremen University in Germany. He will speak about Freya project and new persistent identifiers developments. And our second speaker is Dr. Amir Ariyani. And he's ahead of SodaLab in the Swinburne University of Technology. And he's also on the board of directors of Research Graph Foundation. And my name is Irina Kutschmann. And I work for OpenAir and I will talk how OpenAir uses persistent identifiers. And Ketil, would you like to start your presentation? Yes. For questions please use either webinar chat functionality or Q&A functionality and we'll answer the questions. And it is supposed to be interactive so please type as many questions as you can. Thank you and please Ketil, over to you. Yeah, that's good up here. I'm sorry, I have some issues here. Okay, we're here. Okay, all right, I can do that. That's good. Okay, now I'm shared. Now we're here. Sorry about that. A little technical problems with a toolbar that was in the way of making my presentation to full screen. But here we are. So, today I would very much like to thank you for joining this joint webinar from Freyja and OpenAir. And my name is Ketil Kupjekeps and I'm from Pangia at Bremen University. Pangia is a data publisher and we work with data from many different scientific disciplines our primary area is in the natural sciences. Today I'm going to talk about the project Freyja and the new developments that we have done within the area of PID developments. The Freyja project just a little introduction to what it is that we are doing. Overall we are concerned with connecting open identifiers for discovery and access and use of research resources. Well in a nutshell that actually means that Freyja is all concerned with the use of persistent identifiers and if we take this little citation here from our description of work well then it explains what it is that we are actually doing and that is to extend the infrastructure for persistent identifiers as a core component of open research in the EU and globally and what that means is that Freyja is concerned with facilitating a better infrastructure for persistent identifiers meaning we are not the guys who are actually producing the pits but we are the people who are facilitating a better infrastructure to connect existing and emerging PIDs. Freyja is a Horizon 2020 project funded by the European Commission. We build on two preceding projects that some of you may have heard of that also were concerned with persistent identifiers namely Thor and Odin. Freyja started in December 2017 so we are a little bit more than a year into our work now and if you would like to see more about what it is that we do then please take a look at our web page that you can see here and also you can follow us on Twitter. One of the things that really characterizes Freyja is that we work interdisciplinary and that we draw on expertise from a very diverse group of partners and as you can see here we have all our partners listed we have Data Site Crossref Plus, CERN, British Library, SCFC, Pangea, Hundawi Dance and EMBL so a very diverse group including data repositories, publishers, research institution, PID providers and libraries. This means that we can actually face the challenges about improving the infrastructure for PIDs from many different angles and representing many different scientific disciplines. One of the things when you hear about Freyja is that you will pretty much always be presented a PID graph. A PID graph is our basic tool that we work around and here we have a PID graph a basic one that illustrates how different PIDs that are already existing for the most part is connected and let me just walk you through the graph and then afterwards I will extend it so you can see what we mean by expanding it to include new PID types. Okay so we have a situation here if you follow my mouse here you can see that we have author one and author two they both have an orchid that's their PID together they make a publication that publication goes to the publisher it's accepted it's published it gets a DOI the publication has now a PID the author PID the orchid is included in the publication and the publication DOI is included in the orchid page of each author so they're interconnected. The publication is based on the dataset that dataset could come from the author in which case there would be a link between the dataset and the author but it can also come from an open repository like for instance here at Pangea where I work where we have datasets from all different kinds of disciplines available so you can actually search our database find a dataset that suits your purposes and the datasets that we have available are curated and published and have a PID in form of a DOI so the dataset PID is included in the publication and linked to the author through the publication. Now in order to analyze this dataset that the author found he or she might have used a software it could be a software that's available from a repository like Sonodo for instance and also there the software can be given a PID so the software that is used to analyze the dataset can be included in the publication and linked to the PIDs of the author so in this way Friar is trying to link different PIDs so what I talk to you about here that pretty much already exists but if we add go on to talk about institution well then it becomes more difficult because there's no clear PID scheme for institutions if we talk about new PID types like organizations that definitely people working on assigning PIDs for organizations also within Friar some of the our partners are involved with that and the progress is ongoing but there's not a mature PID system for organizations yet either then there's a sample the dataset might have been generated from a physical sample that exists somewhere and that sample has the opportunity of having a PID there are PID systems for samples out there the IGSN number for instance but it only covers like a certain part of the physical samples that could be available I'll get into that later then there's instruments the instruments that were used to analyze the sample to generate the dataset that goes into the publication now there is a lot of this debate about PIDs for instruments right now because there's a great need for it it would be great to have a PID that can identify the instrument that was used to analyze the sample however this is not an easy task because where do you start giving a PID to an instrument do you give it to a platform that has several instruments on it do you give it to an instrument itself that can be identified as an individual unit but what if you have an instrument that consists of several sensors do you give each sensor a PID and what if you then recalibrate the sensors after two years well then the whole foundation for the way that the instrument works is different do you assign on your PID in that case or do you leave it as it is well these are challenges that I mentioned here that I know there is a very nice group in RDA that is working on that right now but there's definitely a need for an instrument PID other new PIDs also could include data repositories needs a PID the grants that is funding it all the research I mean there are PIDs but we need a more universal system here conferences need a PID so this is what we in Freyja mean about extending the PID graph to also include new PIDs but the question is where do we start which new PIDs are the most important which are the easiest ones to take to the next level where they are actually easily implemented that is some of the questions that we are concerned with in Freyja so in order to analyze the landscape of the new PIDs we did an assessment of the PIDs that are out there and identified the gaps so which basically means the PID systems that we need but it's not there yet or is there in a still developing form and the way that we did that was that we explored the need for these PIDs by asking for use cases from the community and I'm going to get into what I mean by that in just a little bit but all this enabled us to identify the requirements for new potential PIDs so three bullet points on what Freyja is doing on the new PID landscape is first of all be assessed the current landscape of new PIDs that is done there's a report you can look it up on our web page what we're doing right now is we're trying to identify the needs for new PIDs and their requirements this is a long this is something that's almost done there'll be a report in a couple of months and then finally what we would like to do is to develop prototype for new PID times but that is a task that is still in progress but if we start up with defining the PID landscape then we did an analysis of the entire PID landscape looking at what's there what's existing what are the PID initiatives what's emerging what are people talking about what do they need and what is the maturity of all of these PIDs that we could possibly find out there in the research different research communities from all different scientific disciplines well altogether we found or identified 25 entities and by entities I mean things that could potentially be a sign of PID like publications conferences researchers organizations data and so on and so forth these entities either have a PID or need a PID however this is also not a simple task because there might be a significant overlap among different disciplines and that actually complicates it when we want to determine the maturity for instance we can hear talk about samples again well samples for geological samples there is an existing PID system the IGSN number but if we talk about cultural artifacts well then there is no good system for PIDs in this regard even though a cultural artifact can actually also be a physical sample well then there's something like historical persona well this is a persona it needs an PID for persons but it doesn't really fit the orchid system well if we talk about the remains of a historical person well then we can call it a physical sample but there's also no good PID in this regard but altogether for all of these 25 entities we identified the PID types that are already there and we assigned them a maturity level and there were actually only three entities researchers publications and data that have services that are deemed fully mature the the rest of them are either emerging which means that there is a PID system but it's under development or they're immature which means that people are talking about the needs for a PID system but they're not there yet you can see a report on this on our webpage then we move on and talk about identifying needs and requirements for new PIDs and this is where the use cases come to the picture because the method that we chose to use for identifying the needs and requirements for new PIDs was we asked the community the research community concerned with PIDs and PID assignment for use cases and I'm going to show you in a little bit what a use case is what a use case is in this regard but we got a lot of information here also from conferences where we asked the audience to give us use cases these use cases enabled us to identify PIDs new PIDs in high demand and identify the requirements for the progress of these new PIDs so what is a use case in the FRIAR context a use case in FRIAR describes a scenario where a PID is needed identifying a user a goal and its benefits and there it basically follows the template that you can see here on the right hand side as a user could be a researcher I want a PID for instruments that is my goal so that I can track all the publications that comes out of this new instruments that to benefit I as a user gain from having this new PID we collected a lot of use cases in a lot of different areas just to give you a few of the headlines here we have cross-linking literature and data we are instruments linking published metadata with instrument metadata tracking reuse of software and tracing outcome of research cruises so we have a very diverse collection of use cases I'm going to give you some examples in just a little bit but just a little bit about the numbers so altogether we collected 72 use cases in total and from 30 of them you could identify a use case that revolved around what we would call a new PID emerging or immature so just to give you an example here about what the use cases were that we collected and we have one use case that revolves around research cruises so it says as a funding agency I would like to trace the outcome of my financial contribution to a marine research cruise by tracking the data generated and articles physical samples and I would also like to track the future data and publications generated from my cruise and the samples it collected so if you boil this down then it's a funding agency that would like a research cruise ID so that it can track the future outcome of their investments we have another use case that revolves around software it says as a software author I want to be able to see the citations of my software aggregated across all versions so that I see a complete picture of reuse so if you boil this down the user is a software author he or she wants a software PID in order to track reuse which is the benefit for this particular user we have a use case that revolves around policy and from a research manager that says as a research manager I want to have policy IDs so that I can easily identify relevant policies and assess the compatibility between different policies so here we have excuse me a research manager who is the user who wants a policy PID that's the goal in order to benefit from being able to assess compatibility between different policies so from all of these 30 use cases we made what we call a popularity index so how many times were different PIDs and need for PIDs mentioned in these use cases some use cases had more than most of them actually had more than one PID mentioned so that's why the total number here adds up to more than 30 but still what we can see from there is that there is a really big demand for instrument PIDs there's a lot of talk and a lot of need for development of a PID for instruments a lot of other entities were mentioned in these use cases so what we're going on in FRIAR and analyzing further is not only instruments but also the repositories organizations physical samples grants and software and we added a few that we have personal interests in from the partners within FRIAR the research campaigns the data management plans and facilities so these are the nine PID systems that we are taking on for further analysis and what we want to do with them is that we want to identify the needs based on the user stories so why do the users want this particular PID and we want to validate the current status I mean here we can draw on the experience that we have from our landscape analysis but we want to analyze what it takes to expand this PID type to a level where it can be used more frequently and this validation also includes across disciplinary approach and here I would like to dwell a little bit on the physical samples again because I find them really interesting we had user stories that all labeled physical samples that revolved around information about sediment cores in a repository and tracing and relocating misplaced cultural artifacts and on top of that identifying samples of bacterial viral and fungal strains and as I told you before we have PID systems that are to some extent working in these areas but they're not fully covering all of them and so our goal here is to analyze what is out there and what it takes to actually find a system that'll work well maybe not for all of these different kinds of physical samples together but that in itself will also be a result if we can say okay we need a split or we need to try to merge them all together this should end up in a report that'll be available by the end of February and we can then try to match the requirements and the needs with the expertise that we have in FRIAR in order to see how we can move forward in terms of prototyping some of these PIDs I'm not going to go too much into the prototyping now but just to give you a little bit of an example of how we want to improve the infrastructure for these new and emerging PIDs I want to give you a use case here from Pangaea on how we have implemented the IDSN numbers so I've set this up as a use case scenario I'm a geologist I'm interested in sediment cores I add it has come to my knowledge through my search of all the literature available that there's something going on in the French Alps they're taking some very interesting samples there and I want to find out what data is available so I search in Pangaea and I found out that Banyat et al 2015 they published data and that is very interesting and I would actually like to know more about it so what kind of information do I get from the PIDs that are implemented in the data set in Pangaea so first of all we have a PID for the authors so most of the authors here have an orchid and the orchid is actionable so when we click on the orchid well then here we go when we click on the orchid as you can see here well then we will be directed directly to the orchid web page but we have additional information about the author in the data set there is also a PID for the data well this is the one that we're actually looking at and that is what we will use if we need to use the data set and cite it in another publication on top of that we have the article PID that will is also actionable and it'll take us directly to the to the journal web page where you will find the actual article where this data was published so these are the mature PIDs that I talked about what we have added further in in Pangaea and in particular to this data set here is we have implemented the IGSN numbers the numbers for the cores that we're taking in the French Alps to generate this particular data set these cores are stored in a core repository and have been assigned a PID and that is now available also in the in the data set from Pangaea here and this means that it's actionable and it'll take you to the IGSN web page where you can get information about this particular core where these data come from what kind of material where it was collected and where it's potentially stored at the moment so if you wanted to you could actually trace it down go get it get a sample and do your own analysis this is a little bit of a view of what it is that we would like to expand the new PIDs to also be included in in this kind of of information here to finish up just out in in conclusion our use case oriented method gave practical orientation about the users demand for new PIDs and we could see that in particular PIDs for instruments organizations physical samples grants and software were in high demand by the community the implementation of some of these new PIDs will definitely improve the users access to additional information and that is what we're trying to pave the way the way for here in Friya and with that that ends my presentation thank you very much thanks a lot Kettle are there any burning clarification questions to Kettle right now because if there are please type them in the chat or Q&A functionality if there are no well I can see raised hands but let me see if I can allow Peter to speak yeah just allowed you to talk if if you want to ask a question Peter maybe it's easy if you type your question in in a chat I think Peter is still mute yeah that's what I'm trying to figure out how to unmute I might be unmute from Peter's side because this is talking parameter yeah okay yeah Peter is saying that he does he doesn't really have a question right now shall we continue with Amir and please think about your questions for Kettle and also for Amir okay so thanks Arina let me start okay so now I think in the second talk I'm going to go a bit across a couple of different for like more philosophical perspective around the PIDs and connecting the scholarly works and I think if it would be in a way complementary to what Kettle said about creating the PID graph that I'm going to basically look at the PID graph more as a user or as an infrastructure that use the PID infrastructure rather than a PID provider so well I have been fortunate enough to work with the PID infrastructure system is almost about six years and all of this time we actually use this platform for a big bulk of connecting its scholarly works so in this presentation what I'm going to do I thought in the short time that we have I can talk about a bit of background that where and how the working group in the research data alliance got to the point to leverage the PID infrastructure I will talk about a kind of living-age technology that right now in this space using the PID and PID graph underneath and also I have a couple of points about the PID graph so that's that's would be almost aligned with the same things that Kettle mentioned and also the other thing is that this slide set is almost an abstract version of a one-hour talk so there is a lot of things here that I'm actually jump over the topics I'm going to mention them but it wouldn't have enough time to actually looking at details if there is a need for further discussion we can always have an offline conversation about any of the quoting code jargon and I'm going to mention as part of the presentation now well on the background side in back in about six years ago we had a problem which is still a lot of repositories still have that problem and that is finding the related data sets if you are running a data infrastructure if you are a researcher who is searching in a data infrastructure for a data set or if you just randomly come across a paper that cited a data set when you are looking at that data set you think okay what are the other data sets on the internet that are linked to this now in the concept of paper we have the paper citation and there are discovery tools and this is the other Microsoft academic graph and Google scholars and lots of different services that provide you with overall context around the publications and a scholar develops in that space but when it gets to the data our options are very limited Google recently launched a data search platform but still that wouldn't directly address the problem of looking at individual data set and find a related work to this now this was the issue for a group of data infrastructures and lots of people in this space in very early days in 2013 this led to creation of the working group that called data description registry interoperability the idea was finding better ways for connecting research data across multiple infrastructure points and one of the earlier partners in this project was Dryout Seam and Ants and I'm going to use some of that data to show you the sort of works that we did in the beginning to actually find some way to actually address this problem now this is a screenshot from the research data Australia that's a service that hosted by Australia National Data Service in Australia and you're looking at one page here which is a description of a data set on the right side you can see some similar data sets linked to this but assume that that thing is not there if you only have the information in the middle you probably if you search for information related to this you have multiple different options in the old days we actually used to grab a title or related keyboard and search in the data site rest services or other related things to find related data sets now the working group at a time okay well we can do better let's do a different method let's actually looking at the name of the author we can google the name of the author you will find a page of the researcher which is a professor Catherine Bill of in the University of Sydney you can go to that page we can go look at all the articles we can read the individual articles and in one of those we will find a link to a data set in a draw your repository which is actually in this case has a persistent identifier we go to that repository and we find another dataset by the same author which in this case actually has a different name or different abbreviation of that name it's called Bill of K so what happened here is actually we went from a dataset to a researcher from a researcher to a publication and from a publication to another data set we technically connected two different data sets by presenting the concept that they might be related we don't know if they're directly related but they're conceptually related they're produced either by the same author or they're the reference in the same publication so there are some links in that space that connect this information together that was the point of the dvri working group and the work that we did was around developing this model to find this will make it production level service the repository can use this for connecting the datasets in their own infrastructure the work was led to creating a piece of software system that called a search data switchboard and it was basically automating the same process that I explained it was using the Google API to link the information across internet and one of the things it was doing that was important it was resolving old identifiers and it was building a model that later on got the name of the research graph now the the core of this activity comes back to connecting different scholarly works and scholarly assets using technically two or two degrees of separation so examples here on the screen are two degrees of separation we have two dataset they're cited by the same paper we have two data sets that are funded by the same grant and we have two datasets that are created by the same researcher and the difference between this and the concept of data citation is that we don't know any causality in this there is no direct relationship between these two datasets we just have some evidence that they might be related the formation of this work underneath with some other study that I said you don't have enough time to go through this so this is actually quite a strong evidence to find related information across the web we published the paper in the scientific data with a cluster of our records that was curated for the purpose of public use and in that nature article one of the things that we found was in our curated dataset 70 percent of our publications had DOI and 46 percent of our datasets had DOI now the significance of this is that they were curating this graph if you like for the nature we were trying to find the clusters that are highly connected so they don't have lots of scattered information it was more frequent and finding interconnected information and that was an interesting observation again this was a quite a small case study with a very limited information and the most of the DOI that we get obviously here from the data site and crossref but it doesn't mean that this is this information can be applied generally so just please don't take this one as a overall statistic across the internet now so this is what we did at a time around the kind of connecting repositories but later on we start developing something that called Occument API and the Occument API is similar concept but it's actually very much driving the information from persistent identifier because of the trust that we had in the relationships of those identifiers and because of the availability of the services that enables building relationships on the fly they give you an example of this so this is a data set from the NCI which is a data infrastructure facilities in Australia they have a data set that called Blue Link and this is one of the major collections in their data repository now if we use the online API of just the PIP providers we can actually traverse graph and build information and as I'm going to show you the the researcher who worked on that data set from the NCI data we know that is a Peter Oak we know Peter Oak has an orchid ID so we can actually go from the researcher in NCI to the orchid record once we so this is our kind of first level connection once we are in the orchid record we get all the publications once we have those publications you can look for the data sets that are cited in those articles and I might just oversimplify this in a more comprehensive version there was a bit of technology involved to actually find in these relationships between the data sets and papers but let's assume for example say that this just magically happens and then you actually have the data set then there are other data sources like as colleagues which is another initiative by research data alliance that give you a link between the data sets and publications so you start to building this graph and the beauty of that is that you can build this graph dynamically because there are rest trusted API for every single of these PIP providers to find the related information now what does it mean for NCI well for NCI we can actually transform their geo network repository to a graph database then we can send this one to the research graph infrastructure we can document this with the information come from research graph and then it gives you a bigger graph lots of blue points which are all the blue as a color coding of research graph blue is the publication and orange are the data sets and green are researchers so the most of the blue information and green ones come from the research graph augmentation API and there's the kind of field the gap the process as I said is kind of in this presentation is oversimplified it doesn't just blow out the graph in degrees of graph crawling it actually just feels the gap it uses a shortest path algorithm but regardless this is the difference between before and after the service so this is NCI graph before augmentation on the left all the orange ones are the data sets and using this augment API and the power of this persistent identifier services we complete the graph on the right so they gave them a better picture and this was completely it was possible using the pit identifiers and also a piece of pit services and in that space and because for every single of this could this graph would be different tomorrow compared to today as these publications and data sets and graphs are evolving now with all of these technologies what did we learn well uh the first thing that was quite apparent to us was uh the kind of the persistent identifiers in a way saves money for a lot of infrastructures prior so this whole journey that I mentioned in the last like 10 minutes is actually like five years worth of investigation in different technologies and we found there are so many ways that you can run complicated disambiguation algorithm to disambiguate authors disambiguate uh titles of publications and so forth and all of those processes are actually very expensive if you do it at live but when you use the pits is actually are encapsulated into your ecosystem as much much easier to implement and for repositories and for universities and for research groups it's actually affordable to build at added value services top of this pit infrastructure the second thing is that not all connections have the same value so during this time we tested so many different methods we use a graph clustering methods we use topic modeling and different data mining techniques we found yes with different level of accuracy we can guess if these two papers are the same if these two authors are the same but just that one example that you go to the conference and present something and job log is not the same job log as you thought he's that's just one false positive it's uh it just puts a big um a dent into the accuracy of the data or using the pits provided with a trusted connections because the provenance of the data is quite clear so we know actually where that information comes from and the other thing is that the using the even if there is a mistake in the problem what in the pit metadata records so you know where is the source so it's more kind of like fixable so you can go to fix it and then that would propagate through the system the second lesson that we learned is that in the beginning when we were doing this was six years ago uh it was quite tempting to just create a big graph we just started from literally from 500 nodes to 5000 nodes to the 5 million nodes to the 50 million nodes as today there are about 250 million records of a scholarly uh communication records and what you find very quickly is that collecting all of these information they're not dynamic data as we collect them they are actually expired because the information is changing in that space so it's kind of like you want to navigate into this if you want to build a service that really adds value like what Ketul mentioned uh about different use cases that's not about just creating the big graph it's actually about finding and creating the fast trusted and sustainable services that enables innovation and other people can actually build and discovery methods that traverse this graph traverse this big uh ocean of data that we have and actually find the right answer for our users lesson three which is there was a most important one we need we need to actually more invest in the business of connecting rather than collecting so uh when as I said when they started it's about six years ago it was quite exciting to create a big graph uh and I have seen a lot of copies of these big graphs so it's getting you quite fashionable to have another curated version of this and we have this in different shape and forms or startup companies rising into this we have different projects but the the problem statement is not to create a very large uh cohort of scholarly connections that cohort is already exists that's the that's the network of scholarly communication that's already available across the web the question is about how do we connect this information so connecting here is the number one priority and at least the most valuable use case for uh any new service or any new uh platform that wants to leverage this ocean of data now about the pete graph uh so given everything that I mentioned the research graph foundation has a quite a strong interest in the success of pete infrastructures I mean the development of pete infrastructures and really the pete graph which is a graph of identifiers underneath is one of the key enablers for everything that we do uh in that space uh where there's lots of collaboration between our work and fryer and based on that collaboration I added a couple of slides I think if these are kind of just repeating what Kethyl mentioned but I'm just repeating my understanding from where the fryer goes because that might also open the conversation in this space that the aim of the pete graph is providing at least a federated and json api and these actually the aim is one is that they have to be disciplinary applications kind of ready services so they can actually implement these services into different repositories or platforms or infrastructures they need to be able to support the european open science cloud applications and also it needs to be able to provide enough level of dependencies in the in the json api that actually enables some level of graph visualization for the end user now this is again another version of the use cases just this is more in the language of research graph if you like so there's a couple of examples one of them is that if you have a data set across multiple different infrastructures with multiple different versions that actually happens a lot in a lot of fields the data actually has multiple versions stored in different places we want to collect the citation for all of them so we should be able to report all of them at once if I have a data set and I put that data set for example in fixture and for simplicity let's say there's only one place for it but we have updated this data set 10 times and there are different citations for different versions of this given that each version has different ui we want to be able to accumulate all of those citations into one and metric something similar about funding if you have a grant and that grant actually mentioned in a data set in a publication and some other activities and research projects and as we just learned that far is working on creating a pit for the projects and grants then we want to actually have this kind of list of connections so we have a grant that grant actually funded some other smaller grants or projects could use some papers and some data sets so it should be one place for a new service or app to go and search for all of this information rather than actually hitting 50 different apis and try to consolidate the data and the last one I actually didn't have any mathematical model because it's kind of it's one of those things that you don't know the answer until you start working on it and what would be the most effective linking method between data and publications and that's one of the things it's this is a text from one of our user stories that careful mention as a researcher I want an easy way to more effectively link all the data to part to publication and as a reader I want to be able to easily find all the data related to a publication now we are what there is a repository of this on github that you can go and see all the flyer work around these user stories some of this might lead to answer the questions of what API we want at the end and potential users and the other thing that I think just listening to both kettle presentation and also more understanding from a work like this is finding the model for prioritizing the work on which API needs to be built first okay so what next from here there is a buff that is planned for our day planner 13 in Philadelphia so the proposal is up there but we don't want to date and the conference program so that would be a good place to follow this conversation if you need more information about the project fire that's website that's website for research graph and that's my contact people mission and I hope this didn't lead to so many open-ended questions but feel free to send me thanks a lot I'm here are there any burning questions too I'm here right now we don't see any so perhaps let let let me show you open air use case of pizza and then we can have Q&A session second opening my slides hope you can see some wall open air is a European Commission funded project facilitating open science in Europe our current phase open advance has been launched around more or less same time as Freya and what we do we collect metadata about publications or data sets or software and other research outputs from repositories of publications or from open access journals from data repositories or software repositories Chris systems and other aggregators we do some validation cleaning the duplication inferring linking and then we present connections between project sources or data sets and other research outputs and we have different services on top of our portal so one of them is Spolix that was already mentioned there and we hope that this aggregated information is useful for measuring impact of research and some research trends obviously we collect different types of publications from different repositories for example from institutional repositories we collect information about publications articles preprints reports but also in some cases data sets then from data repositories we collect data sets from journals and publishers we also collect information about publications and we are looking into software and other research products in open-air advance to see how we can better link them to project publications so we have content aggregation policy and based on the content acquisition policy we build open-air information and this is what our content acquisition policy says so that open-air accepts metadata records of all scientific products whose structure respects the model and semantics as expressed by the open-air guidelines and this means that both open access and non-open access material will be included and links to other products will be resolved where this is possible and this is where we actually use Pids because for us Pids are resolvers of those links this slide shows the role of Pids in open-air because it helps to identify and register content providers it helps to identify researchers it helps in metadata application enrichment and notification because we have a service where repository managers could be notified if a richer metadata is available to be added to their repositories then we also use Pids for resource linking and also for resource metrics and for tracking aggregated usage statistics we have guidelines for data providers so originally we started with guidelines for compatible repositories with publications then we had guidelines for data repositories and for create systems and we're also working on guidelines for software repositories so we sort of started from looking into publications and we're expanding open science areas and our guidelines help to improve interoperability of metadata information exchange we also hope that they support fair data principle and of course we reuse existing standards sir and this is where we extend vocabularies when necessary for example for different PID types and these are some of the examples of Pids that open-air uses for different entities so for example this is what we use for the resource identifier so we extend data sites list of PID identifiers types we also have Pids for alternative identifiers of the resource sir and you can see them listed here on this slide and we have Pids for related identifiers to the resource sir this identifies that are related to the registered resource and they list it on this slide and then of course we have Pids for authors and contributors and they are kind of standards standard ones and Pids for funders and project grants cross-ref funders and I and agreed and this is an example of information displayed on the open-air search portal so you can see a dataset and it's linked to a dataset deposited in Dan's data repository and it's also linked to an article where this dataset was used some issues and next steps sir and that's where open-air is working with relevant RTA working groups and also collaborating with Freya project of course we want to improve the discovery and integration of community specific identifiers and metadata and we also want to improve the integration of DOI Russian in supported by the nodder which is a repository that open-air and CERN manage so that was sir a quick introduction and I'll stop sharing my screen and it's time for your questions to Kettle Amir or Eliana and Ellen from Dancer who are here with us from Freya and open-air project as well so here I stopped screen sharing any questions I don't see any coming up it ties the means that sir we've been all very clear or I don't know thank you so the first question how can we follow your progress in the future I can answer that for Freya at least we have our webpage so there are all the reports that we are producing available we have a list of activities also if you want the direct day-to-day or not day-to-day but week-to-week interaction with us on Twitter is a good opportunity to connect with Freya where you will get tweets about our upcoming activities where we will be presented who you can meet and our information like that okay and I can about the research graph I would probably say the best person so the best place always to meet everyone and get involved is a research data alliance apart from this are also the baby nurse and the website and all other contact points but again as I said that usually every RDA we either have something new or something interesting happening so that's probably the best place to get together also I just noticed we have a couple of other questions on the Q&A board so there is one question for me is about saying something with the links between peds and so it's from alien I don't know exactly what you're asking I can't say about I don't know it's about this you're asking about the statistics that how many peds are connected or if the peds are actually or they're using the connections oh okay yes around the citation and funding so there is one thing is two things one is when it gets to peds that we use peds in two different contexts one is that there is a metadata there is a relationship in the metadata of the peds data site for example has a very good corpus of these kind of connections and api's and the space is very good orchid also is the same in same category and cross if it started to explore the concept of the event data which provides something similar so the the data provider tells you that these two peds are linked so that's one scenario the second scenario is that that's where you get things like the cited by and funded by and so forth and there is the another scenario is whether either repository or a university research management system or a library system that in data space a librarian said this doi is linked to this orchid so in this case you have another authority source that actually link these two different peds together so of course in this scenario the the accuracy or the type of provenance is different around the links but it still is much better what is more trusted connection than something that just drives by a iron machine learning so there's a there are two different type of connections from pete but don't know exactly if that was the question but i'll try my best there was also a question from biliana could you explain a little bit more about current status of pete for organization i think probably careful is the best person to answer that and i'm sorry i must pass on that one too i am that this is not the area that i specialize in so i'm not going to say anything that i i don't know exactly about so i'll pass on that one but contact me and i can put you in contact with the right people and then there's a question from quentin is there someone specifically to discuss historical persona yes within fryer that is something that we have that we have discussed and then we have ongoing discussion so i also suggest here that you to contact me and then i'll put you in contact with the people who are concerned with with personal persona this is i must say a very interesting um entity in terms of assigning a pete and i think it's it's challenging difficult but but definitely needed and interesting and the rest two questions from mariel is anyone walking with wiki data who also have lots of pete for people artifacts locations etc useful and relevant to research and researchers sir so i know that open air collaborates sir a bit with wiki data but unfortunately i uh yes i can also comment on that they just started exploring some of that information so that's one of the items uh one of the kind of engagements that research graph explores and for fryer it's on the radar but it's uh i am not familiar with the current status of our involvement and then also mariel asked anything ambassadors can do for you to help you share progress knowledge uh i think i think elaine might be uh the right one to answer that if she would unmute yes hello everybody this is eliane um i'm from the communications team of fryer and it was also co-organizing this webinar today um yes we are definitely looking for new ambassadors we have an investor program if you go on our website you cannot find all information there we do involve our ambassadors in webinars regularly so we are happy to get information from you so if you are not an ambassador yet please go on the website and have a look thank you uh then there was a question would you recommend implementing this new pizza in an institutional repository now or in the near future what is the time frame until we could use this new pizza effectively yeah that that is a very broad and a very good question and of course it's um it's individual for for each of the pits so it but if you want to have an idea about the time frame i i suggest that you you tune in to the report that's coming out of our analysis of these nine pits that i presented earlier today and and there you'll get an idea about the current status of the development of these are emerging and first in some cases also a bit immature pids and it'll get you an idea about when something could potentially come out of but i mean it's hard to say because it's not like there's a deadline three years down the line and we can say well for instrument pits in three years everything is up and running it's a it's a it's a development in progress and that's always how it is but we're definitely talking uh years in in this case here but i mean in fryer as we we can work on different levels so you heard a mere here from a technical point of view there are some universal technological developments that get that is in progress and will help implement the pid services and on a larger scale and then you saw the presentation that i did uh just from the little last scenario from pangu where we implemented ideas ends so you can say the implementation can go on and at different levels but of course what we want to see is uh something that's happening more generally that can be um taken up and used by the larger community especially in terms of technological developments and i can comment about pids for instruments there is sir i don't remember is there a walking group or an interest group in rda on this topic so perhaps if you are interested in instruments that's a place for you to to check and maybe join some community community discussions muriel said that he's already an ambassador and is there anything he could do apart from sharing uh knowledge with his teams and colleagues sir maybe phrase interested in more user studies sir at sydney or similar ship sorry sorry i don't know you take this one please yeah you are muted is this a question for uh university sydney i can just add something here uh muriel i just remember that the university sydney has been one of the participants in the ddi working group way back about 2014 2015 and there is some ongoing work in the research office in this space uh i don't know about the exact user stories or use cases how it can work but there are some connections there to explore and there are some possibility some capability in place to actually connect library systems to this kind of graph modeling technology at the university of sydney and i would say in regard to the user stories then i mean we are definitely interested in hearing about good user stories who can illustrate what exactly it is that that we are doing in connecting and paving the way for new pid services so let let us know if you have interesting user stories that we can put on our agenda and that we can move forward i think that would be also very useful for fryer thank you and i think uh we once told the questions received so far uh thanks a lot for joining this webinar and apologies that we went a bit over time but i think it was was it there thanks a lot k2 lamir eliana ellen for organizing this webinar like i said the presentation is already on the webinar page we would also make recording available on the webinar page and we will also upload it to open a youtube channel um thank you all and uh have a good rest of the day and a good weekend and happy 2019 thank you bye thanks a lot bye thank you very much