 Πώς να σας δούμε την προσφασία της ΕΕΕΕΕΛΑΚΛΑΚΙΣ, Είστε σε μια οικονομική ευκαιρία. Θα αφήσω να καινούρας το παιδί μου, σε αυτή την εξακτάσταση του μηνύχου του Σύρφου Αυρίου, όσο το επίπεδο της Ευρωπή Νακτήρου Αυριό. Ο επίπεδο της Ευρωπή Νακτήρου Αυρίου είναι να δώσουμε από την Ευρωπή Νακτήρου Αυρίου για να πούμε στις συγχώρεις τέχνικες και να φανταξούν τα εσάρια σου σχέσης για τα τέχνικα ή στις στις στις σχέσεις. Το name είναι Ερίνη Καραχρήστου. Σήμερα θα μιλήσουμε την συμφωνία, για να συμφωνήσουμε την παρακολουσία within our timeline. Είδαμε τη σκόπω της συμφωνίας μας, που θα συμφωνήσουμε με την παρακολουσία της ΜΕΣΙΣΙΣ και Ναταλία Μανολά, η�εδιότητα της ΟΗΤΩΕΣΕΣ, η Θυνα στον ΕΗΑΡΑΣΟΑΣ, θα μην Χάσουμε την ΕΟΑΖΑ ΤΟΡΑΙΣ, καλύτερα σχέση τρεις. Σ difícil τρεις, θα αποκομμάζουμε την προσταθία της παρακολουσίας, και μέχρι αυτών των παρακολουσίαθεν, θα ακούουμε και ανάπτυξεςδελίες, να επιβλώσουμε η δημιλησία του ΕΕΙΑΡΑΥ. Αννοώ, όμως, για να αποκλείψω αυτή η σημερινή, θα σας δώσω μια αδερφή από τα πιο σημερινή σημερινή σημερινή αδερφή για τέντρες. Η συμμερινή θα είναι έτοιμη με τη σημερινή σημερινή συμμερινή, όπου εσείς αντιμετωπίζει την ευκαιρία, μπορεί να αντιμετωπίζει τις εύκολες σημερινές στιγμή και στιγμή στιγμή. Αν δεν έχεις ήδη κάποιες σημερινές, θα δώσω τη σημερινή για η Άνταλία Μανολά. Ευχαριστούμε. Μπορεί να σας δώσω τη σημερινή. Ευχαριστούμε για την εμπειρία μου. Δεν θα δώσω πολύ στρατηγικά στήματα. Νομίζω ότι πολλοί της κομμάτους που κομμάται, και ποιος θέλουμε να πάμε. Θα δώσω μία τρία ή τρία πόντες. Λοιπόν, όπως όλοι ξέρουν το Άνταλία Μανολά, είναι ένα πρόγραμμα, το Άνταλία Πρόγραμμα της Ευρωπαϊκής Ευρώπης. Νομίζω ότι είναι η τρία ή τρία πόντες, στις 2009 που ξεκινήσαμε. Και αυτή είναι η στιγμή, τα στιγμή, είναι τα τρία πόντες. Πιστεύω ότι πρέπει να δημιουργήσει, να δημιουργήσει, the National Open Access Desks network that we have, which is a network in every European country, a representative organization in every European country, fostering open access to publications, data, other research works and open science in general. The second strategic priority is training, how to build the training, a training infrastructure in Europe από cold to fostered open science and training is around many issues. It's training about policies, training is about research data management, it's about open access to publications, it's all about it. And then third is the services, this is where we come in. So from the start and from the beginning of open air, we have a service oriented infrastructure. Δ Rest Explain We have a list of services that you can browse through in catalogue.openair.eu you can reach it through our side and we have different services. Services that are addressing content providers so for example we have services for repositories in open access journals how to register and validate the metadata schema in OpenER and be visible to you up in the world Είμαστε δημιουργές για πόρτα των προβλήτων, για να αλλάξουν μεταδέτα με το μπροκρό σύστημα. Είμαστε δημιουργές για οι δημιουργές, που είναι πιο δυνατό, δημιουργές για την ΣΥΜΑΡΙΑ, πιστεύουμε αυνηγία, που είναι ο δημιουργός της εξοδογής. Τώρα, εξοδογές στιγμούν αργώσεις, που είναι η δημιουργή της εξοδογής. Είμαστε δημιουργές, που χρησιμοποιηθούν αυτή η εξοδογή, όπως οι σχολογές, οι σχολογές, που είναι η δημιουργία της δημιουργίας. Και πιστεύω, που θα μεταδειδημιούν, αλλά αυτή, φυσικά, η δημιουργή που θα μιλάμε, είναι η δημιουργή της εξοδογής. Αν we are trying to build the scholarly communication graph with participants around the world, with collaborators around the world. So as you have seen, I think maybe for the second time is that we aim, as an infrastructure, we cover from the layer from the universities to the value-added services for research communities, for funders, for everyone who wants to do something in scholarly communication. So our three challenges is about how to empower the challenge number one is next generation repositories, is how can we have services that we cannot do ourselves, but we can empower repository platforms like ePrinz, do this space, now Lyosys, and other repository platforms in order for them to come to the next level, what we call the next generation repositories. There is a core listing of new functionalities that we would like to delve into. Then the second topic is value-added products for open science, and I think this is based mainly on the graph that we are producing. How can somebody take a graph and produce value-added services for various stakeholders, and this could be from researchers. Now, for example, how can I get my CV based on that to funders proper monitoring, to research communities, how to publish, and then the third challenge is the enhancement of current services and technology of OpenAir, because of OpenAir is a data infrastructure, a big data infrastructure. We take millions of contents around the world. We met the data and also data or text. We text data mine, we duplicate, we clean, we use API, so there is a lot of machinery on the back of OpenAir, and services that would make it more robust or more performant would be very welcome. All of this, I have to say, is in the remit of the European OpenScience Cloud that many of you have heard. So OpenAir is a pillar of the European OpenScience Cloud. The word I'm trying is strengthening the OpenScience. This is the world that I was looking for, strengthening OpenScience and working with research infrastructures, big data or small research infrastructures, how to make this publishing workflow start from day one and then how to make data discovery for them or how to make data more visible for them. And this is because this is a European infrastructure and in the course of the European projects we are not able to fund as a means for particular tasks because we mean that we give a preference to particular SMEs or particular solutions. This is why we have the open calls because we want all of you to come and work with us or and see what you can offer. And this is I think our aim. Thank you Natalia for this fruitful presentation of the OpenAir Advanced Long Term Strategy, Scope and Activities. Are there any questions at this point? Okay. If not, then I think that we can proceed with topic one Next Generation Repositories. Johan, if you are ready, I will share the presentation that you sent me earlier with the participants. Is it clear? You can tell me when you want me to proceed to the next slide. Thank you. Good morning. My name is Johan Schilwagen from Bielefeld University part of the technical team in OpenAir and among other tasks responsible for work package called Towards the Scholarly Commons and part of this work package is on Next Generation Repositories. However, the idea of Next Generation Repositories was born in the context of the core organization of Open Access Repositories. A working group was initiated in 2016 and this was done for a couple of reasons on the next slide. For instance, the crisis of the institutional repositories. Some aspects are of course also valid for catch-all or disciplinary repositories but sometimes some aspects are also different. So here are two examples already three years ago published that criticizes institutional repositories regarding pure usability for instance because there is a very heterogeneous landscape that lack of agreement on common standards and this makes it often a challenge to aggregate not only the metadata but also the digital objects in a high quality way. Then another aspect often criticized now is the protocol or IPMH to collect metadata records while this promising approach by the beginning of the 2000 years as a common denominator interchange protocol for repositories it is now no longer up to date due to other more modern web standards and to issues regarding lack of support of other formats than XML because of performance issues content application and so on. Moreover the repositories in general have a difficult rule on the one side they are the backbone of open access community of open access infrastructures like openair they are used to fulfill several open access mandates by funders most recently the plan S for instance however they are not well acknowledged by researchers especially institutional repositories because they have not the same priority like the paper in the scientific journal and usually a repository is not a virtual place where researchers would meet going to the next slide there are some aspects that need to be improved one of them is a limitation and functionalities repositories are often acting as data silos there are only a few global aggregators like openair like core like the referencia but mainly the issues are a lack of social interaction functionalities and issues in regarding technical interoperability on the next slide we will see that now the core next generation repositories vision comes into place which asks to position repositories as a foundation for distributed globally networked infrastructure for scholarly communication and here it is important to build value added services on top of these repository infrastructures another aspect is also that in the past repositories were seen as the object where the interaction would take place however more importantly is to make the resource that is deposited in the repository as the priority entity in the web to make it interactive to machines on the one side but also to researchers and to the public on the next slide there are some challenges mentioned one set is to build value added services but also to better integrate repositories in the research life cycle and to build innovative scholarly services on top of repositories and also the operation aspect is important to make sure that repositories are collectively managed by scholarly communities and not by a few monopolies in the world improving repository functionality means to make the resources meaning the digital objects an integral part of the web very important here is also to standardize the data exchange on a global scale and to by adoption of modern web standards and protocols and in the end it's important not only to support literature or research data or software but any kind of digital research results in repositories the next slide identifies several behaviors of next generation repositories which are recommended by these working group in core these behaviors are a result of collecting several use cases and also by collecting feedback from the repository community I don't want to go through all these 11 behaviors hear me I'm sorry I had a problem with my connection I couldn't hear you ok I'm sorry I think that we maybe solved it ok I'm still in this slide ok so how does you could hear everything sorry it was maybe the internet no problem so in a project like open air at once we cannot cover of course all the mentioned behaviors so we selected a couple of them concentrated on resource discovery and better content transfer open metrics and annotation of content on the next slide and with regard to these call for tenders could you move to the next slide please we propose several activities of course this list is not complete just a few ideas meaning that we would suggest more software platforms that support modern standards like resource sharing and signposting which of course make sense to implement in platforms that are mainly used in the repository world which is for instance Dispace 5 and Dispace 6 also ePrince and also several derivates of Fedora for instance we also suggest to build on the current open air user statistics service in open air with regard to more innovative visualization techniques which would allow a better analysis of usage of resources and from the data sources on the next slide I will suggest a little bit more detail some use cases so starting with the issue that for instance open air can collect bibliographic metadata records from over to a thousand 500 data sources on the other side the issue since the beginning is to uniquely identify the full text in the metadata records and here the resource sync could come into play and a scenario is visualized in the figure implemented by CORE in the UK using resource sync which collects metadata and the full text from publishers and makes these links available for synchronization with other services and open air is already using it by implementing a resource sync client and collecting metadata and full text from publishers open access publishers via CORE the next slide suggests the implementation and scenarios based on signposting to discover and navigate content on landing pages by head of type links in HTTP link headers so this is a rather I would say low level approach where for instance the HTTP header of a web resource is analyzed and the links contained in the header are evaluated by help of their relation and their type so for instance I get immediately information how to cite these particular resource I get information on citation formats like Biptash information in which format I could access the resource let it be a zip file or let it be a specific format like pdf or html and I get information of identifiers related to the authors that created the resource another example concerns the user statistics open air already user events and counter reports from over 150 data sources provides simple visualization on the repository level and on the item level in the open air portal however it would be interesting to build more kinds of visualization based on user statistics and provide filters that are built on different aggregation levels using different parameters like on the repository level, on the item level but also user statistics that can be compared by topic discipline do graphical region and also interesting would be to analyze user statistics in relation with other kinds of centimetric indicators the last example is the annotation of content where now already since two years web standard is defined the WF3C web annotation standard and this is an interesting idea that could improve implementations for open peer review where a reviewer is able to annotate the content that was created by another person and to make these annotations visible using well established standards coming to the selection criteria regarding this topic it's entirely important to align with the existing open-air interoperability standards namely the open-air guidelines for content providers which exist for different types of sources like repositories, journals chris systems but also for different kinds of resource types like textual publications research data software and other research products another selection criteria is the level of innovation so it would be interesting not only to see just the implementation of one of the suggested protocols but also concrete demonstrations or prototypes that showcase how these new standards can be used to improve behavior for repositories and finally improve these scholarly communication and of course the proposals should also have an impact on the existing open-air services and last but not least on its stakeholders ok thank you thank you very much Johan for this presentation and I'm sorry for the earlier inconvenience due to the internet connection there is one question if the presentations will be available after the webinar I believe that our presenters will be willing to share their presentations would it be feasible Johan? Yes I think this is the idea ok thank you and ok let's proceed with the presentations of Paolo for topics 2 and 3 and then the participants may gather their questions in total and maybe we can discuss the different issues Paolo you will have 30 minutes to present topics 2 and 3 and I can make you a presenter if you want to share with us your screen Yes please Just give me one minute Can you see it? Yes I can see it So good morning everybody So in this presentation I have to go through a long list of services that we're building in open-air So since we don't have much time I preferred I opted for an in-depth description on what we call the open-air research graph and then a more light weight let's say level of detail with respect to other services that we build on top of the graph So first of all let me try let me start with a general introduction of what we're doing today in open-air So technically speaking we are building a large collection of metadata records describing all let's say kind of entities you can find in the scholarly communication record So ranging from metadata about publications, data software etc to metadata of entities that you would rather use to monitor research impact or provides scientific reward to your institutions and scientists such as affiliation information so organization funder information project information link to funder also the data sources from which we collect the information So we build basically a graph what we call the open-air research graph So the first thing that we do in that direction is to collect from external data sources We strongly rely on the repositories that are today providing storage and the position and persistence to all the results of science as Jochen was suggesting So those repositories are out there today they already contain metadata and files regarding these objects and other sources out there are already improving content from repositories by aggregating them, harmonizing them provided links where these were not available enriching them with information originally available and therefore providing use cases and applications that are specific to certain contexts For example, Unpaywall collects Crossref and identifies the open access links to open access versions of the articles in Crossref and that's a very useful operation to do So what we do in open-air is to collect all such sources We provide also guidelines as Jochen mentioned These guidelines are basically instructions for data source managers to export metadata in a homogeneous as much as possible homogeneous way to simplify the process of collection and aggregation These guidelines are not available for all kind of sources Of course, in several cases we need to take on board on our shooter the harmonization process So to give you an example we collect for example from Market, DataSide we collect from open access publishers from aggregators of open access publishers like DOAJ and so on As a specific use case we also collect from research infrastructure sources and this is quite new with respect to the general context of scholarly communication and that's because we believe that there are many data sources that live today in the context of specific scientific research infrastructures for example Partinos, Epos etc that contain objects that are useful for science both in terms of reproducibility of science itself for example the workflows used to perform a given experiment and in terms of attribution of the scientists who've been producing them because of course we need to attribute these products to the scientists who have been part of their effort and their skills So when we collect these metadata records we build a graph whose high level data model is depicted here You see the product is at the center and we have different classes of products which are publications intended as literature stuff so objects digital objects that are intended to be read by humans data which is basically collections of objects that are intended to be processed by software or it must be observed by humans to represent facts and observations and software which is instead the code that we either compile or interpret to execute a process over the data and then we have this orp orp stays for other research products Basically any kind of research project that doesn't fall in the first three categories which can be easily recognized and identified across different disciplines while orp orps tend to be very specific to the different disciplines and this is where we ask our scientists to let's say classify their objects For each product but in fact for each object in the graph we track the original source from which we collected the metadata and we can link it to projects, funding, organizations and funder when the information exists within the original metadata As you know if you're part of this domain this is the case and it's becoming the case when links between publications and research data are provided but it's not necessarily the case because we know the other scenarios for example the links to the projects are not often available the links to the organizations are generally not provided and links between publications and software data and software are not generally available So this is why we manipulate this metadata in several ways On the one side we clean and validate we validate the original sources to check if there is respect to the guidelines On the other side we clean it to compensate these lack of compliance to our guidelines so harmonize our content where these exist We perform the duplication which means that if we find records describing the same digital object but coming from different sources we identify these two records and we blend them to get keeping track of course of both the original sources or more than two we have products whose metadata can be collected from tens of sources and we also perform inferences so we are when possible collecting the original file when possible means when the URL is correctly provided or we can infer it and the file we collect the PDF so we have around 12 million PDFs today on top of which we run mining algorithms to detect missing links most of the time missing links between these objects especially projects products or product software etc On top of this graph we build and we offer a number of applications these applications should be interpreted by looking at the graph so different views in some cases we want to use the graph to monitor how open science is being performed in one community or another community or with respect to a funder for example point of view or with respect to an organization in other cases instead we want to offer discovery mechanisms to scientists of a specific context for example so to view this graph from the perspective of their own science so the projects related with the science the products related with the science the links that are concerned and related with these products and so on so basically the system is divided in five parts bottom left you see the aggregation the whole aggregation system through which we collect the metadata records and the files when these are available these metadata records are clenched here and then they are used to generate what we call a native graph so basically these records are unwrapped and a native graph is built out of them we are collecting today 500 millions records roughly and around 1 million links directed links so if you can see by lateral by directional it's about 480 millions so we build a graph out of that and as you can see at the top we can slice the graph so we take it from the different entities point of view so we can take for example the publications or the data sets or the organizations and we perform the duplication as the feedback of this action and we then merge the equivalent objects and build a deduced graph and this deduced graph can be finally enriched with the results of mining which are bottom right the results of mining take a copy of the deduced graph populate the full text cache where we run the algorithms and as a result of this we obtain a number of updates to the graph which we store into action sets allow us to structure the way the inferred information is produced so we have an action set for links between publications and links an action set for the abstracts that we have found etc and so we enrace the graph accordingly once we finalize the graph so we build the enrace version then we publish it and we publish it via different backends these backends range from linked open data to a full text index to dumps to OEIPMH endpoint the index is then of course serving our applications which are connect, explore monitor etc and can be used by others to build their own applications and services so all this is performed using several kinds of open source technologies we we use Spark Hadoop to produce this graph and manipulate it but I mean standard technologies have been used like MongoDB to store the metadata and especially DINET DINET is a framework that is not mentioned here that we have built internally to manage workflows so all the workflows you've seen 80% performed automatically by the system then of course you need humans to verify that at the end everything went fine for example the fine monitoring all the indexing procedure was completed properly it has to be double checked by a human resources we run all this into we have beta and production organized like system and in production we have a public system what we call a public system which is the place where we store double numbers here the place where we store and provide the portals the mining system where of course as it self-explanatory claims where we run our algorithms mining algorithms and the data provision is where we keep the graph and we build it so we enrich it we duplicate it and so on I'm sorry about the double numbers here you will have the slides these were just to highlight and increase it's the slides that highlighted and increase in the last year of the resources and somehow screwed so we have around 40 people working by different locations on this system and on different aspects of it ranging from the UIs to the mining algorithms the cluster and the organization of the mining algorithms themselves which is separate from it so how to for example execute mining algorithms and plug in my mining algorithms in our system to the operation and data creation aspects etc so you can imagine how complex it is so the open air resource graph the way we built it I mentioned it before so we collect through what we call provide it's a it's a system that allows us to validate register the sources, collect the content, provide statistics about them and so on we enrich a graph thanks to end user feedback so end users can come through our portal and portals and add links to the graphs to make it better for example so they can link products to projects, product to communities etc we do the duplication and the mining as mentioned before and in the end we publish via the applications the open air resource graph is therefore at the center so we want to provide this open metadata collection, we want it to be open that's key, very important in some cases it can also be CC0 in some cases it cannot because we collect from sources that are not CC0 like Microsoft academic graphs for example we want you to be complete as possible meaning that we want all trusted data sources to be in trusted by the community of course so sources that they consider reliable for them, for their science we want it to be the duplicated we want it to be transparent which means we want to track down where each piece of information comes from so if we mine for example a subject and we infer a subject out of a full text we want to know that this subject was inferred wasn't collected from a repository and we want also to know the level of trust of every single piece of information so we identified an internal framework that classifies trust of information from 0 to 1 and we require all our algorithms to produce a level of intended trust and we assign it to every bit so you can actually view the graph from different trust perspectives so you can see for example all objects and interlinked objects with a given level of trust or below or beyond etc we want you to be decentralized and that's the idea that this graph is today surviving in open air but we want this information we want the information enriching the data sources out there to be stored in the data sources out there so we provide tools for that to happen we call it the broker service so through the broker service the data source provides us content and can actually collect back all the content that we have found that enriches the original records and they can subscribe and be notified of different kinds of enrichments this is actually very important for us because open air we hope it will exist for ages and provide this service but if at some point we want this precious information that we found thanks to this participatory effort to be stored forever that's very important ok this is just a glimpse of some of the 10,000 sources that we are collecting from but consider we from Github, software heritage to all the registries like greed, referee data, open door orchid to the thematic repositories the catch all repositories like Zenodo be to share dryad fig share and so on to the research infrastructures which are top right in this screen ok so the harvesting and transformation workflows the idea is that we have services that are organized into workflows workflows are made of sub workflows that's the framework that we built and we split it in collection phases, transformation phases so for us it's actually very important to track down every single step in this process so track down when this was performed so when we have collected when it was done, if it failed which are the records that were collected and the same on the transformation side we want to know which are the clean record how many of those failed to clean and so on so for us it's very important to track down this process key also for example to define the transformation mappings so we need to have the native sources under control to check if all the vocabularies they're used which are highly heterogeneous find and map them into our common model through the transformation for example so here there are several things that we are doing we are moving from the XML to the JSON framework that's an unfortunate heritage that we have because XML was the core language into the OEI-PMH and that's where we stood for a long time so we are moving from XSLT to JSON and XML to JSON at the same time we need to monitor the data and quality expectations across the sources within sources so for us again it's important to track provenance and to do something nice with it for example identifying statistics verify that for example overtime a data source is increasingly improving in quality or is increasing in the number of records because if at some point we realize that there's half of the records there some errors may have a chord and that's a problem for us which are building an aggregation so we need to keep things under control we do a fine-grained classification of the resource products so when we collect from a repository or a data source in general we don't consider it to be a container of objects of the same class like old publications, old data sets, old software we consider them hybrid so we do these classifications into our common model with a fine-grained methodology so we check the original resource type and we find every time a mapping into one of the meta classes we have in OpenAir publication data software and other resource products and a possible subclass if it exists so important for us is that we are collecting from 10,000 sources but some of them we pre-aggregate them especially the largest one so we have two pre-processed sources which we restrongly rely on one is Skoll Explorer and the other one is DOI Boost Skoll Explorer is a collection of article-data set links that we collect from data site Crossref and EmboleBI this means that we are collecting basically everything that is a trusted link between dataset and articles out there we are collecting today overall 480 million links by direction as I mentioned before we do the same with respect to Crossref Crossref is a key source and we want to have its richest version so we build what is called DOI Boost it's basically an intersection between Crossref, Embole, Microsoft Academics and Orkid as a result of this we have 85 million publication records complete of all possible orchid IDs all possible open access versions all possible links to affiliations abstracts and subjects and so on both collections are published as dumps in Zenodo so you can download them you can find them there for us it's actually important to build tools to generate and maintain these dumps over time out of the graph or out of our aggregation system this is something we are working on and again maybe a valuable set of ideas for you so versions should be kept and maintained over time should be produced as much as possible incrementally and so on mining we use ACFS and Spark we have around 13 million full text 12 to 13 millions depending on your collections that will be included and we use Java a Python framework here so you are very welcome to write this kind of algorithms taken into account you will access in parallel full text related with the article and you'll be able to apply algorithms on top of that according to these two programming languages here you have a list of ideas that you can take inspiration from so every time you will find one of those grey boxes basically you may have a few hints and then of course try to build on top of it your own concept important important to know is that we apply context propagation in the graph so we exploit the links to propagate information from objects to objects so for example for you if you take a look at the top if I know that a product was funded by a project and I know that the product was supplemented by another product for example the publications to a data set then I can easily propagate the link to the project to the second product this means that when I will do my research impact from the project point of view I will visualize a product I wasn't aware of which is the research data in this case for example and this will give me extra content of course I would like the data source that contains this second product to include the link to the project and that's as I mentioned before another important aspect in our mission so these are examples of propagation like country and community can be propagated across different products for the duplications we have two articles you can find the references here and there are several things several actions that we may take like improving results by adding context which context that can be inferred from the web for example so any ideas in this direction are welcome so today you go to our explore.openair.eu you will find the current production system which includes only the open access subset of the graph that I mentioned if instead go to the beta. explore.openair.eu you will find a full graph which will be open soon for open consultation these are the numbers inside so any feedback will be welcome how to access services so we have developed develop.openair.eu from there you will be able to download the data or query the graph according to different protocols we are in the process of publishing the whole dump of the beta graph so in the next month they should be available these are quite standard ways of accessing services so you should be should we find it easy please refer to us DOIboost is what I mentioned before you can find it in the Zenodo online and the same for skull explorer these slides are here for your benefit the graph will be opened for consultation in November and be released at the end of December probably beginning of July so you will find Trello as a tool to provide us feedback with we will open it probably next week this week next week the problem of identifying errors and inconsistencies in the graph is actually key so if you have in mind strategies to do that in an automated way because with Trello of course we are collecting feedback from users but one of our ideas would also be the one of analyzing the graph and automatically identifying issues for example and then provide users with prompting the users with questions like is this really an issue or not so every strategy or technique that you have in mind that can be easily used in this context are more than welcome now Natalia mentioned already our standalone services and I go quickly through this skull explorer is certainly one of those we are collecting links using a standard that is called skullics that we have devised together with other partners in worldwide including data site and severe and so on skullics is basically a simple data model and exchange format that allows to express links between publications and data sets data sets and so on so today we are collecting a huge amount of links from the sources that you see below plus others like data centers that are not included in data site but are compliant to skullics for example and we're offering to a number of consumers via APIs the APIs are open our main consumer scope is in fact and of course also the web user interface but that's really less useful is more for checking that something exists in there we go up to an average of 14Mhz per month so it's pretty big Zenodo was mentioned but it's actually a catchall repository so you can store any kind of product in there but the cool thing about Zenodo is its APIs so its APIs are open so as long as you have a token you have a user you can make sure that your service is bridged to Zenodo and published products on demand so they take a DUI, you can provide them into data and so on but there are clever applications that do actions such as publishing on behalf of the scientist can be built thanks to Zenodo so it's very important for you to know Argos while skipped on that you can find information on the internet but anyways the machine action of bold data management planning is powered by an open source an open source software called OpenDMP that we developed and the concept is that of machine actionable data management so if you want to have more information you can contact me here Amnesia is a service for the anonymization of data sensitive data which we are using in several contexts you can download it and use it on your site now let's go to this high level view we have the open air resource graph and as I mentioned before we provide what we call dashboards on top so monitor and connect and then explore monitor is really about using the graph to monitor research impact trends for example open science trends or funding trends, institutional trends and so on connect is instead more about sharing so here is where scientific community finds the tool to discover and access all scientific outcome in a specific discipline and domain so here what is very welcome are tools on the side of monitor tools to better perform monitoring operations or to use the graph for example to recommend something to the research administrators for example project coordinators and so on on the side of connect it's actually very important to erase the graph for example to improve the quality and the precision and recall of searches or device tools for example to search in a way that is let's say intent driven or discipline specific and so on so semantic search concepts and so on provide is the first of this dashboard is the one that we use to register services allow their validation and send notifications to the original sources like for example we found an enrichment for your data and so on one of the most important service we provide is the broker here and as I mentioned before is the simple idea that since we are collecting metadata from sources and we put them in a graph in the context of the graph we can interlink these original records or clean them or enrich them thanks to our tooling and therefore it's pretty easy for us since we keep prominence of every single record to track down what's new with respect to the record and who provided it so these new bits can be sent back to the original sources so original sources can subscribe to the open air broker and say please give me all the new open access URLs you find about my records or give me the orchid IDs of the users that I have or give me the DOI of my records if you find one and so on the user statistic service instead is if you're familiar with these kind of aspects is the implementation of a service that aggregates user statistics about the individual objects, publications in this case across repositories so repositories participating to this framework and scheme are capable of tracking down events regarding for example the download or the view of a record of a given publication and send this information, these events to this original aggregator of events the user statistic service that can aggregate this information and provide a unified view so you can actually see how many time an object has been accessed across different objects across different sources in one source or the sum of all of those so these these numbers are typically used to as indicators of quality of given articles or for example the different sources so how much they've been used compared to others let's go to again to the broker so topics topics are produced for specific data sources so whenever OpenAir produces a topic it's about a target repository about a specific object or a target repository the topic is of course assigned it gives a message for example I haven't reached with an abstract this record and includes a level of trust so the original source can actually subscribe to a kind of topic and say give me all topics that have a given level of trust for example and whose message respect certain criteria so this is kind of configurations that we can provide we have three kinds of events enrichments, additions and alerts we are only producing enrichments today enrichments are again as I mentioned before ways to complete the original records with information that wasn't present in the beginning additions are not yet produced but of course ways of doing it are welcome so we want to tell a data source when we have found a record that should be in the data source but is not present today so there are ways of doing this deducing it from authors if you can identify a sort of strong relationship between an author and a repository like it's repository of reference if you can infer it from the graph itself or from a statement that you can find somewhere online for example from the affiliations etc alerts are more of the kind this record has a mistake and there are several ways of producing them we have some on our roadmap for example through the OEIPMH data collection if we find an XML error we can actually point the original repository to that as a notification with a notification or if a user comes and tells us that a given link is wrong then we can of course or a given field is wrong we can of course alert the original source this is on the roadmap we haven't yet produced it so this is just a page that exemplifies how an original record that we collected from CLO in this case no sorry from La Referencia which is the south is the Latin America aggregator is enriched on the side of via through several pieces of information ranging from different ways from this can be downloaded from the funding the subjects related research results and so on these are just examples metrics we use sushi as a way as a standard open source software in fact to collect information we had to change a few of the aspects there so more information can be found on the open air website and you can ask us directly but anyway these are the key elements and you can find information again on the website on how to hook a data source to our user statistic framework so here are the links ok finally just a few words on what we have built as connect sorry I'm a bit out of time but it's going to take another four minutes out of the graph we can view the graph basically through the eye of a given community with scientific knowledge so we built this service that is called research community dashboard through which on demand we can produce what we call a community gateway the community gateway looks like the applications that you see top left is basically a view over the graph and this view can be configured fine tuned by the community manager which is an authorized user so the girl there can configure the criteria for the gateway of the different objects of the graph and we have several community gateways that have been generated today others may be generated of course in the future the idea is that you can specify the subjects of pertinence for example of objects so objects with a given subject can belong to the gateway the provenance so if the object comes from a source or if the object comes from a given community is another like collections often defined by scientists and communities the projects that are relative to a community so all objects related with these projects will be included in the gateway and we also propagate the notion of link to a community via the relationships as I mentioned before and here are two examples so this is again to highlight the fact that you can enhance the graph or suggest ways to make this criteria better or let's say adaptable much more better adaptable to the communities your criteria can be via orchids or others again monitor instead is a way to look at the graph from the perspective that I mentioned here so from the funding impact point of view ability to attract funds forms of open science impact which can range from fairness fairness of software, fairness of data fairness of whatever you want to be there so any suggestions in this direction are very welcome like new measures, new indicators that can be calculated or new ways to enrich the monitoring tools that we have to better capture the quality, the trend the ability to stick to open access mandates for example in the context of open air here are just examples as I mentioned before and you can find more and more suggestions here finally well this is the discovery portal as you can see as I mentioned before this is the production but you can find the beta service as well with the extra content that I just mentioned and that will be open for consultation so thank you happy to reply to your questions now and in the next days so if you want to send me an email if you want to have more information on any of the services that I presented you're welcome as you can see there are so many so it's really hard to have an overview in 40 minutes ok thank you very much Paolo for these detailed presentations there are already some questions from the participants so I can transfer them to you the first one have you got registries for all kinds of objects product, funder etc yes we include well of course we include the registries that we believe are interesting for us in this case ORKID for sure grid.ac for organizations so we have what else the registries of the data sources for us are very important open door and with the data because we don't want to build yet another one unless it's very very necessary because this is key we have an idea of building a new registry for data sources if the current registries for example won't be able to stand to our requirements yes in general these are the registries that we are considering today ORKID of course ok thank you second one how do you support consistency of the data in registries are we relying on registries so they should be in charge of consistencies when we collect from different registries and this is the case in several cases we deduplicate so we build we build internal let's say bridges between the different registries this is of course an issue so in some cases we are taking specific actions it's an issue because every time we run a deduplication process we introduce a level of doubt on what we are doing and this should be itself curated so it's a sort of extra registry on top so we can't afford that for everything of course we are in the process today to develop one for organizations because we really need to have our organization information stable and stabilized in the context of open air if we want to provide for example statistics about them so once we produce once we deduplicate this information which we collect from what they see but we collect it from every funder has its own identifier for the different organizations in some cases there is no identifier at all like in open door and briefly data they have organizations but they don't provide specific ideas so whenever we find a result of the deduplication that is useful to us we notify it internally and we have curators that make sure that this is really the case and if this is the case this is stabilized so it means that for us it becomes authoritative so the way these different entities are linked together and this creates a more stable environment so this is under implementation under implementation now will be released hopefully beginning next year what about the registry of organizations they are asking that's what I said okay we consider this and from Emily is there more information on the trust level we we make sure that every time we collect information or we produce information we assign a level of trust so this is available in the records so if you download our records every records will have inside this information when is needed so how we assign it depends on the quality of the original data source if it's a user providing it if it's an algorithm providing it and so on so I'm not sure what you mean by more information but that's what we have so if you go to the for example to the website and you find you open any of the products if this product has a link to another one that was in third you will find on the right that's a bar saying how trust for this information is like 89%, 90% etc okay so you can also find the visualization of the trust okay thank you very much I would like to ask the participants if they have other questions and also the technical experts if they have additional comments on the presentations okay I guess that this is a no can you see my screen the agenda is on it yes okay thank you so maybe I will proceed with some points related to the open air advance open call for tenders and after the end of this a small presentation if there are other questions maybe we can provide the participants with the related the respective answers okay so regarding the framework of the updated open air advance open call for tenders this is still split it into three phases phase one is about solution design phase two is about prototyping and phase three is about the original development and testing of a limited set of first products and we decided to keep these phased approach because allow successful contractors to improve their offers for the next phase based on lessons learned and feedback from the procurer in the previous phases and using a phased approach with gradually growing contract sizes per phase also makes it easier for smaller companies to participate in the open call and enables SMEs to grow their business step by step with its phase so we believe that is better to keep this scheme second point the maximum budget pretender will be 60k and only one topic can be chosen that means that among the three ones that were presented each participant will have to choose the one that it suits better to their activities this time there is a threshold of up to 15 points with the highest score of this one of 30 points that means that it's applicant with a score less than 15 meaning 14, 13, 12 etc. is going to be rejected there is also a flat rate for each phase 10% for phase 1 40% for phase 2 English is the language of the tender both for the communication among the procurer and the participants and the mentors also all communication and submissions must be in English deliverables and milestones Of course as I said before the open call is expected to be published mid-November on the official website of the project there you will find during the next days as well the webinar the open market consultation that we delivered today and the presentations of our technical experts and of course after the publication of the open call you will have some time to address your questions at theopenair.org if you need clarifications or if you have any other remarks we will happy to provide you with answers because the open call is expected to be online for 30 days you will have time to send your questions and we will provide you with clarifications So these are the points from my site I would like to ask you one more if there are other questions or you need something either from me or the presenters there is one question can multiple independent proposals be sent by the same company Well this is an issue that we have already experienced and is a contractor from non-easy country eligible for the tender by saying easy country you mean European Union I believe So regarding the second question we accept proposals from EU member countries and also from associated countries according to the horizon 2020 list of associated countries I'm sorry and regarding the first question this is an issue we are still trying to decide so I believe that you will have our official answer upon the publication of the open call ok thank you very much for your participation if there is any other comment then I will proceed to the end of the meeting there is a possibility to have joint proposals time by saying a bound to team up with open air partners I believe that you're referring to the conflict of interest maybe this is the issue so in order to answer to your question we still don't want a conflict of interest that means that you you are able to submit joint proposals however we think we still think that we don't want participants that have already received funding from previous open air calls however we have to say to you that the official rules are still under let's say finalization so upon the publication of the official call you will see that which is the final decision of the open air advance team however we are proceeding to this path at this moment ok thank you thank you very much you can still address your questions at openair at coralia.org and thank you again for your participation today thank you