 And thank you, Pauli, and Kimo for your kind invitation. So my talk today is about aligning European repositories to the open-air guidelines. I wonder if here are any repository managers, either institutional, thematic, or data repositories. OK, so we are not alone. So why about repositories? Here's a statement from a working group in CoA, Next Generation Repositories, saying that repositories are the foundation for distributed globally networked infrastructure. And most importantly, they need to be collectively managed by the scholarly community, so following the principles of open science. I outlined my presentation that I give a brief introduction to open-air, then coming to the open-air and operability guidelines. And last but not least, I will talk about the open-air services and the benefits they should provide to repositories, for instance. So open-air wouldn't be possible without the open access, open science policies, and mandates of the European Commission. So the different phases of open-air, which started in 2009, are strongly related to measurements of the European Commission, starting with Open Access Pilot for publications in Framework Program 7. And this was the moment when also the open-air project started with the mission to collect publications with an acknowledgment statement saying, this is funded in FP7 and report this to the European Commission. And you can also see that the different stages always started with a pilot. And then it was made mandatory for all publications, the same for research data starting in Horizon 2020 is now a default to make research data available of funded H-2020 projects. And also, the launch of the European Open Science Cloud is a clear consequence of these measurements of the European Commission. And we will, despite all of the challenges, see quite optimistic in the future with the next research program Horizon Europe when open science is declared as a motors operandi. So what is open-air? It's a quite large project with about 50 partners across Europe. And we are focused on mainly three aspects which are connected, which are one area, guides, and drives the other. So it's the technical infrastructure where we have an aggregation service where we collect different kinds of research output, not only from repositories, but mainly. Then we make it visible via the portal. We provide dashboards for different stakeholders, like for funders, for institutions, but also for content providers. And we are offering a machine interface so that third-party services or innovators can build on the content that open-air provides. Of course, and it was also mentioned by the first speaker today that the policy landscape in Europe with regard to open access, open science is quite complex and challenging. And one of the aims of open-air was and is to align, to contribute to the alignment of those policies, but also to work in the area of studies, to provide recommendations about legal issues, to overcome legal issues, and intellectual property rights, for instance, especially with regard to research data. But also in relation to our aggregation service, we need some kind of agreement with content providers how they should provide their metadata to open-air. Therefore, we issued guidelines. And most importantly is the strong human network in open-air, which consists of the 34 national open access desks, among them the North from Finland. And they are working on a national level with fostering open science and their communities, and also providing support and training measurements like by workshops or webinars. And publishing fact sheets and guides on different aspects of open science. So in the past, there were sometimes questions who was using open-air services. So what are our stakeholders? And here are identified for groups. So we have the funders and institutions which are interested in monitoring their research output. We have the content providers and research infrastructures where we want to collect their content and make them visible. But we also want to provide value-added services for these type of stakeholders. We have innovators that can access the scholarly information graph from open-air to build more sophisticated services. And of course, we have the very important group of researchers and research communities where we offer our discovery services. On the right side, I've chosen just two collaboration projects and initiatives. Of course, open-air is connected to much more projects and much more initiatives. But I think these two are also related to repositories because there's a collaboration with Yoast Cap which addresses three pillars like service integration to provide a seamless information flow of research results among services between open-air and the federated infrastructures of Yoast Cap. But also to work together towards communication activities and producing training materials together and, of course, also to align the different strategies of these projects. And I think this is an opportunity where all the content which we have in European repositories to make them an entry point to the Yoast portal. But also, open-air collaborates with the configuration of open access repositories, which is a global lobbying initiative for open access repositories. And here it is important to overcome the issues of data sealers to see repositories as data sealers. It's important to make the content of repositories visible globally and connected with other scholarly output. And this is also an activity within Core in the next generation of repositories working group which analyzed several user behaviors and provides or promotes some technical solutions to upgrade the technology of repositories to a more web-compliant standard. Open-air numbers are just a few. In our production environment, you can discover 25 million publications where the majority is open access. We have collected an indexed 1 million of research data sets. We collect this information from around or, in other words, this content is available in about 14,000 content providers. We work together with 18 funders in Europe and worldwide in order to identify the research output from the projects funded by these funders. And so we identify, at the moment, 20.7 million grants. This is a view on the quite complex infrastructure of open-air. So we collect all our information from authoritative databases or registries like a funded databases. We make use of directories for open access journals or directories of open access repositories or Re3Data for data repositories to identify the relevant data sources. We gather content in form of metadata, if possible, also the food text files, and also we would like to track usage data or usage events from these sources. The content is then going through a pipeline of different services to normalize the content which is then stored in the information database of open-air by generating an information graph of different entities, which provides main features like enriching the aggregated content, make them discoverable, but also offering a feedback chain to the users. The content is available for monitoring, for instance, that the funders can see the research outputs of their projects, but also for project beneficiaries to report their publications to their funders. The APIs are available for developers, and a couple of text and data mining services provide more sophisticated services that allow to assess evaluations or impacts or research trends. There are a few recent services released like the Open Access Broker Service, which I present at the end of my presentation, and the Scholarly Explorer, which allows exploring links between publications and data sets, which is actually also an outcome of an RDA working group. So I've shown you that we collect content from various types of data sources. And this has challenges, of course, with regard to interoperability. We collect different groups of research results, which are publications, data sets, software, and so-called other research products. And we identified that all these groups of research results can exist at the same time in each of these kinds of data sources, like institutional repositories, journals, data repositories, software repositories, and so on. So we need some kind of agreement. And so we started with issuing guidelines. The first were about addressing the textual publications and institutional thematic repositories extended then by guidelines for data archives, for software repositories, and other research products. And most recently, we also updated guidelines for CRIS platforms, which now also allows that we can aggregate content from current research information systems so we can deal with rich research information in OpenAir. A compliment to these guidelines is the OpenAir content acquisition policy. And this is needed for different reasons. One reason is that we extended our scope not only to collect open access publications or publications linked to projects, but to collect all kinds of research output regardless of their access status. And the content acquisition policy addresses different aspects. The one is that we need an agreement with content providers that OpenAir allows to reuse metadata and other files, for instance, for text and data mining, but also to agree on a certain level of content quality, which is then more refined in the OpenAir guidelines. We also want to serve different research communities so that we can link the content in OpenAir with domain-specific research results. We also want to enable reproducibility. And for this aim, it is important that we can link publications with other research output. And we want to provide a monitoring, for instance, for funders. In December last year, we issued the most recent version of these guidelines for literature repository managers. It's still based on formats that are aimed for descriptive metadata, like Dublin Core and Datasite metadata schema. We consider here that, in many repositories, we find both kinds of output textual and data publications. So we need one format in which we can use to describe different kinds of outputs. We make use of controlled vocabularies, which are also aligned with other OpenAir guidelines. The goals of these metadata guidelines are that we want to be able to discover and make the content citable. We want that the content is accessible and able for reuse. We want to contextualize the research output with other projects or with related projects and other research artifacts, for instance, data sets. We want to support interoperability, which requires stable, persistent identifiers for the entities and the use of controlled vocabularies. We want to allow reporting, which requires references to funding information in the metadata. And we want to support text and data mining, which requires certain license conditions and also the provision where such files, for instance, the full text is located. I don't want to talk much about the guidelines for Chris managers, because we will have another talk in the afternoon about this topic and some earlier results. But I want to mention that from the OpenAir perspective, we see that the Chris can be used for different cases. As an institutional Chris, which, like a repository, exposes the research output, but also the funder Chris, which OpenAir could use as an authoritative database about the funding programs and projects, and the national Chris, which acts as the national aggregator of research information. This slide I show the current compatibility of the data sources in OpenAir. On the right side, it's a compatibility level for repository data sources, which in some are about 980. And on the right side, it's a compatibility level for journal data sources. What we can see here is that the majority of data sources still comply on the very basic level, so exposed descriptive metadata, where we often miss some more information, like links to other research output, or missing links to funding information. Here's a list of a few challenges we have faced in our daily work when we aggregate content from repositories, the problem of missing values, the problem that we are missing links and identifiers. Still quite often controlled vocabularies are not used, or not in the agreement we offer them in the OpenAir guidelines, or the metadata descriptions are not comprehensive enough, so they provide just the mandatory values, which limitates the abilities of discovery and reuse. So what are OpenAir services which repositories could benefit from? So I present here a few of OpenAir services, but one to limit on the last three ones, which is the registration and validation service, the open access broker service for content enrichment, and the user statistics service. You can also find all these services in the EOS service catalog, which is so it's already integrated there. For repository managers, OpenAir offers a content provider dashboard where a repository manager can register with an account and register and validate its data source. It can subscribe to the open access broker service to receive notifications on specific metadata records, or it can also join the user statistics service. The registration is quite straightforward to choose the type of data source and then go to a few steps to register the interface of the repository. We try always to reuse information from existing directories like OpenDoor or Re3Data. Features for repository managers are that they can view the validation history. It's also preparation to provide continuous validation, so not only at the time of registration of the data source, but also each time the repository is harvested or aggregated, which can then be seen in the collection monitor. The broker service is an opportunity to complete or provide additional information about open access, about metadata records in a repository. And this addresses different aspects like completing information about persistent identifiers, then the problem, for instance, that a publication in a repository is not available as open access, but in another repository or journal, the open access version might be available, so this information can then be enriched. Enrichment of project information, of classification or subjects, or abstract information. The user statistics service is an opportunity to where OpenAir tracks usage events from the repositories about the views and downloads, and then calculates and counter-compliant usage statistics. The advantage in OpenAir here is that we can, thanks to our data application service, accumulate the usage of a publication, which is stored in different sources, and in this way to provide a more comprehensive view of the usage of a publication, for instance. And this is how it looks like in the dashboard and in the portal, so we can provide statistics on the level of the whole repository or as a view of a specific item, of a specific publication, for instance, and that's it.