 Okay, my name is Natalia Manolaf or those of you who don't know me. I'm from University of Athens Department of Informatics I'm the project manager of open air and I was supposed to give this presentation with Paolo Manghi from CNR Eastie Who was not able to attend So what I'm going to do present open air we've heard a lot about policies. We've heard a lot about Behaviors we've heard we've heard a lot about open science open air is a data infrastructure is a research infrastructure who is trying to to implement and develop Many of the things that have been talked about today Just to give you a brief overview of those who don't know open air is a policy-driven project It's a horizontal infrastructure. So it's not discipline specific. It's trying to Cover many disciplines The first phase started in 2009 and then it last December It was a project to implement the FP7 pilot and the RC guidelines as we all know them And now we're in the open air plus the first year has already gone What we're doing is supporting open access in all Europe. That means going beyond the FP7 To include all European founders And we provide links to data What we have seen so far For the three and a half years of operation of open air is that it's becoming a point of reference for open access in Europe And then as we have worked in this we have realized that open air has the potential to be to be the European scientific information infrastructure meaning that we can Link everything together all all all things related to scholarly communication and research Open air we've heard about models today about Mendoza net saying that there were some Some results that how having a Central model or in Spain and in Portugal oops, sorry and I still can't see They were different kind of models what open air does it's a European infrastructure and we rely on this participatory Distributed design which means that we are building on existing efforts Either them be a central repository in a country or a system of repositories or individual Repositories or thematic repositories this participatory design for open air has two Two branches that the human and the technical I'll just go very very briefly over the human because this this this Presentation was about the services that we provide and and there are in the internal architecture So so what as we all know is that the research is a global effort, but researchers act locally so what we need to do to to to To support this open access pilot and beyond is that we need to act at the local level so what we have done is in open air is that we have 32 national open access desks and some of you are here many of you are here Which deal with the diverse national research environments? Mainly in the publication domain now. We're moving into the data domain and how this institutional Repositories can move into this we have open access advocacy through targeted activities many webinars many A lot of presence in an European forum. So what is actually this human? Infrastructure, which is a very very important part of open air is a network of people disseminating advocating implementing and monitoring the technical infrastructure which I'm going to go Over the next slide is Is an infrastructure that is building on existing and existing efforts existing efforts meaning repositories? Publication repositories thematic repositories other registries open access journals funders Whatever you can name it that is linked to the research to the research world We have a system operational since 10010 we are proud to say that we have Zero number unscheduled down times and we are involved in Collaboration with many providers So just let me go briefly over the architectural and data management. This is a very very high level work Open air and open air plus is there to support the new publication and what is the new publication the publication? Alitha called it enhanced Other calls it call it and reach the publication as we know has changed What we call it is information in context because there is a research question Somebody needs to to get some results and output related to research question And there are many things related to it. So so I don't have a point. I do have a pointer So so the publication is a traditional one and then you have many things hanging around it like provenance License any kind of metadata that that define the publication and its quality These are linked to the research data Which again have many branches This could be linked to software This could be linked to any any kind of object digital object that is related to publication And one of the main interests in open air is that it's linked to funding because researchers get funds by by by funding an organization and somehow this whole process needs to be Evaluated so the open-air data model We've gone from from the driver projects where we had the DC flat model we've gone to Chris complies a reef compliant model which have we have many entities in there So we have research results and licenses Related to publications research data. We are looking to expand in the future Maybe to learning objects to patents to software. We'll see it's about people It's about research organization funders and data sources and these data sources could be something could be data sources that will give us The publications or the data or could be registries that will give us authority with data about related to them Just to give you Very brief overview of what we're doing These are because as I said that open air is building on existing efforts And these are the systems that are out there and they are people are putting an effort to to to build them So we have about Six and a half million open access publications from validated repositories through our driver projects We get from them the metadata plus the usage data We have the EC funding we collaborate with the Commission the European Commission to get the FP7 programs we are expanding to national funding so other funders That I want to participate in this in this effort We have free systems that are in universities or in national free systems of which give us information about About about the research that is being carried out We have many We are allied with many Registries like our research IDs Open door or open and raw And then we on the other side we have the data repositories and data journals we Collaborate with data site and now with you dad. So we are trying to get all this Harvest all this metadata and sometimes data itself And then we have the processes there that we link classified the duplicate side We we clean the data one of the things that was mentioned before was quality of the data and open air In order for open air to be able to provide services that are meaningful to researchers and other users Quality is one is one of our flags and we try our best to do it. So On top we have many services mainly from from depositing into into the orphan repository, which I'm going to talk about in a few minutes Changing names and face To to visualize and manage enhanced publications. How can people link publications and dates and funding together? the usual search and browse and then we are moving to Advanced services where people can come and tell us about the metadata to to improve the quality of What we have harvested and of course is statistics statistics since this is a scientific information Towards scientific information system statistics is a very important thing But this is all you can find in our portal But one of the main aspects of open air is to provide all this data to third party providers So they can build applications on top. How do we do that? Michael is going to be Talking after me with guidelines guidelines are what we try to do guidelines for different data providers Is to enforce or to promote standards? How do we work that? standards So in the coming few months Many of you know of driver so open there was the project to implement the pilot in the coming few months Our technical teams are working overtime to try and merge open there and driver together So at the end you're going to have a system which will have mainly open access Publications and promote the open access and work for open access from around 400 repositories and open access journals We evaluate about six point six publications of different types not only Articles articles are less than a million in this in this In this infrastructure Everything is going to be open access except for the cases that we want to monitor the open access Monitor the open access Policies where we gather non-open access and we do our statistics The persons we've seen from from these six million publications are around 60 million persons Which needs a lot of cleaning and a lot of deduplication efforts We have our projects are 26 from FP7 and we are work We are working on a pilot with welcome trust and then we will be advocating to get more funders from member states because as somebody almost said 10 FPS the European Commission is 10% A lot of the research grants come from the member states So so we're trying to get the European the the European Landscape Organizations we have a lot of organizations from open-door in Corda So just to give you an overview of the complexity of the system because we were talking about the data management services This is the collection where we collected data. Then there is a lot of work here and As Paolo has written that this is the big data because we are using H base and Hadoop which are Google like their Google Technologies or Google like technologies We are cleaning. We're linking. We're doing a lot of stuff with them And then we provide them to the users the users being end users on our portal or third party providers Just to give you an idea of the process of the data flow is that we have Repositories we get the collection We do a lot of data inference and this data inference is offline With algorithms that are institutions because we have five technical technological institutions on our consortium That they are being tested and they return to into the production with a trust level So we we we take into account the quality because we know as somebody said that these models in this algorithms change We did duplicate because Because we're gathering from too many sources there There's a lot of duplication and the statistics we want to to provide at the end Should be meaningful and then we provide the information space to the world What we are moving I'm gonna go briefly at the end is that we're going to be Having a service with expert validation that communities will be using open there for their Views of what what they think they have gathered and will give them tools to validate them so services three types of end users researchers from data providers and Research administrators in research of administrations. It could be institutional researchers national Funders anything that you can imagine and we need to break those down more For researchers, we have one stop shop for the position And I have to say that open there is not a repository. Although we provide The the the repository from CERN open there is not a repository because there is this confusion and some people say the repository It could be a database. It could be seen but the actual publications actual data relies into the place where they They are researchers deposit But they can come to open there to see where do I deposit where do I deposit my publication? Where do I do the data? How do I link them together? So this is what we are providing to the researchers and of course we have We are now this list of services will start to grow As we have a more stable infrastructure with alerted notification of line curation expert validation and so on For data providers, what do we do? We have Mika is going to be talking about it We have as as I said, we have driver open they are merged, which means that they're there will be Publications that are not linked to funding. There will be publications that are linked to funding They there will there will be data or publications linked to data and so forth So so we have for data providers all the flavors with different levels of commutability and they can come and And participate in this infrastructure What do they get back from us they can get back and reach information so all the back and services that we use in our big data scalable Infrastructure what we plan to do in the next six months is have it have all these services as when be their faces as With web services or maybe through the portal so they can come and check them The data providers can come and check them and then maybe use them So for example if we have a service that text minds and retrieves funding information from a paper We do it all on the back end, but we'll provide it in the front end So so those are the kind of services that we want to provide to the data providers except for of course for the services like building a community guidelines and visibility Research managers, what do we do? So we provide statistics since open air plans to gather all this information One of the research managers now that we work with is the European Commission This is these are the people who fund us and these are the people who want to get results and see one of the major and primary Question that they have is how is the open access pilot going? So we are able to provide some results for them And now we are going we're moving beyond that What we do is? Statistics on projects scientific areas countries institutions So if we have all this information of the model everything linked together we can say how many publications where Came from this institution which areas did they produce it? So I'll show you some pictures later of advanced tools for science trends So just now a few shots of open air in practice This will be our new face in the coming months. So it's a it's a preview a Publication view what you have here. It's not only publication. You have different kinds of content classification this content classification comes from repository or it can come from what we do on the back end for For content classification and clustering we're using supervised and unsupervised algorithms So so those will be there and somehow visually will say Will show which where by the researchers put in by the researchers and which were Inferred then tagging several several several Ideas because what we do what we do is the duplicate and One one publication can have many ideas Where you can find it which repositories funded by which projects and then all this related data Related publication on metrics. We're working on them We hope that they will be gradually appearing as we go along in the next few months What you see here for a publication view you will be able to see it for a data object to you for an author for For all these main entities that we have in open air. We're planning to release them gradually one by one What we have done and what we're doing at least the technical infrastructure is ready to work on this is we've gone beyond FP7 and the European Commission we have gone with welcome trust We have had all their grants and we have identified Publications and now you can search in the portal in the test portal that we have you can search for FP7 or for for welcome trust. So you see this is a kind of emerge from driver and open there so you have 1.2 million publications out of the six point six This many FP7 and this many welcome trust when you come to deposit a publication or data to us You will be able to choose which funders which they were funded by you will be able I'm going fast because I my time is running out You will be able to support to to to to to identify Data sources based on the type based on the country and to see some very basic statistics What we are doing is we are allowing researchers to work with other services So when they come to to link a publication for now to a funding and later to data they're able to To identify these publications through Through through the open access search through doys from Crossref through doys from data site or from orchid So for example, this is this is my when I click on orchid. This is my My my my list of papers I can select that can very easily added to the to the to the funding Orphan the orphan is changing face and name. It's going to be a Repository kind of like fixture But it's going to be supported by CERN Sustainable and supported by CERN Curated as much as possible and will provide links. So when somebody deposits some publication into Zinodo and this is the name That they will be able to link to projects and to link to data So so it'll have the functionality that opener has it will be also in Zinodo Some statistics, but this is what we calculated for FP7 monitoring scientific outcome Because from the repositories, you know the identification of grants was not there yet So we had to go and do some extra work Those are graphs that are shown in the private space to monitoring open access policies FP7 and SC 39 Those are what we plan to do in the next two or three months is that since we have all this all this data We're going to have aggregated visual statistics with various indicators So indicators as we go along this will be We hope to have a lengthy list of indicators and then we have statistics based based on advanced Classification so so so this is this is now on the back end what we need. We have the archive Classification we work with other classifications and we're going to merge them together so when you have a paper or you have a publication or Maybe a data from from a data journal or data repository like triad. So so you can identify In a bulk mode where does this belong which scientific area and this is a very useful tool both for data providers So we can do we can run it on one repository. So this is the archive distribution. We can run it on a funder This is the ARC distribution we have we can run it on on a department Like this is at the Department of Informatics in Athens that this is and this is for a person So all these are services that will be able to to provide on the portal, but we'll provide on With different levels of agreements User statistics, this is one area. We started in open-air and we hope but it's not in open-air plus But we're continuing we work with Pirus surf sure open access statistic and And knowledge exchange and we have a usage statistics on a pilot base We have open-air has we have implemented the services so that we can show them in the portal But the the data is not enough to produce meaningful statistics And this is an area that we would like to work on so we can have user statistics for a publication for a repository And then because we have the knowledge we can aggregate for projects research institution national interest So so what is the usage statistics of Greece? Let's say of the publications? That are that come from Greece and we work also with Mendeley with plus And this is these are the kinds of statistics that we get from them and last I'm finishing Okay, is We have we have a new service that we assess community impact We work with EGI for those of you who don't know EGI is that European Green initiative. Yes It's a it's a big community before the cloud or they're changing to cloud Or I'm not sure what the model is now, but they've asked us to say, okay Let's see. What is the impact of the work that we are doing in world would have done? This is what they gave us and What open-air will do is collect all their publications Produce the Statistics that they want and we can give it to them. So what we'll do will will mine identify EGR references of publications Will provide services when a researcher comes into the portal They will be able to link a publication not only to a funder, but also to a community Then will the EGI curators. This is what I was telling before is that we're gonna have content Service so that communities can curate their own contact. Are they validate all the links? We'll calculate statistics and they will show them the results they can do them in their portal So but this is a pilot case and this is very good for us to do because we need to change some of the underlying technologies Beyond EGI we have been asked by EMI Which is the European middleware initiative and they have software that they produce And we will be moving as this is a service to the national infrastructures and this may be of interest to you