 To be the conference has had an amazing attendance. So it's really nice that people are really showing up. Okay, if it's virtually rather than physically, but it's really nice to have you guys coming to these, these talks, given as an opportunity to reach out and speak with you. So we're here today to talk about the common services and we say it's opportunities for usage and integration. So this is mainly because we have common services which are actually In the EOS catalogue and you can actually gain access to here and now so they're off the shelf and usable. So that's usage. You can actually really pick them up and start using them. You can contact us and do that. But there's also a lot of integration work and that's one of the themes we want to actually try to show you today as well. That as services join the EOS gets an equal system of services and there's added benefit from joining the EOS. We link the services in together. We look for common usage patterns. We see what the community say to us that they need. And we move on with that and we look towards these integration activities to provide added value. We actually go behind the scenes and we link these services in together to actually cover these common use cases. The overview of the session. Really, we will have three distinct thematic areas with two talks each. So we will talk about the distributed computing and the orchestration services that allow access to this distributed computing infrastructure. And then we'll have a discussion round, maybe a couple of questions depend on the time. And then moving on to the advanced data services so foundational for data services. So this is looking for data services that look at data metadata and also further data services findable data, etc. And we look at what we've done in this area. There'll be two talks there again followed by a set of questions and then advanced data services. So these are two Fairly distinct services. So long term data preservation and sensitive data services. These are both things which are in the catalog now in the marketplace, you can use the one to expose them to you see so you can actually see how they're evolving, etc. And then a little summary and outlook. It's worth pointing out that if you look at the agenda on the web page, you will see the link to slide. Oh, so this is where we'd like to capture your questions. So we said at the end of each of these double talk sessions these thematic areas. We'd like to look at the top questions. If you click on the link to slide off on the agenda, then you can post your questions there and you can also vote for questions. So if you see something where you feel, ah, that's exactly what I wanted to ask, then feel free to click on the thumbs up and vote this question. And that's what we'll try to address at the end of the talks. Okay, so following that I think we're covered for the the brief introduction and now we should move on to the distributed computing and the orchestration services. So, you know, can now take over and give you a talk about the distributed computing and the Miguel will come on and then talk about the orchestration services. And as I said thematically linked, then we'll have some questions based on these at the end of the talks. You know, if you can share your screen and you should be free to go. Okay, so I hope you can hear me well and see the slides. Thank you, John, for the introduction. I'm an orphan and that's working for the foundation and I will talk a bit about the federated compute services that we have included in the US have a project. The objective of this federated compute services is mainly to provide the US users with a distributed computing infrastructure to execute their workloads, we do that in different kind of abstractions so you can run your workloads in different kinds of computing resources and and computing types and for those we have a set of services involved in this big task of federated compute. Which are listed here. So the cloud compute a cloud container compute the Indigo advance infrastructure service, which is mostly you docker tool. EGI high throughput compute and the manager. I have a table here that tries to summarize the different characteristics of the main services delivery compute capacity to the US have and these are EGI cloud compute cloud container compute and high throughput compute and basically you run different things on each of these services in the EGI cloud compute. This is a distributed infrastructure as a service where you run BMS. That's a virtual machine where you decide what operating system and what software to run and how to run it. You are completely on control of the resources and you decide how to deal with that. So it's really powerful and allows you to do very custom setups, although it. It's also a bit complex because you become an administrator of a virtual machine so you need to really take care about all the details that run a virtual machine. Then we have the EGI cloud container compute, which is about running docker containers on top of this cloud infrastructure. Normally, you can run a docker container as a standalone thingy as a single binary, but the usual thing is to run several of those containers together in a kind of synchronized orchestrated way. So there you have the container orchestrated system. In the case of EGI cloud container compute we use Kubernetes, which is an industry and standard that basically everyone in the in the cloud idea is supporting it. It's quite complex to set up but in the EGI cloud container compute what we provide to you is just ready to use a deployment of Kubernetes or you can just run your applications. The complexity of Kubernetes is there and it has a steep learning curve. So sometimes it's a bit afraid of this, but we can help you to get started with all this kind of setup. And then we have the EGI high throughput compute, which is about executing jobs. The jobs are basically a clearly defined executable with some inputs and some output limited in time that can executed to a batch system and you get the results when one is over. This is ideal for this kind of problems that you can divide in independent chunks and you can submit a job for each of these chunks of the problem. The good thing is that you don't need to manage any kind of resources. So it's not like the VMs or the Docker containers where you are aware of the underlying the resources to just submit jobs and let the system come back to you when when they are over. But you need to adapt your application. If your application is not written or developing this kind of job model, you have to port the application and we have some legacy interfaces that may not be that easy to use from some system. In any way, it removes all the burden of managing the the infrastructure, which is quite nice for for most of the users. And this slide and the next one. Yeah, I have the same services putting all together and how they relate to each other. So we'll start with the EGI cloud compute here this cloud here is where you can run virtual machines. On top of the EGI cloud, you can start the Kubernetes that is the EGI cloud container compute so this service runs on top of the EGI cloud compute. Then we have here on the right the EGI high throughput compute which is this big batch mission for your jobs to be executed. And there you submit the jobs and especially you can have your Docker jobs that allows you to run any Docker executable or any Docker container I should say into into one of these jobs. And then we have a set of tools that allows you to better use this three services I was using. The main one is the EGI workload manager the main one included in this task of the federated compute service is the EGI workload manager that is able to distribute jobs on the EGI cloud compute and the EGI high throughput compute depending on your needs and and and the resources that you have available. But we have also other services that are able to interact with the ones included here in this task, sorry, which is the EOS hub cloud orchestration that you will hear about in the next presentation so this, these services are able to manage resources that are provided by the EGI cloud compute. And of course you can access data of that are hosted in external service like data drop on data and others that you will also hear later on in the in this session. And all of these in theory with the Federation services of the US hubs of the AI the accounting and monitoring, etc. So moving on to the distributed computing infrastructure that powers this EGI cloud. Here I have a map of the different providers all across Europe, there are 22 of those offering infrastructure as a service, mostly to support running your workload near the data so if the data is located in Poland, the normal thing would be to go to the Polish provider to run your application. On top of this infrastructure service we have federated identity. We have a common virtual machine image catalog, we can offer both graphical user interface and common line access with support orchestration and we have central accounting and monitoring this infrastructure is been supporting the EOS hubs computing means, since the very beginning of the project. Here, you can see a screenshot of the accounting portal of the last year. So from January 2019 to January 2020. And over that period we have delivered 7.8 million CPU hours and 4200 pms have been created. And we are doing this for for several communities. So the main ones that we are supporting in the US have our automatic service so you have a list there, though that's just open course, we are a pillar and life watch. We are also supporting some of the competence centers like fusion, and we are also involved in the digital innovation hub and business or interactivity so we are half a list there of different pilots that have been supported in this infrastructure. And now with the early adopter pilots, there is some projects that are coming to the US hub that started in late 2019 and some others in this year in 2020. They are also being consumers of this infrastructure of the cloud. This is not just for the big communities, any user, regardless of how small their community or even individual people can request the access so I invite everyone that is interested in testing this kind of service or using them for production to go to the EOSC marketplace and order that and we will guide you through all the process to getting access and starting using it. And during this 2019 so the last year, I have two slides that are a bit dense with the achievements in this year so we'll start with which with I believe is the main one, which is the complete integration with the federated AI. Now, all of the services I have been mentioning are accessible through OpenID Connect or some form of token translation service, namely RCA health, and that's available either via graphical user interface and via command line or API access so every service listed here can work with the federated AI. I would say that's a main achievement over the last year. And then for the different services, I have a list of main points delivered there. So for the cloud container computer we have updated the Kubernetes support for one of the latest versions of 1.16 and beyond. We have adopted more and more Kubernetes features, so GPUs, the container network, ingress, NFS storage, so all of these features that are coming to Kubernetes or are contributed by external people, we try to bring them in as soon as possible and we have done that over the last year. For the cloud compute, there has been a lot of work dealing with the app TV, which is our graphical user interface where we have improved the support for providers running OpenStack. We have improved the security with second integration, which is a tool that allows you to detect the vulnerabilities on the images that are published in the cloud. We have reviewed the Federation model of the EG cloud to make it more attractive for new providers, making it lighter and avoiding privileged access to whatever the provider is. We have improved accounting and monitoring with new types of accounting records and better monitoring to detect failures before the users suffer them. And we have also improved the information discovery using the glue schema, which is a standard of DOEF and using the argument system system for delivering this information. In the due docker side, we have improved the OCI compatibility, so now it's easier to run containers that follow that standard, which is the main standard of docker images. We have now support for GPUs, we have support for ARM, and other minor general improvements like Python 3 support and compatibility with different systems that can be found in the computing systems of the EOS Hub. And in the workload manager, we have better multi-course job support and enhanced packaging of direct binary dependencies, so it's easier to use this workload manager, and we have better support for complex workflow management. So that's my last slide, and now I think it's time for Miguel to go with the orchestration. Okay, thanks and all, let me share my screen. Okay, so I think that you are, you can see my screen. This presentation will summarize the developments carried out in the taxes of three. So we're achieving common services for cloud orchestration, but we are focusing mainly on services that can be directly spoiled by the user, not in internal companies. Mainly, we will show three services, that are the path orchestrator, the infrastructure manager, and the feature gateway. A rich component, it will make a brief summary of the service, and I will focus on the integration in the effort that we have made in this task in order to integrate with other EOS Hub components. This task, as I said, is focused on orchestration services on top of cloud compute and cloud container services. These services allow to build complex built-in computing structures. It is based on TOSCA, in the EOSist TOSCA standard. It uses a high-level graphic interfaces. It enables the automatic selection of cloud provider. It's based on some information, starting information about the provider, monitoring information, and the available SLAs between the user and the provider. It supports a light range of cloud provider. We are focused on the cloud compute service, that we also support another cloud provider, as any other open stack type, or open Nebula, Cloud Stack. But also, we are able to access the public cloud provider, as Amazon, Google Cloud, SQL, etc. We have been focused on the optimization in order to speed up the deployment of these cloud providers. This is the main architecture of the orchestration layer. The entry point is the feature gateway, that has been developed in order to build a signed gateway. Then the main company's orchestrator, that enables to choose the best cloud provider. Internally, we have four services. Two of them are made to gather information about the system, the CMDB, gather information, and the static information. The SLAM component gets information about the... It enables to manage the SLA information, and also get information about the SLAs. Finally, Fabix is used for monitoring purposes. Finally, the cloud provider ranking component gets all this information. It has a ranking of the providers, where the orchestrator will deploy the infrastructure. Finally, the infrastructure manager, that is the component that gets the request of the orchestrator to finally deploy and configure the virtual machines from the different cloud providers. In the last year, our main effort has been integrating with the solar services, mainly with EGI services. We have been integrating with EGI skin. We have been integrating with EGI cloud computing in order to enable to deploy virtual machines on top of the infrastructure. But also with the EGI information system, the APPDB and clouding for providers. It enables the EGI access information to the user in order to select the available sites and the images that are available for the user. And all the improvements have been requested by some communities. The first component is the feature gateway. As I said, it is one of the possible entry points to the application layer. It has a front-end that is accessible by means of a REST API. It has an extensible backup that supports different distributed computing infrastructure, such as read, cloud or high-performance computing. And it was successfully used with different end-user applications. For example, a web portal, but also a mobile application or a scientific world. As an example, you can access this new web page that is the EGI science software on demand that has been deployed on top of a feature gateway. The second component, that is a path orchestrator. That is the core component of the orchestration layer. It enables, it has two main functionalities. One of them is to deploy utilized computer storage research. In this case, it interacts with the infrastructure manager that finally destroys this infrastructure. But also it enables to deploy localized services and jobs on top of Mrs. Class. In this case, it interacts directly with the metro framework, in particular with Marathon and Chrome. In both cases, the requirements are described using the Tosca Janales standard. And the path orchestrator provides functionality about acceleration and scheduling capabilities. It enables a transparent access to different cloud environments. But it also enables the selection of the path orchestrator providers based on different criteria like SLA's available, service availability, data location. The path orchestrator has different client tools. It implements a RESTful API. It has a command line tool that is called Orkent. It also has its own web interface, that is the orchestration dashboard. And it has been included in the EOS portal. So you can access this link that is shown in the slides in order to get more information and to request access to the component. Finally, the infrastructure manager, the service that finally deploys the virtual infrastructure on top of cloud resources. It can use RADEL or Tosca language. The ADL is the native language of the infrastructure manager. But it also supports the Tosca standard. It supports this infrastructure as a cold party in order to deploy infrastructure. The IEM is in charge of automating the deployment, configuration, monitoring and updates of the virtual infrastructure. It also supports a wide range of cloud providers, not only EDI plus compute, that is our main focus of this task, but also other open-spread sites and other public cloud providers. It features DevOps capabilities. It is based on Ansible in order to enable the users to specify a set of types in order to configure their infrastructure. And the IEM works as a service and offers federal interfaces. It's a TIFTAR-RAN, additional ERPC and REST APIs. A command line application and it has two different web-based graphical interfaces. In particular, we have deployed two publicly available web interfaces. The first one is an evolution of the orchestrator dashboard that has been adopted for the infrastructure manager. It is for an advanced user. It is only focused to deploy on top of EDI cloud compute resources. It is an easy way to deploy virtual machines without any knowledge of the physical infrastructure. And the second interface is the IEM portal that enables to access all the functionality of the infrastructure manager. It also enables to deploy in different cloud providers, public cloud providers. It enables to deploy your own TOSCA or other documents and such. It also has been included in the IEM portal and you have here the link in order to access and to get information or request access to it. Finally, we saw a foresamity of the my own achievements of the last year. In the case of IEM, there has been integration with the EDI team and with EDI in order to get information about site and images. In order to make this process easier for the user. The orchestrator has added the functionality to dynamically import the TOSCA course to site. It has been requested by the DOLA community. It has been integrated with HACICORP vault for sacred management as requested by ELECSHIR. It has been enhanced the interaction with the infrastructure manager in order to retrieve the process of organization log and get a new term of the details. And it also has been improved the retry mechanism in order to allow the deployment phase. So in case that one site is failing to launch the VM, the orchestrator will select the next available cloud provider in order to retry the launch. And it has been improved the client TOSCA, the client TOSCA in the Diorcant. And it has been created new dashboard and it has been requested by the DOLA and HACICORP community. Finally, the feature gateway has been focused in order to run the service as a set of Docker containers, in order to run on top of a Docker compose for a Kubernetes platform. And this is my last slide. So energy, John, you can. Yeah, okay. So, um, okay. Super Sarah. So we have two questions. So I don't know if we can let the person ask the question herself. Can they be unmuted or Yes, yes. They can be unmuted. Okay, I will do that. Since we have a few minutes, we're doing quite okay with the schedule. Yes. Sean, you can speak, I think. Yeah. So very quick question. I know you've been providing these services. Sorry, can you hear me? Yeah, yes. Okay. I know I know showed a graph which showed an increasing uptake of the services. Have you done any user satisfaction surveys that you'd be willing to share with the community? How do you use any feedback you get to improve your services? Yeah, so, so we are following the, the fitness and standard and, and, and that kind of force us to do satisfaction service for all the services that we have in EGI. So every year we do an interview with the communities, try to understand what's been going on if they like what they get or if they don't. And those are internal. They are not public. And if there are any complaints, we capture those complaints or suggesting for improvement, and we need to tackle them somehow in the, within EGI. So yes, we are doing that. But I, for sure, right now I cannot share anything because I would need to check in the internal documentation what we have. Maybe we can, we can, we can do the exercise of check, checking what can be shared and what can be useful for communities or other infrastructure providers as well. I think it's a nice question to actually chase up the back end, see what we can open up. So we do actually talk with people also within the project itself, but it is nice if we can expose the feedback from other communities, et cetera, as well as the point, John. Okay, so the next question if we can unmute. Yes, you can speak too. Okay, so my question is related to compute container service. So I'm asking, I know if it is a multi tenant service and if so, how do you manage the authorization if you have name space created on demand. Basically what this says. Thanks. This is not the multi tenant deployment of Kubernetes we do a deployment and dependent deployment for every user or community. Kubernetes is not properly designed to do multi tenancy so we are not trying to do anything fancy there. Okay, happy in your show. Well, there are some possibilities. We have checked to use CGI chicken with some plugins to automatically create name space and service accounts on the on the fly when the user connected and we have degree of isolation and I was asking just see other people have tried the similar experiences. Thanks. Okay, so we're doing mainly okay on the time frame. So now we'll slightly to sit slightly shift and move towards the more data oriented subjects. So again thematically shared we'll have two talks, then after these talks will have a set of questions we should be able to spare a few minutes. So first up is Heinrich read man. For the further data services, honey, can you share screen and Yes, can you see my screen. Not your presentation yet. Yes, but it's for me at least it's, I see. Yes, it's a small fraction of your presentation. Yes. It's like a zoom function is activated. I don't know. I reckon do you want me to show your slides. Just a moment. Oh now yes. Yes, because there is a sharing of streams here. Again, again. What's now. Yeah. Now, yes. Okay. Okay, super. Good. Yes. Hello everybody. My name is Heinrich Whitman from German climate computing center in Hamburg, Germany. And I present the task area data discovery and access. I focus here in this presentation on development and education activities and efforts carried out within and beyond past data discovery and access. So, more or less the main goal is to establish a common data discovery and access layer. So, which is based on, or to allow their data management. And I briefly present the services involved here. And the services in this context for searching access and sharing and sharing data. So, as said to achieve objectives of the past, the integration here follows the principles and open data policy guidelines. And in order to allow the end users to discover distributed data resources. Not only the sinew has he has he up scope but as well beyond and provide adapted solutions for transparent access storage, changing and transferring of data. So, this integration activities includes a lot of work. So, of course, in some enhancement and extensions of options of the services itself, and then we developed these are friendly and interfaces in able to come in usage. And for the main goal so at the end you want to apply this case, use cases and services. And here for we have often to adapt the services and to the particular requirements. Okay, here's a list of the service portfolio of the task so the services are from the beginning available here. So very shortly, you will be defined in the discovery service. G data hub is federates storage service for large amounts of data on a global scale. To stage is a suite of service is aimed to to simplify transfer of data between data nodes that be to drop is a synchronous is a service for synchronization and sharing of data across multiple devices. So, additionally, to this services to the common service of this task, we have as well integration activities with other tasks I will show you in a minute. And, of course, we use a lot of support and Federation Services. For example, of course, AI tools are very important for data access for more details I refer here again to the marketplace of you see portal where the services are described in detail. So, in this diagram, I show a typical fair data discovery and access flow. So typically research search for data and resting for his research or study project. And after she hopefully found some data, of course, the access to the data should be transparent and seamless. So, you know, often you want to share the found data and thinking of this other people and so interoperable if the interoperable sharing possibilities opportunities should be there. And finally, the data of news and process and analyze. And at the end of your results, you will publish your results again, and then, more or less the data cycle begins from the beginning, just make your results discoverable and so on. So, I add in this figure you see already the puts the icons of the services, but this to build really an architecture and that you can really apply this common services, we have to integrate the services. And this is more technical figure here, where the integration the pairwise integration shown I list here only a few examples so the we managed to index data resources. Start in the data hub for the rated storage and be to find to make them discoverable. So, for, they have developed the web interfaces between be to drop and be to stage this optimizes data exchange and sharing the integration of the two stage and each year day data hub is done. So this makes with evening of data easier and this is the first step data transfer between storage devices and a set a lot of AI tools are used for data access. And for example, the indigo I am tool is used and as well be to access to authorize and as many people for data access. As well we have integration activities, the services from other tasks, especially from the area or task data and data data management. The key level model in detail, present this task in some minutes. So, again, list two of this. There are two examples of this integrations. So, we, we achieved, established the possibility of data transfer between the two big preservation storage storage services of EGI and that so this is between each year data hub and be to save. And we have established an interface, develop an interface between be to stage and be to share which makes the staging of documents and data stored and be to share to, for example, to processing platforms via be to stage. Possible. And so, and the other direction or the other point is that we worked for us together with other work packages, for example with the work package for the rated services, because as said, we use here the example AI tools to, which to ensure transparent and seamless data access. Okay, and as said, so the final goal is to apply this common services in this case in somatic services. I list here only just three examples of somatic services which use services of this. Task. So it's Dodas. This is a suite of services for on demand analysis. And here is along other services, identify authorization management tool used for data access and so that you can use here the federated also. The certification mechanisms coming with this service, but drop is the huge archive of a barrier, and they try to migrate their vegetation infrastructure to you that so use the service of you that. Once one thing is here that we harvest or that we to find you that we find harvest meta data from all the talks data resources which are stored. We have been with safety safe storage service and make them so discover last we have here the e-cars. The mottic service e-cars. So this is a service for analysts of climate and climate data, especially for those models network community. We have a grid build a bridge to drop. So that that end users can use to drop storage so so they can use it as shared storage so to share and exchange data results from the process and for example, with other colleagues but as well a private storage space. If they won't publish the results yet. Okay, this brings me to the last slide so the main effort in 2019 last year. This task so from the beginning already so we had to bring together the services of you that's the service and from the GI. We have a grid initiative and the other big huge data infrastructure project the Indigo Indigo data cloud and so and, for example, yes, first we have to consider the different backgrounds and bring the people together. And so but and last to integrate and use all those services challenge. But meanwhile, we had a lot of achievements. Just the next two bullets. So we have two of the major achievements so we enhance discoverability of distributed data via be to find in a lot of a lot of projects and integrations of data providers and communities and the big achievement was the establishment of the data fair between safe and data hub as already mentioned. And as well described. Implemented. A glide and or this creation of services are used in several use cases and so medic services. What I didn't mention until now we see has a collaboration with cooperation with open air. So, for example, within this operation. Build a harvest bridge to be to find that means that as well open air can again harvest the meta data. And open air can apply their services meta data check and services and so on on the be to find meta data, which would improve the quality of the meta data catalog. And there's a concept of integration of the annotation service be to note and be to find. It would be in able and uses to annotate data sets in the to find for example with apologies. So, you can notice already integrated to share. And we want to go ahead to create this as well. Okay, this brings me to the end. Okay. Yep. Then if Micaela is ready then we can move straight on to the part of the thematic talks and the data session. It's okay for me I can share the screen. And that's okay and then we can as I said cover questions later any questions feel free just ask them by slide or we'll cover them as we reached this the end of the double pack of talks. Okay, are you able to see the slide. Okay. Yep. Okay now I'm going to introduce you services for that and with data management. Okay, basically, what we aim to do with this task that I'm leading is to provide the user with the ability to manage and users that in a fair way. So, following basically a fair principle, the user will be able to search for distributed data in Oscar and beyond to have similar sources to distributed that resources where for similar sources well I mean the user are able to assess resources and services within the federation providing their own credential previously obtained by some identity provider or rage or rage in IDP or site without registering twice. So they are able to to assess all the services in the federation seamlessly interoperability in the in the mean that the users are able to share and publish research output to the following open standard tools interface and reuse in regarding a skin gene and staging data. So in in general Oscar, but we support the end user regarding seamless assessor finding localizing transferring and using data resources for scientific purposes. I give you a briefly service panoramic of the main services for data data management. We have currently a bit too safe, which in particular I lead in regarding the development. And currently, we work together with some other partner and engineer software engineer in order to provide functionalities. The be too safe service basically offers functionality to replicate the set across different data centers in safe and as efficient way, provide the data replication metadata rule management and as well as PID management where PID is basically the persistent identifier. So the users are able to replicate their own data and find again them using the PID and metadata associated to their own data. They be to be to share instead is the web based service for storing and publishing data sets for European scientists. The service utilize other that you that services for ability and that are retention a basically is is deputed to store and publish that the set also relying on trust repository. In order to provide a manager that supported the environment. The be to stage instead is a suite of services that they are used to in order to transfer data and stage out the data on the user resources and services and basically relies on great FTP web and HP API. Well, the be to drop is a sink and share service that offer a collaborative way to work on documents and synchronize that across multiple devices. The be to handle is the you that may persist in the identifier service basically is a service designed in order to to contribute to that type of synthesis you work coupled with the be to save that I described before. And it's used in order to provide the PID to be associated to that object. For a long period of time. And of course this is a very important part regarding long term and data management. The be to know this that is a service that allow you to easily into the really create a notation on research data hosted by you that collaborative data infrastructure. The three types of initiation are provided for semantic coming from identified ontology repository free text word to be used when a specific term is not found every test comment. The be to note can be used coupled with other services like be to share and in general can be added as a component and included in other, for example, what basis services. This is just a panoramic graphical view of the main unit components and a Oscar data services interacting a be to note, for example, that it can be used for annotation interacting with the be to share and be to share and be to save and be to handle interacting together and integrated with a GI and ego data services. During the last year in 2019 the main heifer was focused on solving the most important issue like integration of main be to service as well as the GI service including AI. Metadata management service improvement and of course maintenance of existing services, the improving of the transfer and access among different be to service and the GI service and of course improving documentation. In general, the use case we rely on when we think of the management and management service can be collected from my mainly five sources and generally from thematic service co competent centers communities and all the new communities entering in this is very important for us in order to define use case and requirement for the service that we are going to implement and provide some of these services and communities have been already described before, but anyway, just to give a brief panoramic. We have, for example, for climate data, but also beyond that for computational method for biomedical application, the area for arts and humanities and so on. This is a short summary of major achievement in 2019 last year we particular we provide a new version of the be to find with an answer that the schema and the new graphical user interface, as well as the integration with the GI data hub, the be to stage and be to drop service have been upgraded, including integration with the health care service and the be to assess. Of course, relying on basic standard authentication protocol like, for example, what to open ID. We move that the collaboration with open higher and the be to safe integration with be to sharing, including also the support to Python three migration to Python three for for the client tool as well. And the improvement of the be to know the integration with the be to share and open higher and integration with the of the be to end with the GI data hub and the be to share improvement with the general improvement regards they also meet the data and feed records management management and introduces support to Python three. Thank you for your attention. If you have asked a question, please ask. Thank you. Do we have questions in the slide or sorry that you could highlight for us please. Just one question at the moment by Ali. If you want. Yes, I have only one question. This is, can you tell me more about how many data sets are in be to drop. Currently around. Currently, this is this is a point that I have no the number here now. So, because basically regarding the metrics we provided the information for mostly for the be to safe be to share components that we are still trying to define which metric to use in order to measure the data that have been new accounts and data that we we stage it out. So further for the be to drop I have no specific information and know that documents have been staged out but whether depends on the installation and the size that we are referring to. I don't know if, if john has more information than me on with this regard, this respect. No, I don't I would have asked, maybe I'm behind me if he has any, any number that he knows off off hand, but it's also sometimes. Yes, Dr. It's a little bit tricky, maybe Sunday's in the audience even maybe can answer, but be to drop is more. Yes, it's based on cloud so it's changing documents. There's a lot of exchange and deletion and upload. So, but I don't know really at the moment. Maybe it's a more a question of the capacity of. So, or Sunday, are you available. Maybe you are not sprinted to some comment on the on the chat. Are you talking about the underlying reason why you'd like to know this. Are you interested in how diverse it is or what communities are using it or just the general usage. So was your underlying reason for asking. It's not only be to drop but also be to share, I would like to know, because dance is but Olivier will tell you more about it later dances made a bridge to submit the data from be to share in in our archive. And I'm just wondering how many objects of data I have to think of. Yeah, the point is that. Yes, Eric, please. Yes, I was I was just saying that in general, the point is that either distributed ecosystem so you can have multiple installation of the same service, you know. In order to provide such an information like statistic, this is something that we are going that we already discussed also with john in the past, mainly for the, for example, for the be to safe and so on. You, we should previously defined what we would like to measure. And in general, this is not completely an easy task because, for example, we have a discussion for the be to safe and iris and regarding data and metadata and the point was that for some metrics, it was not so easy because we we still need to decide if what we would like to measure. For example, you can have that you want to measure new new data that has been staged out or new accounts for users, but you need to understand in which period of time in following which policy and this may vary also from side to side, because you can have as previously I say you can have data that are staging a stage out and the situation may change from one month to the other. This is valid also for user accounts. So for example, you can have in China we have a special policy only for our data that you have that accounts that are not user for a certain amount of time, maybe remove it. And so for this reason, you can have a different situation if you respect if you want to measure new data or new accounts. For example, each month or just to have a like a photo of the like a screenshot of the actual situation you have you get different numbers, you know, so and you can have also different installation on multiple sites of different service. For example, we have no be to drop in Chinaca, but this is just one of the of the point anyway. So it depends. It depends in Chinaca for example, we how we actually look at using the dance service as well if it's been done to one of the proof of concepts for the long term storage services as well. So maybe not all data would go into that. Yeah, exactly. For example, for the be to safe in Chinaca and I wrote only for the data, we can say we have 50 60 terabyte of data from different communities. We have different data resources that are managed with our policies and our data structure and our user base and we have matrix provided using for example, access search or Kibana. And we we provide the data resources spot for for example, lens of Florence or other communities, but this is just something for us. So we are still defining how to measure this kind of information. I tell you if we can take this up offline that'd be nice so we understand, you know exactly what the usage cases behind you so it is a question where I understand where you're coming from. We'd have to talk to you and it address this it depends aspect. So we could understand what term. Okay, you're in the project. Let's talk. Let's see about how we use that. Is that okay for you. Yes, yes, we can talk about it another time. Yes, it's okay. Yes. Thank you very much. No worries. And regarding Sean's question, I guess we have a suite of sites which do have certification, etc. If you're talking about the deeper certification. Then maybe you want to wait until after the talk from Olivia. So the those where we talk about the ETR so the sites which really do have ISO standard certification. Another stronger ones. Okay, that's fine. Yeah, I think I really to address this as well if you're talking especially when we get into as I said the, the deeper certification where we really have ISO standard size which are certified long term will hold your data for decades. Then, Olivia is the guy to talk to probably after his talk, which comes next anyway so. So moving on to the last part of the session. So again, we're grouping thematic talks these are two services, which are being provided or two data services specifically. And then we'll talk to us about the long term data preservation, and then we had Abdul Rahman to talk about the services for sensitive data. It seems that Abdul Rahman can't actually make his talk so I will do my best to walk us through his slides there's a lot of detail in there for you to go and look at after. So I can give you the flavor of this, and then more detail things, then you're going to have to ask offline I'm sorry about that. I suppose this to give you a broader understanding. So if we can first start with Olivia about the services. Can you see my screen. Yes, it's fine thank you and you can aim okay great. All right, so you've heard about services bound to data just just before my presentation. This will not be really different. I will describe now a long term preservation service which is provided to ensure that the digital assets remain fair. So you've heard of fair. The main difference with a long term preservation is the time dimension. It's a service that's intended to be provided over years and when I'm saying years that's decades, decades means 4050 60 years. There's no limitation in duration. So, because of this time dimension, these services are rather special and they have to include capacity and resource planning and long term preservation techniques and technologies. And that's what I'm going to describe right now. So, but they also combine policies processes around quality insurance essentially to ensure that this natively digital or or converted data remains accessible findable reusable and regardless about regardless of the challenges of a technology called changes that are to come. Of course, because we cannot say what storage is going to look like in 50 years. We cannot say what file formats we're going to look like in 50 years. There are risks and we have put together some action plans to mitigate their risks when they occur. So, those long term preservation services are provided by two data centers as part of the Oscar CNES, which I work for is one of them. And the other one is dance and you've heard any just before I start my presentation. Long term preservation is not that straightforward, especially for scientists. And we thought it would be a good idea to integrate some already existing tools in your scope in you that in order to facilitate the work for researchers to deposit their data into long term preservation platforms. The acronym for long term preservation services or platform is ETDR in the Oscar ETDR standing for European Trusted Digital Repository. So, in order to facilitate that work, we thought it would be smart to integrate be to find be to safe and be to share in order to ease the ingest process. So, this is what we've done at CNES. So, basically we rely on be to safe and be to find be to safe in order to allow data transfer on to our facilities. Once the data is transferred, we will perform some quality checks. So, basically, we will validate the formats, generate and compare the checks and check for antivirus. And then we will do some was potential post processing in enriched speech earlier you've heard about the other pilot, so the other pilot was done at CNES. And in addition to the be to safe and be to find services, which have been used, we've used those also some HPC services in order to extract some information and metadata out of the packages that were sent by the museums. And we used our HPC cluster in order to do that. Then we will index all the information and make it available for other things through be to find. So it will be findable through be to find portal or the internal portal that we have developed. In addition to that, we will send the packages on our long term preservation platform where it will be stored for years and even eventually converted if there are any risks or five format of solicits or metadata solicits and things like that. So this is something that's been running for a year or so. And on which we did the pilot pilot. In the meantime, dance implemented proof of concept in order to provide another interface in addition to the be to safe that we provide in the CNES ETR and they chose be to share and try to implement an interface between be to share and their data vote, which is their long term preservation platform. So that was part of the pop which has been successfully executed last summer. The changes made to the be to share code are available in GitHub and you've got the link here. So potentially that would be a second instance of the ETR. So the major things that we have achieved over the last year is that while the ETR instance at CNES is now part of the EOSCRAP catalog. So it made it to the marketplace to put the link here. And we also completed the be to share integration in in CNES and the be to share integration in dance. So what other announcements could we do to the existing platforms at CNES? One of our objectives is also to add be to share in the panel of tools available to deposit data. That could be for the long term, but that could be also for the medium term so that that would be a second kind of service, a little bit different from the long term that we provide at the moment. At dance, the potential things to come are, well, it's it's about strategy. Be to share is about to be upgraded to be new three, at least the be to share instance that's run at surfsara surfsara is the partner of dance in the in the in the pop that that we did last summer. And this upgrade to the new three as a side effect, because the the SWALT two service that they implemented should be upgraded to SWALT three. So this has of course an impact on a possible deployment in production and that's what's been discussed at the moment. And other things that that we got on the radar here is here. Sorry. So I'm sure I'm supposed to be happy. You can see the fortress logo. So one of the things that we are in plan is to test the new version of the fortress in CNES. I've been credited with the data seal of approval, which is the previous instance of the fortress. See, courtesy is being revamped, as we speak, because it will try to map the 16. Details of the fortress deal with the 15 recommendation of the fair principles. And this will lead into a revised certification process. And we will try to test that as part of momentum of understanding that that we are between the Oscar and first. Right. That's me done. So I suppose that you will have few questions after the talk. Okay, thank you, Olivia. I'm going to do my best to do the presentation from Abdul Rahman here and now and then then we can move on with the questions. I can share my screen. See if this is the right one. No, I'm sharing the wrong one. The questions. Presentation from sorry people. Okay, can you, can you currently see what I'm sharing? Is it the top from sensitive data services? No, we see a black screen. Blackspot. Yes. I have to go. Sorry, again to try to get this to come up. There's my zoom session. Okay, stop sharing. Right, I'm going to give it another try and then we'll see where we're at. Otherwise I can ask. Okay. Can you see this? Yes. I don't know if I can. No, it's ugly. That's broken. Okay. Give me a second. Yeah, but it's, I think it was zoomed in and so from my screen it looked. Total. I want to have a better word. Okay. No. Can you see the? Yes. Yes. Yes, it's not in presentation mode, but we can see it. I can move to presentation mode. New slide. Is it still good? No, it's black now. Wait a second. I know. No, I don't have windows. No. I'm struggling to make it because I think it's because I'm sharing through the browser. So I'll see if I've got to download it. Let's show it like this. Yes. So, okay, we can go through it anyway. This is the best I can do. I think at the moment, sorry, really important. I don't have windows. No, I don't have windows. No. I'm struggling to make it because I think it's because I'm sharing through the browser. So I'll see if I've got to download it. Yeah. It's not in presentation mode. No, I'm struggling to make it. I'm struggling to make it. So I'll see if I can go through it anyway. This is the best I can do. I think at the moment, sorry, really apologize. So, uh, as I said, Abdul Rahman can't make this session properly. But um, what I want to talk about. There's the sensitive data services. So as I said, it's, um, it's a specific aspect of services that is provided through e-oskets in the marketplace. You can go and use it. And there's a few use cases which are covered here. And this is what, what I'd like to focus on mainly. which is a secure cloud service from CSC and this provides in general this is an infrastructure as a service so as I said that we've got different use cases which are covered here, EPUTA is providing infrastructure as a service so you can actually create your infrastructure within a secure environment you can access that infrastructure from externally and you can manage services within though you can actually manage your VMs etc and it's good to highlight that there are web user interfaces and there's clients to actually access this and manage it. If you can see the screen now the EPUTA data access is via secure from your site so there's a secure tunnel across to the EPUTA infrastructure when you actually create your infrastructure and you can also at the bottom here you can also see that you can administer this infrastructure as well from external so you can actually gain access to these services which you create and the VMs you create and administer them so this is one of the core use cases creating the infrastructure managing your data within this infrastructure and gaining access to it. The service itself as I've said is provided by CSC you can apply through the EOS hub through the portal through the marketplace to gain access to this service you can set up your own networks you can establish the connection to these networks or gaining access to the services you've got set up inside this secure cloud infrastructure and there's a nice link to more information here so this is where you should turn if you have this as a use case. Alternatively the University of Oslo offers the TSD which is also a sensitive data service but this is more software as a service so TSD in a nutshell this is a nice slide which shows that you gain two-factor authentication to the services which are internal within the TSD service you can actually build upon these and do analysis on these services using the data which is stored inside the TSD service so the data should stay internally. The next slide shows just the building out and if you if you can see okay here that you get two-factor authentication into that but then you can have multiple machines etc. setting on the back end. It's nice to see that there is a rapid growth and uptake in this service so you can really see that through the years we see these jumps you see this uptake you can see there's lots of researchers using it and communities using it which is very nice to see and a lot of data I mean two-factor bytes of sensitive data is also a lot of data. Value added on top of this for the TSD service the web forms for online questionnaires and secure data collection that's obviously for many communities very important. There's APIs and also that allow you to gain access from your smartphones etc when you're defining apps which also again many communities really appreciate this and it's all it's not just providing the the end service and the infrastructure but really providing a suite of services which makes people's lives easier to manage and to provide these services secure data services. There's work done on gaining secure container infrastructure as a cluster of containers so this slide addresses that and then one thing I did want to highlight here is this TSD the content system, content system sorry so when people actually make this data available then we need a system though there's a need for a system which handles the content so people actually can actually say you can gain access to this data you can use this data but it's very important as well that this can be revoked and managed and so the TSD system offers this as well so the suite of services takes a lot of the weight off the researchers when they're managing their secure data and so that's nice as well to see that these services are fought out and effectively end-to-end providing the systems and the services which are needed so this is an overview of the design of the content portal so the major thing is that you see that there's a portal underneath which is the portal to the outside world and then underneath on the TSD world actually the management of this data and the access to this data. You can query for this content so you can actually ask to be given content to certain data sets which obviously also is a very nice feature so brief summaries platform and software as a service and so that this is nice also we take care of you as I said they provide a suite of services which actually take care of your needs and it takes a lot of the weight off the people doing the research and actually then says we'll manage the secure stuff for you we'll provided the services you need not just the infrastructure and I think that's very nice as well and it also gives access to the back end to the to the large HPC systems etc which is which I think is very nice as well the content system we've mentioned the anonymization of data structures is also there as well we'll mention that a little more later on you can gain access to this by the portal as usual I think portal portal go for the marketplace always people and you can apply for projects there's a nice slide here with an overview of how to apply for projects what you need to do and please if you have questions take a look in the and go through this again the last slide gaining access to it through the portal directly and there's something about prices one thing I didn't mention at the start is these are services on the back end they've grown out they're quite multifaceted but also there's the integration with the EOS Cub and there's a few aspects here which are well worth mentioning so one is the work with the the B2 share service so B2 share is actually a generic service within the EOS Cub it's well used well established it's well understood by the researchers and it's a nice thing that they actually can use this service and there's been work putting place to ensure that the sensitive data services are available via the B2 share service so what this is meant if we move down to the next slide is that we need a secure B2 share service and so there's been work putting place to actually ensure we have a secure B2 share service that then you can use and we can use a B2 find service to look for sensitive data now obviously this is more difficult than the standard way of actually finding data etc because even placing the names of the data sets etc could expose some information about the internal data so this is something and assigning PIDs to the data etc something where we're looking at and evaluating and we're making progress upon but it's obviously something that we have to be extremely careful about and to ensure there's no leakage of any information from the secure data out into the outside world even in the forms of URLs names of datasets maybe and the PIDs so this is all work in progress one of the main aspects around this is as we said on the left hand side here you see the users and you see the the standard services which are used to being exposed within the ERC hub so B2 find B2 share these are the services that the users know and trust and have seen can manis can use and it's nice that we can expose the back-end services no matter how varied they are via these services and it's this is where we're putting effort now to actually see how far we can evolve this as I said it's normally quite a lot harder than the standard services when we have to be more careful here there's a lot actually of work being considered with how we manage the secure containers so obviously the container services in themselves we have to be extremely careful how these containers are formed how secure they are each time there's updates we need to go through again looking at security aspects with them so there's been a lot of that to work done in this aspect and progress is being made we're doing very well I think it's nice one nice thing to mention as well is the amnesia link to the open-air project so that's the anonymization tool so this is again because you can't give out data and you need to anonymize it before it's actually passed out for the communities or before it's been given out to the projects so this is a nice cross-link to other projects to actually say we can work with open-air to expose this data but also to look at their tools for our exposing this so if we have the data in-house or we have the data within the project we can actually work with our projects and this is a nice thing as well to do this this cross-project work and that's something we've actually worked upon quite a lot to make sure we are focusing on that to understand we're not a data silo we're not a standalone project so in summary using personal data for research purposes pauses a challenge for infrastructures I mean this is very obvious but we can work on solving this so CSC and UIO through the EPUTA and TST services are looking at two slightly different approaches of doing this but it's nice to be able to cover these two different use cases it's also very important for us to say that these are long-lived and solid services where a lot of effort has gone into actually ensuring that they are secure and we're exposing them now increasingly via other services and via EOSC hub mechanisms as well so we're looking at the integration of that taking up these long-lived services which we trust and exposing them as well they are in the catalogue they are available over Nordic activities I would like to not go into because I'm not the person with the real know-how here I apologize that we skimmed over this I hope you've got a flavor of what's happening here and the fact that different use cases are being addressed by different services and we are looking at a cross-project work as well looking at other project within Europe to actually see how we expose this data and we are ensuring that these services can be brought out and be made part of the EOSC infrastructure and one thing I've stressed before about this integration activities and we see it again and again is once these services are made available within EOSC we can actually work on integrating them with other services and we can gain added value through that and this is something which has been one of the core aspects that we focused upon throughout this project ensuring that we have off-the-shelf and stable services but ensuring that we have this data integration and these services which can be exposed and linked together okay that is I think as best as I can do there's a rich set of slides please go through them and if you have any questions then we're willing to take them up now okay so it seems the at least in the slide all the only question was already covered so Sean are you happy with the the response from Olivier that you understand that we are well plugged into the car to seal and we do have certified sites yeah yeah no problem okay are there other questions okay just to wrap up okay so I can steal the screen sharing I hope and if I can John are you still there sorry it is just me that I cannot John so I can't hear John either he is his microphone is on maybe let me write this then we cannot hear you well so in the meantime I do some housekeeping and so for tomorrow so tomorrow I I'm going to paste in the chat the link to the agenda on the agenda you find the new links for tomorrow to join the last day of the US Cup week and like today tomorrow we start again at 10 a.m. and we start ready with the breakout sessions so make sure as usual that you join 10 minutes before the start of the meeting in any case I will send you an email like I did yesterday with the details about the links John give us a sign he went out and now is admitted again so maybe okay so he's trying to reconnect yes I see that he's joining anybody else from the speakers wants to say something in the meantime yes this time maybe yeah I want only to stress it already said challenge is so we can integrate a lot of actually Irish you have a lot of background noise so it's difficult to understand what you're saying John can you hear us well I suggest that we close the session because I don't think that John is able to come back yeah I'm here if you can oh great right fighting against zoom but it's all my fault I think so so I don't know if I'll try to share screen again and see if I can manage this to just to just wrap up but I mean the main thing is I think it's just want to say thank you for people for attending so it's it's really important to us that we actually get contact to the communities and the people out there that can use these services and it's good to have the feedback and so that getting the questions and always contact us always come with questions I think two of the major aspects that we've come across is that we're providing off-the-shelf services so you've seen services today you can get them through the marketplace please go to the marketplace everything's available for you there's information about these services how to gain access to them how to gain support for them and also the documentation etc and then we can use them but then secondary to this is also come to us and talk to us about integration activities we've done that throughout the project and even after the EOSC hub project ends this year there's still going to be this focus on us wanting to address community needs and these integration activities as services are built into the EOSC we really look at this and we look at these added value situations where we can actually understand that obviously certain services are linked together and we can actually provide much more value if we actually start doing that we're doing it we want to increase upon that so please come and talk to us about the use use cases that you have and I think that wraps up for now and also thank you to the speakers and thank you to trust IT for supporting us through this because we've difficult at times but it's very very nice and I think we've it's been a very nice conference experience to be able to do this online and it's changed so much so thank you thanks John thanks to you so while you were away I was just saying that I pasted in the chat the link to the agenda for tomorrow we start at 10 a.m. again and there are the new links to connect to the sessions so thanks everyone have a nice evening okay thank you people thank you