 We'll record this session, we're already recording this session and we'll also share the slides with you and we have two excellent speakers today, Elie Papadokolo from the open air talking about data privacy and we also have Abdulrahman Azab from U.S. Cobb and University of Oslo who will walk us through sensitive data services and we'll also start with brief presentations about the projects co-organizing this webinar U.S. Cobb and open air and I invite Isabelle Kampas to start with short U.S. Cobb presentation and by the way if you have any questions please type them in the chat. Thank you. Okay, thank you very much. I will present in the next four or five minutes the project Elie Papadokolo as you see in the first slide is an overwhelming coordinating effort with around 100 partners which will run until December 2020 with the aim of integrating and managing services in the hub. Next slide. This is more or less the layout of the project. We are mobilizing providers from the EGI Federation, the European Grid Initiative, from the AODAT storage platform and from the Indigo Data Cloud Service provider to offer advanced data driven research and innovation services. This includes data privacy and security oriented services for data. These resources are offered via the hub which I will describe now. The high level goals of the project are simplifying the access to resources and services by providing an open and integrated service catalog, reduce the fragmentation of service access and provisioning and the way to do this is implementing things through interoperability and standards and good practices in service implementation and development. A more long term aim is consolidating research infrastructures by improving service quality with the final end of widening the access to services to all user groups. This is a very ambitious project in terms of objectives. For that the project has created the AOS Hub which is a federated integration and management system in which you can find services, you can find federation services which are meant to have the whole system operating. These are not services, let's say, oriented to the users but to the infrastructure managers. Then you have services like data application tools, services, training, consultancy, et cetera, which are more oriented to end users. We also have quite some activity in processes and policy implementation of guidelines, et cetera. What can you do with AOS Hub? Many things, of course. To begin with, you can do computing with your data. You can run computational jobs at a large scale on the AGI infrastructure much in the same way than the LHC experiments do. This we know. We know how to do it for many years. If you think you need a service for distributed computing, for distributed analysis of data, just ring the bell. We have centers like CSC and the University of Oslo. This will be a portrait later, but you know, a presentation that are also providing tools and services related with data privacy. What else? We have federated computing and infrastructure as a service and platform as a services to support cloud computing and data intensive workloads. I put the links there in the presentation so that you can check when we are done or you have questions. There we can host also long-running services like web services, databases, et cetera, and we have infrastructure that is possible for testing and development. There we have very nice features like single-sign-owned Docker containers and a service of technical support during the project lifetime. Very fancy things. We have very fancy things like Jupyter notebooks that are so much in-face and which perhaps is a nice way to go to towards the famous intelligent publications, intelligent articles. Okay, so this basically consumed my time. I put here the web page of the project, eos-hab.eu. I said I have added there some links that are at the marketplace that is now brought into the eos portal and we'll evolve from there. I hope you find something interesting there. Thank you. Thanks a lot, Isabelle. If you have any questions, please type them in the chat and I will briefly talk about OpenAir project which supports open-sign-scolary communication and also offers monitoring services on open signs. We do this by building a scolary communication graph. We harvest metadata from open access repositories with publications, from data repositories, from research information systems, from any other providers of scolary content and we did duplicate this harvested metadata and we provide links between publications, data sets, funding information, and other interlinked scientific products and outputs and based on that we offer different services. So for example we have part of our portal which is called explore and that's a place where you can find publications, data sets, most of which are available in open access. We also offer zinodo which is a shared repository. Any researcher could deposit any research outputs. We also provide monitoring services for funders. We provide research analytics services. We have information about open science developments in different countries and we help funders and institutions monitor update of open access and also any other open science related developments. We also have services for content providers, repositories, open access journals, data archives. We enrich the usage analytics. We also enrich metadata with additional information that we collect and we also have services for researchers such as helpdesk where you can ask any questions about open science. We organize training events like this one and I already mentioned zinodo which is a repository for publications. We also have a new portal which is a virtual research environment for research communities where different research communities could store and share their research outputs and we have a data anonymization tool called Amnesia and we are launching a machine readable data management planning service and everything we do is available as open APIs. One more slide with a list of specific services for researchers. Zinodo repository, Amnesia tool to anonymize your data sets. More information about open science training and support. One of our products is also school explorer which helps to interlink data and literature. That was a quick introduction and now I hand over to Eli for the main part of the webinar. Hi everyone, I'm Eli Podokulio, I'm a librarian and I work for Pima Research Center in Greece and my affiliation in my role is to act as the national open access desk for Greece and for this webinar I have prepared a brief introduction to data processing which is part of data management life cycle focusing on data privacy and sensitive data so that it ties with the services that we will present later. For data processing we'll see what is data processing and what is the position of data management life cycle whether it's components and moving on to data privacy and what are the elements and how to handle sensitive data. So data processing is the operational phase zero which raw data is being manipulated to result meaningful information so it's basically all processes taking place immediately after collection or creation of data until the deposit of data. What are these data? These are textual files, images, audio and of course they're metadata and that they can be electronic, digital, analog and physical informatics, files and all the materials and here I have some basic examples where you can see a table with information from a survey that were imported in the spreadsheet which of the raw data and then down after the analysis of this data you can see visualization of this information and then on the right hand you can see some photos and then edited and filtered after processing. So data management life cycle has many stages but data processing takes place between data collection, data presentation and it has to do mainly with handling and declaration of data and it's something that we have to have in mind when writing data management plan and this processes involve ingestion or aggregation analysis classification of data like the data enrichment organization of data validation storing so basically things that have to do things that we do in research is do data well they combine pieces of data where they to ensure that this supplied data is correct and relevant or to to make to deposit the data to prepare data for deposit in a proper format supported by the repository or the secure place that they will be depositing it and also all these processes are by here there might there might be reprocessing of data for data migration to new formats and softwares so that they are and so that long-term preservation of data is ensured and also data processing involves data disposal as well some things that have to do with data processing the data processing in a data management plan and in a fair manner as as are the specifications and the requirements coming from the european commission and we have to make sure that the type of data are deposited in the format that it's supported and by the repository that the metadata are created with standards using some protocols and standards and that persistent identifiers are applied to the data sets these are all part of processing of data and has the role to play in the compliance of the data sets for that this information can be also found in data management policies or repository policies of organizations and projects and they contain retention information and permissions for how to withdraw or when to withdraw data what the exact format of the data should be what are the metadata standards that are supported by the information systems and what to store the data and how to back it up and all this information regarding data processing and what its organizations processed are highlighting in those policies the organizational policies so for data privacy and sensitive data zooming in and there are currently in the EU two major major laws and directives so one is the general data protection relation which has to do mainly with professional data but also with sensitive data as well so that the user has better control of the data acquired by commercial providers and then we have intellectual property rights and all the directives from copyright to trade secrets and inventory patent systems sensitive data can be can have multiple types so personal data can be sensitive data as well as in the data we shouldn't forget in the data always confidential data are sensitive data are perceived as sensitive data because there are trademarks and investigations data involved security data as passwords financial information national safety and military data as well sensitive data are also data protected by intellectual property rights and you can also be location data geodata and mobile phone data data coming from mobile phones and also from endangered species and all this data coming from the biodiversity community and also a combination of different datasets could be sensitive data and this is why an organization is crucial when we handle sensitive data are more specifically possessive data and all these categories and racial or ethnic origin data political opinion supply here religious and social beliefs trade union membership genetic and biometric data physical or mental health sex life or sexual orientation and criminal offenses but what are the best practices of ensuring securing your data and kind of the specific data i've only listed six six here it's you can secure your data by applying access controls by having passwords for the data sets and having a firewall to avoid malicious attacks by hacking or some computer viruses you can anonymize your data by removing or aggregating variables or reducing the precision for detailed textual variable so that even when they want to anonymize dataset combined you cannot be able to identify the subject or the specific sensitive data and you can encrypt data using encoded digital information you can even you have the control and you give the key for decoding or decryption sorry or decryption to specific subjects and users that you know and you can serve your data in a secure place of course this doesn't involve cloud drives like Google drive or one drive since these are from commercial providers and they're connected to internet so it's not that safe and it's not that it's a bad practice of selling data and store data in an isolated machine preferably where the server is not connected to internet so that we avoid all malicious attacks and secure disposal is also a key when a key issue when we want to delete when we want to delete the data so this means that we have to make sure that no data recovery is possible so by by deleting only and we don't secure this we have to make sure that we uninstall some tools or softwares or proper delete from the root the source that we're working on these are some useful resources where you can join information about data security data privacy and sensitive data I've also included the amnesia in addition to provided by opener here as Irina did as well but just so you know opener has a task force at the moment working producing some guidelines for sensitive data with an emphasis on long-term preservation so this is for filming and we will be letting you know when it will be available. Questions I think at the end but thank you very much for your attention and now thanks a lot Ellie we just need it and now Abdurrahman will talk about sensitive data activities in Iosqab. Yes, hello my name is Abdurrahman Azab I work at the University of Oslo and services for sensitive data at the University of Oslo and I'm co-leading task 6.6 in Iosqab that is the sensitive data activities. The sensitive data services in Iosqab includes two services one is services for sensitive data at the University of Oslo where I'm working and the other one is the sensitive data cloud and this is at CSC in Finland and I'm going to present both of them briefly. First I will present the ePOTA secure cloud that is the sensitive data cloud at CSC in Finland. There is a video about it I will share with you the link in the chat here and let's go through the description. So CSC, ePOTA is cloud in Finnish and they have two cloud services one is cPOTA and one is ePOTA cPOTA is for non-sensitive data and ePOTA is for sensitive data. Both of them are based on OpenStack and they are infrastructure as a service infrastructure as a service it means that they provide to you the the cPOTA the memory and disk and you generate your own virtual machines and you manage them means that those who are going to use ePOTA which is the sensitive data service at CSC you need to have your own engineer your own software managers who are going to install the operating system and they are going to install the software on the top of that. This is the disadvantage that you need to manage your own things the advantage of that it is flexible you can install your own queuing systems your own software there is no limitation what it will provide to you is that it will provide a collection of virtual machines that is located in CSC that will be part of your network and I will describe how this is happening. So this is your site assume that you are a hospital for example and you have a machine here that you want to administrate the results in the cloud and you will connect to CSC in Finland through the internet this will be encrypted secure connection and this will be your end users researchers and then there will be virtual machines on the other side if you are going to use ePOTA for storage for high performance computing for both so those will be your resources the IP addresses will be included in your network IP addresses it means that you will not feel that this virtual machines these virtual machines are somewhere else they will look like that they are part of your local hospital network so your researchers will access them like the same way they access any local machine in the hospital but as I said this will be infrastructure as a service so it is your responsibility to install things there and this is more on the technical side so there will be MPLS connection this is a secure connection your switch here and there is another switch on the other side and there is the admin mode and there is the user mode the admin mode is through a web interface that will access CSC through a firewall where you can generate new virtual machines you can give access to new users and then the user interface where you are going to access those those virtual machines through the hospital network and for the design choices there is no internet connectivity from the virtual machines you will not be able to open browser on the virtual machine and you will need to provide your own network and the list of IP addresses and if there are broken disks physical disks are not sent back to the vendor it means that we will not send the disks that contain sensitive data to Dell for example or to Mac Store or to Seagate if the disk is damaged for them to fix it because it contains sensitive data if the disk is damaged it is that's it so we will not repair it with the vendor because it's sensitive data those are design choices and this is how to access to ePOTA apply for a CSC project you can apply through eUsCup and as I just answered in the chat for eUsCup services all services they have at least a quota that can be accessed for free so through the eUsCup marketplace you will see for example for ePOTA and CSC what is the the free package composed of how much virtual machines how many CPUs how the storage description everything about the free quota and this you can get for free through eUsCup if your project is big and you want to do advanced research you need more and more resources this will be for pay and then you can find the ePOTA pricing just if you type in Google ePOTA pricing you will find the price page and this is the the link where you find details about how to get connected to ePOTA so this is about the secure cloud ePOTA and this is the sensitive data platform it's sort of a complicated because I mean for for for people who don't have much background about IT so I will describe it in brief this is the connection from outside this is the CSC network there is secure storage for for storage for storing sensitive data and there will be also possibility for having a desktop connection and this is a new feature in ePOTA you will be able to have a remote desktop connection to a virtual machine next is services for sensitive data at the University of Oslo and there is a video about TSD here I will paste in the chat so TSD is TNSTF or sensitive data TNSTF services in Norwegian and the difference between TSD and ePOTA you just described that ePOTA is infrastructure as a service so they give you the secure connection to a collection of resources but you need to be able to manage these resources you need to be able to install your stuff in TSD it is different it is not infrastructure as a service it is platform as a service and software as a service it means that if you are a researcher or a group of researchers who don't have any background about IT about programming you are just doing research using some tools to analyze your data you don't make tools you don't install software TSD is the right place for you because we installed the software for you we manage the resources you ask us to install this software we support both Windows and Linux so you need only to do your research and analyze your data this is how the TSD looks like so this is a firewall and you access the TSD through two-factor authentication two-factor authentication it means that you will have a username and password not only those two but you will need to have a one time code you will install a software called Google Authenticator on your mobile and it will generate one time code that expires every one minute so every TSD user has this on the smartphone and you will access TSD through that and through the firewall you will be able to access the project DM and this is remote desktop if you are a Linux user we will provide a Linux VM for you if you are a Windows user we'll provide a Windows virtual machine for you and those virtual machines are connected to the secure storage each project in TSD has its own secure storage that is totally isolated from other TSD projects there is a shared area in TSD and this shared area does not include sensitive data if for example there is some reference genome which is something that can be used by different projects we put it there because it is not sensitive if there are some pieces of software that can be used by many projects we put it there because it's not sensitive in addition to that we support high performance computing and this high performance computing is a slurm cluster it's a collection of nodes that are connected through infinity band which is high speed connection where you can do high performance computing like when you are doing genomic analysis for variant calling or or even sequence alignment if you want to do this in parallel or if something that is requiring a lot of CPUs a lot of memories we have huge memory nodes for memory intensive things and we also support we bought the machine that's called edico dragon i'm not sure if anyone of you heard about this dragon FPGA processor for genomic pipelines we have one in TSD and this can reduce the time of for example variant calling pipelines from three days to 20 minutes uh i got a question here is dsd storage located in europe yes it is located in norway the storage is located in norway at the university of oslo physically of course and this is a more detailed the structure of tsd so this is the user access from outside through two factors indication each project has its own isolated area and you have a access to a collection of virtual machine this is colossus this is our hbc cluster each project will have a storage area which is the main storage here the main storage and also some smaller storage on the hbc cluster this storage is faster than that storage but it is more expensive so if you are buying for example 100 terabytes here and you want to buy 10 terabytes there it is double the price so one terabyte in the in the main storage is half the price of one terabyte in the in the in the cluster storage because this is an expensive storage the reason that it is very fast so accessing the data from the compute nodes to the fast to the to the cluster storage is much faster than accessing it from the virtual machines here to the uh to the main storage uh yes it is 100 secure and we it's uh to to make sure that it is 100 secure we uh periodically perform some sort of penetration testing which means that we get some security experts uh and and those security experts are worldwide known and we ask them to attack our systems and find vulnerabilities what are the problems and we did this for for ipota we did this for tsd the generate reports for us what is what is ideal what needs to be considered and we take this into consideration and after some time we hire other security experts to try to hack the system again and to to see what are the security holes so uh we we are taking care of that yes uh yeah about the the isolation that's that's a good question every single project has its own virtual land it it looks like uh i will make it more clear for it try to make it more simple it looks like each project has its own switch physical switch because virtual land is physical separation of networks so this storage is not seen by any of these virtual machines and that storage is not seen by any of these virtual machines so each project has its own virtual land this part of the disk is accessible only by those virtual machines no one can get access to it it means that if someone a malicious user gets uh hack one of these virtual machines and get root or administrator he can do nothing and also we have root squash which means that if you are a root or administrator on a virtual machine you cannot access the project data you have to be a valid project user to access this data by default every single user has import right can give data from outside to inside but uh yes but only project administrator has export right we can take data from inside to outside but the project administrator or the project owner has the right to ask for export rights to other users if if needed but this will be the responsibility of the project administrator we have support for new service and this is an ongoing work so in in in this stage here you see that there is an HPC resource and this is a cluster that is shared by everyone of course the data of every project that is processed in this cluster is totally isolated but still everyone is accessing the cluster which means that if someone is using a lot of resources right now then you have to wait until there is these resources are available so what we are doing here is to create a fully containerized cluster so this will be the project dlan inside the tsd you will have your own cluster that is containerized docker base and you can have more or less compute nodes as you want and we will have support for both singularity and docker containers as to to run your own jobs and by now we have support on the HPC cluster for singularity so if you get a singularity container you can run the singularity application so far we have more than 500 projects now it's about 3000 users and more than 800 virtual machines and more than two terabytes of data and we are supporting some web forms and APIs for sending data in so the patients can fill in some forms about for example cases of psychology if you want to record the history of one patient the patient can fill in some web forms to collect some information and this will be taken securely to tsd and there is an EPI support where you can connect from e-health devices or smart devices to collect clinical data so we have support for collecting clinical data instantly and we have a consent system and tsd i will show you the interface this consent system since for the gtpr is mainly about consent so the consent system is for patients i mean the the data subject to provide consent on the data processing for the specific purpose so when they fill in the form they will define which data set you will define the purpose i provide consent for this type of processing on this type of on this data set to this project and from inside tsd there will be the verification of the content before you as a researcher start processing or analyzing this data you can verify did the patient give consent on that or not yet so this is in brief the features of tsd you have platform as a service in software as a service support for both windows and linux high performance computing sensitive data web forms data collection using smart devices consent system and we have anonymization of structured data this service is okay i'll describe this at the end these are the steps how to get a tsd project and you can apply through a use cub as well in the same way that i mentioned on e-port so you need to get if you are a clinical researcher you're doing the research with clinical data you need to get approval from rec and this is the research ethical committee that your research is valid and you need to submit some documents about the research and those are the two the most important things data processor agreement between between your institution and tsd so that you give data you assign tsd as a data processor where you are the data controller and then commercial contract between your project and tsd if you are if you will go for the basic package that will provide for for free then you will not need to pay anything if you want some advanced resources then you will need to sign this contract and fill in the electronic application and that's it apply for a tsd project upload your data start working here is the user guide and if you want more about pricing tsd pricing uh some use cub activities that we are currently doing is use the in that be to share to to publish meta data about sensitive data so this how it will be you will have b to find which is the search engine for for b to share you will be the the researchers from outside will be able to search for the meta data about your research once you publish it and then you will request access to this meta data and this this request will be passed to you and you can grant access to the to the data for this research so in brief you will be able to publish some meta data about your data sets on b to share this meta data will be findable publicly by other researchers if someone finds it interesting will contact you for us and then we will you will be able to grant access uh this will be a bit complex and it will not be uh i will not be able to to to describe it here but i think you will have the slides this is the the structure of how this is taking place if you need more deep questions you can ask me later when you get the slides uh we have support currently for you docker and tsd so we support both singularity and you docker and the advantage that of you docker that you work with docker containers and you don't really need to install to convert the docker container to a singularity container like what you do in singularity use the docker container as it is and it has support for mpi parallel applications and ipota is also working with it for data anonymization uh we are currently having a pilot for amicia and this is a data anonymization tool for structured data so it is not for unstructured data if you have sort of a fast q file you want to anonymize that this is not the tool for you but you have if you have your data in a structured format like excel sheet or something this is the tool for you and uh once the pilot is uh tested we will plan we are planning to have amicia instance on every single tsc project we will support this as a service so this is the uh the link for the services for sensitive data in use cup and there are other Nordic activities we are planning to have connection between ipota and tsd so that if your tsd resources are not sufficient you can get some resources inside ipota uh to use within your tsd network and we have trigva which is a Nordic project that includes both tsd and ipota and also a site in Denmark that is computer ROM and a site in sweden that is most of all for sensitive data so if you have a project that you have some data in Denmark and you have some data in Norway and you need to do distributed processing for this data you can contact trigva this is this project what is it about so just google nike trigva you'll find website and all of the information and those are the contacts maria francesca is the leader and i'm the co-leader anti-porceled from csc and in addition to kriss yes that's it so please uh ask questions i'm trying to get where the chat thanks a lot let me also try to locate a chat can you see a chat now yes i can see it yes so um so maybe let's start from uh i don't know from the beginning or from the end i i think i answered the the question about the the security right what kind of guarantees are provided for security i answered that um also the comment says that that sounded more like huckaboo again uh sees a road back that that sounded more like huckaboo when you were describing security but i don't know whether it was just a comment or question yeah so so what i described is that to to ensure security we are doing penetration testing and yes and and and this is sort of trying to hack our system in order to see where the security holes are and we do this politically we we hire some expert to uh to uh to try to to hack the system and see what so the the penetration system is is divided into two parts part number one if you are not a user of tsc can you log into tsc or not part number two if you are a tsc user of project x can you access the data of project y or not so those are the the tests that we are normally doing and uh and we perform at this penetration testing and the results was okay and uh mainly we have a firewall uh for tsc and we have a virtual land separation between uh tsc project so um i hope that this answers that question and how long will the data be stored in tsc and what is the reduction about the lifespan of tsc itself i will start answering the the last question the tsc has been uh uh as a project infrastructure now it is a service so there is no end time for tsc now because it's service that's permanently supported by the university of oslo how long the data be stored at tsc it depends on how long your research will take so you when when you sign the agreement with with tsc you define the start time of the project and the end time of the project after the end time of the project the data can be archived in tsc and we can keep this for some time but you need to define ways to export your data and you will be responsible for your data after you end your research we will not keep it for you forever after you you finish the the analyzing otherwise we will not have enough storage for everyone so uh you define uh how long your project will take and of course you can extend this if it takes longer how do others the access of admins who have root access to the system can they have potential access to sensitive data uh this is a very good question because we have uh let's say different layers of of administrations not every administrator and tsc can do everything we have for example the the normal tsc administrator and then we have another group that is tsc core and those are the ones who can do anything and we have to have this group of people because we need to be able to solve the problems but we have very strict tool that we never we need in order to uh to troubleshoot something whenever we need access to sensitive data we have to ask the project administrator in order to solve this problem i need to access this can i do this or not and uh every administrator is obliged to ask before taking any action uh you have any organization standard you follow and it evidence and and arguments there is of course that almost that this service would be safe we have a risk analysis document so there is a tsc white paper and you can send us an email to give the white paper and we have also a document about the the tsc risk analysis and we can give you that the security standards uh i will not be able to describe everything right now but you can ask me specific questions about what what what you are interested in and i can give you the answers but yes we have security rules for the data storage for the data processing for the access for the the the user separation so it depends on what what what you are asking about but we can provide you the the details um your users are social scientists are not yeah of course then tsc is is for you uh we are not dealing with users that they are professionals no some of our users are are are very competent in it and they can do things and actually they are helping us sometimes to uh to verify things and to test some some new features but most of our users are this type we expect our user to be researchers not to be it specialists um yes uh we are using singularity is it before because of security concerns over docker uh well this is part of the answer yes because singularity by default uh it run containers as the user and docker by default run containers as rule but there is a solution that we provide right now in tsc for you to run docker containers but not on the main cluster uh the the concern here is not about security but but that our cluster has an older kernel it is sent to a six and it is not ideal for docker we will have a new version of the cluster by the beginning of the year uh with a newer kernel i mean a newer operating system and by then we will be able to run docker on the main cluster so do we support singularity yes do we support docker yes our support for singularity is currently wider than our support for docker but this is going to change soon so we have both a degree of support for both platforms and of course we don't use docker as it is but we are sandboxing docker container run so that it is it is secure do you use specific softwares to create the secure environment fair project on the hpc cluster uh the the the i'm not sure what the secure environment here means but uh and with our cluster is slurm and this storage on that on the hpc cluster is uh separated by linux permissions so each project i mean each project who's using hpc not every project needs hpc some people need tsd only for storage so for those who need to use hpc they will have a temporary storage on the hpc cluster they can store sensitive data there and access to this storage is allowed only by this group which is the group who's a member of project x and no one has data access to that directory other than this group security wise singularity is also quite full of yes i we know that so we uh but not right now to be honest since 2.6 the versions before 2.6 we were getting almost every two weeks a an alarm that uh delete all of the version previous versions of singularity and install this new version because we for the security hole and then after one month another email we discovered another security three just delete all of the previous versions and install the new version now it's it's it's quite better but yeah so uh now it is better in addition to the fact that even if you get through it on the singularity container we have root squash root squash is is a level of security that you i mean you cannot you cannot uh you cannot prevent malicious access but it it stops you from making mistakes so if you are root you can impersonate anyone but if you run something as root you will not be able to access the the user data because you need to be a user so uh what we support with singularity we try to test all of the singularity the singularity version against the security risks and if you get if you use the singularity you will not be able to use the shell you will not get interactive uh shell with the singularity container it means that if there is a problem that you can get root you will not be able to access the data because you need to have the shell yeah that's all all these problems you have yeah sorry all these problems you have because singularity is a tool for the system administrators yeah if so in the end what it does is expose features to regular users um that belong to to the root user and then it tries to cut it out to cut so that the the user does not escape from there no and this is very difficult this is that that is bound to trouble no as we know so I therefore I was pointing out here another service so this is a philosophy problem of singularity that and normally they use uh security binary it is very very dangerous yeah it means that it really depends on the tool of the developer not to have not to allow any security threat and so if there is a security hole then the next tool will be on skew on the other hand you don't use using a different tool and for any application to install we so we really recommend you don't over and but the problem is that features and features are not and many and so on in singularity including the you docker is more secure and it is user and so uh I think it is recommended and the but we will be still important to us and be careful about that how are you really really so it's a normal question for Christina he's demil to comply with the right is it better now with an echo I guess it is just wanted to say that there was also a question from Christina TSD may also support to comply with the rights of the data subjects did you provide a system for withdrawal consent is this question about the consent system yes did we answer that already so the answer is yes we provide or yeah we have we have this system to provide consent about processing of specific data sets so if you get some some clinical data you definitely need the consent for analyzing this data and this uh web form that is connected to TSD allows the data subject to submit the consent and yes with the role of the consent because in the GDPR it is essential part the right to be forgotten and then there was another question from Narquez in other words how do you create the sandbox which sandbox is it referring to Narquez is writing yeah I just need that clarification about what is sandbox here he's referring to the secure environment on the hbc cluster a sandbox are you referring to that to the to the solution of one hbc cluster for all or for the solution where each one will have own hbc cluster each one okay so if if you are referring to the to the second solution that is one hbc cluster per project uh each project will have a collection of virtual machines and those virtual machines you can install slurm as a queuing system or ht condor as a queuing system you can install docker swarm also if you are familiar with docker and you will use those virtual machines as compute nodes so virtual machines can be used in the tsc system and in in three categories either only as login virtual machines and those will be very small virtual machines with like two cpu and because you use only as login and to access your data or you can use the virtual machine to run some services like for example if you like to have the open clinic if you want to have our studio if you want to have shiny server so this you will get a virtual machine for that and also you can get a compute virtual machine like a big virtual machine like 32 cores or 64 cores and you can you can have a collection of those to have your own hbc cluster within your own project this will be a better solution for you because no one will you will not have a contention with every with everyone on the same hbc cluster you will not need to wait in the queue to submit your jobs uh for your jobs to be allocated on the resource uh and this will be within the virtual land of your project so those virtual machines are project virtual machines no one else will get access to these virtual machines other than your project members okay narcas is also asking how about the secure environment on the hbc cluster colossus for uh for colossus uh the jobs of a specific uh project will be isolated from the jobs of all other projects for example when you list the the the jobs on the queue your jobs will be listed by your name and the other jobs will be listed by nobody and you will not be able to see the names of the jobs you will not be able to access the files of the jobs so for uh for the for for there are some certain configurations on on the queuing system that we do for for this purpose and now the new policy will be that every project will have its own compute nodes so when when you want to use the hbc cluster you will book this compute node or this collection of compute nodes for the project this will be more isolation these compute nodes are your when your jobs are running and it will access only your data other compute nodes will be used by other people so there will be no jobs running on the same compute nodes from different projects thank you and there was another question from garret so i'm trying to scroll up to find it and i guess it was also maybe too usable so just a minute uh i'm yeah i'm trying to read it now but sorry the suit of view services is very exciting however however there is a perception that it's designed to accommodate big open science projects i'd like to get an idea of how it would scale for small size size projects for example would it be possible to deploy an access and preservation suit for a community archive project for example with atom with uh are hematica that could utilize eosk docker containers as storage eosk docker container as storage are you referring to using containers as storage elements i guess yes uh garret is typing so let's let's see maybe yes okay um um well our use of containers and and even even docker's policy is not to use containers to store data you can use containers for uh for uh tool portability but not for data portability so our always always our recommendation for the users make one tool per container or more but don't store data inside the container we don't like to see a container with 100 gigabytes because containers are not really designed for that uh you can have volumes within docker for example but not to store the data inside the container unless it is very very necessary if you are going to store data inside the container then you can use virtual machine then you can have a bunch of tools and a lot of data in a one in one virtual machine so containers are not replacing virtual machines virtual machines my machines are good for some things and containers are good for other things if you want to have a huge object of a lot of tools and a lot of data then it's better to have a virtual machine if you want to have a nice and small and lightweight virtualization for some for some tools and you want this to be portable then you use you use containers if we open up for having huge containers then we might get into some resource uh problems that we don't want to to get into the complexity of solving that so i don't say that it is not allowed to have big containers but there are some limitations for this thank you very much frechman that was very lively discussion and thank you all for your questions sir so once again thank you of lachman isabel ellie and ellen who organized this webinar and i would like to echo ellen's request if you feel a need for more webinars like that please drop topics in the chat and we'll organize more joint webinars with us cub and open air i will send you slides and recording and i wish you a good rest of the day and thanks again to our excellent speakers and to all of you for questions i really enjoyed it thank you bye