 So good morning, everyone, and welcome to this course on Open Biz. I put on my slide that it's Katrina and Henry giving the course, but in fact I think there's several other people giving the course, so they will be introducing themselves. So we're taping the first part of this course, so please mute your audio and use the chats during this part. The rest of the time please feel free to turn on your camera, ask questions, interact. This is meant to be a hands-on course, so yes, so without more ado, I'll hand over to Katrina. Hi, thank you very much, Nick and thank you for organizing the course and giving us the possibility to present today. So maybe I can, I think that if you stop sharing your screen, sorry, then I can share mine. I usually start with like an overview of what is the research workflow in an experimental or in a computational lab. So usually if you are in an experimental lab, you always have some samples that can be of different types on which you want to measure something to characterize and take some measurements. If you are in a computational lab, you would gather data from somewhere from different sources, and in the end the process then becomes very similar. So you have data, which we call raw data and metadata associated with them and then this data are very often processed and then analyzed and in all these stages, lots of data are generated and metadata as well. And this of course is an iterative process. And in the end, when something interesting is found eventually this will lead to a publication where you can share your results with the scientific community. Nowadays, the finding agencies and the journals have the requirement that the data has to be shared and published according to the fair data principles. So I hope you are by now familiar with them but fair stands for findable accessible interoperable and reusable and here is the publication where this principle were first introduced. The thing is that during the research process lots of data are usually generated so in order to avoid drowning in data. You shouldn't leave it to the last moment so when you reach the publication to think about oh I need to make my publish my data in a fair way. But if you start treating your data fairly from the very beginning then the process becomes much, much easier. So this is what I call the data spread which we see very often at ETH but it's not really peculiar just to ETH. We see these in many other places now we have collaborations with many institutes also outside ETH and I can tell you that there is a very similar situation pretty much everywhere. So what I have here are the different components of what is produced during a research process. So you have maybe the protocols which are the standard operating procedures, the samples, materials, the code for analyzing data and then the data which are the raw data, the process data or the analysis and then still very often everything described in paper notebooks. Then what I have in the middle are the storages available. This is what is available at ETH but probably you have something very similar also in your institution so we have the local hard disk, the network attached storage. This is cost defined storage. These are the tapes for long term storage, clusters and cloud and very often we see a situation like this. So we have bits and pieces a little bit everywhere, no one knows where the things are and the description is still on paper notebooks completely disconnected from the rest of the data which is now very often in digital format. So the ideal scenario that we want to see would be this so I have everything in one place and this can be achieved by adopting a solution which can be at the same time an ELN where ELN stands for electronic laboratory notebook and a LIMS where LIMS stands for Laboratory Information Management System. And this is essentially what OpenBIS sees. So OpenBIS has three components. The first one is the inventory management so you can use OpenBIS to manage samples of any type. They can be biological samples, chemical samples, reagents, also all the procedures and so on. Then there is the electronic lab notebook part where you can describe your experiments and also describe all the processes and all the data analysis, all the steps in processing of the data. And finally there is the data management part so here you can store data connected to the experimental description so basically you have everything in the same place. So OpenBIS has been developed at ETH since 2007 so it has been around now for quite some time. It's an open source software so anyone is free to download it and use it by themselves if they are able to maintain the software. It's a platform for managing scientific information from the very beginning until the end so from when you start producing data from your samples and so on until the publication basically. It can be used in many scientific fields so we started with life sciences. But nowadays in the past couple of years we started working with more scientific communities so at ETH we serve basically the whole ETH community. We also have, you heard before we have some users from EMPA for example and so we serve the physics community, material sciences, environmental sciences and so on because OpenBIS in the end is a fairly generic software so it can be easily adopted to different use cases. It is used by research groups and facilities at ETH where we provide it as a service but also in many other Swiss institutions and also European universities so here are a few examples of the places that are currently using OpenBIS. So in a nutshell OpenBIS is a solution for research lab really to foster and facilitate collaborative work so it's not a software that someone would install on their laptop but it's a client server application so the software is installed on a server and then the users access it via a web interface. You can use it for managing your samples. So this is how the OpenBIS would look like then of course during the training we will see these things more in details but this is just an overview. There is an inventory in the ELN-LIMS component of OpenBIS where we have a materials folder and in this materials folder you have several collections. So what I'm showing here is the default life science version but then of course this can be adopted so it can be collections of anything. It could be solar cells, solar panels, batteries, whatever so it doesn't have to be necessarily life science specific. So if we have a chemical collection then here I see my list of chemicals so I have a table that I can also export and so on. OpenBIS can also be used to manage protocols so these are the standard operating procedures. So again in the inventory part of the system we have folders for the methods and again here we have the default for life sciences but this can be customized if I go to my collection of general protocols. This is an example of one protocol so I have the name for what is the protocol then I have the procedure, the description here. And what I see here listed as parents are basically the things that I have used in this protocol. I can create links in the system to different entities of the system and we will see this in detail in the part of the training. Then there is the ELN component. So by default in the laboratory notebook part of OpenBIS there is a folder for each person and then inside this folder each person can create their own projects and experiments and what we call experimental steps. Then here we have the description of the experimental steps so you have what are the goals, the results, you can put pictures in here and here we see again this section of the very schooled parents. So these are basically the things that I have used in this particular experiment or experimental steps so the samples, the protocols and I can also visualize these as a tree. So this tree is automatically generated by OpenBIS and I see that this experimental step is linked to this media, this protocol, these buffers and then disease and I see that disease is in turn where disease comes from basically. I see the history of all the things. So this is very useful for tracing back the history of what has been done. Then we have the data management component so this is very important in OpenBIS I would say that this is one of the strength of the system because originally it was born as a data management platform primarily. So in OpenBIS the data can be ingested in three ways. The first way would be via the web interface so there is an upload button in your experiments or experimental steps or also in the samples you can upload data in this way via PiBIS. This is the Python API for OpenBIS so if you are more a common line person you can also interact with the system using this Python API without necessarily using the user interface. And then these two ways are suitable for data that is not very large so in the range of a few gigabytes. If you have very large data which could be for example the microscopy data or other large data of any sort. This is not the way to upload the data we have what we call the Dropbox mechanism, which has nothing to do with the Dropbox, the commercial Dropbox program, but basically how this works is that data to be uploaded are put in a Dropbox folder. This could be either a manual process so the users just drop the data in this folder manually or it could also be an automatic process so it has to be set up of course the infrastructure but the data can be moved directly from a measuring instrument of any type. There is a spectrometer, an microscopy or whatever a sequencer also, and then the data are moved to this Dropbox folder, and from there then they go to OpenBIS. And the data in OpenBIS are always attached to something to an experimental description. So here I have my experiment, in this case this is a flow cytometry but it could be whatever experiment, and here I have the data that is related to this experiment. So this is what I mentioned before that you have everything in one place. You have a tool that is called Big Data Link and this is essentially a command line tool that allows to use OpenBIS as a metadata repository. So this tool should be used in case where you have very large amounts of data that for whatever reason you don't want to move to the OpenBIS managed storage but still you want to keep track of this data. In this case you can install this tool on the place where the data are stored and then instruct OpenBIS of where the data are. And basically what happens is that in OpenBIS a folder is created, this is called a data set, but there's the information about what data are in these data sets and where the data are. So I also have then the SSH command so I would have to go to access the data, I cannot access it directly from OpenBIS in this case, I would have to go to the server or to the cluster whatever the data are stored. With OpenBIS we also offer a couple of integration with a couple of tools that you can use for data analysis. So OpenBIS itself is a data management platform and ELN so it doesn't do data analysis but we connect with these two tools that are Jupyter and MATLAB and this is what you will see in the second part of the training today with Harry and Jarunan. So this is for those of you who are not familiar with the Jupyter notebooks, this is how a Jupyter notebook looks like so you have essentially some text and then your code and the output of your code all together. And the integration with OpenBIS is in this way basically you can, we provide a Jupyter Hub server and if you use that you can launch notebooks from within the OpenBIS interface. And then you can, the idea is that you have your raw data stored in OpenBIS, you can launch a notebook to analyze this data and then the notebook can be saved back to OpenBIS. We also provide a tool for allowing users to use their own Jupyter installation so this is essentially Jupyter Lab OpenBIS extension that allows you to use your own rather than using the Jupyter Hub server. If you have your own Jupyter installation you can use that and what this extension does essentially it creates three additional buttons that are actually not shown I realize here in this notebook but they are basically will have a button to connect to OpenBIS one button to download data from OpenBIS and another button to upload the notebook back to OpenBIS. And then we have also a tool for MATLAB for analysis data analysis with MATLAB and these will be presented by Harry this afternoon. And finally, I want to mention the APIs from OpenBIS so API stands for application programming interface and basically OpenBIS offers several of these APIs so it is possible as I mentioned already before to interact with the program with OpenBIS in a programmatic way so if you don't want to use the user interface, you can also write everything, script everything so you can interact with the program in this way. Some additional features of OpenBIS are the relationships I already mentioned before so you can create relationships between different entities of the system and these helps a lot in reconstructing the history of things so you can see where all the experiments where a sample has been used for example or all the measurements that have been taken on a sample and so on. There are important export functionality for OpenBIS so you can export everything from the system you can export the tables that I showed you in some of the screenshots before can be exported you can export also a complete notebook for example, and of course you can import different types of data. Regarding the user management, user rights management, different users can have different rights and also different access to different parts of the system so it is possible to control this. We have observer rights so this means read-only rights or you can have rights to edit things but not delete them or you can have admin rights so you can also delete things so this is modular and you can control who access what in the system. The system also provides an audit trail so this means that basically all the changes that are made to any entity are stored in the database and are also retrievable from the user interface so you can see who has changed what and when. Then we have this concept of data immutability and this means that basically the data, data files that you upload in OpenBIS cannot be changed so they become read-only when they go in what we call data sets. In order to store the data sets are immutable so you cannot add new files to a data set, you cannot modify existing files in a data set. If you want to modify, basically you would have to download the files, modify them and upload them as a separate data set. We have an option for those of you who are working with samples in a lab. We have the option to track the positions of the samples in a fridge or a freezer so you can have an overview of all your storages that are available in the lab. And connected to this we also have a feature which is relatively new which is the barcode reader so you can basically either generate barcodes with OpenBIS and assign them to the samples or you can also read existing samples and assign existing barcodes of existing samples. And OpenBIS is also integrated with data repositories so the tool that we have at the moment are Zenodo which is widely available to everyone and then the ETH research collection which is only for the users at the ETH Zurich. But basically the idea is that you use OpenBIS on your daily work so your data are all in OpenBIS ideally and then when it comes to the moment that you want to publish the data and use a platform such as Zenodo or DTH research collection for ETH users then you have an easy way to retrieve the data from OpenBIS and send them to these data repositories. So I just want to present very quickly a couple of use cases. So the first one is one of the first labs that started using OpenBIS at ETH as an ELN so before it was primarily used as a data management platform. So this is the vice lab in the biology department of ETH, they do different types of analysis on cells, they study protein biochemistry and they use different technologies like RNA technologies, sequencing, microscopy and so on. And basically they started using it in 2016 and since the year after this became mandatory so the lab members are forced let's say to use OpenBIS so all their data have to be in OpenBIS. They use an inventory, the inventory part of the lab notebook where they have all their materials and their protocols and actually these were imported from a previous database that they were using. And then they use a lot of the parent-child relationship that I mentioned before so they use them to link experiments to reagents but also to link one experiment to another. And then another use case that I want to show you is the Bedretto project so this is again at ETH and this is a different use case so this is not life science, I told you that OpenBIS can be used also in other scientific fields. In the case of the Bedretto project what they're doing is they are studying taking measurements from the Bedretto tunnel which is a tunnel in Switzerland where they have several boreholes and basically they are extracting energy from this tunnel studying how they can if the geothermal energy can be used as a source of renewable energy. And basically they have sensors and they take data from these sensors that has to be analyzed and everything is stored in OpenBIS. And in this case they have organized everything on a project level so I told you at the beginning that when I showed you the ELN I said every person has a personal folder where they can create projects and so on but you can also structure OpenBIS in a different way so that is the default but in this case they have in the ELN basically they have organized it by project rather than by person and they use PyBIS a lot for interacting with OpenBIS. And then the last use case we have a couple of people from EMPA here the last use case actually from EMPA so this is the construction and concrete chemistry laboratory where they study cement so we do studies on cement and the resistance of cement basically and concrete and this is what the scheme that they presented me with when we started working with this particular group they have a specimen where they do different kinds of measurements and then we configured OpenBIS so that they could use it for their use case so we have again an inventory where they store their materials, their methods, and then they have in these cases organized by person and each person has one project and in the project they have sets of measurements for the different samples that they analyze. So now I just want to say a couple of words about the services that we provide based on OpenBIS. So the first one is the OpenRDM.Swiss service that is provided to the Swiss academic community. And this was established as part of P5 project funded by Swiss University so the project ended in 2020 and helped us to establish the service so now we are running the service independently from Swiss University. So what we do is we can provide OpenBIS on the cloud so we can set up a server and OpenBIS server for a research group or for several groups also of one particular one university or one institution and we use which engines as cloud provider. The second option would be to have to provide support for an institution that prefer to have a self hosted OpenBIS and we currently have both situations so we have some customers that have that are using the cloud service and some others that have their self hosted OpenBIS. So the University of Bern, for example, there was someone from the University of Bern, this is the solution that they opted for so they manage it by themselves but we provide support for that. And we provide training and the staff for support in all the situations and the current customers that we have are EMPA, the University of Zurich, ZFA and the University of Bern as I mentioned. And then we have a similar, very similar service which started this year. This is on the European level. It's called openRDM.u and actually my colleague Priyasma is responsible for this project so this is a project funded by the European community and it's very similar so we provide either cloud hosted OpenBIS or self hosted support for self hosted OpenBIS. And in this case basically the customers are different because in this case the customers would be the academics from European universities and currently we are working with BAM which is the equivalent of EMPA in Germany. So it's the material science institution in Germany and also with the Helmholtz centrum in Munich and we have a couple of other institutes that are lined up and interested in this. Last thing I want to mention before I end is the user group meeting that we plan to have at the end of September so save the dates if you are interested in this of course after the end of this training. And if you're interested you can also ask us to add you to the mailing list so you receive all the information and notifications. And this is the OpenBIS team so we have quite a few developers working on developing the system. Then we have the research data management team where which is made apart from myself also Henry and Priyasma. So here the Richard and Arthur are the people responsible for the operation of the system so installations, upgrades and so on. And this is our management team. And here I have some contacts and useful information. So I will, I don't think that these lights are already in the in the mood or in the training material but I will upload them there. And I think I am at the end of my slides.