 So good morning everyone. We're very happy to have our course online on the introduction to Open Biz and we'll start off with a lecture from Dr. Kathina Barriari. And just a few housekeeping points before we begin. Please mute your microphone since we're taping the course this morning. And also there'll be a question and answer session following the presentation so you can put your questions in the chat to everyone and then we'll take them in order. Thank you very much and over to you, Kathina. Okay, thank you Monique. So first of all, hi everyone and thanks for joining today's introduction to Open Biz so numerous. First of all, I would like actually to thank Monique for organizing the course and also Monique and Patricia Pallagi to convince us to do this online training session because I have to admit we were a little bit skeptical about it. We've never done this before so I guess we will see so they gave us some very useful advice that convinced us to try and we hope that it will be a successful day so we will see at the end of the day of course. So just a couple of words about myself. I am Kathina Barriari and I work at EDH since eight years. And I first joined the group that was actually working on the development of Open Biz. And then I joined the scientific IT services in 2013 when this group was created and the group that was developing Open Biz was incorporated into this section. I am currently responsible for the research data management services that we offer based on Open Biz both at EDH and in Switzerland. And in the first part of today's training I will just give you an overview of Open Biz and how it can be used to manage research data. And then in the second part of the training I will be joined by my colleagues, Harry Lutke and Priyazma Bommick. And this will be the second part will be a hands-on section with exercises in Open Biz and this will be followed by a session on about the data analysis of the data stored in Open Biz and this will be done by Harry. So now let's start with the overview about how you can manage your data using Open Biz. So just I want to start with giving you a little very brief introduction about who is the scientific IT services of EDH, then we will move into the bulk of the presentation. So the research data management with Open Biz I will also show you a couple of use cases. And then finally I will present you the services that we offer based on Open Biz. So we are a section, the scientific IT services of EDH is a section of the IT services of EDH since 2013. So we are fairly new. We are about 40 people, even if we are currently constantly growing because there is a lot of demand for the services that we offer. And most of us are scientists. So rather than classic IT people, we are actually, we have a background in different areas of science. So I myself, I'm a chemist by background. So what do we do? We are actually four groups in this section. One is the performance computing group that maintains the clusters we have at EDH. We have several of them. Then there is a group for this scientific software and databases. This is a group of software engineers that part of this group is focused on the Open Biz development. But this is not the only thing we do. We also develop software in collaboration with different groups that come and ask us for these kind of services. Then there is the research IT platform group, which is the group I actually belong to, with Priyazma, and we provide services and platforms. So one of these is Open Biz as a service, but we also have Leomed, which is a platform for health data and also other platforms. And finally, there is the computational and data science support group where Harry actually belongs to. And this is more focused on data analysis and we work on projects with different groups for analyzing data. So this is just to explain you a little bit who we are and what we do. So now we move on to the Open Biz part. So I usually start with giving an overview of what is the research workflow in an experimental or in a computational lab, which you will know better than me for sure. So if you work in an experimental lab, usually you have to prepare some samples on which you want to take some measurements. You are in a more computational environment when you gather data from somewhere and these then at this point you have what we call row data. And then these data has to be processed and analyzed and eventually this is of course done several times in different conditions. So essentially if you find something interesting, the main purpose of the academic research or one of the goals and not the main purpose but one of the goals is to publish these results and make them available to the scientific community. So of course a lot of data is generated and only part of this data will end up in the publication and you don't know upfront which data will be actually valid and useful for the publication and which not. So nowadays, there is the requirement both from funding agencies and also from journals to publish data according to the FAIR data principle. So I'm sure that most of you will have heard of these principles, FAIR stands for Findable, Accessible, Interoperable and Reusable. So the published data in my opinion is just really the tip of the iceberg because at the bottom of the iceberg, the bulk of the iceberg is the data that you generate and we do not see it's not published. So my argument is that you can publish data in a fair way only if you treat your data fairly from the very beginning. So from the moment you generate it. This is a situation that we often encounter at ETH. I want to show you and actually now that we work with groups also outside ETH I can say that this is not really peculiar just with ETH but we see this pretty much everywhere. So here I show the different, basically the different data types and information that you generate during the research process so you may have protocols, materials, you have your code for data analysis when you have the data itself which can be the raw data, some models, the process data, your results and your notes. So what the description of what you're doing. And in the middle here I have the storages that we offer these are the ones we have on offer at ETH. So you have the local hard disk which can be your laptop for example, or the machine that you're using for for measuring your data. And then there is the NAS which is the network attached storage. This is another type of storage we have on offer at ETH. Then there are the tapes for the long term storage, the cluster and the cloud. This is what we often see. So your data is very often a little bit everywhere, spread everywhere in different places. And still very often notes are taken on paper notebooks and disconnected from the data. So this is a very common situation, as I said, and it is not really ideal. It's very hard, very often to keep track of what has been done and especially by yourself, but especially by other people. So this is our actually where we aim to go. This is our ideal scenario to have all the information stored in a central place where things are more easily accessible and it is easier to reconstruct the history of what has been done. And this is easily doable with using a solution such as a combined ELN and LIMPS where ELN stands for Electronic Laboratory Notebook and LIMPS stands for Laboratory Information Management System. And this is actually what OpenBC is. It's such a solution. So now a couple of facts about OpenBC. First of all, OpenBC is an open source software and it is distributed under the Apache 2 license. It has been developed at ETH since 2007. So it's a software that has been around for over 10 years now. It is actually a platform for really managing the information that is produced during the research process from the very beginning until the publication basically so you can store all the information that you generate during the research processing to OpenBC. It can be used in most quantitative science fields. So we started with life sciences, but in the last couple of years, we have expanded to other use cases that are still in quantitative fields like physics, environmental sciences, material sciences, and so on. We used of course at ETH. We offer it as a service, but also in other universities in Switzerland and also in some European and other universities because as I said, it's an open source software. It is freely downloadable so it doesn't have any license fees associated with it. And just in a nutshell, OpenBC is a solution for a research lab for collaborative work. So it's not a software that you just download on your laptop and use it like this, but it's a client server application. So you have a database that is installed on a server and then the users access OpenBC via a web interface. It can be used to store information about materials and samples, whatever you use in your lab. So here we see the default life science version. So we have some things that are default, but these can be customized to your needs. So if you do not use any of these things, you can create the folders you need. So here you see, we have an inventory with materials and different collections. If you want to add additional collections that don't exist here, this is absolutely possible. So here I have an example of a chemical collection. This is a table where I see all my chemicals. I can filter this table. I can export this table. I can import tables and so on. Then it can also be used to store protocols. So protocols are standard procedures of things that you do in the lab. And this is, again, we have in the inventory a method section where protocols can be stored. Again, this is the default for life sciences, but it can be customized to your needs. And this is what a protocol might look like. So you have some the name, the description, the procedure of the protocol. And here what is interesting is these parent section, which are basically the connections to things that you have stored in the inventory. So here I have a connection to a chemical and some buffers, for example, these are the things that are using my protocol. Then it can be used also as an electronic laboratory notebook to describe your experiments when you do them. And in this case, there is a section which is the lab notebook part. And in the lab notebook, each person has a folder, a personal folder where they can create projects, experiments and what we call experimental steps. And then here you see a description, for example, of an experimental step and where I have the goals, the results. And then again here, this is what is very interesting. These are the parents, which are essentially the links, again, to the materials and methods that I have stored in my inventory. And then this connection can be visualized as a tree like this. So it is easy basically to reconstruct the history of what has been done. You can click on an experiment and see what has been generated from this experiment vice versa. You can click on a sample and you can see where the sample has been used or on a protocol and so on. Then we have, of course, the data management part, which is where OpenBees is particularly strong at because this is actually how it was born as a data management platform before being an ELN or a LIMS. So data can be uploaded to OpenBees in two ways, essentially. The first way is via the web interface. So once you register an experiment in your ELN, you have an upload button and you can just upload your data from there, essentially. And this is fine if the data is not too large. So we are talking about a few gigabytes. The second mechanism is via what we call Dropbox mechanism, which has nothing to do with the Dropbox, with the commercial Dropbox program. But basically how it works is that this can be done either manually from the users or automatically directly from measuring instruments. The data are moved to what we call a Dropbox folder. And from here data are then moved to OpenBees. And then the data are connected to an experiment. So here I have my experiment, in this case the flow cytometry experiment, and the data that were generated in this experiment are here are linked to my experiment. So here I clearly have this connection between the two things which is missing if you use a paper notebook. And then we also have a tool that is called big data link. This is to be used when you have data, large amounts of data. So here we're talking about maybe hundreds of terabytes, which you already have sitting somewhere like, for example, on a cluster. So it's difficult to move them to a different storage like the OpenBees storage. In this case you can use this tool, which is a common line tool based on Git and Gitanax. And basically what you would do here is to use OpenBees as a metadata repository. So you describe your experiments in OpenBees, and then you just link your data to OpenBees. So this is how it would look like. You have this kind of sign, which is a link. So here is my data. And then when I open this folder in OpenBees, what I see is the information about where my data is. So if you have access to the data, you cannot access the data directly via OpenBees, but you have to go to the server where your data is sitting, or to the cluster where your data is sitting. Then we have connections to Jupyter and MATLAB for data analysis, and these will be shown at the end of the second part of the training by my colleague Henry. And basically in OpenBees we have integrated the Jupyter notebooks so you can launch Jupyter notebooks from either from different places in the OpenBees interface. And then you can write your notebook with your analysis and store the results back, so the notebook back to OpenBees. And for those of you who are not familiar with the Jupyter notebooks, this is how a notebook would look like. It's a little bit a combination of texts, so you can describe what you're doing. And then you have your code and the results are readily available. So it's all in one, and it's much easier than to follow what you're doing. You can describe your analysis, you can share your notebooks with your colleagues. You can also publish these notebooks. So they are becoming a very powerful tool for data analysis. They support also over 40 programming languages among which are in Python, which are usually the most used programming languages. And they are really very, very powerful. So finally, OpenBees also offers several APIs which stands for application programming interface. This means that you can work with OpenBees programmatically. So you can, we have some advanced users that do not do anything via the user interface. They do everything via the command line. So you can create things. You can extract data from OpenBees. You can register data into OpenBees via everything via command line without interacting with the user interface. And this is just one example. So the APIs here were used to generate a workflow that was used for genomics data analysis using this workflow manager called Snakemake. And essentially here I'm not going to go into the details of the workflow. That's not the point here. What I just want to show is that basically this was used, this workflow was used to analyze data that was stored in OpenBees. This data was taken out of OpenBees, sent to the cluster for analysis using this workflow. And then once the analysis was finished, the results were stored back to OpenBees. And this is possible thanks to the OpenBees APIs. So here are some additional features about OpenBees. I already mentioned the relationships that you can establish between different entities and that allow you to keep track of the history of what you're doing. And this is a very important functionality. I believe our users like these OpenBees especially for this functionality. We also have important export functionality. I already mentioned before, you can import and export files, tables, and so on. Then we also have user rights management. So you can control who has access to what and which kind of access people can have. You can have read all the access or admin access, user access, so you can control the access to the system. This is an audit trail, which means that everything that is done in the system is logged in the database so we can see who has changed what and when. It's possible to trace this back. And by data immutability, I mean that when you upload data files to OpenBees, these data files become read-only so you cannot modify them on the fly. You have to, if you want to modify something that you have uploaded to OpenBees, you have to download it, modify it, and upload it again. There is also the possibility to use the sample storage manager. So this gives you an overview of the storages, so the freezers and fridges that you have in your lab. These are the physical storages where you store your samples and you can have an overview of them so you see where your samples are. Then we have a fairly new functionality, which is the barcode reader. This was just very recently introduced in OpenBees so you can track your samples also using barcodes. And then also another new functionality is this integration with data repositories. So we have integrated Zenodo, which is a generic repository for publishing data when you, for sharing data when you publish something. And the ETH research collection, this is specifically for ETH. So now I want to show you a couple of examples of how OpenBees is used and we have the first use case is the cellular dynamics lab at ETH. So this is the device lab and these are some of the techniques they use. So this is light microscopy, RNA biology, protein biochemistry, electron microscopy. This is what they do. And how is OpenBees used in device lab? So here they are one of the first, the early adopters of the Yellen functionality of OpenBees. So it was introduced in this lab in 2016. It has been main mandatory for all lab members since 2017. So all the people who work in this lab have to use OpenBees and upload their data to OpenBees. So they have an inventory for samples. Actually, they already had previously some databases that we imported in OpenBees and they use it for their samples and for their protocols. They use a lot this parent-child relationship that I mentioned before. So these connections, so either when they basically, when they create reagents, when they use reagents to create a new sample or to link experiments and protocols to link experiments together also. Then we have a second use case, which is something completely different. This is the bedretto project at ETH. So this is more geological use case, let's say. So here in this project, they are studying how to, if it is possible to reuse geothermal heat as an alternative energy source. So they have basically drilled some boreholes in this bedretto tunnel, which is a tunnel connecting to Chino and the Furka tunnel. And sensors have been placed in these boreholes and they are analyzing data that is generated by, that is retrieved by these sensors. So these data are stored into OpenBees and analyzed using Python via the PyBees, which is the Python API in OpenBees. And here is an example of how they have organized their OpenBees and it is actually based on, they only use the lab notebook, not the inventory in this case. And basically they have organized it by project rather than by people. So by default, usually each person in the lab notebook has a folder. In this case, they decided that they wanted to have it project based. So this is possible to show you that you can customize the system to your needs, how you want. And the last use case comes from EMPA. So this is, we are currently running a pilot project with EMPA. There are a few labs that are currently using OpenBees at EMPA. And this is one of the labs, which is the concrete and construction chemistry lab where they study actually concrete. So they do some, they measure different properties from concrete samples like the shrinkage of the concrete. And here you see an example of what is the shrinkage. And basically, we started, this is the map of what they do and what they want to track in OpenBees. So they have a specimen on which they do different measurements and they use different samples and this was put into OpenBees. So they use both the lab notebook and the inventory. So this was totally customized to their needs. So we don't have defaults for this. We work together to customize the system for how they needed to use it. So you can see that it's really a flexible solution that can cover really a wide range of use cases. And now we come to the services that we provide based on OpenBees. So we have a DTH. We offer three different types of services. The first one is what we call Research Data Hub. This is what we call a multi-group OpenBees instance. So it's one OpenBees that can be accessed by several groups at the same time. But of course, each group only sees what belongs to them. They do not see the things from the others. It is centrally managed by us. So it's a shared resource. We only allow here limited customization and the service is for free, is offered for free to all ETH research groups that want to use it. And only the storage costs have to be covered by the groups. Then we have the departmental data hub, which is very similar to the Research Data Hub with the difference that this is dedicated to groups that belong to a particular department. So at the moment we have one of these instances for one department that decided to have their own for the groups that belong to that department. The difference here is that there are also service fees associated with it and it can be customized for the needs of the groups of the department. And then we have the Research Data Nodes, which are basically dedicated OpenBees. So it's one OpenBees per group, essentially. So this is a more customizable solution. It's dedicated to the group. It's managed by the group themselves. And in this case also there are service fees that have to be covered by the group plus the infrastructure costs. And in addition to this, of course, we provide training and consulting to all the groups that use OpenBees. And then we have also a national service. We have that is called OpenRDM.Swiss. This is in Switzerland. At the moment, this is a project funded by Swiss universities and our project partners are the University of Zurich and ZDV. But the project started in 2018 and will finish at the end of this year. And then from next year we move to service mode. So what we do here is that we offer OpenBees as a cloud solution. We use which engines for these so we can set up virtual servers either per group or per institute or institution. Optionally, we can also provide Jupyter Hub servers for data analysis with Jupyter. We can also offer support for self hosted OpenBees. So if someone would like to use OpenBees, but on their local premises rather than on the cloud, we can provide support for these for both to the users and also to the IT experts to set up OpenBees to install it and maintain it in the future. We also provide training and best effort user support. It is also possible to establish a contract with us, of course, if you want more than the basic than the basic service. And the current users we have are from the University of Berna, and I mentioned already ZDV and the University of Zurich. And this is here you have the email address if you want to contact us for for services in the context of openRDM.swiss. I will share the slides at the end of the at the end of the talk. So and this is us. So this is a picture that was taken fairly recently. I think it was just after Christmas. And these are the people who are primarily involved with with OpenBees. And here I have some contacts and useful information. So some links we have the some video tutorials and some documentation available on our website. You can also see the other things we do at SIS on our on the SIS website. We also have a Twitter account so you can follow us on there and these are the the two contacts that you may want to use if you want to have more information about this. So I think now I am done with the overview for about OpenBees and I'm happy to take questions if there are any.