 It's a great pleasure to welcome all to this different orientation training school in biomegial analysis. So the organization included many other people. So you will see there is a follow some bio from the Advanced Lime Microscopy, EBMC in Porto, Mafalda Souza, Clara Pratt, Associate Professor in Copenhagen, and Marium Lume. And also we are, let's say, led by a very good scientific advisory board, including Julian Colombelli, Paul Sampaio, Julia Fernández Rodriguez, and Gabby Martin. So regarding today, this is the program. We will start with a few talks about what is biomegial analysis and what is an image that has to do with life science. We will have an example of biomegial analysis as a service from EuroBio Imaging. And we will have a lesson about Jupyter notebooks by Guillaume Wittes. Before we start, I would like just to ask Anna in the audience to introduce herself and the company. So for this training school, we will be helped to archive the training material and do the video editing by the Sheevee's company. Thank you. So thank you, Rocco. And hello to everyone. Rocco asked me to introduce myself because more of the fact that we will discuss also career path in the bio image analysis world. So my career path hasn't been traditional in any sense. But yeah, I define it more like an adventurous journey through the working life. So but five years ago, there was a changing event in my career. And this was the participation to the training school, organized by new bias. So five years ago, I fell in love with bio image analysis and with the bio image analysis community. And today I am co-founder of a company, Sheevee's, that want to support researchers in the bio image analysis journey, but also data journey in general. So if you want to talk a little bit about challenges of the freelancing life as bio image analyst, please contact me. So I will put maybe my email in the chat. Thank you, Rocco. So now we can start with Julia. Julia, you are ready to present. I will be the one, maybe the one that will not talk about how to do image analysis and introduce image analysis. But I can introduce the problem we have once we're running a facility that produce many images and what we have to do with that. And then just let me share my screen and try to start. And yes, that's the title. I will base, of course, our discussion and all the experience I have. I have been running this core facility called the Center for Cellular Imaging that is sitting at the Salgrenzka Academy, that is the medical faculty at the University of Gothenburg. I have been here in this facility for the last 20 years. That means that I think I have quite a lot of experience. I will say my background is in cell and molecular biologists and I will be the hardcore microscopies. And at some points in some moments of the time that I have been helping and training our users, I have to deal a little bit with image analysis. But I never have the lucky as any of you to have the Neobias and to be able to be trained by them. Maybe at this moment I will not have been a hardcore microscopy, but maybe a bio-image analyst. In any case, let's go. The first image I want to show, you have a multimodal facility. And what do I mean with that? I mean it's a facility that we have different types of imaging modalities that go for, in this case, for us, a live microscopy. And you probably will base a lot in this course on analysis of images coming from the live microscopes in whatever flavor, confocal, conventional, multifotolose super-resolution. We do have, as well, an electron microscopy. We want to increase really resolution. We really need to go to an electron microscope. And in between them we are now dealing as well. And thank you to George Harinder, his associate professor here in the University of Gothenburg sitting as well in the faculty that have also allowed us to offer to our users the Maldi image and mass spectrometry. And obviously, when you have all this type of modalities, you need to do something because you need to analyze, you need to process them, and you need to make integrate all the modalities. I mean, that's my dream. This is what I want. And in the end, in the summary, we'll say that my dream as a head of a facility that running a multimodal facility is to have a platform where I can put images coming from different instruments and try to fuse them, or to align them, or to register them to each other, and to make it sense for the whole images that we get into the facility. The mission, of course, I will just enter very, very briefly. But as you know what we're doing here and what that means on the facility, just for the ones that, yeah, maybe some people, I see recognize some face that working in facility, but many of you probably working in labs, in research labs. Of course, we're servicing our users, and we call users because we actually give access to the microscopes. We train the people how to use these microscopes, and they keep our, of course, our support all the way from the day that we design the experiment to the day that they need to do the preparing the samples and the images for image analysis. Of course, the training is actually a very strong activity and important service we have in the facility. As I say, we sort of have to train our system, our people, users on the microscope, but also in the sample preparation and also in image analysis. Obviously, we have to organize events as this one that we are right now now, but also on-site events, and I'm just telling you that we'll just finish this one event where some people here like Kota, that will be the next speaker, have been as well at training in our course and the introduction and image analysis that we have done just here in Gothenburg. And of course, we have to innovate and we have to do method development either on a microscopy by doing correlation in different flavors or image analysis. We also have some micro fabrication. Obviously, you need to get this innovation to try to get new methods or new technologies that can be offered to our users. And this is just a collage of images that they can show you what I meant with all the multimodality and what all the work that we does in the facility. Obviously, training our users, teaching our users online, training our users even online as well sometimes, training the users on-site, sample preparation and both flavors in the line microscope and in the electron microscope. But now, once we have all these modalities, and this is a little bit to showing you in a schematic way how we try to managing and in a way support the analysis once we have all these type of images. I will go very, very fast. It's just only to give you a glimpse of how we do it here. We acquire the images in our microscopes, users or staff, these images that go. If the user have all those solutions in their site, they just are, their images are just transferred to a NAS server, to their solutions, whatever it is, and they do the analysis. We do have a higher speed work stations as well inside the facility cell where people can work directly in the facility, where we have different programs that they can use for analyze, commercials and open sources. And they as well, of course, can connect later with image repositories. If that's what they want, or they can go to the data solution. But also we have the possibility that we acquire the images and they go to our processing server that in this case we have a hive, actually it's bigger than that one. Half three actually right now we have five blocks similar to that, where we have different programs that the analyst in our facility, in this case Rafael, is helping the users and the analysis of the images on the process of the images, if that is necessary in cases like we have when we do super resolution, structural illumination or single molecule localization. The beautiful part here is the work is not only done from one side where the analyst, but also there is a remote access from our users through this system. And both analysts and users can actually discuss on fly one in their lab or in their home or whatever they are at that moment and the other the operator and the analyst here in the facility. And we have course we have been and we are and this is a work in progress. We still are on especially for the multimodal part. We are actually as well partnering or working together with Cytomine and with Omeron up here. This is just very briefly how we do. But let's go to the images and what do we, how we get these images. And I want that you see here the power where you have you have a multimodal facility that you might have to correlate or not correlate. And when I correlate, I meaning that you use the same sample and that same sample going every modality, imaging modality. You can also change modalities without saying exactly the same sample but the same model organize or the same cells. And here is just an appreciation display of different technologies. We don't have the MRI systems but do we partner with the and I will show you in some of the examples I have during the presentation where we actually have partners in Euro that they do the MRI. This is just the brain in this case actually the human brain is what you see here is going all the way for the MRI. And here is a line microscopy in this case polarization microscopy and going to two photon confocal microscopy and finally in the transmission of scanning electron microscopes. As you see here, we have different scales scales that go also in the hyperspectral space but also in the spatial space that we have a combination of all of them. Each modality is going to give you would you answer part of these questions and the beauty of the multimodality is put all that together to be able to answer a particular biological question. Once we have and I want to use this algorithm that is of course we are not working with Lego but it helps actually quite a lot and I will use what Rafael have done in his talks because he have kids, he have two kids I don't have the kids but he have two kids and then he's playing a lot with Lego. Then when we study multimodality we have several images that come from the different modalities and that means that we have different formats and sides. Then the first thing that we need to do is to sort it and to arrange it. We have to register we have to see which one is coming from which. The problem as well we ended up is that once you generate this data sometimes this data can be several terabytes it's not like four Legos but thousands of Legos together that means that we actually have to try to sort it and arrange it and try to put it in a some kind of visualization mode that they might not be because the idea is to go to display in the full story but in most of the cases we ended up here when we just present visually our images. Why is that the problem to actually get here is because once you have different images with different size and you have to do many operations this can be a challenging in a big data because sometimes many of the computers actually working on the RAM and it's limited by that where it means that you have the problem and the topic of the big imaging and you will we will have a speakers that they will introduce that in the image analysis work for you and I just only showed you the story of having all the data and once one of the good things when we processing it actually that sometimes the post-process data will be orders of magnitude smaller than the raw data where it make it actually easy to handle it unfortunately even so that I tell the story in that way and might be perfect that we will get to the plane and the full story and we get a house with a garden and the grass cutting this is not a reality and now I'm showing you the reality where you have me in the middle or Rafael or any of our people that helping us with the image analysis from the multimode that this is where we have all these pieces that coming around and we have to try to make a story for all these pieces because when I say we have different formats and size I say I mean because you say electron microscope and light microscope the images that come from there is not the same and it's not only related to the format itself but also even to the biological interpretation and just to make it now images let's go back to images and leave the Lego and this is just some examples that we have done in the facility where we are mixing and when I say multimodality you have to mix at least two, three of four different imaging modalities in this case for example we actually have worked with the transmission electron microscope and the nano-syn in this case is light microscopy and electron microscopy in this case is just a multi-photo microscope that allow us to generate second harmony generations and auto fluorescent this is human skin and this is the cells from the epidermis and this is the collagen that we have on the skin or we can do the usual correlative light electron microscopy that people have worked and we as I told you before that we also have the possibility to work with people doing imaging mass spectrometry we actually of course have combined the imaging mass spectrometry with our confocal microscopy I will go now by examples a bit longer examples on how we have been dealing and which programs and you will see that once we have and that's why I show you the allegory with me in the middle of all these Lego pieces because what happened as well and that will be probably more Rafael our analyst when you have images coming from the light microscope from the electron microscope from the imaging mass spectrometry and you don't have one defined tool or platform that allow to control to deal with all the images in once is that you have to deal with different programs for the different things you want to do with each of these images this is an example of what we're flow where we started from the light microscope and go to the electron microscope and we do different light microscopy technologies methods and going to high resolution images in this case we use in a project and where we have mapping the lipidropels associated proteins in the context of non-alcoholic facti diseases this is a cell line that I will in a couple of seconds I will show you how we actually got in there this is the original question we have a particular protein that we were interested the data are not published that's why you don't get the name of the protein sorry we have a proteins that actually moving in this particular case once you are using olic acid to increase the amount of lipidropels in the cells this is Hila cells it's nothing extraordinary or fancy cell line and once you use the olic acid for four hours the protein is moving from the Golgi apparatus to the lipidropels around the lipidropels we see that a bit clear here and we want to know why is that happening and if there are really around the lipidropels because this is a resolution of the line microscope this is telling me really around but not necessarily as a part of the membrane of the lipidropels to do that I really need to go to the electron microscope and what we did in the group is to establish a cell line this in this case they have two nitric cells that have GFP and a protein that have lipidropels associated with the protein this is how we are going to call it from now on and once we do this is how the cell line looks and this is an inducible system that I can induce this lipidropel associated protein will be in the cytosol and once I use the olic acid as you see they go around the lipidropels we have a tool now that is our cell line where we have GFP where we can follow easily on the line microscope but how to go to the electron microscope because I say we will do correlation we will want to check at a much higher resolution how we can go there well, we got a tour thank you to really good friends as well in Belgium and probably some of you know them Saskia Lipens and Sebastian Munch in Lev and Ghent they have a fantastic tool that is a nano body that have in one side M Sherry this nano body is against GFP and in the other side have the apex tool as you see from one side they have M Sherry that means that I can see which of my cells are positive-transfected once I transfect this plasma that have the nano body this plasma that this nano body will recognize the GFP this is my GFP this is my nano body and as you see they match quite nicely and there is one more extra molecule here that is very important for me that is the apex and that is the one that will be glowing on the electron microscope as you know normally one of the problems on electron microscopy is the specificity that is what we normally have online microscopy when we use antibodies well, if you use apex 2 that is an ascorbate peroxidase will allow us actually to with high resolution to visualize molecules inside the cells as we see here obviously what happen here is that you use you fix your cells you put the amino benzidine in the in the media where you have the cells fixed and the reactions this will produce oxidize the peroxidase and it give us an electron dense precipitate or polymer that we can see easily in the electron microscope as you see now I have gave a specificity from the light to the electron microscope and now I can correctly check this is the controlled cells this is my positive cells that we have our fantastic or funny or interesting protein and we can see actually that they are around the lipidropols and it's a little bit outside the lipidropols in that it's not a diffusion of the apex but it's actually this molecule we are looking on the around the lipidropols is also in the cytosol but it's quite accumulated in this area very close to the lipidropols okay the first question if that protein is around the lipidropols is answered yes we have it around the lipidropols and I see you here that you can see that we have used different programs and now this is just a claim where we have in one side the fluorescent in another side the apex and this is the combination of both the apex two and the fluorescent that we have taking images from the GFP and in the top of the if you see here we can see the different programs that we have to use to image in the samples to get a proper image from the line microscope to get an image from the electron microscope to actually align those align those samples organize, align, register and give us you the picture now where we can overlay practically the green fluorescent protein and the apex two and you say oh but that's very simple no because the resolutions in these two more in two images are very different and we have tried to do and that's why we have to use different programs that allow us to actually match and to register and to correct the formations because unfortunately when we have to prepare for electron microscopy and cut in thin sections there might be deformations and also keep in mind that when we take images here this is hydrated samples they just have thin fix and you look it in the microscope but to take pictures here I have to dehydrate because I have to embed these samples in plastic in resins and that means that there is a shrink from this situation to this situation that also we need to have in consideration when we do all the analysis I have actually writing if you have seen at the same time that I do lecturing the dimensions of the different images that we have here the individual image this image and this image this is around this is the camera the camera is a 4K, 4K on the transmission electron microscope and this is the number of pixels we actually have for our line microscope in this particular case and then of course we can make it a bit more I just show you a claim where we actually find the lipidropals directly in both in line and electron microscope but we actually have worked as well and in this case you have the name here this is Arhat 1 this is an M-sign that is around the Golgi apparatus as you see here and this is a Golgi this is a different cell line this is a cell line that actually express as well Arhat 1 GFP and that's why we see here in green color the red color is the M-shary from the nanobody and then of course we have used as well the same technology using the nanobody with apex 2 and the M-shary we sort of recognize where is the positive cell by the M-shary we see here now we have a yellow color there is a combination in there in between and now we can go to our electron microscope prepare the sample of course in between this picture and this picture it have been a long preparation time and we get to this case where we can actually really with high resolution go and pinpoint what actually is this enzyme sitting it's not just everywhere in the Golgi but only in the areas where the vesicles are forming and why is there because Arhat 1 is actually the M-sign that we remove the coating that vesicles have and that's important to you remove this coating to be able to once this vesicle is forming they can fuse with the next membrane if you don't remove the coating it will not be fusion that will be impossible then Arhat 1 is doing that job and now we can see clearly where it actually sitting and another example that we go as well where we doing this time where it's called the correlative array tomography and that we try to go from end to end the management and when I say end to end I include that also the sample preparation I will show you two examples from one side the Kigne and that's as well from human samples and the human skin equivalent samples let's start here a very busy slide but it's just showing you the full workflow we have to do to actually acquiring the images from one side we have to prepare the samples and we get litis now you can see where I was discussing you or just explaining to you that in EM we have to embed it in resins this is how these bullets we call like bullets because really very much look like a bullet and here we have our tiny, tiny, tiny sample and once we are preparing the sample in this way this is just for example a piece of the sample and in this case in this particular case is actually the Kigne the piece of the Kigne once we have it here and we want to do array tomography meaning that we want to take as you see here a ribbon to be able to do 3D volume EM as well not only line microscopy we have a special knife and special devices that call this in the auto tom as you see is like a tape and the samples are cutting and catching by the tape and you can make these rolling you can see here the samples and then now you can see that we can tape this in our special cassettes that we can put inside either the line microscope or the electron microscope and this one in this particular case this section that you see here is the same section that we use for line microscopy and from EM we can use a special resins and in this case accrylates that will allow us to do labeling with antibodies as you see here this is just a alkaline phosphatase or HRP, sorry for that, it's just HRP quite similar to the apex it's also peroxidase and we get this yellow, this brownish color and that's the particular cell that we were interested and after we can go to the electron microscope and actually find exactly that particular cell that's the idea with a ray tomography it's not just only to look at the single cells but to see this single cell in a ribbon like this and take images in this region of interest from this particular cell and do the 3D reconstruction I will just show you now for example, once we have taken and acquired in these images the majority of the time we actually streaming these images from the light and the electron microscope into the hive to do already the analysis there we have the programs that are needed for the analysis and the processing inside the hive and here is just an example on how that will look we have the line microscope now we actually go into the cell and now this is our electron microscope and we go more and more and more until we get a cell and as you see here we have used different programs like send blue for acquisition of the images and taking a little bit care on the images, on the send send connect is allowing us to do this visualization that you see right now it's just mainly visualization in where we can put images in top of each other and align and organize and overlay a little bit this is not a pure fine fine alignment if we want to do more alignment we will have to work that's why Fiji is in the middle then send connect will allow to connect as the images but to do the fine tuning we have to use Fiji and of course this is in remote access that means that if our users are sitting in somewhere else that it can actually work with us and trying to process and analyze these images together on the same machines and just hoping I just show you one more example because here we have to use an extra program as well we do use the Atlas to acquire in the send and to do the 3D reconstruction you will see in the video now right now we will use the send connect as well to connect all these images we have to use Fiji to overlay and Amira to do the 3D rendering this is just the overview what a rate tomography means you have now you see here you have different sections now the machine is going from one section to another one taking a particular image of this this is Algi we have taken one image, the second image, the third image and now we try to reconstruct and now we get our Gorgi and the chloroblasts then as you see here we have to include a lot the Atlas is the program that you see in the background that allows us to see easily and also to navigate into the different sections and to acquire the images in each region of interest once we are there, in this particular case we don't have a correlative approach we just have a rate approach but as the saying we have to use several softwares to actually finalize the question the other project that I was going to talk about is the skin project that we do as well with the groups around Europe and in this case it's an equivalent this is not the piece of skin but it's where they call the skin equivalence the laboratory, the lab is growing the skin this is collagen, bovine collagen and they growing the cells that mimicking the different layers of the skin the epidermis and the dermis and that's where we have here our job here was to prepare samples that was coming for that lab in Austria and Vienna and try to preserve the fluorescent we also have, we want to correlate in between light and electron microscope and in this case the cells that were interesting it was labeled is this positive ones that you see here you see very strong here but this is the stratocornio this is our surface in the skin that is mainly dead cells and that's why it's so bright it's not because all the positive cells are here it's just because when cells start dying they become really out of fluorescent and then here we can see that we see our positive cells clearly after the preparation for electron microscopy and where it was very nice and surprising for us because it was the first time that we tried with this fluorophore and we managed to make it work as you see here we have this is a light microscope and we have taken images in this particular case was transmitted light but I will show you straight away once that we have done a reflection light would allow us to see as well very well the images as you see you start to have file formats this is the file formats that we have been using and the resolution we get obviously now we are in the fluorescent microscope the resolution is a bit lower and we start to get quite big images because this is actually quite big as you see this is just 100 micros we have several millimeters in this case and this is just taking an array tomography from the light microscope and as I say in one side you have the skin the fluorescent this is a reflected light that we use because these samples are sitting in a silica wafer we cannot go to be conducted for the electron microscope but we have seen that actually to use reflected light mode on the light microscope it have allowed us to actually clearly register and seeing very well the cells and pinpoint the positive cells here with the fluorescent as you see as well we have to use different programs to look at and to analyze it and we have to actually ask the big work on the Fiji to be able to take in consideration the tensions and the transformations and the rigidity that happens in the samples once we because sometimes even so that we have the same sample when we go to the electron microscope for the fact of the cutting we actually see certain deformations that we need to have in consideration and here you will see that now things start to fly and moving is because we have done a 3D reconstruction in this here is the sections that we have as an array and now we have to go to the and then of course you see the files start to get bigger and bigger 12 gigabytes per section and sometimes you see this is this tiny area it's not even all the millimeters that we have here and now it's the same sample I just showed you with the line microscope but in the electron microscope where we can actually now as do the 3D reconstruction on these sections that you see here we have again to use several programs to be able to align, to organize, to register, to align and to overlay and at the end obviously where we want to do and then of course we have to increase the number of programs and tools that we use I think they will talk my colleagues during this course we'll talk about all of them and explain to them explain to you how they will work but we have to do in that way as well to organize all then to be able to get to the end that is overlay to organize, to register, to align and to overlay the images once you go to the correlative in this case you see that we actually do small 3D oops sorry yes, going again running again it's going so fast because it's actually just only five sections but we have followed the cells because actually now where we want to see is these cells the positive cells that they are fluorescent the difference in between these ones and the cells that they are non-fluorescent okay let's go to the next one this is just a summary of all where we have we have our samples sitting in a silica wafer with an array we take these images on the line microscope we check that we have a positive samples we align these samples and we finally go to the electron microscope this is a project I want to show that it's a bit more complex and it definitely this is still ongoing but I would like to bring you because now I have talked about multimodality in our facility but actually we can bring this multimodality at the European level let's put it in this way now I have one particular technique that I'm very good and my colleagues in Vienna have another technique that they are extremely experts and another ones in Germany and another ones in Madrid and that's where we did and we actually have moved the sample around Europe and taking images in each of these centers with the best expertise for that the idea of this multimodal across-scale approach is to do the same study on the human brain the images I'm going to show you today is on mouse brain because we use that as the proof of concept to check the experimental setup that the logistics because there is a logistics here to send sample around Europe if everything works as should be and once we have the full workflow done thus we will end that with we will go with the human sample what we call this project is the big multimodal high resolution atlas data management because it will be enormous work here on the data management once we start with a real story and I will just show you the story and I will hope that maybe next year if we have a course and I have the possibility to meet you again I can show you probably data that they have been getting data from the human sample this is where we have this is the brain where we started and this is actually as you see here clearly we are actually partnering with the groups that they are part of the human brain project and the human connectome and of course what we're trying to do is to try to make a comprehensive description of how the neurons and the brain reuse are actually interconnected how all these fibers and to all these neural networks are actually interconnected that will be the end of the project if they get there because actually human brain is extremely big especially some technologies like electron microscopy but in any case we are trying to be a little bit one part on all this human brain or human connectome this is the team behind the project as I told you the sample will go to different areas in Europe we have car size is actually as well here and they actually have been the ones connecting the scientists for using of course some of their technologies because many of us have their technologies but some groups that they don't have their technologies our group here in Gothenburg we have the group in Vienna we have a group in Madrid with Javier and Angel and the group in in Jewish in Germany and they are the ones actually belong to the human connectome that's where they do in Jewish and Marcus and Catherine are the main she's the director and he's the leader the scientific leader and he is having a Marcus actually part of the team of Marcus as you see we really need expertise and now we don't have multimodal in our facility but we have multimodal around in Europe and this is the beginning of the whole story and here is Marcus and Frederick he's a student where they start and he's a physicist actually in background the study they do the microscopy they do after the MRI as you see here is a polarized light microscopy here you can have the fibers and this is where we want to actually look at how these fibers are defined and how they are characterized these fibers this is a little bit what the Marcus like to call toward the nested human connectome we need to nest it that means they have to have a hierarchical organization we need to check that we need we have multi-scale objects at some point polarized microscopy will not give the resolution and we will have we want to take all these in the 3D space definitely we want a multimodal imaging approach and this is what we have to do now this space here is exactly this area here and we start with an area of the brain where we try to understand the composition of these neural tissues to rebuild the local connectivity between the neurons to cross validate and interpret the data that will come from different modalities definitely in this project data management and data processing analysis is going to be very big part of the project and of course to try to inspire the simulation and modeling to do that we need to have imaging to the cellular but also to the subcellular level and we want to do on the same brain tissue going all the way to all the labs of course we know that we will need to do with a subset part of a sub sample it will not be the full for the moment but we will align on the big picture because we will have the MRI pictures images from the brain and of course we will require a big data management and handling but it's not only experimental problem we have a logistic problem I just showed you where it go the sample go from Germany in eulish to Gothenburg where we are here where we will prepare the electron microscopy samples they have done the polarized microscopy here we will prepare samples for electron and the x-rays once it's prepared for an electron and x-rays they will go to Vienna for imaging on the x-rays once they finish imaging they go back to us in Sweden and we will cut in an array fashion a rate tomography for our single beam microscope and look at the same areas that they have seen here on the x-rays and try to see where they are there in the 3D and these images these sections that we are cutting here it will go to Germany again and that's where size overcohing enter because they have a microscope they have called the multi beam it's a scanning electron microscope that have 91 electron beams that means this microscope go 100 times faster than my microscope here in Gothenburg that's why this will be very good to do big areas of the brain that's where we want to do not just a tiny area but big areas and the multi beam will allow us once they finish the samples will go to Madrid for doing a more fine and high high resolution with a focus ion beam and in the end the data management will be done in between a Jewish and us here and the service I will discuss a little bit what will be the plan then we manage to image in the same tissue we want to do several as I told you already different modalities polarize x-rays scanning electron microscopy focus ion beam and multi-sem we will try to deal with infrastructures that will help us for the big data management and of course it is important to and that's entering the sample preparation that we have the perfect workflows that prepare the samples that compromise the minimum for the different modalities and this is just how the sample went this is the polarized light this is the area where Marcus and the team is interested we prepare on the resin we have a tiny area here that is this tiny area this is the x-rays that our colleagues in Vienna have looked at we can see that the preparation is going really really well mean that they send us the samples we sectioning and we start to see now the area on the brain of course we are very limited in a single beam we will send it to the multi-beam this is the way the multi-beam array work it's like hexagons they can do this in really minutes that will be for us let's put it in this way they told me something that they take for us 20 years it will take for them three months and this is just telling you the speed we will not be able to do the connect on without a multi-beam and here's how they look like this will be one picture this is one picture this is where we get and as you see in one shot this is what the multi-beam get and this is just to go to high resolution on the focus ion beam then I want to show you just high resolution images that Javier and Angel have taken now you see that we just navigate in the brain and see all the fibers that we will be interesting and this will help us to modeling they already have design a program they call SPINA that is actually helping us easily to describe to see where the synapse is as and we have the other project where it will be to really put together all the technologies here we have the polarized the x-rays this is a polarized now we go to the x-rays here x-ray low resolution x-ray high resolution x-rays with more resolution and this is the electromagroscope with the highest resolution and now you see as well we have used the atlas to navigate the sync connect to put the images and organize and overlay the samples but the idea is to do that of course this is only one section it's not a 3D the idea is to go in a full volume and probably use infrastructures as the ebrain and the phoenix probably some of you have heard about the ebrains and phoenix especially the ones of you that work with the brain research this is very much related to brain research and some of you have maybe heard about the phoenix if you have not heard about the phoenix and you actually need high computing data virtual machine services you can use it for free this is a research infrastructure that have really supercomputers and that can help researchers that they are in need of high computing and the ebrain is actually part of the phoenix and probably we will work in this or probably in here as well on the specialist once we have the access on the phoenix this is just to show you the different supercomputers that have around you the infrastructure and correlated model I just go just almost finishing I just showing you here other multimodalities this is just a paper that our collaborator York have got in this year where we don't only go multimodal with the lie and electron microscope but we are actually adding the Maldian mass spectrometry and the nano-seeing and also the scanning electron microscope you see we getting more and more for the moment in this project that we have done that we have not done the full fusion in between the different modalities but the idea will be actually to fuse the different modalities then as you see get the more and more complex now if you actually and you will get lectures about that where people do transcriptomics and omics and wants to actually fuse it with a microscopy and with the imaging samples but we have successful work with the groups in the AstraZeneca this is a work directly with the companies with the company where we actually they have actually a nano-seam system nano-secondary ion mass spectrometry where you get very high resolution but the preparation of the samples have to be like for electron microscopy that's what you see here an electron microscopy picture from the transmission this is our system and after the areas where they have found the particular vesicles where they have the compounds in this case the astros on the oligonucleotides it's a very important now for drug discovery and drug development with the oligonucleotides you have an example perfect example the RNA we have in our vaccines from the COVID and I just want to finalize with an summary here and some of the things I will just throw in a way to my colleagues there that they are the analysts and the developers as we see CMI is a field in the construction and we are actually relying in a quite broad expertise sample preparation image acquisition data processing and analysis and all of us have to work together we cannot work separate and try to get it we really need to do and we need to manage this and this imaging processing analysis need to be incorporated from the initial reflections of the project in the first moment you decide the project you have to start thinking how it's going to be processed and analyzed and this dialogue in between the in between us the microscopes and the scientists and the people have the research question and the analysts have to be constant and continued and we have to understand we have to make the communities understand each other and understand the requirements and they need from each other not only that the analysts understand what I want to do but also I understand what the analysts need from me and of course we would like to as much make automated methods the majority of the methods I show you here today this is ad hoc we have to develop on fly there are not many that we just go here and I have a perfect method a perfect project a perfect program that will help me to analyze and to do my correlation and overlap on my registration then definitely we need to work towards to get as much projects programs that can I can really take care of that and help us to process this data and definitely machine learning and deep learning is going to be very important in this field to to push forward or at least to combine the ad hoc methods that we wrote together with the machine learning and deep learning CMI where we will do ideally ideally where I would like is that we lead towards to a multimodal platform where they allow functionality structural and morphological characterization of an entire sample in vivo like brain like the brain we just discuss or model organic cells of tissues and they they allow us to to get as much information in a single platform where we can actually fuse our modalities and we can have the machine learning all the algorithms and products in one particular platform where all the different tools that they are they are developed by developers and the analysis actually sitting if we do that and we have this multiple multimodal platforms we maybe will need we will increase the accuracy of the correlated modalities and where maybe we will not need to have many a post processing after I mean what I'm telling you now is what ideally I would like to have as a person running a facility and managing is to have a perfect platform that allow me to have all my modalities in one place and in the long term that will reduce probably the number of modalities that will be required to answer this particular question of course to achieve that goal I mean there are people working as you see that we have different efforts for different repositories as empire IDR we also have companies like Cytomine or Scaleover Minds so all the companies that are also trying to work towards to deal with the big data and also multimodal data not just only big but multimodal and with that I would like to stop here and thank you for your attention and hopefully you have some questions okay good you can start thank you very much thank you for inviting Rocco and all the organizers so I talk about this with a topic of what is bioregional analysis so it's often that we talk about how do we do bioregional analysis but so we usually don't really talk about what we're doing so and as an introduction to this whole defragmentation school I think it's kind of general aspects that I try to talk about today so this is an image from Robot Cook in 1665 the first drawing based on microscope analysis the imaging started from there right and then there was a long time drawings and then eventually it's just 20 years that we started to capture things digitally and then do computations okay so this image analysis is in imaging many people say that image analysis is pretty difficult you do experiments you get images and then you try to do something out of it and then get results but then this part is pretty much a bottleneck and then so this was a survey in 2015 that we did together with other colleagues so we asked so one of the question was which step in imaging based research projects is the most difficult for you and then so there are three options that you can choose so one was experiment one was image analysis and one was microscopy then many people say image analysis so based on this survey we made how to say so this is the reason that we need network and then we got funded and then started doing this networking so let's think about why it's difficult I mean so by thinking about why it's difficult it actually gives you gives us many how to say understanding about what we're doing so so this is my opinion right so the reason of difficulty number one so this is because of we probably allow images too much for laboids too instinctive I mean intuitive in a sense because we see images we feel like we understand everything so that's the way our life goes right so that we see such an image and then feel like we see many things and then the first thing is that the beauty of these patterns and structures that you find out and then you're overwhelmed this visual recognition and then you forget about the fact that we're doing science but from the point of view these days so this is a drawing from 19th century but still we can see this as a classic way of doing in main segmentation we try to drill cells but of course so I mean there is no boundary in fact everything is connected but we try to see the boundary and then boundary that actually more or less interferes with the diffusion of molecules which eventually leads to the system and then so these boundaries that we try to draw is in fact segmentation but we're doing this in a more digital way using numbers that are actually measured using microscopes and then what we're doing in a sense is that there's an image and then that in fact our table or matrix of numbers and then we're trying to reduce that to certain representation of the situation that we're seeing so there's a process of reducing the dimensionality or complexity of data and then of course I mean so what we're trying to do is this so this is a reduction of dimensionality or complexity of nature in a sense but then we tend to stick to these images because of this tells you more of this but science is always like this that you have to filter and then reduce the number of complexity trying to understand what's going on so that's the basic process of biomeasure analysis difficulty number two so this is a because biomeasure analysis using computer is a new comma and then that we can see with the difference in understanding of image analysis and biomeasure analysis so image analysis in computational science is has much longer history than image analysis that we use in life sciences and now you can see that with if you try to get some stats from PapMed for example if you try to count the number of publication with image j all right and then you see that there's a lot of increase from 2005 by the way image a started to be developed from 1997 and then the the one that was implemented just for mac it's called an image this was for 1987 and then GFP became to be used as a marker for gene expression from 1994 and then cool ccd for microscopy so this is just my personal experience but it's from the all in 90s that we started using and biology and the cool ccd was actually initially invented for astronomy and then biologists immediately became wow this is really really great with this less noise and then this is where this actually the digital image started to be more popular in biology so before that of course we had ccd cameras and then we had normal cameras and then there's a photo multipliers that we're using for imaging but this was not really convenient because the association between computer and capture and device was not really digital it was a always this 80 conversion in between and that's all but I guess so well it's only recent that we're doing started so we're the newcomer and then because we are still new there are a lot of problem with this handling of image data and also analysis so I wrote the paper last we published with Simon in 2021 how we deal with this old problem image data handling analysis so if you're interested in just try to look this up and then but you know the problem of this scientific use of image data this is not really there's not really much solution yet it's just that we're trying to say this is the ethics or something like this but I think in science there's always a lot of problem including crowds and manipulations and then the fake data and so on so it's one category of such so that the just trying to make image data and analysis to be reproducible is all it's probably what we can best do with these situations but in case um so definition of image analysis so come back to this the difference between image analysis environment analysis so definition of image analysis computer computational science you can find in a textbook digital image processing by Gonzales and this so this is like really like a bible of computational image processing and then in one part you can find the definition which is quoted here image analysis is a process of discovering identifying and understanding patterns that are relevant to the performance of an event-based task one of the principal goals of image analysis by computer is to endow a machine with the capability to approximate in some sense a similar capability in human beings so what's this saying here is that image analysis is um in trying to let computer to recognize law just like human being so that you want to mimic the human recognition right so that's image analysis but so so I was always feeling something is different because what I wanted to do so by the way my background is completely biology and then I was heavily using microscope even constructing microscope and then also doing image processing but on the other hand I was cultured in cells but then so what I felt so this understanding of image analysis doesn't overlap with what I want to do with image analysis because of this so in biology image analysis is a process of identifying spatial temporal distribution or biological components in images and measure their characteristics to study the underlying mechanisms in an unbiased way so that while the image analysis and computer science trying to mimic human recognition what I want to do what I have been wanting to do is that I want to get rid of human recognition I want to capture the nature as objective as possible I mean that's science right so that there's some gap between the goal of image analysis and computer science which is to mimic the human recognition and goal of image analysis in biology which is to avoid getting interfered by human recognition so there's two different goals right so that we have to be clear about what we're trying to do in that sense so that because we use a lot of resources that was developed in computation science we use a lot of tools from this image analysis algorithms from computation science but we have to be careful that there are different goals in these two different fields and then we have to you know make difference clear that we are trying to do by image analysis not image analysis in computation science while using those tools so we do not have to be bordered with similarity to the human recognition we have more emphasis on the objective objectivity of quantitative measurements rather than how that computer-based recognition becomes in agreement with human recognition right so so as a new comment so by image analysis is a new field in life sciences concerned with quantitative measurements of biological systems the way to teach and learn is marginally established yet people feel like it can be learned through image analysis in computation science that's confusing right so that makes a lot of these problems are in fact of especially when a professional computational scientist without knowledge of biology and professional biologists without much knowledge about computational science work together they don't feel this difference in the goal of image analysis and then just collaborate and finish their project without knowing that the computer science respects the eye of bowels but bowels in fact think that we are trying doing very objective measurements and what bio-image analysis experts should do is that should stand in between and keep the integrity of science of those two fields okay so so that's the difficulty number two but then difficulty number three why it's difficult so one reason is there a lot of software and it's a bit chaotic so I'm using this label analogy metaphor here again like who we are so that so everyone knows how level goes I mean if you buy box of Lego so especially with a very complex one like millennium falcon or X-wing if there is no this little booklet that teaches you how to construct millennium falcon or X-wing it's difficult right if it's just the peepsies of all these blocks and then if you use that booklet the manual instruction manual you probably it's it's quite difficult to reach the final complete millennium falcon so this is like the same situation where we are facing I mean especially when you start bi-image analysis and then you start looking for Google and then you download software and then image is in your desktop on your desktop and then now you can we feel like okay so I can do image analysis but it you immediately face difficulties so one of the problems the reason that I found out through many of this talking with people is that there are some categories of software and in bi-image analysis especially so that so I make three different categories of software so one is collection and then one is component and the other is workflow so collection is like image or Python scikit image or such a library or software that just you download as a package and inside that collection there's a collection of components so components what I mean here is that for example like a Gaussian blur so that's a one filter that does something and then that's component or it could be a connected component analysis that's another component and those different component of package together and I think image J has like 500 different menu items and Fiji has about 1000 and those are the component that you can use components that you can use and then what you need to do but so if you download it just a bag of this component I don't even what you have to do then is that you have to create workflow that actually combine this component in different steps like this so this is a very linear so it should be quite simple workflow but you start with the image data that you capture and then step by step you apply those components and eventually you come up some numbers of plots of stats of visualization so this process of assembling workflow designing workflow so this is difficult part because when you download this software image J or anything so it's just a bag of Lego blocks and then the instruction manual in case of Lego you have so that you can have some of them step by step but in case of biological image data it's not like that you have to really think so what goes here and then what goes there so knowing the details of what component is doing especially if you want to do it do this scientific way sound manner then you have to really know what component is doing in terms of numbers right so this is the difficult part that actually gives you the challenges in bio-image analysis so what I wanted to tell you here is that generally everything is called software but there are subcategories and then don't mix them up is kind of too good and then it's been very quite helpful teaching people when I try to explain this first okay so this is the the definition so I already explained so that I don't go with component and workflow but there's a workflow template so that's kind of a so workflow is it's components assembled in a specific order to process image data to give some numerical parameters relevant to the studies about the cost system and then in many cases these workflows are only in papers they don't publish this as a tool so that means that workflow is tightly associated with specific biological problems so that you don't really reuse it so but this is essentially what bio-image analysis are doing is to help constructing workflow together with biologists and microscope people to solve some problem but there are workflow templates that are a bit more general form of workflow yeah so that offers us to tune numbers and parameters or swap some of the components like trackmate so these days trackmate is even getting more flexible and that's a very good example of these workflow templates so collections so there's a package of building components with an interface or API to use them or construct workflows software libraries major Python right so the difficulty of this software problem is that once you have software you know so many naive people just think that you can do it because software is there and then more you do the people believe that more you do you get better because people has a lot of experience with the games right the gaming software and then so the games you do a lot and then you get better and you finally reach the final goal and then you see the the boss of the enemy and then if you beat that boss you're done right but in case of this image analysis every attempts to use those software I mean the goal is different because it's science research should have original some answer and then using that software leads you to something new which means that you have to think and become creative in doing that because it's science right but there's a difference between game and biomed analysis software but it's tend to be that having a good connection solves you everything but it's not so with those difficulties how can we solve problems so that so I basically started putting in this order but I think it's better to start with biological problems which is essentially it's more important so I'm taking an example so it's not the details about this biological phenomena is important it's important but for the today's talk I wanted to talk about how this problem was solved by different group of people and then we kind of compare these different type of solutions so that the biological problem is this so there's a localization of genes at their at the exhibition level so this is a nucleus and then NPC so this is a nuclear envelope and then there are whole complexes embedded in envelope and then there's a nuclear lamina that analyzed this envelope and then there's a nuclear or listed a lot of genes right like this and then there's a single chromosomes that are somehow some of them are located towards the core the center of the the spaco structure of nucleus and some of them are attached to the periphery and then it's known that if these chromosomes are close to lamina the gene expression is suppressed but there are some evidence is that nuclear whole complexes activates gene expression so there's a activation and suppression in both hands if chromosome is somehow attached near to the periphery of nucleus so that looking at where genes are located in 3D with the nucleus was very important so that for this of course segmentation of the nucleus is important so this is a very casual easy segmentation using image resolution but of course there are many great tools and many different algorithms for nuclear segmentation these things so how do we do biomes analysis now right so that was taken out okay we now have equipped with excellent nuclear segmentation tools and then how we do it so that's kind of about that post-processing analysis because now we have good structure okay so method one so fish is this gene so this fish reveals the location of gene full size and then within this nucleus 3D we try to find those gene full size and then locate where it is so this kind of initial attempt was that you create three different zones so there's a periphery middle and core and then volume wise the existence of genes becomes equal probability with these three different zones so the volume is same in these three different zones and then locating the gene full size so different gene full sizes these are different genes and then so this one is very much in the periphery and so on like this and then you argue that okay so this gene is suppressed or activated because this gene is towards the periphery and so on so this is kind of a shocking approach because you can do statistics and then market with a t value student t value and so on right so that so I was kind of when I started moving 2009 or something this was a you know kind of established method so I kind of studied and then so so what they did is that they defined the nuclear shape by is actually a new people complex and then from there they started to measure the distances almost manually again so the method too so this is a bit more um in a sense elegant because of they defined some distance specific distance from the nuclear periphery and then measured whether the sky is located in this periphery or not so this was just you know binary category of where it's located and then made so statistics signals were considered to be present at nuclear envelope if localized within 0.25 micrometer right well but then if you look at the details they were detecting the nuclear edge using DAPI image and then based on DAPI they segmented with manual thresholding and then they made this kind of profile and then they defined a place where it's actually whatever the percentage of the total intensity is located well I mean you see that these two groups are trying to evaluate whether genes are suppressed or activated based on this measurement but these two groups are doing completely different things right in terms of image analysis if you see in detail and then it's also that scoring those G locations by arbitrary distances or zones I mean you cannot even discuss this group to the other group because you are using different categories so that's it's rather that it's more so when I started this problem I saw those two methods and then there should be more better measurements and then so what I used was the 3D distance map and then if you multiply 3D distance map by location of gene you get distance right because the multiplication result is this is zero one and then one multiplied by distance would be distance okay so that in that way so I could locate thousand of nucleus nuclei and then get statistics and good thing about this is this becomes a probability distribution of these genes so now we can do a proper statistics to compare different distribution of genes and so on so I thought this was a this was this is the way yeah so the best methods currently in 2010 but then in three years what happened is that one of my friend Christos Tima so his group published a paper which is without segmentation so that he used cylindrical coordinate to locate the position of the genes in this space because Christos Tima in fact so he is now a very famous image analysis expert in biology but he came from astronomy so he knew how to use those different kind of coordinates in such a space so he immediately employed that knowledge into this and into excellent analysis on this gene distribution I mean I really felt I'm lost right so in case so what I wanted to tell you with this experience is that so you see that starting from this three zone categories and then ended up in measurement without segmentation I mean segmentation is very in any case it's rather bias by human recognition because you are kind of with your gut feeling say that this segmentation is okay so that in any case avoid mimicking human recognition is the best yeah and then so that's biomegial analysis and so from these experiences be critical so don't try to just use the other guy's methods and then be careful and then study other methods and then from a biological point of view and then using the advanced knowledge on handling of digital image data you combine this and you come to some better answer step by step and eventually come to solution like crystal right okay so that's the kind of basics but then how we actually do those things and then so one thing that we thought is that we have to have a network of those export of biomegial analysis and then the reason is because of so I tell you the short history of biomegial analysis so that in 90s we had these a lot of biological problems but we only had very few options of image processing analysis software so that we just try to use them as much as possible and come to solution but eventually with the developers effort there are more tools right and then what happens is that you have to choose and then eventually it became more complex because multi-step the image analysis became more complex so that you have to make several steps of decisions so the export of using the software resources comes to some solution starting with biological problems using those software and then get some analysis results so that these are the biomegial analysis what we are in Noreas and then so those guys actually background is diverse you know and then to make distinction with development and analysis so this is a kind of figure so I already explained about this workflow which is a simple component so but what I do is that using the resources that are offered by developers we construct workflow and then try to work with biologists and microscopists starting with me and data and get numbers so I often show this slide so the difference between software developer or algorithm developer and image analysts so that if you go to so knife shop in Tokyo's fish market so I went there actually and then you I wanted to buy a good knife but I you get really amazed that there's so many different types of knives and then you don't know which one to buy actually yeah but of course I mean if you're a professional sushi master what happens is that he knows which knife should be there and then which step and so on but he can just immediately say I want this I want this for that purpose so sushi master chooses different type of knives on the way starting from the whole tuna into a small chunk of the fish so I use this different type of knives in different steps so this is really like you know of course I mean there are the the bladesmith so he's the the guy like him so he's professional in making each knife so as sharp as possible and so on but then so so this is like you know so the development software packages they are like bladesmith who are creating all those knives and then image analysis like guys who actually use them at different steps and combine them with workflow to create the sushi come to great results another way of viewing this difference between development and analysis is like this so that the collection and components those are handled and developed by or maintained by developers and then the workflow is in the hand of this image who uses those resources and I think the the one of the important thing about this difference is that the the value they have so developers they tend to go for speed and quantity and generic means that more generic the component is it's greater now on the other hand analysts so it's more about accuracy and then scientific adequacy whether this is scientifically okay and speed in a sense is less important sometimes I mean often though so if it's more accurate we can take two double the time of this computation and then that's also specific so every problem has specific type of issues and then we are solving every specific specific issue using generic tool so those are kind of two different things so nobias is the net of those image analysts and then that's why we are doing this school today in any case so at so the time is up almost I'm trying to go quicker so we are making organization of these tools so because it's there's a lot of different type of tools and then with numbers or that if you try to search for certain tool it's always becomes you get some results in Google but this kind of structure is not there so that we wanted to maintain such a structure and also that there's a lot of hidden workflows within papers but for example this paper has source code for image data which is linked in zip file which never really appears in the Google right so that those things happen and also that this is a very good workflow for some image segmentation in cloud ourselves but this doesn't really appear in a Google search because it's not evaluated so we want to have direct access to workflows and components related to a specific problem so that the list of software library is not usable and then Google search is too much so reading paper might not be helpful yeah and then so we better have some time some nice so search engine so that we are also developing workflow components organize index of those different tools so you see that there's collection collection collection and workflow workflow collection and so on so that we categorize them into different types then after I started making or what we recognize is that there's a lot of good things so that this is a kind of schematic view of the web to itself there are workflow component our workflows and there are components and then each of the workflows and components are tagged with certain keywords that is tagged by analysts and then those workflows would be linked to papers budget papers useful clothes and high level functions and then components are associated with computational concepts so that image processing in papers documentation APIs and so forth so those are the kind of two different things so the workflow is for biology component is for computational science and then they are within this web tool the good thing about this is that so that the one component is like uses a lot of this computational terms workflows are using mostly this biological terms so that and inside we can link component that is used by certain workflows so that there's a kind of we can even make a automatic link between those things and then each of these entity components are mainly tagged with computational science terms while the workflow is tagged with biological concepts so what this allows us is that biologists can start with biological terms and then flow into components which is computational terms so that without knowing much about computational concepts can access computational resources on the other hand computational scientists they can go start with components certainly in the computational aspects and then start to go to these biological terms right so so it's kind of a you know we try to solve the towel bubble problem which is that you know the people with different languages start to build a building but people start to build a building but they so the god got angry and then say that okay so you you speak different languages and then the building collapsed right so that we are in this field of biometric analysis we're interdisciplinary in domain it's purely interdisciplinary so that there are computational vocabularies like these there are biological vocabulary like this and then we are working together kind of feeding the difficulty in understanding with each other so the reason that this b.eu the web platform so this problem is like this so that we still see apple and then the english guy would say this is apple and then a japanese guy you cannot probably read this but we say lingon and then because we are pointing out the same object the these guys can understand that okay in japanese like this and okay in english like this right but the problem is the real problem is like this so this is apple cider and then the english guy I said this is apple cider but japanese guy we don't have apple cider in our culture so so what's this right so that to understand each other beyond those language barriers the first thing you have to overcome is that this conceptual difference so if you don't have the concept you cannot name it right so but now with this architecture of component workflow so biologists and computational scientists computational scientists can start to think with central mayor and then reach the ffd band pass filter maybe and then computational scientists start with ffd band pass filter and then reaches some problem in our biology so with this organization so that we kind of find out that turning out to be more than just an organization and then people with different vocabulary bars are working together to create new values so that's what we're doing and then we also introduce ontology so this is about informatics terms control book galleries to organize the terms that we use so I will just quick because it's only 15 minutes so so learning is important so more component you know better workflows and then there are textbooks so we are publishing in 2016 2020 and then another one coming in October and then these are all about mostly about workflows because there are a lot of books about components algorithms but there are not much books about these workflows we're trying to make textbooks about this so this skill so we named it defragmentation right so that so defragmentation is because the bio-image analysis resources are getting more and more complex and advanced so that in fact bio-image analysis needs three different types of domain so that one is biology and imaging microscopy and biological concepts and problems other is computational science and other is mathematics and then in the cross-section of these three domains bio-image analysis exists and that's what we're doing and then in the crossing between these two domains such as biology, imaging and computer science there are data processing issues and then there's a mathematics and computer science there's a machine learning and models and then in between biology and mathematics there are models I mean hypothesis and statistics so that bio-image analysis should deal with everything and that's why it's very busy and also many different types of exercises there and then each of these domains are getting more and more advanced and then we are kind of getting really fragmented with different type of skills so that the defragmentation is how can we within this complex advanced skills how can we defragment all those skills and put it back to bio-image analysis the goal is to measure biological systems so that so when I see this program so the intention of the ROCCO and the organizers is that like this cloud and computing GPU computing workflow designing workflow evaluation with benchmarking machine learning so that so to me it looks very much centered in this direction and somehow the other half is not there so it's still defragmentation from my view is that half the way but bio-image analysis itself is somehow heavy weighted these days only on these parts and then somehow forgetting to do these things so that I'm trying to push to reintroduce these things back into this because we have a lot of knowledge in here from biochemistry and also the traditional imaging without segmentation so we analyze a lot of temporal dynamics and special dynamics in different ways which we tend to forget with the way we'll create tools too much maybe okay so a conclusion is that in bio-image analysis your scientific and computational creativity matters how to use computer software is only the beginning and biology is a rich source of inspirations for analysts resources are there and how you disk jockey those resources is the key to drive your creativity thank you and then I try to so I think this is a very good time to list all the people I interacted for in bio-image analysis but I started I didn't have time to go all and then stop there but I think another page is required and then I'm sorry if I forgot someone I mean I didn't have time to go I think another page is required for this thank you well after those encouraging words from quota let's take a little bit of a turn and so my name is Asta Makan I'm the coordinator for image data services at EuroBioImaging and well I want to talk to you about the data services that we offer at EuroBioImaging also what is EuroBioImaging and some of the community efforts that we're working on towards providing image analysis as a service all right okay so what is EuroBioImaging well it's a research infrastructure what do I mean by that I do know if many of you are very familiar with it so I'll just say that it's basically a concept that the European Commission came up with these are varied in topics so you have research infrastructures for getting basically access to instruments which could range from telescopes or some particle accelerators or so on to biological things including imaging so EuroBioImaging is a European research infrastructure consortium for open access to biological and biomedical imaging as well as analysis technologies now come I'll talk more about that later we're actually very distributed in nature so all the countries that you see in green are the member states that form EuroBioImaging and we have currently over I think it's already over 149 so let's say in that bulk park individual facilities that are part of EuroBioImaging so these are facilities that could already be a part of a university or a research infrastructure or some institute but at EuroBioImaging they are basically coming together to provide access to external users so these pins are actually denoting that apart from the nodes that actually we call them nodes the facilities that provide access to imaging instruments we also have what's called the coordinating hub which is a small team of people that are managing access for these users when it comes to both instrumentation as well as analysis and we sit in three different places we sit in Turku in Finland with our Sattu A3 seat we also sit at EMBL in Heidelberg where we have a biology hub or also the data services I'm also sitting at Ambul or in Turino where our medical hub sits so that's a little introduction into EuroBioImaging as a user, as a scientist you can basically access more than 50 technologies I think it's even more by now different kinds of imaging technologies ranging from microscopy techniques to medical imaging technologies so that's the basic idea of EuroBioImaging to offer open access and open access to instruments yes but I think the really key point here is with support of the expert technical staff because those are the people who are behind running these instruments and they know the real depths of how things work and most importantly how things don't work so that's why when we provide users access it's not just the instrument but it actually comes together with the experts that know how to use those technologies so we are very happy to work with all the staff at our different facilities we also provide image data services and today I'm going to focus a bit on that as well as we provide training opportunities for in different forms including to a certain degree this course as well so when it comes to the technologies well the range is huge I alluded to that earlier it goes from very small molecular sort of imaging technologies all the way to human imaging preclinical imaging and so on how actually to access this well we have a very simple way so as a user you can you can apply through our web portal and if your project has not gone through a process of scientific review then that's also possibility however if you've already gone through scientific review you can fast-track into the technical feasibility at our nodes whether it's possible for them to carry out your project or not and voila we put you in touch with the nodes you can go there you can also actually travel to these places and sometimes they even having some grants to support this travel as well as carrying out of your experiments here I will also point out that we are currently having able to offer image analysis as a service too okay so now I wanted to get a little bit more into the data aspect of it and I mean I guess it's not very surprising for this particular audience but there are certain peculiarities when it comes to imaging data especially when we talk about cloud computing and using image data in the cloud and well one thing is also just that the sheer size of the images it's actually pretty large you wouldn't be surprised if I tell you that single experiments go from multiple gigabytes to I don't know terabytes of space and then this is just a little thing that I pulled out of image data repositories one of which is one of the open image data repositories for keeping specialized for keeping image data and they contain 104 studies which make up to 307 terabytes that's already giving me a bit of an idea of how much this data can be in addition to that we have a diversity of formats so in general with the diversity of instruments there's also a lot of formats that people need to deal with when it comes to analysis and in addition to just the format you also have formats or at least the information or the metadata that is important for you to understand and have a look at before you're even able to do any downstream analysis those things are also a bit of a challenge to get right as well as well with the latest technology you always have increasing complexity of the imaging data with the multimodal technologies and specifically the correlative data sets where you're working with different modalities of imaging at the same time these methods are of analysis are becoming more and more challenging well in addition to that I don't know if this crowd has already been introduced through the fair concept but that's I actually put it because well what we're trying to do is we're trying to make image data a resource a global resource that people can actually use trust and reuse so that's why we want to make sure that the image data that our users are producing is findable, accessible, interoperable and reusable so this goes both for data as well as analysis methods and at EuroBioImaging we are trying to support people whether it's directly the users or the staff or that is working at the facilities to be able to work such that the data that is produced is both data as well as analysis is both is fair and open so there are two things that we that I will tell you a little bit more about so first is how we help directly a user so this is where our expert image analysts come in so as of now EuroBioImaging is able to offer image analysis as a service so you as a user can actually approach us and you will be able to access let's say request image analysis as a service on your own data set you could have acquired it completely at a different place but our expert image analyst at any of these facilities that are currently taking part in the proof of concept study for this particular technology will be able to help you while the kind of technologies they offer well of course they are experts in a lot of open source software whether it's like Fiji they they can help you with image analysis with Python and of course a lot of them are actually really with an idea so we have a lot of ideas folks who are actually at working at the node at these specialist image analysts to help people with the image analysis needs some of our nodes are actually offering these services for medical imaging as well so it really depends on your question and if you're able to help you we will be more than happy to and the reason I bring it up actually in this crowd is which was kind of the reason why Roco was also proposing that maybe it's a good idea to talk to these people to you people about it is that this is actually one way that we want to also bring forward the importance of image analysis by providing external projects to these to our expert image analysts this will hopefully help them get visibility as well as some funding opportunities because your bi-imaging can also support some specific topics of analysis through funding support as well so this is what we are looking at right now and we are very excited to start supporting people with image analysis providing image analysis as a service so these let's say services are mostly directed to individuals I mean apart from the fact that our node personnel are also involved in maintaining some of these open tools and libraries which is quite important in general but this is they are also happy to help a single user go ahead with their analysis needs but at the coordinating hub we are also trying to do certain things which are helpful for the community in general so this is basically the services that let's say at the hub or where we sit first of all we actually participate in a lot of European projects Just Life which supports this particular course is one of them but there are many other projects that we are part of which could be specific for either a particular like either COVID-19 in case of COVID or just for health datasets in case of healthy clouds so we're just trying to get visibility also for image data in the European research landscape which we find quite important because not all data is sequencing and genomics data right and we also support open data repository image data repository specifically the bio image archive or the IDR and Empire specifically for electron microscopy images so we actually help our users so we have in our team we have Dale Stewart who is working on helping people put their data in a certain format so that it's easy for them to submit eventually to one of these open data repositories so we are also happy to help users with that process well what's also interesting is that I talked about some challenges with the with image data but everybody is also working towards you know solving these challenges let's say or overcoming these challenges and one important part of that is to coordinate the efforts that are going on in the community so that's another thing that we do we actually work with a lot of community-driven initiatives whether people are talking about how to put the metadata right or how to work on a particular file format so these are the kind of things that we are also working and we are also providing some sort of soft support so we have expert groups where people can drop in and discuss about what are the problems they face when it really comes to offering image analysis as a service because now it's not particularly streamlined right now but it's an exciting place where we all find ourselves currently so yeah these are exciting times to be working on on image analysis I must admit okay well there are a couple of examples that I pulled out so a few things I want to remark first is the importance of the open image data repository so this is basically the bi-image archive it's a it's an archive for image specifically for image data that is held at MBO EBI the basic idea is that they welcome all kinds of images that are related to a publication and this will basically help everything from reproducibility of the results as well as having some sort of reference data sets that you can always look up like I wonder what a classical HeLa cell looks like so you know those kind of data sets should also be there so reference data set but also keeping in mind that there's a potential once all this data is openly available for either new research based on this open data or potential reuse and again this is probably the audience that might understand this more that if the data is openly available it opens doors for potential further research whether it's developing different algorithms for image analysis and so on so you know the end the possibilities are endless potentially if we have enough of this data the bi-image archive is actually made in a particular architecture it's made in such a way that there can be added value databases on top of it and to example of that is actually the IEDR an empire which could still have at the bases the bi-image archive to store the data but have an extra layer of informative layer whether of metadata or maybe displaying data in a particular fashion and serving a particular subset of interest for example empire for electron microscopy datasets the other thing that what we are trying to do to basically encourage people to use this and this is again this work supported by a couple of the EC projects that we have at your bi-imaging is to really go through this pipeline of getting image data putting that in a standardized file format and maybe be able to do some kind of compatible image analysis and then be able to submit it to in our case bio-image archive so here the currently the work that's going on in our team is also supporting the open microscopy environment team and further development of Zara-based file formats which are very useful for competing in the cloud and I guess you probably will hear more about that in the next days so yeah and I think you probably I don't know if Boora is going to be able to join but he definitely has helped in preparing some materials for this course as well as some work that has been done previously already as a part of the EOS life project where there was some proof of concept image analysis pipeline and I guess you'll hear about that from Beatrice and gone in the couple of next days or like next sessions of this course too now the idea for us is to make a very modular pipeline and it's sponsored by these different projects that we're involved in and this will actually be made as a Galaxy pipeline and made available to the users so in a way that we are trying to do is to make a tool that is potentially useful for the whole community so we are not targeting a single user but we are really trying to patch whatever we feel currently is the is need from the community as we understand it well I will make a brief mention to another project that we have recently started with and it's called AI for Life here the idea is to provide out of general intelligence services for life science imaging well it is basically based on a resource that already exists it's called the Biomage Model Zoo you guys can have a look at it it's a place where you store pre-trained machine learning models the idea is that you can actually go there easy to access model you can put your data run it and get a result even try to compare different models and so on during this project during the lifetime of this AI for Life project we'll try to make Biomage Model Zoo more and more user friendly and try to have more compute resources behind functioning of this of this website so please keep an eye out use it if you have any problems that is no because we try to rectify those too well so yeah that's just a bit of a wrap-up what we're trying to do at EuroBio Imaging is to help the users I mean well to basically make this image data fair and open again supported by our EC projects so that once the data is in this domain it can actually be used to generate new methods and new knowledge again some of these proof of concept things are being done by a different within the different EC projects and definitely what's very important is what our nodes contribution here is which is basically trying to get users directly to this new methods knowledge maintaining a lot of these open image analysis tools as well as helping people with their image analysis needs so yeah with that well thank you very much if you want to contact me please feel free if you have any questions whatsoever we always have our doors open and are very happy to to you know even have a chat thanks for the invitation I'm happy to bring the little Swiss contribution here you didn't see see often the Swiss flag in the last presentation so but we still have access to some of these services offered by Europe so like Galaxy so we'll see Galaxy also today so yeah so I'm working at the at the Microscopy Imaging Center at the University of Buron and so essentially I'm developing a lot of software for people and using Jupiter almost every day so I and mostly I use it on my own laptop but sometimes I also use some of the services that I will talk about today for demonstrations for courses and also to share workflows with with other people so when I was asked to give the course I thought about what I would present so I could have also focused on one specific tool and showing you all the details of one specific tool but given that the course is not so long I think it made more sense to give you an overview of all the possibilities they are to run Jupiter in the cloud so you will see there are many very different types all of them have advantages and also negative sides and I think it's good for you to have an overview I saw that most of you have experience in Fiji but probably most not so much in Python or with Jupiter so we'll also just briefly explain what Jupiter is and so the practical part of the exercise is going to be more about accessing these services and see how they work more than running things in a notebook or a writing code so you're not supposed to know Python to follow okay so yeah I pasted the link to this presentation in the chat so you can access it there otherwise you can just type bit.ly and defrag Jupiter it will bring you to this presentation so there are a few links that might be helpful later on so just summarize what notebooks allow you to do so classically when you do image processing or any kind of data analysis you would write a script right in the language you want it could be also in Fiji and then you open a terminal or you open the some command line tool and you execute your script and this gives you at the end an output folder full of outputs it can be tables like a CSV file it can be images of segmentation it can also plot and analyzes right so when you are developing your workflow this kind of approach is not very efficient right if you want now to change some parameter in your workflow or adjust something you have to rerun everything wait until it's done check the output and and do this over and over right so this is why in data intensive feeds people have started using more and more these notebooks and in particular Jupyter notebooks so Jupyter notebooks look look like this so this is this page that you see here in this part here and it's essentially just a simple text file formatted as a JSON so it's a particular file but you can open it in the text editor if you want which is rendered in the browser by Jupyter okay and it consists of different parts you can write and execute code so you have these gray boxes that contain code that you can execute you can also have rich displays right like images like you see here on top that can be also even interactive sometimes and you can also comment your code and not just put comments like you would put in a in a regular script and just with the hash or something like this but you can actually format your your your your code right and create an actual document with titles and links and lists and whatever you want using this language called markdown so this is a markup language the very simple language that allows you to format your text for example to create a title you would just write a hash and then your title to write in bold you use stars so it's kind of like latex but a thousand times simpler this code these notebooks they execute step by step and so this is really an important part right so when you design code when you design a workflow you want to do step by step think step by step so for example you do a threshold and you have to pick which method you use for thresholding so you can try a few look at the output and then go on from there and so this is one of the main advantages of these of these notebooks right you can do things step by step and come back and restart and without executing the whole script all the time these days in these notebooks are mostly used with python they were designed at the beginning to work with other languages the name comes from this right the Jew Jupiter stands for Julia Python and R but these days it's mostly used with Python so you can use it with other languages but I would say the majority of people use it with Python you can also run other software in there via different mechanisms one of them is for example Pi image J which is a little package that allows you to use image J or Fiji functions directly in your notebook so there is a kind of an interaction that you can create between these two worlds and this is also true for other languages right you can also mix for example R and Python in the single notebook so it's a very rich environment so many other details and things I'm not mentioning today for the sake of time but it's a very interesting interesting tool so what are the main advantages so as I mentioned the main advantage is when you design a workflow that you are very much more closer to your code than if you just execute a script it can do these things in a very iterative way and this is valid in other fields also right so these notebooks that are used in companies like Bloomberg to do financial data analysis that are used in other research communities for example in Geosciences there is this Pangeo project because they deal with similar issues as we deal right with large sizes of data and computational intensive tasks and you see that Jupiter Hobb Jupiter here is in the center of this project that they have so it's used in many other places so even if you stop doing biogenesis or research learning how to use this kind of tools will be useful it's very useful to do the code documentation and for workflows for papers or anything where you use code and for reproducibility right so thanks to these mix of output code and commenting you can really be very efficient at explaining what you are doing right so instead of publishing a paper and putting a script on GitHub and saying this is how we did it you can actually explain to people step by step how you actually did it right and then people can reuse it maybe fix errors but at least it will be a bit more reproducible than the simple file that you're supposed to execute so as an example I'm using these notebooks for example to document packages that I write so I wrote a small package called microfilm to make images of multi-channel data and so I've wrote some tutorials and some documentation in these notebooks and there is a tool called Jupyter Book that allows you to transform to transform a series of notebooks into a little website right and then you can use GitHub to publish this but essentially you can use these notebooks for documentation in a very efficient way the big advantage is compared to writing documentation in abstract way where you're not sure that your code actually executes here the notebook has to execute properly for the output to be there right so you're sure that what you tell users to do at least here works and when they copy paste your command it will actually work so it's a very efficient way of writing writing documentation and finally and this is the topic of today it's a very efficient way of exploiting cloud resources and the reason for this is because of the mechanism of how Jupyter works so your notebook displays in the browser right you only need a browser you don't need an application to install so anytime you have access to a browser including on your phone or anywhere you can access Jupyter and the actual calculation is done somewhere else it's done on a server so Jupyter has a server a Jupyter server and on the server you have a kernel or a run the kernel is essentially the basic thing that runs your code right and can be Python can be R can be also other languages but this is the important thing that it runs somewhere else so sometimes it runs on your own computer but it can also run somewhere else and you essentially send from your browser you send information on what computation you want to do the computation is done and then it's sent back to your browser as an output if you if you want it but the data are not necessarily stored inside your your your browser right so the only limitation of this is if you really want to look at at a gigantic image might take some time to send it back to the browser so essentially you can run this server and this kernel anywhere you want as long as you can connect it connect to it via internet right via your browser so it can be on your laptop this is how you use it most of the time I would say you can run it in the cloud this is what we are going to look at today and you can also run it on a cluster for example actually if your university has a cluster in principle you can install there a Jupyter and then exploit all the resources of your cluster on this via Jupyter which makes it much easier than running this Slurm script or whatever that normally you have to do right so you just need a bit of collaboration from the people managing these clusters which is not always easy but if you find some people are ready to help you it's a great way to use to use a cluster okay so in the end for the user where it runs doesn't affect the interface so you will see we'll run Jupyter in many different places it will always look the same way right this is also kind of an advantage you are never lost because all suddenly the interface is different or you you have to relearn something something new okay so why run Jupyter in the cloud at all and not just on your laptop so one of the main reason is that it allows you to try new software without affecting your computer at all right so it's even more isolation than with things like Konda so maybe you heard about these environments that you can create on your computer so essentially creating little spaces on your computer where you install specific packages for each of your projects so here you can really start in the cloud some Jupyter instance so we use a Jupyter instance install a bunch of software and whenever it breaks for any kind of reason you can just suppress a project and start from scratch right if you start having this kind of issues where you cannot install things anymore on your computer it can become really complicated to uninstall the right things so if you just want to try things out this is really a good solution you will have access to your project from anywhere right especially in these days where we do a lot of remote work and I think in the future we'll do more and more so you can start somewhere on some computer doing some work go home and connect from your home to the same resource and continue there your work right so your notebooks your data will be still in place and so you will be able to continue working you this gives you access to a wide range of different hardware right especially if you need GPUs for the infamous deep learning apparently or if you need a lot of RAM because if you have a huge data set this is an easy way to get access to these kind of resources right you don't need a desktop from a facility somewhere that has a really good GPU you can in principle if you get that and write access do that over the cloud it's also a great way to share things so you can create demonstrations or for papers and people can then run them interactively so people can depending on the mechanism people can copy your project and rerun it or they can get access to whatever you did and it's not just the static thing that people read like on GitHub it's really a thing that people can then run themselves check modify reuse give you feedback etc etc so this this is valid for code of articles there are more and more places where you can actually publish these kind of things like Neelife for example it's a great way also to do courses so sometimes I use some of these resources to create courses to avoid having people have to install anything on their own computer for software documentation as you have seen just before and to use these resources what you will need are the following things so you will need a running instance of Jupyter so we see different ways of how you can access access this so it needs to be running on the cluster or on the cloud that you want to use you will need the necessary software to run your script so maybe you do some segmentation with Stardust or CellPose so you need to be able to install these packages wherever you are connecting you need to to write hardware right so depending on the services you use them might be some limitations so we see that and in the end this is I think quite a crucial point you need access to the data right and I think this is probably still a bit the limiting factor in all these kind of online tools for the day to day work is the necessity to have the data close to where computation happens and when I say close it's not like physically close it just close in terms of how fast can you access the data right if every time you need to do analysis you have to upload 200 gigabytes of data from your acquisition of the day people tend to never do that because it's low because you have maybe limited space where you need to upload it so this is for me probably still one of the main limitations and we saw it also a bit in the previous presentation that there are difficulties in creating this full workflows where everything is included because handling data and in imaging we know although that this can be complicated when you have very large data sets so this is a bit of limitation yeah I didn't say this at the beginning of the presentation but so I will just present these tools and then we will explore a few practically together and during the exercises so there are many flavors of notebooks Jupyter on the cloud so I will just make a list and really focus on a few but I just want still to go and quickly through all the possibilities so one that you probably know or have heard of is Google Collab so this is the own version of Jupyter from Google Collab that gives you access to GPUs for example then you have the actual clouds this is what I would say IT people would call the real cloud right so it's big companies like Amazon and Google that provide to you computing resources in the cloud in a very bare way right so you get access to computers and then you have to do everything yourself so there are big players there are also smaller players so in Switzerland we have for example switch engines which works out with academia which provides the same kind of services cheaper and probably a bit more friendly and accessible and then you have other services like Amazon SageMaker that we just mentioned after that then you have dedicated resources so these these main cloud resources they are meant to to do anything right any company can access these cloud resources and run anything on there it's not dedicated to Jupyter so you have to do some effort to use them but there are some companies running services that specifically run Jupyter notebooks and so I would just show you an example with paper space and finally we have like public resources that are for free mostly and that you can access with an account and so this is what we will focus on today in particular on this galaxy project that was mentioned before which is a new project and on this binder project they fulfill very different tasks and so I think it's it's a good example in the exercises can test these two out but I will talk a bit more in detail about these three towards the the end of the the end of the presentation okay so first let's look remember what collab is so you see here typical collab notebook you see it looks very much like Jupyter lab a Jupyter notebook they just change a bit the UI the main advantage is that runs on Google infrastructure and you get access to GPUs for free so if you don't have a big GPU on a computer and you're where you're working this is a kind of a good solution it's really terrific to making demos and courses especially if you do courses about deep learning machine learning getting access to I don't know 30 students to a GPU and at the university is almost an impossible task at least in the smaller university like like burn so having access to Google collab is really nice and the data storage is provided by Google Drive right so if you pay a bit for storage it's quite unlimited I would say the disadvantages of this of Google collab is that there is no really way to adjust resources so you cannot set how much RAM or how many CPUs you want access to so you get something assigned so depending on what you're doing this can be a problem so I've seen people online complaining that their software wouldn't run because it ran out of RAM there is no guarantee of service so this is one thing that I keep repeating to people who tell me how Google collab is fantastic nothing forbays Google next year to say we shut down the service or suddenly it becomes incredibly expensive to run right and Microsoft did that with their own version of notebooks some day they said we shut it down and so if you were using that service you had to find something else so I would recommend not putting all your eggs in the same basket and still use other services to make sure you always have access to something they have their own version of Jupyter as you see here so the risk is also that some features are breaking at some point some widgets some interactive widgets were not working anymore in collab so if you have your whole notebook based on this it didn't work anymore and there it's not an open source community right so you cannot say find an issue and say hey please fix this you just have to hope they fix it and the last thing which is quite a problem is the fact that the environment that is provided by Google is somehow frozen they pick the Python version which is installed and you can really not change these things so depending what you need to install on Google collab to run your experiments your analysis you this might really be really be an issue right so there's no way to create your own environment okay see if you want to explore this I really highly recommend to explore it via especially for microscopy this project for zero cost deep learning for microscopy from the Enriquez lab so they provide a series of notebooks that work you step by step so how to do deep learning for microscopy for different tasks and so they have a really long series of notebooks that you can run so even if you don't know Python or any coding it will be easy to just go through the whole routine so I really encourage you to look at this if you want to explore okay so the next one is the actual cloud so this will be quite brief so when you rent a machine on the cloud it comes as an empty machine right it comes with an OS usually it's Linux because this is how the cloud mostly works these days is using some version of Linux and then you have to install everything on this right you will have to install Jupiter you have to install Konda make sure that everything runs add storage to your to your machine making sure that everything works together making sure you can access to it remotely so this can be quite complicated the big advantage of this solution if you have some people to help you maybe is that it's very very flexible in terms of the resources that you can get access to right so this is an example from this switch solution that I sometimes use and so these are the different machines that you can run so you can have tiny machines with 500 megabytes of memory or you can have another machine with 65 four gigabytes of of memory right and you can use them for a day two days shut them down close them down so it's a very flexible way of doing things but it requires some some IT knowledge right so the real cloud is probably not the most recommended thing so it can be really complex to set up and maintain you're also responsible for the safety and the maintenance of the this instance that is running so to tell you an anecdote I was giving once a course and I was running my Jupiter hub to give access to Jupiter to students on Google so I rented the computer at Google and suddenly everything shut down and during the course and I received an email from Google telling me that they shut it down because somebody was mining Bitcoin on my instance okay so somehow somebody got access to my instance and was using it to mine Bitcoin so this is the kind of things when you're not an IT person and I'm not an IT person that can be really complicated to deal with and the last thing it can be difficult to a project cost right so this machine run and you pay the bill by minute or whatever and you have to pay for very different things for storage for RAM for for usage for even a communication for the IP address so it can be quite complex to handle okay so now the last the last set of services so all these services they use a technology called containers to basically allow you to run whatever you want to run and run it in a reproducible way so we just quickly explain what this container technology is without detail and then see how it's used in these different different services with a few with a few examples okay and so these containers they are provided by this Docker Docker software so this is your cloud and you have these computational resources inside your cloud right and then somebody comes and says I want to use your cloud so the easiest thing you could imagine is to give that entire cloud to that person and treat this cloud as one big computer okay then that person starts installing things in there they use scikit image of a specific version they are happy but then comes the next person and says oh I also want to use the cloud and I want to use another version of scikit image and now you have two problems these people are going to fight for the resources so how do you share all these CPUs when they are using the same machine and also how do you deal with the conflicts of the things which are installed in this in this in this machine and the solution to this is to basically give a small computer inside that a big computer to each person right so the first person gets a computer with two CPUs here and that person can install whatever whatever she wants same thing with the second one same thing with the third one just have to make sure you have enough resources in your cloud or put some limits on how many people can access but this is the way these clouds are usually used in the kind of services we're going to see is that you basically limit you create a small container where people can run their things right and it's actually called containers and it's called containers I guess for two reasons the first one is that they are completely isolated from each other like these containers right so what happens in a container stays in the container so you're not affecting anything from your neighbor when you're doing something in your container and the second thing is that the container contains everything that is needed to run whatever you want to to run okay so these containers again they are run by this Docker software and Docker provides an isolated space where software can run without affecting the rest of the computer so it runs its own OS so you can install Docker on your own computer if you want so I can install Docker on my Mac and run Linux on inside this container so it's completely independent you can install any software so in our case we would run Jupyter and packages maybe Konda you can access to data outside the container so you can make a link to outside so you can still access the data on your own computer or in the cloud or anywhere else you can still communicate with the inside of the of the container from outside so if you have a Jupyter server you're running inside a container you can access to it you have just to open the right path to communicate with it and the last thing is that once you have a container you can essentially make a snapshot of it and upload it to some repository and reuse it later okay and there are some basic containers that you can people reuse all the time and then you can adjust these containers so I can use somebody else's container and then add whatever I think is missing in this container create a new container and upload it right so there are multiple layers inside these containers to create whatever you need to to run your analyzer so these are usually summarizing these dockerfiles so these dockerfiles say give all the commands necessary to install the necessary software in your container and you use what is called the base image so this is just an image that contains all basic software and then on top of that you can install what you want so you just have to be a bit familiar with Linux to write these things usually we don't want to put our hands into these files this is just to show you how it works but the services we're going to look at there are essentially an interface for us between these rather complex docker software and us right so they will facilitate our life to use these services but they will all in the end run with docker and so there is always for all these services a way to tweak the docker image itself if you are familiar with these with this with the technology okay so let's look at the first example which is this paper space here so we just go live it will be a bit more interesting but you have in the slide you have a bit the detail so if you go to to paper space you this is like a private company you need to sign in create an account this is for free and then you have all your projects in there and you can create a new project give it a name so I would just call this the frag and then it will ask if you really want to create a notebook so yes you want to create a notebook and then you will have the opportunity to choose a pre-made environment okay these services they're usually made for people doing machine learning and so they have all the main machine learning environments set up for you so if you want to run something in PyTorch then you just say okay let's run it in PyTorch and then down there you choose what kind of resource computing resources you want and you say okay here we will get a free GPU also so you could say a free GPU or use some more a dense machine that would be much more expensive okay so you see here this machine costs six dollar an hour for example so if you have to run it for a week it's going to be expensive and there are different flavors of these machines that's essentially we can stay here with the default free one and then you can say how much time it should run you see this is also limited you cannot run things for a day if you need to do some heavy deep learning training so there are some limits in there and then there are advanced options where you can specify a docker image and other things but we don't want to go through there and then you just say start notebook and it will just take a bit of time to start an instance of docker and start your jupyter notebook but in the end you will be able to run your notebook as if it were running inside your inside your on your computer okay so you will get I just keep a few of these slides so you will get in the end you will get here a notebook running inside this paperspace service you see the address here is not this local address that you usually have so it runs remotely but it looks exactly like in the jupyter that we have seen before right and this just runs remotely on the GPU provided by this paperspace service you saw that it's incredibly easy to start so this is really one of the advantage of these kind of pre-made services and you also get the free GPU like on Google Cloud in this case you can also upload some data there are limits there and you have also to pay if you need more it's unclear where these data are stored so if you have sensitive data be careful on what you do the same count for Google Drive if you have medical data you are not supposed to just upload them randomly if you need to customize these things it can be a bit complicated right so it's not straight forward because you have to know a bit about Docker so if you need the plain environments that are provided by them that's fine and you can add more software directly in the notebook and I think the main limitation is it can become expensive right so again it's difficult to manage the costs and so you have to be a bit careful a bit careful about this okay so let's look at the three last things which run in public instances so the first one is binder so I will just go to the binder website and see what they what they what they say so this is the my binder.org website and they tell you that it's a system that turns a git report into a collection of interactive notebooks so what this system what this service does is take a repository so you will have a given repository here that contains maybe notebooks it contains also maybe a file that says how software should be installed so for example in this repository of my microfilm package there is a little file called environment yaml so for those who are familiar with python you know these files so this is the list of packages that should be installed so what binder is going to do is just to read this information copy over all the notebooks and use a little tool called repo to docker to turn all this into a small docker image and then this will run in the cloud a cloud that is paid by university and some foundations and you will be able to run your a notebook interactively okay so we'll try this afterwards we'll create a small repository and try to run this in the end what you can do is create a little button on your own repository if you want and then you can just start it from there but for the sake of the exers of the example I just do it live here so you just copy the address of your GitHub repository you don't need any login right it's free for everyone paste it here and then say launch and if you run it already it's already built so you don't have to create a new image so it's fast if you never did it it's going to take some time to create that image and push it to the right place but here it will launch my server right so this is entirely running in the cloud I had nothing to do just copy pasted a GitHub address and now I have a Jupyter instance that is starting right and runs on a remote cloud and just to convince you I create create a notebook and I can import the NumPy as an MP so this all runs right it's interactive it's not the static file and then I can open the notebooks I made here and try to try to run them okay again you see that this interface is exactly the same as if you were running on your own laptop okay so this is Binder a really cool service for demonstrations the limitations is the amount of computing resources you get so it's very limited since it's entirely for free and open to anyone and the sessions are temporary so it's not to do actual read work it's really more to share things to share workflows to show to people how they work to share to let people try things out even for very short courses but not to do actual read work right and there is no way to save your your data you can do also other things like using RStudio for example in there you can even run in the desktop and run Napari so there are lots of possibilities so the next example is Renku so Renku is a service provided by the Swiss Data Science Center but this is open to everyone and that the goal is to do reproducible and collaborative data analysis and it's based on a series of technologies so it's been in GitLab which is almost like GitHub that hosts your code your notebooks and your data and then it accesses to the cloud via this Renku software that you can also install on your cluster at your university if you want and this via Docker again is going to run your notebooks okay and the advantage here is that you can keep your changes so you're using Git you can commit your changes that you make in your repository and commit them to GitLab so you can keep your modifications so this resource you can actually use to do real work in that thing right you work on a project to keep a track of your changes so you can log in via GitHub so this is why I said you can make a GitHub account if you want to access for example to this and as I was mentioning you can run it on your infrastructure but they have a public instance called RenkuLab IO that where you can run these things so this is my account I have a few projects again running you can create a new project give it a name right it's very similar in all these services say if you want to keep it public or private and then a bit like before you can use like a pre-made image that will contain some basic information so if you have a Python project an R project a bioconductor project you can also have a remote desktop if you want and then you will say create project and this will just start making an image and run your your node okay and so this is again this will I don't want to do it now for the sake of time but it will run again a session this time at RenkuLab.io again you get your Jupyter interface yes this I explained and the the interesting thing so you have again this Docker file here that you can modify so you can add more software this is if you're a bit more advanced you can install software via Linux you can also update these requirements and environment files to install packages and the interesting mechanism is if you start modifying one of these files you will commit them to get a lab so you will modify and I say yeah I want to commit keep that change and when you do that Renku will detect that automatically and say oh he wants now to have a new docker image a new environment where this is installed so I'm going to rerun it so there is an automated system on GitLab that will just take this into account and for you without you knowing it right we don't never have to look at this this is just for for you to understand the technology behind this it will rerun the the creation of this image right so you see I made a commit here at some point and it's running the image built pipeline so it's making a new image and eventually I will have a new instance of Jupyter running with this new package that I wanted to install okay so everything is done behind the scenes when you do this commit but it allows you to really have an environment exactly as you want and it not it's not too difficult to do right you just have basically to edit this files like you will do in a regular case you can also upload data so they use some system from git called git LFS to upload data it's limited so if you have terabytes of data you cannot do use that system but for demos for courses it's really optimal and yes so the advantages that it combines data and computing unlike binder where you don't have data for example you can keep your changes just like binder you can also run other things like RStudio and Napari if you want and you can edit this Docker file for maximum freedom the interesting thing is that you can create shareable and run runnable versions of your of your project so if I copy paste this somewhere where I'm not logged in right so you can also try if you want right you get the button there that anybody can start and when you click on that button it will start a session so if you want to share this with people you can just tell them give them the link to this button and it will start this interactive session for them they won't be able to save anything but they can edit and run your notebooks if they want to actually keep changes and maybe edit and keep changes they can make a login and copy your project or reuse your project okay so again this is like on binder depending how many people are using the service it may it takes a different amount of time to start but eventually it will start so we just leave it running there and we will come back a bit later see me if something I think something happened so it takes sometimes two three minutes before it starts this is one of the negative sides right it can take time to update when you are rebuilding and you have also on this public instance limited resources right so you have up to I think four or eight cores you don't have access to GPU so it's also more for teaching and for demonstrations but you can install a private instance of this so there are universities and in Switzerland that have installed Renku on their cluster and so people get this kind of easier access to high HPC via this and this system okay so the last tool that I want to talk about is this galaxy that was mentioned already I always get kind of lost of who pays for what in the EU but so this is run by the Freiburg galaxy team at the University of Freiburg in Germany and it runs on their cluster if I understand correctly and so this is a bioinformatics tool at the beginning mostly used for genomics I would say and it's a web-based platform that allows you to do these workflows so I think in following courses you are going to actually learn about how to make workflows we use a kind of a special part of galaxy that gives us access to their cloud so they have also big cloud and we want to get access to it via Jupyter notebooks and they have a special service called interactive environment that run Jupyter or RStudio and that is a bit separate from the classical way of using galaxy via these workflows that people in bioinformatics use so I won't talk too much about I won't talk about workflows because I think you will hear about this in later sessions I will just explain to you how you can use this for Jupyter yeah this is a bit like Renku that you can keep your changes also so this is a service you can actually use also for video work so this is a statement so this is the goal of these kind of services is to make things accessible reproducible you have heard these words already a few times so you can access this service so if you made a login you can access to this use galaxy service and you have access to pre-installed tools so these tools are here in this left panel here and these are mostly bioinformatics tools you if you search you will find also image processing tools and I think those will be used I think Robert Haseg is giving this lecture on how to use these these workflow from from there and this entire thing already runs in Docker actually and so when we use get when you use Jupyter inside this service we run Docker inside Docker it's a bit like inception so it's kind of amazing that this works so the similar thing to projects that we had in paper space or the projects we had we saw in Renku this is called here a history so you can have multiple histories these histories they contain your data they contain your notebook and they contain even the running instances of Jupyter so this is basically what is a project in this interface you can upload data also you can upload data from your computer you can upload data from a web service like Zenodo or GitHub and you can also upload them from known data banks from what I could see there is no imaging data bank but I might be wrong I have an impression for the moment there is only genomics data banks if somebody knows better I'll please let me know okay and then you get this kind of interfaces we we can test this afterwards to upload upload data and for Jupyter so Jupyter is a special class I was saying for these interactive tools and there is a thing called JupyTool so if you actually go to Galaxy all right and here you start typing Jupyter a tool it will subselect tools that contains Jupy and you see here this interactive JupyTool notebook okay so this is the tool you have to select if you want to run notebook if you click on this it will open this window and start asking you if you want to start with a fresh notebook we want to start with a fresh notebook you can add automatically inputs that you have stored in your history you will probably learn about this also in later sessions we do it in a simple way and then we just execute and so when you say execute this is the step that actually starts your Jupyter server and that will give you access to a Jupyter instance in Docker and then you will be able to access to it so we will do that in the exercise afterwards to access then this resource you have to go into your user account here on the top and go to active interactive tools so if you click on this you will see a list of your interactive tools that are running so I started playing around with this you see 11 years 11 years 11 days ago and it's still running right so this is one I think one of the really great advantages of this service is there is a limit but I forgot exactly how much it is but it's not in days right it's it's it's a very long session so you can keep working on the project for a very long time and it keeps on being active right so your notebook is somewhere active you can access to it whenever you want so if you click select one of your sessions it will open again a Jupyter session right and for the first time now I think you have a very similar interface or the same interface as you had before you see here we have a few more things installed I can run a few more things but in principle we can just create a notebook and this is again interactive okay also fairly fairly easy fairly easy to use so this is a thing I showed live since these things are a bit separate right it's Docker inside Docker you need a way to communicate between these two especially for data so you can push and pull data and the notebooks from one to the other so if you create a notebook into your Jupyter in your Jupyter session and don't actively copy it over to Galaxy it will be lost when you stop your session okay so you have to actually push it to Galaxy a bit like in the anchor you do a commit here you have also to push it and the same thing in the other direction right so if you have data in your session here and you want to use them in your notebook you have to push them to Jupyter to be able to use them so we won't go too much into detail into this but there are two commands one which is go get so you would write get and then look for an ID this is this number that you have here in your in your data set and then this gets imported in this location right so if you're in your Jupyter notebook now this data set is called 18 and you can import it from this this location and the same for saving a notebook you can put your notebook and it will appear in your Galaxy environment when you actually execute this this cell that says put my notebook in there okay so this is I would say a bit probably the pain point of this solution this kind of communication but otherwise it's it's a great tool so you can run or use the default environment that is provided but you can also create new environments with Konda and I just put this as an information for those who are familiar with Python you can test it afterwards for those who are not you can try to learn more about Python but this is allowing you to create a new environment with software that you decide to install and it will appear here in this window so when you want to create a new notebook as I did in my own case here I created an environment where I installed cell poles for example okay so I have a specific Konda environment where cell poles is installed so when I create a notebook from this I get access to cell poles which doesn't come yeah it doesn't come it doesn't come by default in this installation they have installed many packages but not highly specialized ones like this okay so it's very easy to start you have large computational resources this is very nice I'm not entirely sure this we will probably see in the next courses if you can control this but from what I could find that you have access to massive resources and you have up to 250 gigabytes of storage okay so which is a good start a reasonable start to do actual research that you can upload into this service there is no simple way that I found to keep environments so if you create an environment just let it run and if your session stops then you have to restart your environment if you do it in the proper way this should not be too much of a problem and the data import export I find a bit cumbersome but maybe there is a better solution that than what I'm doing here but I think this is you will see all these limitations in all the software so this is a bit the overview that I wanted to give you you see that there is no perfect solution right so some solutions are extremely easy to run like binder but limited in terms of computational power some others are really great like galaxy but the bit cumbersome in handling things like data processing things like paper space are great but very expensive so there is no perfect solution at the time so depending on your application if it's a course if it's a demonstration an article you want to publish if you work in a company and have infinite money then you can go to paper space so it really depends on the situation okay so with this I will thank you for your for your attention