 So, welcome to the day number four or the fragmentation training school by image analysis. Today we have the pleasure to have Robert as if from University of Dresden, you will talk about a parallelization and the GPU accelerated image processing. Pleasure to have with us the galaxy team from University of Freiburg with the batteries Serrano Solano and Bjorn Groening, and they will give us an introduction to galaxy workflow environment. The second part of the train school will be the work on your own data session and the participant will be divided into groups they will work on the project that they have selected. And they will be helped by Anna so you can reach from CVS Benjamin Pavi from V by imaging core and Finn Bacall from University of Manchester. Thank you for joining, and I asked Robert, if he wants to start sharing screen. Thank you for the introduction and also thank you and the organizers for having me. I will talk about parallelization and heterogeneous computing from pure CPU to GPU accelerated image processing. Some of the first slides may be known already. I'm just to get us all on the same page. So when we were working with image J for example to work with these three dimensional plus time image data of developing embryos for example, we typically get one time point out to do some processing with it and we apply background subtraction to visualize the nuclear but and it takes some time. So we were working with image J for a long long time then at some point we see the background subtracted image that we do a maximum projection so it really takes some time until you first see your entire data set the developing embryo for example. And for me, back in the days when I was postdoc in my slab. This was like, I was spending a lot of time sitting in front of my computer watching image J status bar. And that's why I basically took over a project from Louis Gruyere who was working on OpenCL, and we were developing these kind of kernel functions this is an OpenCL kernel for doing a maximum projection. And it's just like that you can accelerate image processing by learning a new new programming language like such as OpenCL or CUDA it would be wasting a lot of time and that's why I wrapped it all into user friendly napari plugins and apari plugins. And you see here already the CLIJ assistant where you can basically do the same operations, but you see it instantly. And you can also process time lapse data right immediately getting feedback if you have a powerful graphics card which helps you for that. Then in the years after I also switched a bit I must say to the Python ecosystem. And I'm now using more and more napari for my data analysis and also introduce collaborators here to Python programming and napari for image processing. And it is pretty much we are trying to get the same user convenience like with the image J macro recorder also to this ecosystem. So that we can basically have code representing our image analysis workflows and then for example deploy this code to the cloud and this is what I will go through today. I'm just showing you a bit where this project comes from. So the first one of the first challenges along this road is workflow management so you have just seen some workflows and I was clicking and was basically building up those workflows but what do I mean with management and why do we actually have to think of order of commands for example what to do. But just as an example workflow we see here a to suffer the embryo. This time also a 3D light sheet data sets of like one or 200 megabytes large per time point. I do some background subtraction to the pre processing, then I do a transformation I basically unwrap this embryo that is says live on an ellipsoidal surface more or less. And I do some segmentation and I visualize the segmentation and I want to save the result just as an example workflow. Basically here visualize in a different way from the top to the bottom if I want to for example do a Gaussian filter it takes so long so you see here. This is a time axis and it takes so long to apply a Gaussian blur. If I do a Gaussian blur on a graphics processing unit on a graphics card, it's faster, but I have to push my data there and I have to pull the result back so this course of costs time. So in total if you think about it going from here to here is actually taking longer, even though we use a fast graphics card, but it takes longer to process the data, because we just process a single processing step there and we have this push pull additional time lost basically so the idea of workflow management is, you put multiple operations together everything you can do on a GPU you do in a GPU. And then actually the push and pull makes sense because all together you finish this workflow earlier as if you would do the same thing on a CPU. But this is basically what I'm talking about. So this is about time and there's also especially if you think about big data there's also a data problem we have to think about when we organize our workflows. Assume you have this again workflow where we load data, and the image J way of doing it would be pretty much that. So you load the entire time lapse data set. And then when you want to do pre processing there's this window popping up asking you if you want to process all 8 billion data sets time points slices whatever in one shot. Back in the days, image J we were clicking here on yes and waiting for some time but we are also when you see what modern microscopes what data they produce this is kind of it is like now rendering projects impossible we cannot click on yes here anymore because it would take ages to do that. And furthermore, we may not even have the memory to store all the intermediate results so a better strategy for doing that is basically loading the data of the first time point processing this one. And then saving the result and then loading the next time processing this one and saving the result and organizing our workflows like that. So compared to the slide before we go basically from a vertical approach to a horizontal approach and we really have to do that when we deal with big data. So this strategy works nicely, for example for processing time points of a long time lapse of a developing embryo. But it also works when you think about large 3D data and you want to process it time by time. But again, we have to rethink our workflows in a way that our scripts or whatever we program implement this new strategy. Then we do that so this is now the video from the paper from the CLIJ paper we published three years ago. Then we do that we can quite easily on a laptop for example speed up workflows. Well, and can easily for example achieve something like a speed up of factor of 15 on a random Intel laptop, or if you go to a fancy workstation with a proper graphics card in there we can have speed up factors of about third. So you do things 30 times faster than if you would do it with classical image change. So today's there's a lot to be right, but you also see here again a bit more in detail plots we did back in the days when you published that and there's also here on the bottom link to the YouTube video. I explained that in more detail in general you can just say that orange and red are the curves of the processing time of graphics cards in green and blue are the CPUs in the computers we tested. In general in the very most cases GPUs are outperforming CPUs naturally because they have faster memory access. But you can also go some steps forward from this strategy so you can for example implement a scheme like that. That's you basically dynamically dependent on if you can load data at the moment is the hard drive is busy if the network cable is busy if the memory is busy. You can also process data like this. And therefore you really need advanced programming skills. Yet. So that's like I think I know 10 groups who are working on this. So I'm looking forward to see this in the hand of bird which is at some point I'm absolutely sure that it will happen. So this is basically workflow management we have to rethink a bit about in which order we do things in order to spare memory in order to spare time. And then there's another challenge in a similar context it's about titled image processing. So I always typically say that tiling is the last parameter against big data. If you can downscale your data set if you can crop the area where the sample sits to make your data set smaller and then only process the crop and down sample area or volume. That's the better strategy only if these things are not possible you have to go for tiling at some point that tiling means the following I have for example this. Not exactly biological example but I have this example image and I want to apply a Gaussian blur to it so it would theoretically it would look like this. Unfortunately this image does not fit in computer memories now theoretically let's think about that. So I have to tile it into many small pieces and process these small pieces these tiles or blocks independently. And if I do that if I apply a Gaussian blur to the small blocks then my example image would then look like that so this is basically just cutting it into tiles processing it as before, and then putting the tiles together again that you get this kind of tiling artifact. And in order to prevent that there's a strategy strategy is to put margins around so you do not just process the tile but you also process some additional pixels around. And depending on how large you make this margin you get then again the correct or at least an approximately correct result out, but the tiling strategy has obviously the disadvantage that you have to have to have to take additional pixels into account. And therefore, when you look for example at this tile 32 times 32 pixels. And the examples from this slide before additional 10 pixels additional 20 pixels around means that processing this tile with 20 pixels margin is approximately five times as large as the tile itself. So we lose a processing time of factor five by tiling right again, possibly tiled image processing is the only way of processing the data but we lose a factor of five and processing time so maybe GPU acceleration makes sense to get this kind of loss and processing time back. Yeah, and again, the computation time depends on tile size and margin size. And furthermore there are some algorithmically challenges related to tile so let's assume we have this binary image. And we want to do connected components labeling to that so we want to label all these individual objects with a different number object one object to object three. And if you basically do that in tiles, our result may look like that. But the tile borders that objects are basically not assembled correctly also here margins can help. But it's actually a bit more tricky because we cannot just assemble these objects together you have to know that here the objects one to 15 are labeled. Then in the next time you have to start counting at 16 and then maybe going until 30. So it's like a bit more challenging, let's say, but it is technically feasible so that's like there's algorithms for that. And if you now come with a more challenging data set like this girls again or in more biological terms think of think of neurons or think of vessels in in an animal for example that could be similarly elongated objects over multiple tiles in a large 3D image. If you then apply connected component living should look like that. And if you then apply a tiling and connected component labeling you may get a result like this because the same object which is here and tile number four on the bottom right and the object at the top left it's the same thing they are connected right. So they should all get the same number but in order to do that correctly you have to effectively visit all tiles. So again, there's algorithms for this just like people wrote their PhD thesis about it maybe 20 years ago so technically solved. But I'm not aware of any software where you can just do that and if somebody here in the audience knows better than me please let me know. I'm not aware of any software where you can do these kind of things on big data. Okay, so then there's one more challenge which is we are I don't think it's only interesting for me but it's something it's very interesting for me and the research we do here in Western image data science in the context of cells and tissues. So, if you think about image processing, we apply filters to pictures, and we consider neighboring pixels. For example, the center pixel here into a compound in that different they wash neighborhoods, neighborhood relationships. For example, the pixels which are sharing an edge on green and the pixels which lie in a certain radius are shown in magenta. So if you think about tissues and cells, pretty much the same neighborhood relationships can be formulated so you can also hear say cells which are sharing a membrane or the segmented case which are sharing an edge, or cells where the central distance is within the given radius so you can pretty much imagine that it's the same just it's less tissue tissues are typically less structure. Yeah, so that's like, let's look at a practical example so we see here an intensity image, maybe 40 times 40 pixels approximately. We do some thresholding, then after thresholding we do some erosion we do some dilation and we get a segmentation result out this is like image processing 101. So we now take a label image of cells here in the bottom corner and a corresponding parametric image for example a gene expression of these cells or elongation of these cells or any physical parameter we measured from that expressed in an image so this is no longer kind of a pixel image is of course still a pixel image but we see it as a different image as a parametric image and a label image pair. So we can threshold these cells, we can say I would like to have all sides with an expression above a certain value and then get this binary cell image out. We can also apply erosion and dilation, you know, so pretty much the same operations can also be formulated for images of images showing segmented cells as if we were working with pixels which have not a rectangular shape anymore. And these kind of things are for example available in Fitchy also in Napari, and then you can do some additional advanced measurements like for example I would like to measure the mean centroid distance between closest neighbors I want to investigate how dense cells are to each other. And I would like to have the average for each cell so I would like to take the neighborhood relationship of these cells into account and measure the average of those these are operations which are also published 1020 years currently you find that here and there and publications but you typically do not find that in software and we cannot just apply that to image data. So that's why we are working on making these things available image J but also in Python and here you see again the example in the party. Why are we doing this. Think about you have this image of this tribal you member you're developing you have a segmentation or an approximation of cells. So for example the distance to the nearest neighbors, you can also measure the local standard deviation of the distance to the nearest neighbors, and then you can classify this image or you can even use unsupervised machine learning for example came in clustering to subdivide this entire embryo into multiple different regions. For example, the Rosa and forming head or tail or these kind of things. So this is like this is nice you can do that but you can also put additional intermediate processing steps here and that's what I basically am explaining on the slides before. You can also say okay I would like to have the local average distance of the nearest neighbors and then the standard deviation of the local average distance of the nearest neighbors. And then you get a cleaner clustering result out. So what we were doing when we used for example things like elastic or the car trainable segmentation we specify filters for processing the pixel image. And here we are basically we specify filters for for processing this segmented image showing cells or nuclei. It's basically the same technology just on a higher level working with says and no longer with pixels. And then comes the actual interesting challenge at least for the talk today. How can I deploy that to the cloud so after I have to find my image analysis workflow how can I make this run on high performance computing for example here's that I H is our local compute center or in Google collab. So again there's nice number of plugins we are working on for formulating these workflows. And you should also basically load a data set that are multiple time points, and you can go a bit through time and beyond if you can turn your microphone off that would be super cool. Hi beyond by the way. So you also go a bit through a time lapse and see if these parameters you have been defining for your workflow. You should go back and forth in time to see if the parameters work on multiple time points on your time lapse or you load multiple data sets and you look at if the parameters work there. And you can also if you have some ground truth annotation. You can optimize parameters of workflows to get some kind of a good parameter setting of the specific workflow for processing your data here I'm cementing some cells I'm tuning a bit. The parameters I do that in two steps and then I get to my manual annotation good fitting result result workflow basically result producing workflow out. And here I show it again on a different data set it's a zebrafish I, I would like to segment the nuclear I tune a bit the parameters. And then at some point I can export code. So here in this case I'm expert exporting a Jupiter notebook. And then from this Jupiter notebook. I can execute this workflow again and look in the party if the result is the same so I basically go like in a party click click click and then I generate a Jupiter notebook which produces the same workflow. That's the, it's a bare minimum documentation of my workflow before I can submit it to the cloud so you should first look if like the steps are really reproducible really doing the same on the same data set. I think here I'm showing it again with a little bit more advanced stuff. You can also read the notebook I'm showing you can also download it's linked on the bottom of that slide. So you can also do additional things for example, changing how things in our power visualize you can set lookup tables and these kinds of things, which might be useful if you want to better document the workflow in an advanced way also changing lookup tables doing segmentation. These kind of things. And maybe I'm speaking too fast for this video. Yeah, course and blur afterwards. What did you say, I think it's okay at least. I think it's okay and we can eventually slow it down. No, no, no, no, it's all it's all fine. It's just this video is a bit long. Yeah, so what I'm saying is that Harry has some nice visualization tools which may for the cloud computing be not super important but I nevertheless wanted to show it here today. That you know that drawing outlines around objects these kind of things it's all possible with some basic Python scripting skills. And again I recommend notebooks to do this in a reproducible fashion. You can also execute export Jupiter notebooks which do not use in a power anymore and that might make a lot of sense. And therefore we implemented also a kind of important thing you see here. What I'm doing I'm import some libraries, for example, Python Esperanto is the GPU acceleration library I'm working on in the Python ecosystem the CLIJ of not power you could say, and I'm loading a data set. So you typically in Python if you load a data set and then you print it out or you put in a Jupiter notebook just the variable you would read array and a couple of numbers is very unusual user unfriendly I would say that's why we implemented this strategy that you can basically when you put the variable in Jupiter notebook so you see the result image or here in this case that's a three dimensional image and we see a maximum projection of that image you also get a little histogram. And with the idea of getting the user feeling we have from image J to the Python ecosystem that you can quickly see the minimal maximum distribution of your data set in a Jupiter notebook. And you can subtract background it and you can also look at the background subtracted image directly in your Jupiter notebook. And you can also segment nuclear and you will also see it this image visualization visualize like that you can also then see for example here the maximum intensity in a label image corresponds to the number of labels in that image. So that is meant to be you as user friendly as we can make it. Then this is also a nice tool called stack view. It's basically some interactive going through slices of a data set or here even blending your original data set and the segmentation on top of each other and visualizing both in within the Jupiter notebook that works very nicely on my local machine as you can see here it also works nicely on the computer cluster of our university. So it is a cloud thing. But unfortunately does not work in Google collab and I'm not exactly sure if I will fix that so it's like limited the important for for me and my work and for the work of my collaborators. But anyway, some people may like it. Also, if you want to tune parameters you can also do that with the same interface there are some nice tools for that. If you for example, you have developed the workflow but from data set to data set and parameters vary and you want to do some checks or you want to do some user input into that you can use that tool. Again works nicely in Jupiter hub to the lab, but not in Google collab. And yeah, and then the question is, I think, many people do not have a graphics card in their local laptop or in their local workstation they also cost money. But on the one hand Google collapse costs is free of charge so you do not have to buy anything you can use a graphics card there for free. And for example the compute center of the University of Technology here in Dresden is a national node of high performance computing for the life sciences. So basically everybody from a university in Germany and presumably also from other research institutes can apply there and get like compute time on this cluster and then log in for example to a Jupiter hub and use this infrastructure and in Germany in Europe all over the place there are institutes research facilities compute centers which have these kinds of offers. So you basically do not have to buy a graphics card anymore. You can do these things in the cloud typically provided by a Jupiter hub. I'm not sure how to log into the address because I'm not sure if there's anybody from the address and the call. But I will quickly guide you through how to do these kind of things in Google collab and you could then also when you download the slides later you can click here on the link at the bottom and you can follow these steps. And the first thing you definitely need to do you have to install the libraries you were using so here in this case we are using practice around the prototype so you have to pip install it into your notebook. And then afterwards you can again process load a data set and visualize it like that which is again this example I showed earlier. And there you see small histogram, minimal maximum intensity also tells you how large the image is to where just loading. And you can also segment execute segmentation workflows and visualize it the same way in the cloud again without the need to buy a graphics card and don't put a workstation in office which mostly collects dust. And to summarize a bit, this kind of advertisement of Jupiter notebooks of Jupiter lab. So, of course, yes, you need some basic Python programming skills there is no way around. It's also limited interactively so I've shown that you can go through some slides with some tools you can visualize images. But to typically have a hard time when it comes to I want to turn my sample in 3D like in Naparis Naparis clearly outperforming kind of this infrastructure, but I also would like to highlight this increased reproducibility. So if I send in if I upload an image to macro to GitHub, it may or may not work on somebody else's computer and in particular, people can from the code itself and get up to come not exactly know what this script is doing they really would have to read or execute it. But if you upload Jupiter notebook for example as shown here on the right to get up you can in the browser see what the intermediate results of these workflow are so it's highly reproducible. It allows people to understand what's going on and do the same steps with their data also. This is goes towards knowledge exchange right so you can such a notebook you can keep and you can have and then you can go to a group meeting and your your PI made and ask you look how was this image segmented and you can immediately show it in the Jupiter notebook and say hey look, this is how we determine the parameters and this is how the result looked like. You do not have to open image J or open up hurry and click a lot of buttons to come to that point where you could be like an eventually see what was happening. This is also the other way around right. If for example a PhD student leaves the Institute and leaves behind some notebooks, the next student can pick these notebooks up and learn from the person who is no longer there. How did they do their image analysis so I presume technologies like notebooks will make our life easier in the future when it comes to knowledge transfer. And you can submit these notebooks for example to Google call up and then execute code there and run image analysis workflows in the cloud. And when it comes to batch processing this question comes up again and again. We have some nice batch processing tools in image J which allow you to process an entire folder and then write the results in another folder and some images and stuff. Either nobody or only like very limited functionality of that kind will be in will be available for example in a powering because if you want to do batch processing. Python is the right thing to use and not now powering so again you should program a Jupiter notebook or even better in this case, a Python script which contains similar code like in your Jupiter notebook and basically is then executed to process images on a folder so it's like it's recommended to do that for example with command line tools and Python scripts and not by clicking again and again the same workflow or in the user interface doing stuff interactively it's like really supposed to be running the background or even in the cloud. So it's not that we have like exercise time in particular for this session here but I thought the one or the other of you may want to try it out also if you watch the video later on YouTube. You may be interested in replicating some of the things I was just mentioning that's why I have here some exercises, you can do them whenever you like. If you come back to me I'm happy to help online if there is anything so what I would recommend if you want to in that party, click some workflows and then generate notebooks from that and run them in, for example, Google Colab or on your local institutional have performance computing infrastructure. I recommend downloading that by a napari it's a collection of napari plugins me and my team are developing. And it basically avoids the problem of you have to install a lot of plugins individually and spend half a day of installing things just like a one shot installation instructions are on the website. And if the instruction work nicely napari when you enter napari yeah it's not a typo on the command line it should approximately look like that. And there's also a troubleshooting section for example guiding you to graphics cards drivers where you can download in case they're missing on your computer. And with such when the tool looks like that on your screen you can then for example download this example data set which is linked up here, and then open up hurry again with napari from the command line. And you can then click on the move background on label and then the result should look approximately like that. And then you can generate code, for example, a Jupiter notebook and you can execute this Jupiter notebook in your on your local machine with Jupiter lab, or you can upload this notebook to Google Colab and if you have very limited you can hear click on this link which will bring you to my version of this notebook online, where you can also modify a bit the parameters and play a bit with this notebook to run this image analysis in the cloud. That's like also if you have never used Google Colab before there's also a blog post on our blog where I explain a bit this is how you log in this is how you create a notebook. And this is how you access activate the access to the graphics card and then you have to install practice Beranto as I mentioned earlier, and then you can do image analysis processing so this is here linked in this blog post. And if you really are eager into trying all the things out I also can guide you through tired image processing there's also some notebooks for that. So basically you have a large image it's in this case 2000 times 5000 pixels large and we count some nuclei in this image I think it's a histological slice if I remember correctly. And then we basically define our Python function for processing this workflow it's basically copy pasted effectively from a generated notebook. There's a smaller image out there you for every tile see how many nuclei in this tile are so it's a way of dealing with the big data problem by counting nuclei in tiles and then every tile becomes a pixel in a result image where you can see how many nuclear there are. So just a reminder for those of you who don't know this platform yet if you work with naparian you wonder is there a plug in for example for cell segmentation you can enter a segmentation of the napari hub and it will give you a list of napari plugins, which allow you to do that so it's also becomes more and more user friendly to enter for example biological terms and then actually get technology back. And it will allow you to do that so also feature extraction again depending on what you want to do enter that they're a research to a research field, and it will give you something useful presumably. And if you have any questions regarding the notebooks I was sharing your order also in very, very general image analysis go to the image that is the form asked there. And I'm sure there are so many experts online on that platform that you typically have an answer in the same day pretty sure, sometimes in 20 minutes depending on what kind of question you ask or if you have never heard about the image science community or image dot SC go there have a look. It's a very open very friendly community where people just help to be clearly within the same day. And you use so I showed you a couple of tools and they were also published earlier. So we have published in that in nature methods and the cell neighborhood relationships stuff and frontiers and computer science, and also one more hint. You can to be candy or very often you can also site GitHub repositories source code open source packages. For example, has this the nodo button on the website use on the GitHub repository so you see that here. And when you then come to the nodo page where the code for specific versions is basically archive you have to scroll down a bit. And there's a section which says site as so you can also site software now, which is for example, in Germany was reason to discuss the big thing. You can also apply to the German funding agency so far you had to provide links to publications. And since recently you can also provide links to open source libraries. And if they were cited right that would make life easier so again, please cite the tools we are developing for the community and it allows us to enable long term maintenance of tools for funding. At least there will be some workshops specifically for GPU accelerated image processing I think in May and Paris, next year and in September and Dresden and London we don't know yet. But if you want to join one of those will be training school new bias style training schools about GPU acceleration but I think also about basic image analysis. If you want to know exactly when it will happen reach out drop me an email and I put you on the list of interested people. I would like to thank you. I'm finishing a bit early more time for questions. Thank you for listening and again Rocco and the organizers thanks for having me. Yes, so my name is Beatriz Serrano and I'm community manager for the European galaxy project. And I would present here the different options that image analysis have in galaxy. I would like to start by introducing galaxy and the galaxy project of it as a project is is a worldwide open source project that aims and making computational support research and reproducible transparent and accessible for everyone. There are three main continental service like you can see here one in the US and other one in Europe is the one in private where we work on the other one in Australia. There are more than 130 public services as well that we are aware of it's it's a bit hard to track this design. And there are also plenty of private servers in research institutes and an industry as well. But again this is also hard to know and give precise numbers about this. And the most important thing is that there is a worldwide community of users to developers and means and trainees are scattered all over the globe. This is the galaxy. So galaxy is a web based computational workbench with a graphical interface like the one that you can see here in the screenshot, and also with a programmatic API for scripting, scripting and automation. This is how it looks like in the browser so it has three main panels you can see on the left side. You have a set of tools so in each one of these categories you will you will find different tools for different operations. There are more or less 8,000 tools available. So it's useful for way more than just imaging and data analysis data analysis engineering visualization. There is the central panel. The working area where the homepage is now and there are two interfaces are displayed. In the right panel, there is a data system operations that are performed will appear in order of execution. So it's kind of a live documentation of the processing steps, similar to what you would have in a lab notebook. And how it looks in real life is something like this in this video so this is specifically one imaging as sub subdomain or flavor of galaxy in which all the important tools or relevant tools for image analyst are gathered so you see there. Yeah, segmentation all the data access features traction, all the things that are now integrated into galaxy. And we also have plenty of training materials as well with different options. You can see the workflows their examples and different tools that I use data sets, everything that is possible to that you need to reproduce any of these analysis. Okay, so what I want to talk about is what is it that it's useful for image analysts here in galaxy and what can galaxy offer to this community. So today's I wanted to talk about the data access so there are different plugins systems so that users can get the data from remote locations like you see in the screenshot. You can get the data from your S3 bucket from FTP and many other different protocols so that's also something that galaxy provides for us so that can be extended if there is something that is not coded there. It should be extendable by adding these plugins there. So one tool that we're using in the past in the context of your life and things were covered for it as well. There was to get the data from the image data source. That was, yeah, so just to get the data into galaxy the old data that we need to perform analysis and not to transfer the whole thing. You can get select what you put where do you want to get data from this public from the idea if it's private you can also get it from a local instance. And then, of course, you need your credentials for that but if you have some local installation for that's possible as well. Then one can select what the image ideas or what is the images that you want to download. And then you can also find the region of interest. So that was very useful in particular for several ways that we were working on because transferring the data it's it's fine consuming and resource consuming so taking only the channels that are relevant only the planes cropping the image. That's also something that was interesting for us. In terms of imaging tools there are plenty of things I'm sure that this is not comprehensive and that's something that I think should be better organized like having other sources together I try to collect all of them but I'm sure that there's more that I'm not aware of. But there are different disciplines there you've seen already the data access ones there for cementation in particular and future extraction we were working with their profiler, but there is also something on image as well. So, all of that you can find in this galaxy w. Yeah, please take a look at this, would we, or I guess the main message here is to say that you can integrate those tools so if you are a tool developer you can also integrate those tools into galaxies you can put them together together with other tools to create a workflow that perform a particular type of analysis. And I think what it's interesting about galaxy in that case and why it can be very useful also for cross discipline analysis that there are tools, and many different disciplines that are already using galaxy to do their research. So this kind of combination of this different disciplines can allow also for new reproducible analysis that combine several areas of science. And finally, I'm drawing every dimension a bit of that so that it's possible to have notebooks as well in galaxy so you have interactive tools there in life is galaxy w. And finally, I wanted to highlight the three that are, I think are more interesting for image analytics and you can use Jupiter notebooks, our studio as well as an interactive environment. Now so we've done this talks, and this will be also track in your history so on the panel in the right that I show before, this will be just an extra step on that. It's useful either for pre processing something before you start your actual image analysis, or at the end to analyze the features and the outcomes of the analysis in galaxy. So in terms of workflow I wanted to mention a project that I was working in a previous life within my post office in use life. And we wanted to analyze the nucleus or biologists so basically we'll have images like this one that were they were all public deposited in the idea. And the, the whole there, there is a, the lack of staining so basically, when there is no DNA in those in that particular channel, we were gathering these kind of images for different screens. In total, we were using silences or expression and compounds in total we bought around 200,000 images that we wanted to analyze and get in general some insights into how the nuclear life works. So specifically want to pay attention to features like how many nuclear life would find per nucleus if the shape would change when a specific gene is inhabited or express or a compound is acting on it. Also see if they distribute differently within the nucleus so all these kind of things were interesting for us, and we wanted to use galaxy for this. We developed a pipeline like this one, in which first we will have the idea don't load to that I show before that was already part of galaxy. That was just to get the data access so first we will filter only the images that were interesting for our particular problem. And then my colleagues at M ball in integrated several tools from cell profile several models of cell profile into galaxy tools. And this is the one that we change to create this automated work was that we could run these 200,000 images. And finally, we could use an interactive notebooks to analyze the features that we were derived from this analysis. And this is how a normal workflow looks in galaxy. So each one of these boxes there is a different tool so you can see that this starting models of cell profile and it goes to color to create so different transformations on the image and the whole thing gets our work from here. We deposited this different workflows in the workflow have, and it's available for everybody that wants to use it was a tutorial so that I will talk about this a bit more later. So in the end one, one of the needs to put together all these different tools create the workflow with it with the operations that you want to perform and click on run. And that's it. And now, from the workflow have you can also directly explore the different work galaxy workflows there, and run and scoxidio directly. So it's even easier. All right. So on terms of computation. So there are public services that I mentioned already so it's completely free you just need to create an account you log in and then you can use a infrastructure. Everybody's welcome is not just for Europe is or Australia everyone in the world can use that for free. And the user friend, the interfaces user friend is you've seen that you only need to click in black things together and that's it so the computation behind that is completely transparent to the user to need to deal with any commands or anything that is specially more technical. However, you just want to do that because you want to automate something, or do some particular operations that are parallelize is something that it's more technical side there is also an API. That allows you to do the same operations that you can do with the graphical user interface, but just with command line. I would like to highlight also here on the computational side that the interpretability and flexibility are great features the galaxy can provide to each other is because you can combine their different softwares and have all these tools in the same workflow maybe something is interesting for you to do with their profile but the outcome could be better process with image a and then you need to have some machine learning on top that you can also have those tools in galaxy and plug them together and then analyze finally something because you have a very particular code that you want to run something very specific and then you have an interactive know food that you can also plug to the whole thing. And the whole analysis could be reproducible and reusable by others, you can share it, it's sharing a transparent way you can share it publicly you can attach to your publication, all of that is possible. And also it scales so that's something that in the workflow that I show before was the main point right. It's insane to run 200,000 images you cannot visually inspect you cannot do that much because it's a huge data set so scaling that was also important for us. And there's also the option to have a distributed computing so there's a booth and they were in galaxy in which you can submit the job to a particular server, but that might not run in that one but then run in another one and this is particularly useful. When for instance there is a particular software and sorry hardware that is enough in a specific instance for instance if there's GPUs in one of them and this is what you need, you could submit the job to a different server as well. And the user support side. It's also interesting for, I think, image imaging facilities and institutes in general that want to have their own galaxy instance running because users when they find a problem they can send me that report that looks like this one. You can input the information describing a bit the problem, but the admin that is on the other side can receive all these things together with all the details about how the problem happened so what was it to that was running what is the job, the job that was in the cluster running, and so everything that is necessary to represent that and also the standard output standard error and everything that can help the admin to to the back and understand the problem was. And the good thing about that is also that it's centralized so that you don't need several people helping different places so everything that is happening around the particular server you can get in the centralized way and help people scale it up in a more easy way. Finally in data deposition so this is something that I came across recently and bugra from. Imaging is working on the only sir format integration into galaxy so it is a standard image format you can see here an example of a visualization of a huge. It is. And so one can see this images that are huge and can inspect them rotates perform some operations on that. And even when they are so big it still works fine. That's because it's chunks for the data at some multi scale and the parameter model so that you can bring on data that you are visualizing. It's a native cloud compatible file format, and as I said it's fast enough contains also the binary data that is produced for the microscope also the metadata associated to it. So the bugra is doing is creating galaxy tools that are first converting proprietary for five formats and data that you have your local storage and or from the bio image archive and converting them into and it's our files and this is the tooling galaxy on the right side you can see that first you can input the data that you want and then the parameters are described how this or any star file will look like in the end so many chance you will have and so on. There is another tool for data submission to that so once you create maybe at the end of your analysis. This OMSR file, then you can submit that to the bio image archive through FTP protocol as well. So two tools can create this kind of small war flow where you have first the transformation and the submission of data, but I mean it can be also tracking any other moments of points in your in your portfolio doesn't have to be at the end could be also the beginning, the input files and so. So the final pipeline will look something like that so they will be the image acquired from the microscope then it gets converted into OMSR so that everything is standard it brings also the metadata so that's also useful and then one can run the analysis that could be in Galaxy and one can deposit the data finally in the bio image archive or even before if there's all data is what you want to submit. Yeah, so I mentioned already some training on this and that's something that you can do as a society if you like in galaxies a galaxy training network where you have plenty of tutorials about plenty of different topics. So, part of this project that we were working on with develop this different tutorial so uses a profiler for your nuclear lies and mentation that was the original one create something on the price and mentation. You have slides there if you want to run out worship as well the hands on material the workflow that so you can reproduce it. You can do this in different uses as well. Also developing this tutorial is really not complicated so anyone can can do this. We also developed one so once we started with the first project would realize that just with a few more tools we could also have other examples like the tracking one that you can see here is an example from the self profiler example examples in the website. We also developed a tutorial this that you can follow state by state so you can either assemble the whole workflow yourself taking each one of the tools, putting them together and trying to reproduce the whole thing, or you can download actually the workflow that we provided and try to try to reproduce the whole process of this data or some of the data that you might think is useful for you. Finally, I wanted to mention a few interesting resources that could be useful for for image analysis for in general anyone that is working with galaxies so we have training infrastructure as a service. We also have training for for trainers so the trainers apply it gets the special q in the cluster so that when people joined also the three needs are joined that course, they get special access to those resources so that means that they don't need to wait in the general q. And everything directly and everything should turn smoothly because yeah they have this this kind of priority, we have plenty of events on that and many people are asking for the resources we have already more than 14,000 students that have been using this service so it's it's it's quite useful as well. The other thing that I wanted to mention was the galaxy mentor which is rather new but I wanted to briefly mentioned that this is a program that we run. Since not so long ago. People can apply there as a mentee or as a mentor depending on your expertise and what you want to learn. And we have a program that lasts for eight weeks so first we make the matches between the people with different interests and then the program starts for eight weeks. So you have also the link there to join if you like. As a mentor as a mentor it depends on what you like to do and this is in general to onboard people in galaxy, or for people that are inside galaxy already within the community want to learn something completely different. I can know, maybe you want to know to know how to develop a tutorial maybe someone can help you with that. Yeah, many resources like this. I think that's that's more or less it from my side I wanted to have a final advertisement kind of so I see many resources, many people interested in working with imaging on galaxy. But those efforts are often scattered there's no much coordination on that so I created this Google form and I got an interest to see people want to discuss more about these things and see what's available. I wanted to coordinate those efforts avoid the duplication of work basically. So if you're interested and or you know someone that could be interested in having some discussions on that contributed with whatever you think you can maybe brainstorming by the and integrating tools, developing tutorials, anything that you could think of. Please click on that and join. I will try to coordinate that and communicate with these people with the next steps on that. You can also contact me and social networks and so on on my email and I will try to help you if possible. And I think that's it. I don't know if I took too long or too short. But anyway, I'm happy to take questions and be honest online so he can also maybe help with that. Thank you very much. Thank you very much. So, I will start with some specific question. You mentioned that. The tools are available. However, user may require specific version of software. I'm thinking about Fiji or specific package. It's possible on your site to install packages like if you were working in a conda environment before you run your Jupyter notebook or this is something that has to be done by administrator upon request. It has to be it has to be through the admin so also because of safety reasons right so you can understand everything every time so you need to check that the tools are okay. So it needs to be requested first and the tool needs to be a conda package and then it has to be integrated into Galaxy that's so it's compatible with the ecosystem. And then you need to it goes to the into the toolshade that is like the app store of all the Galaxy tools. And the admin is the one that needs to install that complete admin of a public server you can also ask for that I can be also a private. If it's in your institute you just talk to the admin of your Galaxy instance. Is that correct? And yes, if if that was, yes, this is the general answer to Galaxy tools. If you were asking if you can install tools in a notebook, I mean this you can obviously do so as soon as you are in a notebook you're in an encapsulated environment and you can install whatever you like. You can use all the common magic that Robert has explained and install your your environment as you like in the notebook.