 Okay, welcome everyone to the second part of Bioimagernalysis with Lukas and Python seminar. It is my pleasure to introduce Guillaume Witz, who will lead you through the seminar, and also Cedric Winesch and Mikhail Obladimirov, his colleagues from Berm. And we also today have Anna Klem from Uppsala, so a lot of bioimagernalist experts helping. And for this, Guillaume. Thank you. So hi everyone, welcome to this second session. I called it season one, episode two. Hopefully there are going to be other seasons of these courses. I think they are great. Yeah, so you have been introduced again to the moderators. So feel free to ask some new questions. As has been said, the previous questions have been posted already on the ImagerC forum. You can find the link here. I have edited them slightly completed in cases and I grouped them by topic so you can maybe find, if you ask the question, you can maybe find your question in there. Somewhere. And the new ones are going to be posted in the same thread on the ImagerC forum. Again, the course material is available on this GitHub repository. Probably you will know it if you went through the content. And this presentation is available at the same link as last time. I completed the presentation with some new slides. So you will find the new slides also there. Before I start, let me just briefly remind you that there are, if you're still interested in Python and more advanced Python, there are really interesting sessions coming up in May. There are two sessions by people from the cell profiler team. One about really much more advanced Python topics very soon. Tomorrow, yes, 14th of May. And there will be one about writing modules in cell profiler. And cell profiler is based largely on scikit image. So I hope this course will be helpful if you want to follow that course. And in June, there's going to be a session about Naparie. I mentioned Naparie. If you installed the course material on your own computer, you may be exploring Naparie. So there will be a specific session on this software. So I won't cover this too much today. Maybe I will show an example. And I won't answer too many questions since you will have the chance to have explanations by one of the core developers, Nicolas Sofroniev. And notice just that the time is one hour later, I guess, because he's in San Francisco. And so just don't get mixed up. Okay. So what is today's program? So as I said, the last time the program would depend a bit on the feedback we had last time and on the questions that were asked. And so I tried to put together something that would satisfy everyone. So I assume most people said they would try to go through some material at least. So I didn't want to just go through one of those pre-made notebooks. You can go through them. There are lots of explanations. And so to avoid that it's too boring. I will do sort of live coding session where I will show a very simple analysis pipeline and that would illustrate several important topics or things I think are important. And I will also allow me to answer questions that have been asked last time and also during last week on the on the Google form. So it will be just an illustration. I think it's also good for you to see how these things are done live and not just with pre-made things that are already working. Because in case I make a small mistake, you will see how to correct it, how to understand what the mistake was. So that will be slightly different. Then there will be a few words about mixing languages in Jupyter. In particular with R and also with ImageJ. So there were lots of questions last time about this PyImageJ. So I will show you a really advanced example of how you can use PyImageJ and Clish. This GPU computing package that has been originally designed for Fiji that it can run also in Python. And so I made a notebook that runs on Collab and on Collab you can use GPUs. So I will illustrate how this works. And then I would just briefly mention some additional information about Jupyter and some options you have like another interface and extensions. And then there will be a large chunk of the course that will be dedicated to installations. So there were lots of questions about the installations. I kind of re-brushed over the topic and I realized that if people really want to use the material and then write their own code for their own work, they will install this on their machine. And so it was a bit unfair to just leave it a bit in the air how to do an installation. And it showed really step-by-step how to do an installation using Conda, because I think it's a, it's my preferred solution. It's a really great solution to do that. And so I will explain what an environment is because last time I just mentioned environments without really saying what they are. So it really goes step-by-step through all this information. Okay, so I think I will just now for the moment get out of this and open a Jupyter session. So I have already Jupyter session open. It's open on my own computer as you see it's the address is my local host, but it will run in the same way in on binder for example, so it's always the same story. And I will just create a new book so that you can really see how it works fresh. So it's the same environment as we used before. I will try to make it a bit bigger so that you can read. So the first thing I just really briefly wanted to mention because it's questions came up on this topic and in my interactions with people I also see these interactions, these questions coming up very often. It's about how you use NumPy and what are the dimensions of a NumPy array. So for this I will use this library to import that I mentioned last time. Let's say from AE and notice that when you import packages, this is one feature of Jupyter. When you start a few letters you can type on tab and this will suggest you packages. Okay, so you don't have to type everything every time. So now I want this package and I want to import just a part of it so I will import and here again you can type tab and it will offer you the different modules that are present in this package. Okay, so this is really useful so just use tab to do that. So if I import this, this was also illustrated in several notebooks. I'm importing Matplot's label so just to show the images and I will import NumPy because you always need NumPy. And you see here, you have the debugger here, I forgot to say as. There are several ways of importing these packages. There were also questions about this I will mention this a bit later. And now I import my image. So this notebook is present two levels below my data folder. So I'm here in this supplementary folder and the data that are presented is made in by Apply folder. Okay, so now I'm going to level down. Some people also struggled a bit with this. So this and this. So now to be to come back to the level of where my data really where I have to go two steps up and to do that you type two dots slash two dots second level and now I'm on a level where I have data. And here again, you can use tab so you can use auto completion. And you better know that you're looking for a pass. And so it will suggest something so there is only one folder starting with data. If I do that again, these are all the files that are present in that folder. Okay, so again, you don't have to type everything. I need the first one you can even click in it. And this is also auto completed. So again, tab, tab, tab, tab. Okay, so now I load that image. I can get the shape of that image. You see that these are the dimensions. So there are x and y and there are five z stacks, two colors to channels and 72 time points. Okay, so now I was had the possibility to import it like this so I can say get again, you can use tab it will suggest you the possible functions that you have assigned to this to this object. So I want to get image data here has to say in what order I want the dimensions. Okay, so this is designed in this AICS package. And so I can just say okay I want the Z first and then X and Y and then I want this for the T is equal to zero so time zero and then this is how I want my output. Okay, if I do this, I have two colors so I would just say, so you have again an error message and it tells me in detail what is wrong or what is missing. Okay, and now I can ask what so this is an umpire right that can ask what is the shape. Okay, so if I just show the output is just a big matrix of number. It's a three dimensional matrix. And you see that the Z dimension is indeed in the first place. Okay, and now if I want to do a projection I can just say use him show. Then I take my image and I use the max method and I say along which axis I want to project. So I want to do a Z projection. My z axis in the is in the first dimension. So I will say axis is equal to zero because because we start counting from zero. So this gives me a projection of these of the Z stack. Okay, but now I could also import it with the Z axis as last element. Okay, so I do that. I end up with an array where X and Y and the are in the first dimension and these is in the third dimension. So now I if I want to do a Z projection, I have to project along the third dimension, which is axis to. Okay, same image. So it didn't change anything at all. So I just show this example as an illustration of the fact that the dimensions in NumPy are not defined by names. So you don't know if the first axis is X or Y or Z. If you create an array like I did in some exercises, I said NP zeros, 10, 10, four, and I said, okay, this is an image with 10 pixels, wide and pixel high and with four planes. I could have defined it like this in the same way. Okay. It's up to you to know what your dimensions are. Or if you have the chance that your images are formatted properly. And when you import them, do you import or know knows what the dimensions are. Okay, so if there was a command to do this that I forgot now. Okay, this tells you exactly what order this image, the dimensions are originally stored. And so this reader knows how they organize. Otherwise, it's really up to you when you import your image to know what dimensions is in which axis. Okay, if X and Y are in the first dimensions or Z first or last, etc. So a NumPy array has no metadata. Okay. So this is just the first point I wanted to specify. Okay, so now we're going to look at another data set. I will just close this because I won't need it anymore. And I will create another notebook. And the goal will be to use this data set. So this is a data set from the broad bio image benchmark collection, which is a really absolutely great resource that somehow is from the same people as a cell profiler. And so you find a lot of data sets in there to test things. So there are real data sets. And they have some information sometimes even about segmentation. So if you have a second, if you develop your own segmentation, you can compare it to what is available on here. So there are several data sets. I'm going to just look at this data set here. So it's a fluorescence microscopy. And you have cytoplasm and nuclei. And so two channels. And you have one channel is actually not really the cytoplasm. It appears here as a cytoplasm in green. It's actually a protein that can be either in the cytoplasm or in the nucleus. The cells were treated. So they're on control controls and treated cells. And in the treated cells, the signal goes into the nucleus. Okay, so the goal will be to be able to know if the treatment had an effect and what effect it had. So the data is from a 96 well plate. So we're going to get a lot of data played like this. And for now we forget about the fact that we know what the treatment was done, of course it's described here. We just know that we have for each line a given a given so you have in blue and in yellow you have different types of treatments and you have and you have replicas. So you can download these data if you wanted to data point to try this, I will put some notebook also at a later time point in the same repository that you can see how this is done. So you can just unzip it. Now this is again on in my data in the data folder. And it's here. And so these are just images you can even open them actually in in Jupiter and you can have a look at them. So this is one signal, the nuclei signal, which is always the same. This doesn't change with the treatment is the channel number two. And it appears like this. Okay, so we're going to do a very crude segmentation of this of all the images and see if we can get something out. So I think we have to import are all the packages we're going to need. So usually I am always import the ones I am always using. There are even ways to automatically import second image and the importer of skimage. Okay, and now I can, for example, import one image and see how it looks like looks like. Using in read. And so again it's two levels up data and I can go now to this. And I will open one of the channels to And I say, okay, and image is now an array. There was one question about it's annoying. It was about non pi, but it's in second image. It's a bit annoying to always have to write all these things. So you're not forced to do that you can also import your packages like this you can say from skimage.io import in read as I did before. And now you can skip this entire part if you really want that you saw that in the notebooks I almost never do that. I never do that for two reasons. First, it doesn't take that much typing since you can use this tab auto completion. And the second reason is the main reason is that it makes the code much easier to read because you immediately know where your functions are coming from. If by any chance in your notebook or in another module that you wrote, you have a function and you suddenly call it in read, you will not know if it's your function or if it's a function from psychic image. So that's why I tend to never skip the place where these functions are coming from. So it makes the code and much easier to read. It's a bit more text to write but as I said with auto completion, it's not that bad. Then we can look at the image and remember that we can maybe change the color map. Okay, so this would be the image. Now we're going to do some simple thing. We are going to do a thresholding. So how can we do thresholding? How can we choose it? So there is a great function in psychic image. So the thresholding functions are in filters. You see that here, I didn't import filters. So even if I write FI and tab, it doesn't suggest anything. So I have first to import it. And so this module is not important by default. I have to import it first. When you develop code, you will probably do exactly what I did here. You realize that you need an additional module and you put it somewhere in the middle of the notebook. You should really, if you think you're really going to need that function or module, just copy it and put it at the top. Otherwise, you're going to have this loading of functions interspers in your notebook and it's not ideal then to understand what is done in the code. Okay, so filters. And now there is a function in here, which is called try all thresholds. So you remember that there are examples where I use one of these thresholds. So you have a choice of lots of different methods. If you do try all, pass your image, and it just tries, we just suppress the output like this. So it tries all these filters and shows you the results. And so then you can pick the one that you think is the best. You can even specify a figure size if you think this is a bit too small. You can make the figures a bit larger. Okay, so this is quite handy. It's very similar to the thing you have available also in Fiji. So we are going for O2. This is not supposed to be a perfect segmentation. So it's really just for iterations. So we are going to use O2. Okay, and so this is the result of this thresholding with the O2 algorithm. This is the value of the threshold. Okay, so now to actually create a mask, what I would do is say whatever in my image is larger than this threshold. Okay, but first I have to define the threshold. So I assign this value to my threshold and now I can do this. You see that the result is a Boolean array full of true and false values. Okay, if I do, if I plot this, I see this. Okay, so just use this weird color map, but you see they have a positive and negative. Now I want to define a new variable with this. So I can just say this is going to be my mask. Okay, and so there is a slightly weird way of writing things that you saw several times in the notebook, which is very not mathematical. You just have to imagine that you first execute everything which is on the right side of the equal sign, and then you assign whatever happened on the right side to this variable. Okay, so don't be confused by mask is equal to image and then this is larger than threshold. So what it means is really you do this, and then you assign it to mask, but you don't really need the parenthesis. Okay, so if I show my mask, this works. Now, when I want to measure the properties, what I will want to do is measure the properties of these regions, in particular the intensity. So I will use this mask, and I will use it on the other channel because I want to know to know in the other channel if things moved inside a nucleus or stayed in the cytoplasm. So I have to create labels and to do labels, I can use there are in two places in psychic image this labeling function. So if I pass a mask, I will call these labels. Now if I show this, it shows me a labeled image. So this is not optimal the coloring so I prefer like a random map. So this is why I created this random map that you can import, which is in this form in this little module that you had up here. Oops, sorry. So this is a small module where I defined a random map. So let me import this and you will see immediately a problem. So I will say from course function and when I tap, nothing comes up. Okay, so this is because my notebook here doesn't know about this module. So this module is two level higher than where the notebook is. And so the notebook doesn't have access to this. There are multiple ways of solving this. You can actually write a complete module and put it on pi pi or a conda and then you can install it. And for this kind of things is really an overkill. So either you just copy that file in the same folder where your notebook is. This is not really recommended because you are going to end up with lots of copies of your of this file and then you're going to modify one but not the other one. The solution is to put it on GitHub and then just download it every time and make sure that you only use that version and update that version. So that's one other solution or you put it somewhere in your computer where you always have access and then you can add the pass to that function inside your notebook. So to do that, we can do import the module, a built-in module which is called sys system and then if we ask what is the pass. So these are all the passes which are already included for this notebook. And you see that most of them are specific to my environment. We're going to see that afterwards. This closed location on my computer. But it doesn't have access to my current to my current pass. So what you can do is do system pass. And this is just a list. And you can append a pass. Okay. And so here I'm just going to up and say, go one level higher, because my notebooks are just one little higher. So if I do this, you see that I have, I put three P's here. Okay. So now if I ask what is this pass, I have these two dots here, which would lead me to the right places. Now if I say from course function disappears. Okay. And I can import my function called random CMAP. And I can say CMAP and I call that function like this. Okay. So now this course function module has appeared because my notebook is going to look for all these places plus just one level higher. Okay. And this can be a pass, like a more logical pass, like a specific place on your computer. So you can put the full pass. This is how you can access specific things that you are writing. So these were questions coming up also. And now we can use this color map. And, and this is meant to be super opposed to the original image. And we use here a gray color map to avoid mixing all these colors together. And so now you see, I can make it a bit bigger. So also that you can specify these settings when you create an image, there are different ways of doing that. Now you have your segmented image. Okay. So now we are happy. Let's say that this is the best thing we could achieve. Of course, in the real life, you would have something much more advanced than this. And you couldn't even maybe you start this, for example, a cell post to achieve this. But the end result is the same. You get a mask with labels. And now the last thing we want to do is to get information about these regions. Okay. So what we have to do is so this was our first image. This is the second channel. And now we will import the first channel. So we're going to call this image one. And we need exactly the same image. So they are called the same way. So this is just a position in the 96 workplace. So we're going to load the channel one. We can have a look at image one case. This is looks more like cytoplasm. And what we're going to use is this region props function. So it's in the measure module. Regions table. So what we have to pass here is the labels or image with labels. And we can pass an intensity image. So you can either measure the geometrical properties of all your labels like the area, the roundness, convexity, all these things. But if you pass a second image here, you can pass an intensity image and then it's going to measure, for example, the average intensity, the maximum intensity in each of these regions. Okay. So here I'm passing my image from the channel one. Then I have to specify what properties I want. I want the label. And I want the mean, just this, the mean intensity. So this you have to look up the keywords you can look up on Google you could just type sake image region props, and you have a whole list of these available. And I will call this regions. So now this is what I have. So it's a it's a dictionary and it has label and values for in the intensity area. So each of these corresponds to one of these regions up here. What I can do now is what I will do later is to transform this into pandas data frame. So maybe you saw that I was using pandas a bit in a very rudimentary way. And you can transform this into a data frame. So if you are familiar with our, this is very similar. So this becomes like a nice list with labels and rows that you can easily use. Okay, so now we think that this is working pretty much. Okay, so now we want to do this for all the images, right? We need to do this for all the images. And so we have to go through all the images in the folder. So there are different ways of doing that. Of course, you can, for example, say that you have a main pass and copy this. And then have some way of finding what the content of this is of the folder is and say things like main pass plus maybe this image. Okay, and you would have a list that you that you go through. And now you would have to put a slash. And so this would put your slash here. And so you would end up with this pass, which is okay, but it's okay on my computer, right? This pass that adds these slashes here is okay on Linux or Mac, but not on the PC on the PC would have, for example, backslashes. So this is going to be difficult to reuse on another computer. So the better way of defining this, this passes is to use a more global way of defining passes. And so there are two ways. Now, I think for the sake of time, I will show the easier one. So there is a module called OS. We can import OS. And OS has all functions to deal with passes and folders. And there is a software you could pass. For example, this has a function called join. And if you do join, you just pass this, this you would always have to set differently depending on your on your computer. But then I can just pass this name. Oops, and in the quotes. And this creates a pass for me. So you see it added for me this slash and it would add it differently depending on my computer. So if I was working on Windows, I would actually get the right formatting for this pass. Okay. So this is what what you what you should should be using. So this is quite a quite an important, quite an important part because you are going to deal with passes all the time. So you have to do this. Now there is another module we're going to use, which is called glove. And glove allows us to parse the contents of fight. So if I do glove, glove. Again, these are all things you of course you have to know you have to Google. This. So you have a pass and you say you you complete it as you want. So if I just and you start to say, okay, take whatever. And I can say whatever ends with this extension. Okay, I have her and I just missing a quote here. So this creates a list of all the files that conform to this in my folder. So now I want two of these lists, one for channel one and one for channel two. So what I can do is say channel one and I would just skip this. Okay, so now I have a list of all only the channel one. It's a huge list and you see that it's not really ordered to order it. I can use non pie and say sort. For example, there are multiple ways of doing this. This works well. So now I have a 12345. This has been formatted and read in a nice way. It's always the case, but this works fine. So this is my files. One for channel one and I will just copy this and do this same thing for files to. Okay. And so now I can, I will just suppress this over confusing people. So now I can open in read. I don't need these passes anymore. I can open, for example, the first element. Okay. And this is channel two. So I will call this image one and image two and I can have a good image one. This is a cytoplasm. And this is a nuclei. Okay. So now, and we can do that for all this indices. Okay. And since we ordered them, we are sure, and we have the same numbers in two channels in the two channels, we are sure that they are corresponding. Okay. And now we can apply the pipeline we had before to all these, these images. So we are going to create an empty list that in which we will put all the results and then we do a for loop. So we do for index in the range. And this is going to be just how many elements we have in, in the files one. There are more advanced ways of doing this, but let's keep it simple for this. And now I, for each element in my loop, I'm just going to import these images. So I need here index, index. And now I will just suppress this, this, and now I will measure the threshold on the channel two, because this is where I have the nuclei. I will create the mask. I will create the label. And this is really typically how I work. And I think many people work. So you try something out on an image. Once you have something that works, you include it in a complete loop where you go through all the, through all the, through all the data in your, in your data set. And the final thing that we need is this. This is going to measure the regions. And we're going to transform this into a data frame. Okay. And finally, if we don't save this every, at every time we, we go through the loop, we're going to erase this result here. So what, and we need to correct this image too. We are going to append all these results slowly. So we're making a list of data frames. So we append the regions here. And so now I don't really need this anymore. My results is a list of data frames. So if I take the first one, this is a data frame. This is a data frame. Now I have no way of knowing which data frame corresponds to what file, except if I go back through my original list here. So I can include that in my, in my data frame directly. So I will just add a column. And I'm going to call it file. And I will say that this is. I will take this name here. And I will do it slightly differently. So if I, if I take this files here. And I take the first element. So this is entire pass. You can split this. For example, using the, the backslashes. You can use the last element. So this gives you the name here of the file. There are many ways of doing this. There are more elegant ways of doing this. Especially using a module called pass lip. So you could import pass lip. You should really go read about pass lip. It's a bit too complicated for today, but you should try to use pass lip in your, in your work. So we are going to add this actually to our data frame. Okay. So we need here the index. So if we do this. And look at one of our results, we see that now we have the labels, I mean intensity and the name of the file. Okay. It automatically adds it to each line. The last thing we need, we would need is to know. Which row and which column that is in our plate. Okay. And to do that, you can use regular expressions. So I'm going to really briefly show you regular expressions. And I'm completely aware that this is, if you never heard about this, it's going to be difficult to understand, but if you don't know what they are, I really encourage you to discover what they are. So regular expressions allow you to find patterns of text. In, in, in, in, in, in strings. And so there is a module in Python that does this built in module called RE for regular expressions. And now you can create the pattern. So my pattern here will be something, something is written like a dot and a star, whatever thing. And then I have a dash that you see here. Then I have something which is here, here's a column. I think it's a column. And then you have a dash. You have the, the, the, the, the, the row A here. And then you have something. Okay. And so now if you, if you, for example, have just a string here, you're going to look for this pattern. So you have to say what you want to keep. So we are going to try to keep this. And so we put parentheses. Sorry. This first element here and we're going to put the parentheses and hope that it works. And this takes us A. Okay. So this is the, the, the way to recover this, this, this element. So I think self-profilers has the same thing. And you can also define these patterns. And this is what recovery information from the file name itself. And if we recover now, the other one, I just add this. So this recovers this, this second element. And so you can recover it like this. So you have just one element. You can even convert it to a, to a number. This will give you an integer. This will give you an actual list. List for this. Okay. So now you can add these elements again into your large data frame. So you can say, what is my plate row and, and my plate column. So my plate column is going to be this. This is going to be for a specific name. And the row is going to be the same thing. Except that we are not taking, this is just a, a, a letter. And so we don't convert it to an integer. Okay. So now if I ask what the result is, I see that this now gives me this result. Okay. And if I go to the next plate, I see that it still played row A, but the column is two, et cetera, et cetera. And so if I go further, I see seven, et cetera. So this gives me the reference to that, to that plate here. Okay. So you probably know, no regular expressions. You were completely confused by what I just did now, but you see for what it can be used. So just go, try to learn, learn about it. So now we can finally plot our information. Sorry. So we can first concatenate this. So we can put all these tables together. So now all results is just a huge list data frame with all the information, but the main intensity for each plate, for all the elements in each plate. And we still have the reference to the position inside that plate. Okay. And this is now where you have the strengths of the, of the, of the data frames. That you can do data science directly in your notebook. Okay. So you can do, for example, a group by, so we can group by a certain, a certain element. So we can group by the file, file name and take an average. And so we'd see what comes out of this. You have new and new lists, but now we have average the mean intensity for all the elements in our plate. So you can get all the statistics you want doing this and you can do much more complicated combinations or groupings. But yes, this is where you have the data science strengths that really, that really comes in. So it just, and then you can plot your results. So for example, you can, I just copied this to avoid too much typing. So now I have grouped so I can create a new data frame here and call it grouped. You can group. You can use the index to plot and then you can plot the mean intensity. Okay. And so the index, the index is the file name. And then you have the average intensity of fluorescence within the nucleus, but in the channel that you're actually interested in, not in the nucleus. You rarely measure intensity in the same channel as where you actually segment. And you see that we see sort of a pattern here, increasing pattern. But this is still difficult to read, right? But you have all your files here. What you can do next is doing something a bit more fancy. So you can do a second grouping. Now you can group by rows and by columns, right? So you can just group all these things together and calculate the mean. So if I look at what this contains, this is a big table with references to the plate row and plate column. And the labels we don't need anymore, but we have an average mean intensity over the entire plate. Okay. Now for each of these, I have multiple points that corresponds to these different columns here treated at different levels. And I can see, I can see my results. And for example, I can just do a small for loop and plot each of these groups. So you see that I get groups of a certain row. This is how you do indexing in pandas. It's a bit similar to NumPy. And you can do a plot for each separate curve. So this is each of these lines corresponds to a certain row here. And each point corresponds to these columns here. And you see that in the columns, treatments were done at the different levels of a certain drug that changes how much this protein is found in the nucleus. And you see that the result actually, if you can go and check, fits actually the expectation. So we have first negative controls. And then we have increasing amounts of this drug. And so we should have more and more fluorescence across these columns. So this is exactly what you see. And these first points here are controls. They're different depending on the row. And you can also check that, that they, and that they are matching. So you saw that in, in, with not that much effort, we have small loop here to analyze all the files. You can create the data frame and then have access to all the data science capabilities of Python and Jupyter to do this kind of plots. And this is already quite a complicated thing to do, but it's, it becomes really easy using, using pandas. The last note that will bring us to the, to the next topic, you can mix languages in your, in your notebook. So for example, here, I want to import this is just to be safe of a bug that I discovered. I will load an extension. So you will see that there are these magic comments. I will talk about this in a moment. And then you can use R. And so I use actually GG plots, which is very popular. So I import GG plot here. And then I use the regular R syntax to do this plot. Okay. And so this is still in my Jupyter notebook. And you see that it produces the regular default plot that you get in GG plot. Okay. For this, I just had to install R on my computer and the package that allows me to push and pull data between the two words. So in addition to all the data science, you can do by default using pattern tools. If you are really familiar with R and like GG plot, you can use this. There are like ports of a GG plot in Python. One of them is called plot nine, which is really good. So if you want to say in Python, you can and have the same kind of syntax that you have in GG plot. Okay. So you have seen here things which are very similar to what was in notebooks, like the region props labeling things. This stays at the very simple level. But I wanted to give you this additional information on how you should handle the passes. If you need to have important module that you wrote, I wanted to give you some information, how you deal with multiple images, how you can parse the content of a folder, how you can recover information from the file name. So this was really fast. I'm completely aware of it, but really try to understand what these regular expressions are doing. This is extremely helpful. And then you can create as an output, a data frame and use pandas to do the data science part on this analysis. And you see that you end up with the plot. Of course, you would have to improve it at labels. This is just a very default plot, but you could actually use that to put in a publication. And then everything is in your one notebook. Okay. So this is the illustration I wanted to do. I will put a version, a clean version of this on GitHub so that you can have a look at it if you are interested. But I wanted to show you also, not just the content of the code, but also how I do it. And also, for example, that you can use these tabs, which are very, very helpful. Yeah, we just go here. So to use R, you need a regular installation of R, like you download it from the regular place. Then you create the condo environment. We're going to see that in a moment. So a specific place to install things. You will install Jupyter plus all the other packages you need. Then you have to run R from a terminal, like a regular R and install packages and run these two comments. So this is from a package called IR kernel. And then you need an additional package in your repository and your environment called Rpy2. This is all you need to be able to mix languages like I just did. And you saw that here, what you have to do is in this first cell, you have to say that this cell talks in R. And so this is what you do with this person-person sign. So this means that this entire cell here is going to be written in R. And these additional elements here say that I'm pushing a variable called grouped inside that cell so that I have access to it. Okay. Of course, you will probably run into trouble. These ways of mixing code are very powerful, but somehow power comes always with complexity. So you might run into trouble. You find a lot of information on internet, like overflow, you can post on different forums. But in principle, it works without too much trouble. And then I will go to this afterwards. I'm just going probably to take a few questions if there are questions that have not been answered. Yeah. So there is one question. Could you suggest a good book, manuals, web page where we can start approach to Python specific for internationalists? Unfortunately, not really. So there are a lot of books about data science. I think I put the references to my, to my favorite book by, by J. But there, there is no reference book. I would say for bio image analysis in Python. So you find courses like this. You find other courses on GitHub using notebooks. But I wouldn't say that there are no references. Like there is for exact, for example, for data science or for Java, there is a reference book with code for Java. But to my knowledge, there, there isn't. So if anybody has a good book about this, I'm, I'm, I'm happy to, to, to hear about. There are of course books, but none that I'm really familiar with or that I would really recommend. Well, there's no other questions so far. Okay. Very good. Then I will keep on. So we saw that here we make. Yes. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Very good. So I will try to find it and put it as a, as a reference on the emergency forum as a question. I will just put it as a question. Okay. And if there are any other references coming up, I will add them there. Suggest. Okay. So you saw very briefly here that you can mix R and, and Python installation is not too complicated. You can use of course also command lines. And so we saw that a little bit already last time, but we just want to remind you that you can do two ways of doing this. You can use the exclamation points and use, for example, knowing what your current past is, knowing what the current content of your, of your folder is. You can do this. The other, some of them are even actually available without a, without exclamation point. So just that, you know, you can use these two very basic ones or even without this. But you can also, just like we use in R, we use these two person sign. We can use person signs for a bash, for example, the bash is one of the main languages for in the command line. So if I say this, it means that my entire cell here is going to be bash. And this means that I can do things like a CD, like go one level up and ask what is the content one level up. Okay. So this gives me the content one level up. This is where my notebooks were. So you see that it really executes all these things in one in, in as if you had opened a terminal. So you didn't really move in your notebook between, in these locations, but within this cell, you went one level up and you ask what was the, what was the constant and you, what was the content. And you can even show the current path, et cetera. So I'm using this in some notebooks, I think, and the notebook that you're going to see now, I'm doing this using this to write some bash code. But this is also interesting if you need to download large data sets, for example, from Collab, and then you can use this we get tool. And you can then unzip things. So you can do a lot of things you wouldn't do in the command line separately. You can do them in the notebook. And in that way, whenever you run your notebook, all these commands are executed. So you don't need to first open a command line, execute a bunch of things and then open your notebook and execute the notebook. You can do everything from that notebook. Okay. And so this is what I want to illustrate now. So I made an additional notebook. I was a bit frustrated by the fact that it was not possible to run Fiji properly, or it was difficult to run Fiji in Google Collab. So if you tried it, maybe you saw that it wasn't working very smoothly. On Binder, it worked well. I tried it just before. But so on Collab, no. And on Collab, you have access to GPUs, right? So if you want access to GPU for Fiji, or if you want to mix Fiji and ImageJ, it would be nice to be able to use Collab if you don't have access yourself to a good GPU. And in particular, if you want to use Clish. So Clish is this plugin for GPU computing in Fiji and in other languages. And it would be great if we could use that in Collab in a notebook. And so I tried this inspired by different notebooks that I post on the ImageJC forum. And so I came up with this solution. So you can use this link. You can open this notebook also. So I put some explanations if you want to go through it later. But you see that here, for example, in this first cell, so the first thing you have to do is to change your runtime. You want to use a GPU, right? This is the goal of this whole thing. And here I'm executing all the installation of whatever you need to run Fiji on Collab. So you install Java. You install different libraries and PyMHJ. This library I was mentioning last time. I will do it. I'm not sure how fast it's going to go. So maybe I will come back to this later. See that this tells you that it's running. Then I'm downloading a Fiji version to install. And then I'm installing Clish directly in Collab in that Fiji version that I downloaded. So here I downloaded Unzip it. And here I'm moving to Fiji app. And I'm installing, you can do that actually by the command line I discovered. You can add update sites directly from the command line. And so this is Clish in Clish 2. And we execute this too. So you see that you're getting a lot of messages. It's downloading all the things that are needed. And then you arrive in the part where you actually want to use it. And so of course you have to wait until everything is installed to be able to run this. So I will see how fast this goes. It's still stuck in the... Okay, so it's finished installing this. Now it's downloading Fiji. And I think I will just come back to this in a minute and go a bit further in the presentation. Then we'll come back to this. So yeah, so there are a few more things that I wanted to mention about Jupyter. So there is this other version of Jupyter which is called Jupyter Lab. And you can install it using Konda for example very easily. And you can switch between the two worlds. So if I go back to here, if you just remove whatever is after tree here, you have tree and whatever comes after and you write lab, you can do that on binder. On binder Jupyter Lab is installed. Then it opens the Jupyter Lab interface. And this is a bit more advanced. I tried to use Jupyter Notebooks, the classical ones for courses because they are easier to understand. But if you want, you can try to use Jupyter Lab. You see that you have these tabs opening, not directly in your browser, but in the sub window. And if you have multiple of them open, you see that you have a browser of your data here directly. And you can open multiple of those tabs. And they will open like this. And you can even split your images, split your workspace into two. So you can have two notebooks in case you want to copy paste things. And there are lots of other tools. So you have these tabs where you can stop notebooks. And you have a lot of extensions that you can install. So I'm not demonstrating this here, but know that this exists. And this is probably the future version of Jupyter. The feeling is closer to what you have in Matlab, for example, where you have multiple windows that you can use. So just know that this exists. And you can go back and forth. If you install both Jupyter and Jupyter Lab, you can go back and forth between these two. So if I write back three here, I'm back here. And my notebooks stay running between these two. So it's purely an interface difference. So you can discover Jupyter Lab on your own if you think it's interesting. And then there was a point of extensions. So last time somebody asked, for example, if it was possible to see variables. And so there are extensions that you can use. So you have to install for the classic Jupyter notebook. You have to install this thing called Jupyter contribution and the extensions. It's also a condy install that you see here. It's very easy. And then you will have access to this menu here and the extensions with a lot of possibilities. One I was using here was the table of content, which is somewhere and has disappeared behind my window. Yeah, table of contents is here. This is what gave the possibility to when you open, you can see what is running here to have. Now I have too many things open. You see that my computer is kind of struggling. So this gives you this table of content and some of these extensions, they come with additional icons here. So the table of contents, for example, is here. You can turn it on and off. This is another one. This is a spell checker. So if you want to spell check whatever you wrote, you can use it. And then there is one to do, to see variables. Now just right now I forgot how it is a variable inspector. So variable inspector is available here. And let's see here what we have. So if I make this smaller again, we see here, this is my variable inspector. And I have all the variables that I had defined. Okay. So these are all the variables I defined. So you see that it tells you this is the data frame, the size of the data frame, the shape for non-pair arrays. It does this too. So it gives you the shape of things. So this is can be quite useful. And you have the equivalent in JupyterLab. So this exists also in JupyterLab. Okay. So these are the two additional things that I want to mention here. And now we just finally go back to this. So now it's done installing everything. I just have to restart the runtime for some reason. This is like restarting your kernel. So everything which you installed here is valid for as long as your session is on. Then I am importing some packages and to run Fiji in Python. And this is iMHJ corresponding to this iMHJ package plus some additional Java classes to be able to recover data. So then you start this. And then you see that these different commands take variables amount of time because I don't know exactly what happens in the background on Collab. But once it's running, the code itself is running fast. So it's just the loading and the setup which can be sometimes a bit tedious. So I don't know if I should wait and hope that this is going to take too long. Anyway, I can also already explain you what happens here. If you looked at the notebook, you saw that we can actually use macros in notebooks. So here I defined a macro and you see that it contains the same kind of commands you have when you do macro recording in Fiji. And it's just defined as text. So you have this triple quotes. Everything which is in there is just like macro commands. So you can really record your macro in Fiji and then copy paste the text in here. You can pass some variables if you're familiar with this. And then you can just run your macro using pyimg. So you say run macro and here I'm loading this blobs picture. And then this is, for example, using Clich. So these are the commands that you will get when you want to run Clich in Fiji. And so here I'm doing a 2D blur, a Gaussian blurring of my input image and I'm getting it out by getting the current image. So this is also how you would do in a similar way probably in Fiji. You will get different ways of doing this but you can get the last available image. So this is one way of running this. I will show once it's loaded how you can really run it. The other way is to run, done. So let's run this. And I have a really large sigma so it's going to do a lot of blurring and you'll see that you'll get a very blurry image. So this really used the Clich from Fiji. The other way is to use Clich Pi. So there is a version of Clich directly for Python so you need an installation of Fiji and Clich and then you can load this specific package and then I'm loading a new array. So this is just an umpire array. Then you can go through the code but you do some pushing and pulling of images to the GPU and you do blur. So this is where you do the blurring. I used five here for sigma. And then you can show the image. So you see this is pure Python code and this gives you this result and you can put a larger value for one axis to make sure that you really do what you think. So this really did the blurring using the GPU that is present on colabs. If you want to be sure of this, you can print. You see that you have also some autocomplete and I think there is somewhere to get information. Right now I forgot how to get this information. If you know the command, you can really detect that you're using the GPU which is a Tesla GPU. Anyway, so you can explore this. You can either use it directly in Python or via macro and directly on Colab. So I think that's pretty cool. At the cost of some installation, that is quite tricky to do but which exploits the fact that you can install things directly from the notebook. Okay, so you can follow the link and just see how it's done. I don't pretend that I understand everything which is done here. There's a reference here to where I found this information but as long as it works, everybody's happy. Okay, so we'll take questions at the very end if there is still time but I really want to go through that installation part which I think is really important for everyone and we'll take questions on everything at the very end. I think we'll do a bit of overrun slightly but not too much. So remember that I said that often people get stuck here. I got many questions about installation because I guess many people wanted to run the course on their own laptop and some people got stuck at different places but essentially they experienced exactly this. So I just want to go really a bit in detail on how you can do that. So you have a computer and naively you can say, okay, I installed Python or have already Python installed and I have packages. So all this lives on my computer and is available to me when I use Jupyter, for example. Okay, and they're globally available all the time. So there is going to be one problem with this and the problem is the following. You are going to have issues of version. So this is a problem that many people, so that several people had to run the course that in one of the notebooks I'm using the region props table function that we also saw in the demo. And it's because they had an old version of scikit image and this function appeared in the version 0.16 and so they couldn't really use this function. So I told them you can update it and they run into other kinds of problems and you see that I recreated this here I installed an old version of scikit image just to show you what happened. So this is how you can also get the version. Okay, so what you do then is you say, okay, then I'm just going to upgrade my package and you can, for example, do that with pip. So you say pip install scikit image and you select a new version. What happens is this and when you install a new version it not only installs a scikit image but a lot of other packages that scikit image depends on like NumPy and one of them is image IO to do imports and what you will see is that it has a requirement that the image IO should be larger than a certain version. The version of image IO should be larger than this. So it finds one in my laptop which was already there and then installs it. So it uninstalls one which was too old to install this new 0.16 version of scikit image and installs a new one. Okay, so now you're happy you can use scikit image you can use that function because you have a new version of region prop you have a new version of scikit image. But the collateral effect is that you also updated other packages and what happens a lot of the time is the following. So now you're happy you have scikit image and this required this package. So you have another version of scikit image now we have another project and in that other project you need another package and that other package also uses image IO but has another requirement and it requires the version to be exactly this version. This is quite rare but it can happen. Okay, so now you have installed this and now you install new package and it requires exactly this. So since things are living in the same space how are you going to do that? Okay, and so the problem you are going to have is this one. So when you're trying to install that old version of image IO to conform to version for your other package you will get an error and that says that scikit image 0.16 has a requirement of a certain version but you will have image IO 2.01. So now your new package works but your scikit image doesn't work anymore because it relied on another version of the image IO. So this is a typical thing called dependency health that you might experience when you install things in Python. And so this is why we use environments to somehow isolate the things we do on the computer, okay? So what we want is to have for each project an environment so they're really close on your computer and so each one can live its own life with its own versions of packages and it's not only these two packages you can have different versions of Python you can have different Jupiters so you can have everything that you need is enclosed in this little box here and this little box is called an environment and then whenever you want to do something on project 2 you will use environment 2 project 1, environment 1, okay? And these things are going to be independent. So the best way of creating these environments in my opinion is to use conda. So conda is really creating these environments it's called conda environment then so the first thing you have to do is install conda on your computer so there are mainly two ways of doing this you can use anaconda this installs a lot of packages plus the conda software itself or you can use mini conda so mini conda has much less packages just the basic conda since you install new packages anyway I recommend using mini conda because you don't need the several gigabytes of software that comes with anaconda so you can go on the mini conda website they're installed you download them you double click and you really install them like a regular software then if everything went alright then you agreed to all the defaults the next time you open the terminal like on Linux or Mac you open the terminal in Windows you have to open what is called the anaconda prompt which is a software that gets installed when you install a mini conda or anaconda the next time you open your terminal you will see this additional thing of the thing you usually use the name and it's called base and base is the base environment of conda so the picture is slightly different so in your computer you have the conda world and you have the rest of your computer and these things are completely separate so you might have already things installed in your computer python, psychic image, whatever and you can reinstall everything in the conda world and the conda world is based in the base the base is also an environment it's an environment that somehow contains or I don't know how to exactly explain this but that is available every time you start the terminal it's the first environment that you have access to and which contains also conda so conda is run from the base environment and so this can have packages if you install anaconda you might have all packages plus your environment so again the environments that you want to create and so each of these things are separate so these two words don't talk together and the base and the environment one and two don't talk together so they are very separate now there is one rule with conda is that you should not use the base environment just leave it but don't use it to do coding so I was reminded by this by a post on AMJLC forum by Juan Núñez Iglesias who said we should amend some of the answers and this is exactly this so there are three rules don't use the base environment don't use the base environment it's really important the reason for this is that eventually you will break your environment when you install you will have a mix of things installed that will eventually have conflicts that are unsolvable and you will end up with an error message that says I could not install these things and you are going to really struggle to solve these issues uninstalling parts of software so you really don't want to do that in the base environment that you cannot suppress so you have to create special environments and those special environments once you run into this kind of issue you can really you can destroy them you can just suppress them and start from scratch so this actually happened to me when I wanted to I wanted to demo what happens in a bad case so I installed the different things different versions and suddenly I got this message but it takes I have to say it takes some effort to break an environment so in 90% of the cases when you install regular software like machine learning software like TensorFlow especially TensorFlow you will run into these kind of issues okay another problem that you might have is if you work in the base environment especially you have a lot of things installed notably a Python version and if you try to upgrade scikit image for example and you have a certain an old Python version for example because you installed Anaconda or Miniconda when you say upgrade scikit image it won't upgrade it because scikit image more advanced versions cannot work with this early Python version okay but this is what some people experience when they try to upgrade they told me yeah but I upgraded but nothing happened and this is why you have an old Python version so you would first need to upgrade Python this is going to break all the other packages so that scikit image works so your environment will be broken so really do these things in specific environments it doesn't want to move anymore okay so how do we create an environment how do we create these closed spaces to create these closed spaces you use this conduct create command and then you give a name you call it my end for you can call it I call it by a pi for example for the course and then you usually install at least one package if you don't install anything it's going to be empty not even Python not even pit there's nothing in there so you will install for example scikit image so you go on their website and they tell you okay this is the command that you should run conduct install and you have one additional piece of information which is this so these are called flags so these are options sort of options this is a channel so you have these packages available from different sources let's say one of the most popular one is conduct forge so you can add this as a as a specification usually if you don't there are default sources that also work so as long as you don't know don't do really advanced things you should not have trouble with these with these versions and then you say the package name so this is going to create an environment called my end and it's going to install from the beginning scikit image okay so we can execute this in your and your command line and when you see execute this what it it installs a lot of other packages I just highlighted a few numpy because it depends on numpy it installs pip so now you can also install other packages with pip by default it installs a version of python if you don't specify anything it's going to take the latest one which is this one and scikit image of course see it so a lot of things get installed at the same time now to use these environments that you created this little box inside your conda space you need to activate it so you use conda activate my end on windows you might have to use source activate my end and if it doesn't work you will google it and the first tag answer will be your solution now once you have activated your environment whatever you do will happen in that environment so if you install new things it will happen in that environment not in the base environment and not for your whole computer only in that environment okay so for example we install pandas so here I said conda install pandas but you see that I have my environment activated and you see it here and this parenthesis this says no more base it says my end so I created an environment called my end and now it uses this environment to do the installation okay and so everything is installed and the same for jupiter and the same for a lot of other packages you can of course also use pip so for example the AICS image IO package can only be installed by pip if you go on their website and so they tell you use pip install this so since pip has been installed previously I remember here it's available in your environment called my end so now you can execute pip install AICS image IO and it installs all the things that are needed in there so you can use both pip and conda I recommend using conda whenever you can because conda really makes sure that package versions are compatible but you can use both now if you want to automate this entire process you can write a little environment file that looks like this it's yaml language you give a name to your environment you can specify you don't have to channel for example conda forge and then you say what you want to install so instead of writing conda install and all these packages you can just make a list here and you can even say what should be installed by a pip so for example AICS and image IO is installed by pip so this is all contained in this environment file and then the only thing you have to do is run this command so you save this file on your computer you move to the right place on your terminal or you import it in anaconda interface and you say conda create environment here you wouldn't call it environment minimal and this is going to run through all the installation take care of everything for you and at the very end you get a message that tells you a message like this that tells you to activate this environment use conda activate myNF and this is going to have whatever name you selected in this file here so here you would say conda activated by a pip so this file here is available in the github repository in the installation I put the reference in the installation folder so you can download it on your computer install it it's a minimal installation in the sense that there is no cell pose and no stardis which are slightly more complicated to install so if you need that just go on their github pages and they will tell you exactly how to do and you see that you can specify versions for example here Python 3.7 because cell pose needs exactly the version of Python and I just left it in this example ok so to summarize if you write all these lines in your terminal you will have a functioning conda environment that has scikit image this is a conda environment that you create called myN you activate it never forget to activate you install Jupyter then you install with pip AICS ImageIO and finally you can write run Jupyter and to run Jupyter type Jupyter notebook yeah don't create completely empty environments because you are going to run into trouble so at least install one package and then the other ones I really recommend using these environment files now not responding using this environment files ok and then you will end up with really close environments no conflicts of versions and I think it's pretty easy to do right so if you want to install cell pose and you had trouble before you can download their environment .yaml file and install it using this command and you should not run into any trouble ok so conda will really simplify your life and this is the end of this presentation so I took quite a lot of time for this installation because I really think it's important if you want to continue using python I really encourage you to use conda so maybe there was not that much time for you to ask live questions but I really wanted to go through this so the material will stay online and interactive beyond the course so on collab or on binder interactive and all the questions will be are or will be posted on the image sc sc forum and with that I think we will overrun a few minutes but we can take questions if they are about any of these topics and thanks for your attention yes so we have a couple of questions here can you hear me yeah yeah ok what are the major differences between python and are or what are the benefits to use are instead of python I think in the large part is really a question of taste of how this I mean they have very different syntaxes so some people really love our syntax some people love the python syntax this is a question of taste then there is a question of domains of applications are is really good at statistics we have to do statistical analysis are I think has a sort of an edge because lots of people use it there are great packages in python like stats model for example for all the data science part I think it's pretty equivalent you have the like the tidy version are and the python ecosystem around the pandas these are quite quite similar I would say for image processing python is is clearly clearly ahead right because especially because of all the deep learning advances that have been done in in python so both TensorFlow and PyTorch which are the main libraries are written in python and in lower level languages but mainly in python so if you do image processing I don't think you should do use are you can use it as I showed for the data analysis part but not for the image processing at the moment there is a question regarding the stardust and cell poles whether the training data sets are available somewhere so I I used essentially what is provided by the developers so for stardust they have two or three training examples one was trained on nuclear examples also from this broad repository like a 500, 600 different types of images of different types and so this is available in their GitHub repository and they have also I think synthetic data so essentially I use that and this if you have fluorescent nuclei you can try to use it and see how it performs if you have completely different data you will have to retrain it and so they have some instructions on their website on how to do that maybe I can just try to get out of this if we go to here I think on the repository they have examples like 2D and 3D and 3D I don't think they have the data this is how you do the notebooks so they have already also example notebooks probably you will find quite a lot of examples a lot of more information on their actual talk so here yes so this is a demo this is basically a model so in here you have the weights for example and there are other parameters you can set in stardust so all this is loaded when you load a stardust and what I did for the material available in binder on colab I am cloning this repository and then using those data to run the segmentation for cell pose, cell pose is really designed to be usable out of the box so basically when you run it it downloads automatically weights because they train really on a massive amount of data and so you can retrain it specifically on your data but I think their point is really that you would have one solution to fit all kinds of images and from the feedback it really seems to be the case so it really manages to segment an amazing variety of data but again every time you use those solutions and don't retrain really you should do a benchmark and segment things manually for example and recheck how good your segmentation is the question I think was rather also about the availability of the training data so the training data for cell pose, I mean if you go and read the bio archive a paper they tell you exactly where the data are coming from I think most of them are there was a competition actually as not to remember how it was called now where people competed proposing deep learning approaches for segmentation of nuclei this was also led by the cell profiler people so they used this I think and added other sources of data and so everything is public so if you read the paper you will find all the sources of those data I think maybe they mentioned this year if it was used in it was called something bold and now I forgot exactly what the name was but I can post this also on the repository on the images in case people are interested it will come back yeah but everything is public okay general question what do I have to consider when installing Jupyter and having to access computational resources on a remote server to which I can connect with SSH can you start with the question what what should be considered when using Jupyter and installing Jupyter to use it with a remote server over SSH there is nothing really fancy to be aware of so the only thing that might be a problem is access to your server Jupyter so some IT departments don't really like people to access to Jupyter to their infrastructure via Jupyter and so there are firewalls that make it difficult but these are really specific questions that you should ask to your IT people they are always workarounds so I will post explanations on how you can use SSH in any case to access a cluster for example but it really requires some effort but I will post a solution as an answer to the question okay can you use MATLAB from Jupyter not that I know of course you can run MATLAB from a command line bash cell in your notebook anything you can run from the command line you can also run in the notebook but this would not be equivalent to what I showed with art for example where you can push into things so I don't think there really is a solution for this so there is an octave kernel that you can run in Jupyter which is the open source version of MATLAB but I don't think that there is a purely MATLAB solution okay I think that's it okay very well since we already overrun I think it's good to stop here thanks again for your attention and thanks again to the new BIOS organizers for giving me the chance to give this course