 Welcome everybody and today we have Carson Stringer, who is a group leader at H&MI in Cornelia Farm and she works and on your side, and she will tell today about the cell posts, which is a learning network for cell segmentation. Thanks so much. I'll share my slides now. So we're going to go through some slides explaining cell posts, and then we're going to go through an example, notebook and also running cell posts in a GUI. Again, happy to answer questions at any time. I can see the Q&A. So yeah, let me know if it's blocking the screen for people or not. Okay. So cell posts is an algorithm that me and my colleagues are you developed to segment cells across a wide variety of imaging types and cell types. So what is cellular segmentation? It's basically you're trying to get kind of outlines of single cells, you're doing an instant segmentation problem where every instance is the same type, and hopefully you get these outlines accurately so that you can do a variety of other analyses on your data. So for instance, you might need to count the cells in your sample monitor cell shape changes, quantify gene expression, or even record neural activity in which in each of these cases you need to segment single cells. So here's some examples of the diversity of different cells that people record from. This is an example of data that we use in our paper that Tim Wang collected that is actually the next step in his pipeline after segmentation is figuring out where the dots are in his RNA in situ experiments. So this is a gene expression example. So we developed cell posts basically to try to cover all of these use cases so that you don't have to make a new model every time you come across a new cell type. Then one other case, I'm not going to talk about today, but when you're recording neural activity, so this is an example of calcium imaging in vivo, and so each of these neurons is expressing a protein which lights up whenever calcium comes into the cell when the cell spikes. So you can do segmentation on this type of volume, or I mean it's a volume because it's actually X, Y, T in time, but we also have another algorithm called Sweet2P, I won't talk about today, which also takes into account temporal information and correlations in time to segment cells. So how are we going to solve this problem? We want this generalist model. So a good place to start is probably a deep neural network, that's the state of the art in segmentation. So people have been using things like mass carcinogen to do segmentation. In particular, they train on this really large dataset called cocoa, but unfortunately at the time there was no cocoa for cells and so that's a problem that we didn't have the data required to train the model. Another, I mean you could call it a potential problem is that the state of the art in machine learning for segmentation is trying to solve two problems I want is trying to solve the semantic segmentation and the mass problem, and it also does it in multiple steps where it tries to find these bounding boxes and then on top of that finds the mask. So we thought maybe we can come up with a loss function that's more appropriate for cellular segmentation. Okay. So to solve the problem, we acquire enough training data and then built a deep neural network with an appropriate loss function for cellular segmentation. We were optimistic that this approach would work due to work from Ann Carpenter's lab, which where they collected and segmented data from a variety of different nucleus types, and they held this challenge and they found that the top performing model would strain on all the images. So there was no downside to basically training on all these images and so we thought, okay, let's do this for cells, not just nuclei and so the exciting thing there is that basically having a single model is much less time needed than for instance, you have to train a specific model on your data and it's complicated. Maybe the code is complicated or if you have to for instance use a pipeline without deep learning, which might also take time. Okay. So we collected and labeled many cells, mainly through Google image searches and Mihaly Mihalos in the lab, he segmented all of these cells in our GUI. All of this data is shared on our website on cellposts.org slash dataset. You can get access to all of our segmentations if you want to train a model on our data too. Sorry, I just thought I saw a chat message. Okay. So now we need to come up with a loss function that's appropriate for cellular segmentation. So to do that, we basically take these cells and we turn this into these cells into little dynamical systems where basically each cell is represented by these flow fields. And so for people not familiar with this, this is the notation that's often used in the optic flow literature. And so this HSV color map corresponds to arrows pointing in different directions. And so you have and basically all of these arrows are ultimately pointing towards the center of the cell. So if you ran a dynamical system using these flows as your DX and DY then the pixels in these cells will converge to the center. And so we're going to try to predict these flows using a deep neural network. And we get this is that out. These are the flows that are we're training the network to predict. And then once the flows are predicted, we have to do some post-processing on them because this in itself is not the final solution. We basically run the dynamics. Like I was saying, if you run the dynamical system, you see where these particles, basically the pixels in the image end up. And each of these kinds of peaks in where they end up are classified as single cells as you can see here. And so we think the flow representation allows us to find shapes that are non-convex and additionally, it helps us to avoid a problem that's common. If you're using units, for instance, for people who are familiar with them, you often have a problem with merging. And so cells close to each other often get merged together. They're really close to each other. And it's ambiguous. These flows, in fact, if you look at a boundary between two cells that are next to each other, like these two cells here, there's clear differences in the flow that the network has to learn to segregate them. That helps, we think, with avoiding this merging problem. The next thing we did was try this model, try to create at least just see if this algorithm works on a specialist case and on specialized data. Where we took a hundred fluorescent images of neurons from the cell image library that was already pre-labeled. And we trained our model on this data. And we trained cell posts, stardust, and mass-garCNN. And what we found was compared to, for instance, stardust, you see that we're able to more easily find these non-convex shapes. And then with mass-garCNN, there's often you see over-segmentation. I mean, it depends, but sometimes you'll see this over-segmentation, the model trained on mass with mass-garCNN. You can see errors here too. So sorry that I should have said the ground truth is in yellow and the prediction is in red. So you can see errors are basically where the yellow and the red don't line up. And so we're going to quantify, you need some metrics to quantify how well you're actually doing. And so to do that, we use the accuracy defined as the number of true positives divided by the number of true positives plus false positives plus false negatives. And you define them, you have to choose how you define a true positive. And so that can depend on how stringent you are on your overlap between your ground truth cell and your predicted cell. And so we use various levels of IOU, which is the intersection over union, the area of overlap over the area of the union. So for instance, you can use a threshold of 0.5. For instance, which means a cell at least has to be overlapping by about half with the predicted. The predicted mass has to overlap by at least kind of half with the ground truth mass. And so then you have basically that's the, you're going to call it a true positive or you have an overlap that's sufficiently large and then false positive and false negative for ones that are not overlapping. And so we assign every cell, we match each cell with a ground truth mask, if it's above that IOU and we compute these accuracy scores. And so that allows us to create a curve as a function of IOU threshold. So basically at 0.5 across these IOU thresholds, even when we're trying to, we only classify true positives with really stringent amounts of overlap, like there has to be a lot of overlap. Cell pose in green outperforms stardust and mass carcinin on this specialized data. All right, so at least on the specialized data, cell pose outperforms these other state-of-the-art models. And then next we asked what about if we train the model on all of this data. So not just the cell image library, we're going to train it on fluorescent cells, non fluorescent cells, cell membrane images, microscopy, other images, kind of ones that we weren't totally sure about and were like microscopy and other things. And then non microscopy images, which were like rocks and other repeating types of objects in the world. Okay, I see a question. So there's a question, is there some limitation in the number of cells that can be predicted per image? And the answer is no. But I will say, so with this flow representation, oops, let me go back. You do need a sort of a minimum number of pixels per cell in order to segment it. So you kind of, I imagine something of at least eight to 10 pixels. And so then if you have a really densely packed sample then in theory that kind of limits the number of cells you can find. But there's no, I mean, there's no real limit. It's just kind of, it's more of a limit on the size of the cells that you can segment because if they're too small then these flows will break down. Basically you won't have flows pointing in each direction enough pixels to have all the flows in order for it to converge. Yeah, good question. Okay, and so then we trained our model on all of these different images and then looked at the performance. And we found there are errors that cell flows makes. For instance, it didn't find this cell, but it does a better job than Stardust and Mascar-CNN. You can see some errors here. For instance, Stardust wasn't able to find, it did some over merging. You can see three cells here. There should be three cells segmented only one. And similarly, Mascar-CNN makes that error too. And so what we found when we, I'm just gonna show test performance on these like four relevant classes. But in the paper, in the supplement you can see the test performance on all of the classes. And what we found was on the generalized data on these four classes, cell, pose, outperform, Stardust and Mascar-CNN. And this model trained on all of these images also didn't have reduced performance relative to the specialized model. So we checked our performance of this model and specialized data and we saw no loss in improvement. And I basically I'm overlaying the curse on the previous slide. So basically the network has the capacity to learn a variety of different image types. There was a question for the comparison to Stardust with Stardust run with the standard model for fluorescent nuclei on the nuclei channel or was it compared to the segmentation of the entire cell such cytoplasm? I'm not sure what the standard nucleus model is. I ran Stardust with two channels. And so I gave it the cytoplasm channel and the nucleus channel and had it predict the cytoplasm segmentation. And so we were looking at the segmentation in the cytoplasm channel always for all of these models. And then same thing with Mascar-CNN. I gave it, in Mascar-CNN's case, it takes actually RGB images because it's been pre-trained on ImageNet. And so I gave it two channels and then the third blue channel was zeros. Okay, I hope that answers the question. Yeah, sorry, sorry, Bram, go ahead. It's fine, Ed. I was just saying you don't have to watch all the questions. We will also answer them in the chat, in the Q&A session. And maybe if it's interesting once, then we will ask you or you can read them. Okay, that's fine. Then you can focus on the talk. Okay, yeah, no, no, no, yeah, sorry. It's also fine if you want to answer all the questions. Maybe it takes too long, I don't know. Yeah, no, no, that's a good point. Yeah, that is maybe a bit of a technical question that I answered, but just for FYI on how I train the different models. Yeah, and again, this question, I guess this is iterating on the previous question, this question that says that segmentation is based on nuclei over whole cell fluorescence. And it's based on the whole cell fluorescence and the nuclei actually, but where our ground truth segmentation is on the fluorescence. But we, again, we give the model two channels. So for images that do have two channels, we give them both the nucleus channel and the cytoplasm channel because that can help aid in the segmentation to know where the nuclei are. And so then when you run cell post yourself, you can give it two channels. One, your cytoplasm channel, one, your nucleus channel. And yeah, I guess I just, I think these are kind of relevant questions to answer. So I'm just gonna, is it possible to perform the segmentation with more than one marker for the cell? So what we recommend is you run cell posts, for instance, for the cytoplasm on the two channels and then basically if you wanna segment the nuclei afterwards, you then run nuclear segmentation on the nucleus channel. So if you have different channels for different things, you can run cell posts with those different channels. Cell posts was only trained on 2D data, was never trained on 3D data. And this question of how problematic can it be for cell posts to predict in segment cells in a sample with high background? So you can actually, so a lot of our images are this case of light things on dark background. So you can use this, there's an invert option in cell posts in the GUI and also on the command line, which you can see if you run help when you run cell posts, you can see all the options. And so you can invert the image. So if you have high background and low and the cells are dark, then you could invert the image and that will make it run better potentially for your case. And that's like, this is another good question. You mentioned that it works well on two channels, nucleus and cyto. What if you have two channels, nucleus and membrane? And so we did consider actually like splitting all of those things up, but we found that it actually works fine for us to throw in. So I'll show all of these segmentations. We basically have training data where it looks like, where it looks like membrane and we just call that cytoplasm. And the model can learn that how to segment cells that look like this versus cells that have more filled in cytoplasm. So we didn't actually separate things with nuclei versus, or sorry, we didn't separate cytoplasm versus membrane because the model did fine by keeping them in the same class. Yeah, so basically the summary of this figure is showing, so again, the ground truth is in yellow and the predicted is in red and showing that cell posts can work well across a wide variety of these different images. And so even like fun things like clothes with garlic, basically, these are all our test images that it has in between them. It accepts someone is asking what kind of color images it accepts. I don't know what H and E means, but it can accept like float 32 or whatever. We convert them to float 32 after you give us the images. It can use hematocline and mucin. So this is a immunohistochemistry or just a bright field images. Ah, okay. I think it would work if you first convert it to 16 bits, I guess. To grayscale. Oh, I see, okay, okay. Yeah, so that's a good point. Yeah, so you can run gray, you can run cell posts with grayscale. So a lot of these images are running grayscale and then the ones that are in color, you can see are run with two channels with a cytoplasm channel, which is shown in green and a nucleus channel shown in blue. Okay, and then someone asked, can we do 3D or 4D images? And so the answer is yes. So for 3D segmentation, we've written in, there's an extension of cell posts, which again, it's only trained on 2D images, but we are going to use information in 3D to create a 3D segmentation. So how we do this is we take our 3D stack and we take XY slices, which is kind of the conventional thing to do. We take these XY slices and then, but we also take XZ slices and YZ slices. And then from each of these slices, we're going to get flows. And so in XY, we get flows in X and Y. In XZ, we get flows in X and Z and in YZ, we get flows in Y and Z. So basically we have X and Y, we can average the flows from the XY field with the XZ field. So we have X in both of those that we average and we can average Y from XY and YZ and then average Z from XZ and YZ. And so that gives us these average flow fields, which are now taking into account information in 3D, not just information in 2D. And then we run our dynamics in this 3D space. And that gives us an output that looks like this, where here, if you zoom in, here's just like a GIF I made using image J of what these cells look like. I don't really like looking at the cells so much this way. I think it's easier in our GUI or in other GUIs where basically you load in the stack and you can scroll through it in Z. And then you can clearly see that cell poses is able to avoid merging cells that are close to each other. And this is really important in a problem like in C2 RNA sequencing because you really wanna be able to identify like dots in these images and assign them to a specific cell and not have errors that might cause you to think you have like a new cell type or something that would be bad, obviously. And so someone did ask like continuing on this point. So this is data that was next used in RNA sequencing. We don't have an algorithm for detecting dots. And like what I was saying before is that cell poses is good for bigger objects. And dots are generally only a couple or, I mean a few pixels, it depends on the resolution or less. So I would recommend using another algorithm for dot finding. And I think there's several of them on the market in the open source code and in GitHub for you to use basically. And so we've trained cell pose on 2D data and then done this extension to 3D. And so that's our performance. So again, like we wanna do, this is average precision. This is our accuracy. We wanna do as well as possible. We can see cell posing green outperforms these other models. It outperforms us specifically. We made a 2D stitched version. So we basically run only XY slices and find masks and then stitch them together and see how well we do. We do almost as well, but we don't do as well as using the 3D context in order to do the segmentation. But a nice thing about the stitching is that you can do it for any model like mascarcina and start as we were able to do that and show that cell poses indeed outperform these models. Again, these are all trained on the same data. And the stitching, someone asked about 4D data or in time, the stitching can also be used on time data. And there's also other tools for working with time data like TrackMate and other algorithms that people have developed. The stitching is good if you have relatively small changes of cells positions in time. They're moving like really large jumps then you would need a different number of them. They need to have some overlap across frames in time in order to use it. And then we can also compare two models that can run in 2D or in 3D as well like a unit where you can basically segment the cell probably. We outperform that. And we also outperform elastic that's been trained on 3D data as well, which Tim Lang trained. Are there any other questions about this sort of thing? Another question about movies. Yeah, so you can try using the stitch threshold which is an option in our Nepari GUI but also in from the command line or in a notebook you can use the stitch threshold. And I'll show how to run that in a bit. Okay, and so I'll save the rest. I think I'll ask the other question or I'll wait. Someone's asking one more question about the RNA data. Yeah, so if you find objects we have the masks that we save in X, I mean and you can assign dots to those objects. Okay, so this is what the GUI looks like and I'll go through it afterwards. So you can see what it looks like. You can drag and drop images, use the left, right arrow keys to go through the directory. And yeah, okay, I see there's also chat questions but yeah, put them in the chat box or in the Q&A box, sorry. Okay, and so the GUI allows you to do manual curation and so one thing you can do if you manually curate yourselves is you can send them to our server and we have a new model that we've released. Well, last May we released called CYTO2 which is trained on data that people submitted in addition to our images. And so that's a way to add data to our dataset as well. Yeah, so just that's there. Okay, so kind of in summary we found that CellPose has good out-of-the-box performance for many types of images. It's easy for us to integrate labeled images from users. You can send them to our server. We also have integration with other platforms such as ImagePy, Mjoy, Napari, and CellProfiler. And so other people wrote all of those other than the Napari one. And I'm grateful to all of them for making CellPose more accessible to people. And so yeah, big thanks to my colleagues. So this is joint work with Marius Parkisaru and then also Tim Wang and Mihaly Michalos. And thanks for my funding at Janelia. And I guess there's still questions rolling in in the Q&A so I can start answering more of those before we go to the demo. Yeah, sure. I can ask some questions to you and then you can even up. So for instance, I was trying to answer also in the chat. Thank you. But maybe I didn't repeat. So the CellPose running better with a nucleus and a cytoplasm channel. And then he also says, if I only have a nucleus channel, how might I improve the performance? I guess he means that if you have only a cytoplasm channel or a cytoplasm and a nucleus channel, do you see any improvement in segmentation? Yeah, that's a good question. So using the cytoplasm channel in addition to the nucleus shuttle for cytoplasm segmentation definitely helps. So you can, especially if boundaries between cells are ambiguous, you'll generally have the nuclei will be more separated than the cytoplasm or and they'll have distinct features. Like you'll see a ball next to each, like two balls that are adjacent that have curvature that the algorithm can distinguish. So using the nucleus channel definitely helps with cytoplasm segmentation. If you're struggling with nuclear segmentation, I mean, I think maybe in some cases it would help to have a model that was trained with the cytoplasm, but our nucleus model has not been trained with cytoplasm data. All of the nuclear data sets are released with nuclear segmentation without cytoplasm, at least the big ones that were released at the time. So we wanna improve your segmentation with nuclear. Yeah, sorry. You might, maybe there's some like pre-processing or filtering you need to do beforehand if your image has a lot of inhomogeneity. But yeah, you can also use the cytoplasm model on your nuclei and see if it works better. So particularly if the nuclei have like lots of intricate structure, it might actually work better because the nuclei are relatively homogeneous in our training set, it might actually help to use the cytoplasm model then on the nuclei. Yeah, and it is also another question from the same person is related to that. Because he asked if you have artifacts like bright spots or antibody glimpse in your images, does it affect the segmentation? How would you improve that? I guess some pre-processing maybe helps, yeah. Yeah, I mean, it depends on what like you could try. I mean, it depends on what the, it really depends on the image. It's hard to know. Like if it's low frequency changes in intensity, that's pretty easy to sort of filter out. If there's really like high frequency dots all over the place, then that's a little bit trickier. But yeah, if it's differences in illumination across the image, that's, you can kind of normalize that or high pass filter your image and that will help before you run self-close. Cool, lots of questions streaming in. I didn't really get the chance to look a bit of all. But would you want to do the demo first and then ask us some more questions? Maybe I should finish the, maybe I'll finish the questions and then like we take a short break and then we do the demo. How does that sound? Also fine, maybe you need to break. No, no, I'm fine during the questions now. And then we can do the demo for people who want to stay and see the demo. Oh, okay, yeah. Yeah, the tissue net model that's coming, yeah, we're planning on doing that. That's one of the questions. They've answered one of those questions. Can it distinguish between individual cells or a determined number of cells for cells that are in a large clump? It really depends on if you can do it, like if you can see those differences in the cell, then in the cells then it should be able to find those boundaries. But if it's just a clump and like by eye you can't tell that there's different cells there, then it's unlikely the algorithm will be able to tell as well. Someone at Kenneth asked, what is the file format for the output of the result of 3D segmentation? So there's a couple, the default format that it outputs is an MPY file, but you can do dash dash save TIFF if you use, or in the GUI, you can say save mask to the TIFF or on the command line, you can say save TIFF. And that allows you to get the masks in TIFF format. But there was another part of that question. Wait, hold on, where did it go? Do you have the outline as the surface masks of the 3D cells? No, the outlines, well, the outlines are in 2D, are computed in 2D. So I guess they're not output as a surface. I'm not sure what the different format would be. The outlines are like, are across 2D slices. So saved in the same size as the image itself and the masks. Okay. There's a question about omnipose as well. Are you going to talk about that? I wasn't going to talk about it today. Wait, where's the question about omnipose? How well the cell could handle filamentous organisms. Oh yeah. He tried omnipose, but he has some issues, but that's maybe for later. Oh yeah, yeah. So if you're having issues, you can definitely try omnipose if you have things that are really long that aren't working well with cell pose. And you can make an issue on his, we've recently separated the two repose out. So you can like ask specific questions to Kevin Cutler who developed that algorithm. And I'm going to make a new release basically later today I will make sure that like there's a separation, but then yeah, so ask questions, put issues on his GitHub for omnipose with questions and he's happy to help you if you're having issues. Because yeah, that's the sort of thing that omnipose is good for is finding these really long things. You're asking if it's possible to interact with cell pose through a script in language like Python for large scale automation. This is what you're going to join next, I think. Yes, so cell pose is fully written in Python. Yeah. So yes. There is a question that says, looks great. How do I get the segmentation into other programs? So you can say in, you can put a flag that says save outlines or you can say save outlines from the GUI and that saves them in this text file format that allows you to load them into ROI manager in ImageJ. And then I guess I'll answer this question about low SNR images. It really depends on how low SNR. So some of the, actually some of the segmentations that were submitted by users are a bit lower, the images are lower SNR than in our training set. And so CYTO2 might work better for you because it's been trained on more of that kind of data. It is not, you would have to train your own model if you wanted to do subcellular organelles though. You would need a training set for that. That we don't actually have. Garsen, the questions keep streaming in obviously because it's such a wonderful tool. I, we propose that you maybe first do the demo now and then if there's time we can, you can ask and answer more questions. And also the questions will be answered if there's no time, then questions can be answered later and they will be assembled in an image.scposts. So everybody can read them later. Cool, that's awesome. Thank you. I underestimated how many questions there would be. Yeah, it's always in the beginning and nobody's asking questions. And then suddenly it comes. Okay, so I'll let's do like a five minute break now then and then do the demo and then answer more questions. And if you don't need a break, then I would propose you just continue. Okay. Sounds good. Very quickly. Okay, yeah, I'll come back in like a minute and a half. Yeah, great. Let's take a very short coffee break and see you soon. And maybe in the meantime we can, yeah, I can answer some of the questions that I see. You just keep pace going. Let's see, let's see, what have you got? So somebody was asking about Q-Path and also Fiji integration in Fiji integration in Q-Path. And I know that there's a, there are wrappers. Oh, sorry, I have to remove a cat here because it's stepping on my computer. And they will, yeah, they provide a wrapper that you can actually integrate cell posts in Fiji and in Q-Path. And I've done it for Fiji and this works quite well. For Q-Path, I haven't really tried it yet, but apparently it also works. I will answer that question and then put the link also in the chat or in the question. There are some questions that I don't know. You're asking if cell posts are strained on tissue. I'm pretty sure it's not. They train it on a lot of different cells and also images that are not cells at all, like seashells and rocks and kind of things. And this is why it works so well, I think, on very general. So the pre-trained model works very, very well on a lot of data. It also works on tissue, but I don't think it was strained on that specifically. Hi. I tried to answer two of the questions in the meantime. Thank you. So I'm gonna get started then with the demo. Yeah, great. Okay, so I'm not sure if I can zoom in here, but hopefully things will be bigger in the GUI. So if I've installed cell posts already, so you can install it in its own environment or you can basically say pip install cell posts, then I can run Python M cell posts. And this is going to open the GUI. And this is actually the current GitHub version of cell posts that you can pip install the GitHub version with pip install git plus. So this is the latest version. I'm gonna make a release later today. Sorry, I didn't manage to beforehand. So it might look a little different than the normal one. Try to kind of compress things here. Okay, so I'm going to, once I've opened the GUI, I can drag and drop an image. And this image is actually already been segmented. So I have some masks that came in from the previously segmented seg.mpy file. You can also have it auto load other things. You can have it load masks and other things like that. And basically, we can run the segmentation. I'm going to choose a few things. Okay, so I'm gonna run it on just one network, so it's faster. I'm going to choose the Cytomodel. There's one thing you can do. So I'm gonna choose the channel to segment, which is gray because this is an image actually, here if I show just the outlines. This is actually a calcium imaging experiment. And so these are cells. These are neurons in the brain. And so this is just gray scale and there's no nucleus channel. So I'm gonna segment just the gray and I can click calibrate. And it's kind of slow on the CPU. So the advantage is, oh yeah, so how does calibrate work? Sorry. So basically, if you don't know your cell size, you can run the network to try to figure out your cell size. And it found that the cell size, it's printed out here, estimated diameter of cell 30.6 pixels. But in practice, you usually know your cell size. I mean, we did this because we basically, we train the model only with, we rescale all the images when we train the model to a diameter of 30 because that actually works the best. And so then as a user, you need to tell cell codes what your diameter is, either putting it into this box or you can try our pre-trained model, which might not work perfectly, in which case you just wanna put it in yourself or put it into the command line, it's dash-diameter. Okay, so once we've run this model, sorry, we've run the size model, we have the size that we're gonna use and now we can run the segmentation. Okay, and then we can see our segmentation now from the model it ran. You can see what it says here, it says it found 213 cells. And then basically I can change, this is our QC staff, which is our model match threshold, which is called flow threshold. Normally we call it model match here just to make it easier to understand. Basically it's how much the underlying, how similar the flows have to be in the masks that are found to what you'd expect from a mask that's simulated from the flows. So if you can see here the model predicted flows that look like this and we created these masks and then what we can do is go back and say, suppose the mask looks like this, what do the flows look like? And are they similar to the actual flows that we found? And if they aren't, then we throw out the cell because we think it's kind of maybe a bad cell. There's other things we can change too, this is the mask threshold. So this is how big the cell probability threshold is. And so each of these things, reducing each of these things will help you find more, if you're not finding, if you're finding too few cells, decreasing these things will increase the number of cells you find. So I move these bars all the way to the bottom and you can see that my mask threshold or cell prop threshold is minus six. My flow error threshold, there's no flow error threshold, sorry. If you move it all the way here and I found 225 cells compared to 213 without changing that bar. Okay, someone's asking, is cell pose Mac friendly? I think, I mean, before the M1 processors, it was, but now with the M1 processors, I think it's not cell pose, that's the problem, it's Pi QT. So the GUI might have issues, but you can still run things on the command line or you can run things in a Google collab notebook. And yeah, unfortunately that's kind of that's, I mean, it's a pro and con of using open source software, you're kind of reliant on the underlying packages being up to date for everyone's processor and everything in order to continue. And so that's something that happens a lot. We try not to pin versions because we try to get the latest version of everything. So sometimes there will be bugs as other people's update software that we use in our own GUI and our own software. Yeah, but generally speaking, if you have a Mac, then you can't use the GPU. So you probably do wanna run in a Google collab notebook anyway because then you can actually have GPU access. Yeah, and so another thing you can do, I can come back to the image. I can also, if I right click, I can start drawing an ROI. And then that ROI gets, yeah, it's drawn and the GUI is saying, okay, now I have 226 ROIs and that's saved to my second GUI file. And so I can go in and manually curate what cell posts found. Maybe I wanna throw out this thing. I don't really like this. I control click that throws out that mask and now I can draw a new one if I want. The nice thing about, I like about our drawing is you right click to start. And then basically it does depend on the brush size, this endpoint, but you see this little red circle. Basically, once I get back to that, you don't have to click anything. It just automatically finishes for you. So it makes it really easy to quickly draw ROIs. Yeah, so how much time do we have exactly? Ram, do we have an hour or an hour and a half? Yeah, we have a little five, so that's plenty of time actually. Okay, cool. So maybe I should show a running in the notebook too. Yeah, go ahead. Okay, and yeah, just another thing you can see the cell probability here. If you have a Z slice, you can see the Z. Let me load in a Z stiff Z. Sorry, I had 3D tip with Z for you to see. There you go, I've loaded it in. It found nine masks because I've run cell posts actually on this already. These are the, these are tiny little tips that we actually have in our test data that we use to test when we run our automated tests on GitHub for the package. And so this is what the segmentation looks like in 3D. And now we can, we can scroll through with the left and right arrow keys to see how things look. And I can run the network as well in 3D. It looks like things are set up correctly. Green is my, is my cytoplasm channel, red is my nuclear channel. So I can try to run this. And here I can see it writing here. So this is the default 3D model that runs in the GUI without the stitch threshold. So it runs in YX, it runs in YZ and it runs in XZ. So you'll see it doing each of these. It just ran in XY, sorry, Y, I mean it says YX and then it's running in YZ. Now in CX, there you go. And now we have our segmentation here for each of these masks. You can see. All right. So I'll show also how to run it in a notebook where you kind of need to, it helps to know a little bit of Python in order to do that. So we open here a, this is the notebook that you get when you click in, if you go to our repo and you click here, a code-based notebook, then it opens this notebook here. Now I'm gonna, I'm gonna install, I still need to make a new release later today. I'm sorry, I didn't get a chance to do that before the talk. I'm gonna install a previous version for now of CellPost. And this is kind of the first thing you do in the Colab Notebook when you zoom in. So now we're running in Google Colab. So Google Colab is online resources that Google provides for free, that comes with the GPU so that you can run CellPost. So this is good. Like maybe you've done some things in the GUI. You kind of know what things look like. You can run things on the GPU in Colab if you don't have your own GPU. And then you can also save the segmentations and load them. They save to your, you can save them to your Google Drive if you link your Google Drive in Colab. And then you can load them back in the GUI if you want to. You wanna go back for manual curation. Okay, I should say run anyway. One tricky thing is, is because of our PyTorch dependency. I believe that's why we have to do this restart. We have to click restart after this happens. So you'll see it will actually, it will prompt you to do this. Click restart. Yes. And then now everything will be installed correctly. Cause so basically Colab comes with a bunch of packages pre-installed and then we install our own packages here too. Okay. Arson, sorry. Yeah. Can you please magnify a bit? So we see the text. Oh yeah, sorry, I didn't realize that. Yeah, this is for the final video. Thank you. No, no, no, it's my bad. All right, cool. Okay, so then all right, people who aren't super familiar with using GPUs, you can see what MVCC version you have. NVIDIA SME is like your best friend. If you're in Linux, you're always just checking how much GPU memory you have and things like that. And here if I run NVIDIA SME, it tells me my graphics, my CUDA version. It tells me what GPU I have. I have a Tesla K80 and then how much memory I have and so on. All right. And so now we're going to, so there's a bunch of libraries I've imported here to make this run, this notebook run. NumPy is kind of the standard thing you probably know about if you know Python. And yeah, so I don't think most of the other things we rely on too much. That we're using scikit image to load in images because it's installed by default in Colab, but you can also use another image loader as well. Yeah, and Matplotlib is again, how we're going to make the plots. And then from CELPOS, we're going to import the core and we're going to check that we have our GPU. And this is so CELPOS runs a check and checks that you have a GPU available. And if you do, then it will use GPU. Can you, if someone's asking, let's see, what else? Yes, plenty of questions. Yeah, okay. I'll just keep, all right. This is just, all right. So we have this URL library just so we can download some, we're downloading some URLs from the CELPOS website. And that's where all the training, that's where all the data lives, like I was saying. And also some of our test tests that we use. And so we're downloading them into the notebook and then these are what these images look like. And then we're going to run CELPOS on these images. I got, so here, if you see, I got, sorry, let me go all the way up. I got four images from the website and the fourth one is actually a 3D image. So we're not going to run that yet. We're just going to run the 2D images. And so this first line here is taking just the 2D images. I take all of the values except the last one in my list and use those. And then I'm going to initialize my CELPOS model. And so this is the main CELPOS class. It's called CELPOS. And this is for the built-in models. So see, this is the main model, which combines size model and CELPOS model. So these built-in models have size models, like I was saying, to predict the size of objects. But the kind of the core class that does this, the CELPOS predictions is the CELPOS model that's also initialized with this. So if you're pre-training your own models and using them, you're probably going to interact with the CELPOS model class. All right. And so this has some inputs. You want to say whether or not you're going to run on the GPU, the model type, we're going to use the CITO model. And then basically those are the two kind of, there's other options here. If you want to specify a very specific GPU device, although most of the time you don't need to, then you can. And the default is to use PyTorch, not the original CELPOS code was in MXNet. Okay. And so we're going to initialize the model and that's this line here. And then what we need to do, one of the most important things is figuring out what channels to run on. And this is a little confusing in CELPOS and I apologize for that. But basically we have, we have four different options here basically. Gray scale, which is zero. Red, which is one. Green, which is two. And blue, which is three. And we have two channels that we're going to choose for. We have to choose the Cytoplasm channel and the Nucleus channel. And so for example, if you, particularly if you just have it, if you don't have a Nucleus channel, then you're going to set, you're going to set the second channel to zero no matter what. Or if you just have a gray scale image in general, then you're going to give it zero zero. And then if you have, for instance, if I have a green Cytoplasm and a blue Nucleus, then I give it two for green and then three for blue, which was the Nucleus channel. Then you have, if you have Cytoplasm and red Nucleus, then I give it two and one. And so we go back up to our images that I loaded in. Okay, it looks like this one has green and blue. And then this one is gray scale and this one's gray scale. So then we're going to say two and three, zero zero and zero zero. And so this is a list of channels that's the same length as our number of images. Often you'll probably have images with the same types of channels. So you can just set a single channels here. You don't have to make it a list. And it will say, okay, I know this should be used for all the images that I've used, best file like that. Okay, and now we're going to run our model. And so in, well, in Python in general, classes have these things called methods, which are functions of the classes. So I've made this model. And then when I call this model, I say model.eval, which runs the method of this class called eval, which is using this thing that I've made already, this cell pose class. And so it's going to run the Cytomodel we initial, we set and use various key, basically inputs here. So we mouse over this, do we get, there's lots of parameters here and they're all specified in the doc string. And kind of the important ones, diameter is, this is where you should probably specify your cells diameter. If you don't know it and you want cell pose to estimate it yourself, you can set it to none. The flow threshold, if it's set to none, that means that you're not going to do this QC step, I was saying, the throat cells that have mismatched flows, which we do have in the paper, the default is 0.4. And again, if you click on this, you should, hold on, you should be able to see the default values. See here, flow threshold, the default is 0.4. And so there's a lot of these different things. And then the really important thing is this channels. So you want to set the channels here. And now we're going to run this. And there's four outputs that come out of the cell pose eval. There's only three that come out of the cell pose model eval because that doesn't include the size model, but this includes the diameters that come out. So we have the masks, which are going to be a list of masks that it estimates for each of your images. The flows, which include an RGB picture of your flows and a few other things. The styles, which are kind of a compressed, they're a compressed version of kind of the center of the network at the bottom of, at the bottleneck of the network, which we use actually to estimate the size of the objects in the image, but you don't need to really worry about that. Oh, and a fun thing here is like, Colab is pretty cool. I can actually mouse over this and you can see what it estimated. It estimated a cell diameter of 29, 33 and 30. And then in my masks, I can see this is a list and the masks are different sizes because the images were different sizes. So, okay, let's see what happened. All right, so the model's downloaded and it ran each of the models. It was estimating, first it estimated the cell diameters and then it did the finding mask steps. Okay, and now we're going to display the results with this built-in function here. And yeah, and so you can see the predicted masks and the outlines for each of these images. And then you can save these outputs to a seg.mky file. So then if you connect your Google Drive, you'll, or even I can just go here and like, hold on, where is it? Sorry, I don't know what she says. Okay, and then I can see my seg files. I can just download them right here actually. But you would have to, you can also like upload. If you don't want to connect your Google Drive, you can also just upload the image you want to run. It won't be saved. So like this is a temporary session like Google will log out your session occasionally based on idleness and like runtime constraints. But you can upload things and then download things from here after you've segmented them, which is nice. So I definitely recommend this if you don't have a local GPU or if you don't need to do a lot of manual curation. Okay, and now the next thing is running cell phones in 3D. And so there's two ways. I've specified before. We find the flows in XY, YZ and XZ and combine them all together. And so that's the default version and the version we had in the best performing version we had in the paper, which is good if you have particularly samples that are not very anisotropic that the XY image looks the same in YZ. And that's what this data does in fact look like. So we're going to take our 3D image, which is the last image in that list. And we're going to run the 3D, we're going to run the 3D model. I've specified the channels as two and one because it's green and red. And then the diameter and do 3D is true. All right. And then this is basically has run the whole thing. And you might have seen this as much faster than running on my computer because this has a GPU. And then the other way to run it is to run it with the stitch threshold. And so now I'm going to specify instead of do 3D is false because I'm not doing the 3D version, the 3D flow version, instead I'm doing the stitch threshold. I turn on the stitch threshold. And so now what's going to happen is, and you can see this actually in the, it's red and a TIF having 75 planes and two channels and it's going to run through these 75 planes. And another thing, if you have lots of like channels access and Z axis, those are options in eval. So if I mouse over this it's not coming up. Like we have a channel access and Z axis, which you can specify if cell poses and understanding which ones are your planes and your channels. This is almost done. We can see our results and maybe we can compare them if we want to look and see this is the 3D flows version and then here are the results from the stitching. And yeah, so if your cells are pretty far apart too than the stitching, you don't really need the 3D contact. So it might make sense to use the stitching as well. But yeah, if things are tricky and 2D then it's definitely helpful to use 3D information to do the segmentation for, and people who are doing that might be more familiar with that. And that's it. So I guess we can get back to the questions. Yeah, great. Thank you. Thank you very much for this wonderful demo. Then yeah, there are a few questions about people wondering about file sizes. So can you run large data? How does this work? And also, do you need a lot of GPU memory for that? That's a really good question. Yeah, so the way, okay. So it's actually, if you have a really big 3D volume it actually ends up being CPU memory limited in some cases even. Like the flows are so big. So there's an example in the distributed folder in cell post, which allows you to run your images in batches, basically in patches along across the image. And in that way you can, it then does a stitching afterwards. It uses a library called Dask. And in that way you can run really large things. So that's a nice extension that Chris wrote contributed to cell post that people can use. Nice. Yeah, and if it's unclear how to use it, you can also ask, you can like tag him in the GitHub issues. He's like, it's pretty active on our GitHub, which is helpful for sure. So yeah. Good, good. The company was also asking about the cell diameter. If you have multiple diameters of cells, which one would you choose? And also related to that, how does the calibrate button work? Yeah, that's a really good question. So the calibrate button, so we basically we run the network once and we get the styles out from the inside. And we have a regression model that we've trained on our training data that can take those styles and predicts the diameter. And then we actually, we do a second step, which is why this is quite slow if you run this. So if you know the diameter, maybe you shouldn't do this, but it does a second step where it uses that diameter, it's estimated from the styles, runs the network again with that estimate and then he gets the masks out and then uses those masks, diameters as the estimated diameter. So it does this double loop and that gives us our prediction of the diameters you can see in the paper and the supplement is quite good from that approach. On our training, I mean, on our data, which is within our kind of the same types of images as our training data, our test images are quite similar, so it works well there, but it might not work on all cases for sure. So if you know the diameter, you should specify. And then if you have images with lots of different diameters in them, cell pose might not do as well. And it might be the case you need to train it a bit, but it really depends. Yeah, unfortunately in our data set, there aren't so many images with a diversity of cell sizes. There are some, but maybe not enough for it to overcome the bias that it is looking for objects that are of similar sizes. Thanks. Browsing through the questions. So many are also asking, can you run it with AMG GPU? So not from NVIDIA, I guess. That is a question for PyTorch. Not for me, I guess. So if PyTorch enables it, then we'll certainly try to support it. Unfortunately, if you have a choice, Python is still, and the MKL libraries that a lot of these packages rely on are still, are faster on Intel CPUs. There was even like a backend hack you could use on AMD processors that they discontinued in recent versions of Python. So generally all the code that I write is faster on Intel processors than on AMD. And all the PyTorch code on the CPU is faster on Intel CPUs than on AMD CPUs, unfortunately. But if you have a GPU, then it doesn't matter so much. Then you can use a GPU version and things will still be faster for you. But it's something to keep in mind if you're thinking about getting a computer Python is still pretty set with the Intel at the moment. Yeah, yeah. Okay, thanks. This was a question. So can you refine or retrain the model locally on your machine by using the correct mask? Okay, you've held it a little bit. Yeah, yeah. So you can, from the GUI, you can say save masks as tip. And then if you have a folder with images and those masks, which were just basically your image plus the underscore masks appended, the model will be able to train on that. You give it that directory and you give a dash dash train input. Oh, something I didn't show. You need to add then your training data, I guess, right? You have that somewhere. Exactly, yeah. So if you've done the annotation, you have a folder with your images and your underscore masks files, you can set that as your directory. And then say dash dash, you run Python M cell post dash dash train, dash dash stir and then your directory. And it will run the training on that. And another thing you can do, hold on, let me see this. I'll just share this in a second if it works. This is gonna be small because it's in my thing, but you can run, if you run someone nicely organized these for us to have a floor press, which is pretty cool. But if you run Python M cell post help, you'll see tons of arguments and all of the information on how these different arguments work. And so there's lots of options when you're running from the command line. And yes, someone was asking in terms of user friendliness. So we have made versions of EXCs in the past for cell posts, but we haven't recently updated them. So it is kind of unfortunate, but that you do kind of need to open. Basically in Windows, it consists of opening this Anaconda prompt and if you have an environment or maybe you don't if you set it up for someone that maybe isn't so Python friendly, I then from there can run my Python M cell post. So yeah, those are kind of, unfortunately that's currently the recommended way to do it. I know it's unfortunately not super user friendly, but it's hopefully a little more user friendly then. Yeah, just the command line. I guess, yeah. It's also asking, does the 3D segmentation work well with only cell membrane channel and not Python lesson? Probably depends on how it looks, right? Yeah, I don't see any reason why it wouldn't work so well because you should see the cell membrane still on many of your slices in X, Y and YZ. If it's not working so well, you can try the stitch threshold method instead of using the 2, 3D was true, the flows in all directions. Yeah, and still people are asking, my microscope is making a 50 gigabyte data sets can help us handle it? Well, I think it, do you have to, I mean, so if you run things on the command line like what I was kind of showing, like you can run Python and cell pose and put in a directory, it will run each of those images one at a time. And so if you have 50 gigs worth of images, it's only gonna process one of those at a time and then it shouldn't run out of memory. In this case, apparently it's a, okay, smaller 3D volume of 50 gigabytes include channels, but so that will just take a very, very long time, I think. Yeah, so you would, you could, I mean, if each of the volumes is separate, you can run each of those, again, like if you run on a folder, it'll, cell pose will run on each of those volumes separately if they're saved separately, and then you can do the stitching afterwards basically between those volumes. Yeah, again, Chris wrote as the person to ask about those things, not me, because I fortunately do not have volumes that big at the moment. Yeah, we certainly have more questions. There were two questions that I already asked, but maybe you want or need to comment on that. What are the options for integration with Fiji and also with QPath? I don't know very much about QPath, I've never used it. With Fiji, again, it's a question of you, you can run Python, you can run cell pose in the command line like I was showing, and you can say one of these options is save outlines, and that will save your outlines in ImageJ. So I can run Python and cell pose, dash dash dir, my directory, and then say dash dash save outlines, also put in your channel options and your diameter options, and then you'll get out of that, your folder will fill up with these dot text files that you can then load into Fiji. Unfortunately, I don't know Java at all, and so there's no current plan to integrate with ImageJ more fluidly. But there are wrappers for Fiji and QPath, which work relatively well. Okay, okay, cool. So yeah, I'll paste them at the links also. Okay, that would be great. Yeah, I think that's useful. People are asking things like, can you calculate parameters, like volume, area, et cetera? And so yeah, that's, if you load in the set, the MPY files, or you load in the TIFFs into Python or something like that, you can compute things like that. If people, like we could also, if there's like a pull request or something and someone wants to contribute it, we could also have an option to save some of those stats in the folder you're working in. But yeah, that's kind of, right now that that's the sort of post-processing we sort of expect kind of the user to do, depending on their use case. Yeah, so about the 3D, somebody is asking, sampling in 3D, does it need to be isotropic to data? It doesn't really, right? Yeah, so we have an option that can allow you to scale this anisotropy parameter, that's a flag. Again, that you can change if you have a different sampling rate in Z. And so that should, that will then run the flows with that changed with the volume kind of scaled differently so that the cells are the same size in all planes, and Z, Y, and XZ, and YZ, or NYX. So that will help, but if it's super anisotropic, then it's still like, if it doesn't even look like cells, if you look at the YZ slides, then it probably won't be great, and you probably just wanna use the stitch threshold. Yeah, okay, so you've also touched upon this one a little bit. So what is the best approach for 2D and time images? Assuming cells don't move too far between collective frames, can one use the stitching 3D? Yes, definitely, you can run, and yeah, so that in the part, you can run it in the plugin, in the GUI, you should be able to run it, but if you're just running cell posts, you can run in the command line, give it a directory of images, and say the stitch threshold, and it will run on that directory of images, and stitch them together. Keep in mind to do the stitching, it will load the whole thing into memory, so it can't be like super huge, or you need to have a decent amount of RAM to do that, because that's not, again, it's not like paralyzed in a way that, yeah, that maybe it should be perfect. Yeah, I think this can work well, probably an alternative would be to run TrackMate, so the newest version of TrackMate can just load the label maps, that come out of cell posts, and then you have proper tracking, actually. And I noticed this word to me. That's a good point, yeah, this is like the, I mean, this will give you something decent, but it's not guaranteed to be as good as what you'll get out of TrackMate. Yeah. Yeah, somebody's asking about, what information segmentation files include, do we have a pipeline to structure the analysis of these files? I mean, it depends on what you're doing post hoc, I guess. Yeah, so the information about what's saved in the MPY files is on the read the docs, and maybe I should have shown that, it's linked on the, it's linked on the... We'll put it in the chat. Yeah. Someone will put it in the chat. Here, I'll also put it there. So then you can see what exactly cell posts is outputting and what you wanna do with those things. We have officially two minutes still. So let's answer one more question, one or two more questions, and the rest can just be answered offline, and we will include it in a nice image.sqt. Do you see a nice one, Phil? I see one about bright field images, when segmenting cytoplasm, and if the cell edges aren't very clear, is there a way to emphasize boundaries? And again, I would say like this high pass filtering thing, you could run it on one of those images, and that will emphasize the boundaries better, is something you can try. And yeah, but it might be the case that cell posts still might need to be trained with some of your images if they're not really in our training set. But yeah, that's a current to-do that's in the near future, and it's including a model that's been trained on more of those images as well. Cool. And there's two people asking about the latest questions now, asking about cell provider. So cell provider is of course written in Python, so it should work together well. Yeah, I did not write that plug-in, so that's a question for the cell profiler people, but it was very cool that they wrote a plug-in for cell posts. And so I would ask them, sorry, I actually have not run it myself. Yeah, but the answer is actually, so the answer is yes, you can use it, right? There's a cell profiler plug-in that you can just download. Yes, it should work, but I don't know how to do any of that. All right. Thanks for linking it. Yeah, I think in a few of the time, we will stop. Unless there's any burning questions. Yeah, maybe the last question that you streamed in, how are cell posts and omnipotes related? So still a little bit about the future, omnipost. Oh yeah, so omnipost has a different license than cell posts, so we actually separated them into a separate repo so that it kind of makes a little more sense. And so you can pip install cell posts and then pip install omnipost, or pip install omnipost, and you'll get both of them. So the license, yeah, that's a, again, like any issues you have with omnipost, you can now ask on their GitHub directly. And Kevin will be probably faster to answer than I am, so. Okay, thank you very much, Carson, for this excellent webinar. I think it was great. Thank you so much for hosting me and organizing. Yeah, you're welcome. Yeah, so again, this webinar will be posted on YouTube and there will be a link to the image at C-Form, a big post answering all the questions. You will have to work a little bit on that, Carson, and I can assist you in that if you want. Thanks. Or we can assist you. No, no, this is great though. That's a great idea. And I think we will say goodbye to you all. Thank you very much for attending. And if you could take around a little bit. Sure. We will just ask everybody to leave.