 Okay, great. So, hello. Thanks for joining us at this webinar today. This is hopefully registered for it, so you know what to expect. This is going to be talking about analysis for single molecule localization microscopy. So, my name's Sean, I'm going to take you through the first half, which is the basic process of how we go from our raw microscopy data to localized molecule positions in our data sets. So, it's going to be a bit of a kind of whistle stop tour but these are all the things that I'm going to hopefully cover during today's during my part of this webinar. So we're just going to recap the basic principles of analysis and single molecule localization microscopy SMLM. We're going to talk about how we might need to change analysis depending on what our raw data set looks like. We're going to talk about how can you tell if the analysis is doing well or if it's going wrong, if there are artifacts, if there are errors. And we're going to talk a bit about how you can do localization in Z, so how you can get super resolution information in the axial direction. Some bits I might go a bit fast just so we can stick to time, but please do as Bram said at the beginning, use the Q&A window. We'll all be manning that and we'll try and get through as many of your questions as we can. Okay, so just a very brief recap of what is single molecule localization microscopy or SMLM. And I'm saying single molecule localization microscopy. I'm using this term to cover all techniques like palm, storm, DNA paint. So all the analysis that we talked about today will be applicable to the kind of whole family of those methods. Alright, so hopefully as you know, if you take your fluorescently labeled sample to a microscope, just a confocal or wide field microscope, you can get a diffraction limited image, which might look something a bit like this. Now this is a simulation. This is actually the Nubius logo simulated to see what it would look like under a fluorescence microscope. But it has the same basic properties. You have a resolution that's limited by diffraction to about 200 to 300 nanometers. So you have one image that's very, very complicated. It's got a lot of information in it. Okay, let's say we want to look at structures on a smaller length scale. We want to look at more detail than 200 to 300 nanometers. That's where single molecule localization microscopy comes in. We label our sample with a photo switchable or photoactivatable fluorophore. And by doing that, instead of acquiring one single image that's got all of the fluorescent information contained with it. We instead acquire a very long time series of lots of frames where each frame has very simple information. So we're spreading out our fluorophore mission over time to create a large series of very simple images. And if you have a look at what individual frames might look like, here's a couple of examples here. So should be able to see in these individual frames, you can actually, with your kind of naked eye, see single molecule fluorescence events. And because in these kind of simple images, it's much easier to analyze this, we can then very accurately locate the sense of these molecules with sub pixel accuracy. And the way we do this is with algorithms called detection and localization algorithms. So you can see we can convert each of our image frames into very high detail maps of where the molecules are. And then we, for every single frame in our data set, we localize every molecule within that frame very accurately. And what we can do is we can accumulate those localizations to make our super resolution image at the end, we can render our image. And that typically has a resolution about 10 times higher than the original diffraction limit, typically a resolution of tens of nanometers. So today I'm not going to be talking really about anything on this top row. I'm not going to be talking about how to label a sample, how to acquire data. We're just going to be talking about the analysis. So we're going to be starting from the point of you have your raw blinking data set and you want to get the localizations out of that. If you people that are kind of watching along, if you don't actually have any data of your own, don't worry, you can still do all the things that we're doing in the webinar. All of the data sets that I'm using are freely available and you can download them from links that we've provided so you can actually try this out yourself on data that's already available. Okay, so we're going to be focusing on this process down at the bottom here detection and localization. So what do I mean when I say detection and localization. Let's imagine we've got a frame of data and you want to do this thing want to find that precisely the locations of each individual fluorescent molecule. The first thing we need to do, and this is actual real experimental data. This is a simulation now. The first thing you want to do is you want to kind of a rough estimate of what parts of the image look like a fluorescent molecule. And that's called detection. This is very much a kind of rough guess of which bits look like a molecule. And so the detection part of the algorithm will highlight areas roughly that look like a fluorescent blob basically. Once we've kind of delineated these regions, once we've identified candidate for four regions, the algorithm then moves on to the second part, which is localization, which is the very, very accurate localization at the center of each of those molecules. So if we kind of zoom in in one of these detected regions, it might look something like this. This is just one of those individual fluorescent molecules. The most common method for finding the center of this molecule very accurately is a two dimensional Gaussian fit. Now, it's kind of difficult to show you down the slide. So what I'm just going to do just to illustrate this is put out a one dimensional fit to each of the X and Y axes of this molecule. You can see you can fit these, even though we haven't got many pixels, we can fit this very nicely with Gaussian functions. And we can use the parameter of these fits to find the real peak of this Gaussian real center, which corresponds to central location of that molecule. And so for every frame, what you get out is a list of molecular coordinates. And this is a really important point that I kind of want to make as clear as possible. For a single molecule localization microscopy, you go from a set of frames, a second set of raw images to a list of molecule coordinates, you're not going from one image to another image. You're extracting a list of coordinates of every molecule. And this is a really rich source of information, which of course you can render as an image as well. So there are loads of different software packages that do this detection localization type analysis for you. You can use one particular one as an example for the webinar today. And there are many, many ones available, but I'm just going to use one today called thunderstorm and again, don't let the name put you off it's not just for storm data, it can use any single molecule localization data. So here's the reference for the paper that was accompanying the software release. I'm not sure if it's self hasn't got a ton of stuff, but what's really brilliant is the supplementary data of this paper. If you want to find out any of the maths of any of the things that the algorithm is doing, then drive into that it's really, really good and it's also got a lovely user manual. And kind of full disclosure, I am nothing to do with thunderstorm. I just really like it. So some of the reasons I really recommend using thunderstorm especially if you're new to single molecule localization microscopy analysis. And it's all graphically user face user interface based. And this means that you don't have to worry about importing your data sets in any kind of complicated ways and just how you would normally import data into Fiji. And you don't need to know how to script or code or anything in order to use that and do the analysis. The algorithm has the functionality to do both two dimensional three dimensional localizations, and it has options for high density localizations. And don't worry what that means I'll come on to that in the next slide, that to say that it can handle all different types of data, basically. The output which is this list of coordinates is really nice to manipulate in a GUI and really easy to export. So it generates protocol files that actually to keep a record of what parameters you use to do a data analysis, which is really important for reproducibility. And also when you're presenting a research and writing up your methods. So, I said earlier that the top of the slide that this is one of a great many different software packages that are out there. Daniel, one of our moderators and his team, he's waving back and have published a couple of super resolution localization microscopy challenges, which is basically assessing the performance of all these different algorithms on some standardized data sets. The most recent version of this challenge super resolution fight club. Don't talk about it. It's available at this reference here and it's also linked in the resources and the whole website for that paper and associated challenge is really cool and really interesting for open source data. And also for seeing what other algorithms are available other than thunderstorm. But as I said today we're going to use thunderstorm and if you're interested on how to use it and how to get it. This is the easiest way. So you go to the health menu update manage update sites and thunderstorm is available from the whole by lab update site. So if you tick the box that next to whole by lab apologies if I butchered some pronunciation there. And update site restart Fiji, then it should be there when you open it. Okay, so that's the software that I'm going to be mainly working with today. Before I start analyzing some data, what I'm just going to tell you about is something called data density. And this is something I alluded to briefly on the previous slide but it's something that we kind of need to understand in order to know what analysis to do. And again, these data sets that I'm showing you are available from the single molecule localization microscopy challenge website that I spoke about in the last slide. All downloadable with all the metadata, etc. So, first type of data set we're going to look at as a low density data set also known as sparse data and low density data sets look like this, and you can very clearly see individual fluorescent molecules turning on and off. You don't see many overlapping molecules, and there's a lot of space between your own molecules. So that's kind of one type of data will look up. The other type of data will look is high density data. Now, as you can see, in this type of data, you can see that there are still blinking events you can still see that molecules are clearly turning on and off with their fluorescence. But there are many more fluorescent molecules present per frame, and you can see that they're definitely being to overlap a lot more. We don't have time today to talk about how you get these different types of data, why they might occur when it's beneficial when it's not. All that say, we're just going to talk about if you have low density data, this is what it looks like high density data looks like this. And these are the different ways we need to approach the analysis. Okay, so now is when it's going to get fun because I'm going to be going to go rogue and enter and leave PowerPoint and enter a demonstration. So this is where anything can happen. Okay. And my face is currently covering my data. So I'm going to start off doing some low density data analysis. Here we go. So this is that low density data set. I'm going to try and drop the tiff file in Fiji. And if I scroll through, you can see indeed it's got this low density blinking behavior. So I'm going to talk you through how I would analyze this data set using thunderstorm. This is exactly how I would analyze any of my own data. There are no tips and tricks that I'm going to miss out or skim over. I'm going to try and be as complete as possible. So I'm going to my Fiji plugins menu I've already installed thunderstorm, and I'm going to go to run analysis. And before I do anything else, just to make sure that I'm starting kind of fresh, I'm going to click on default, it's going to refer to default settings. So this is the main analysis graphical use interface for thunderstorm. And you can see it's broken up in several panels, which I'll quickly go through one by one. And even though I'm using thunderstorm as my example software, any software you would use should have very similar features. So understanding basically what each of these things is doing should be directly translatable to a lot of other algorithms if you do want to use those. So first thing I want to do is tell the algorithm how what properties of my camera work. The thunderstorm in particular works in real physical units so nanometers and photons rather than pixels and arbitrary intensity units. So hopefully, if you've acquired data you should know what your camera pixel camera pixel size was it's 100 nanometers this data set. And if you're using an emccd camera, you should also know if you had the gain on em gain on and what gain you applied. Middle parameters are a bit more annoying. These are related to the specific camera that you use to acquire the data. Now these are properties that you should be able to find for your microscope for your camera that you used. So you can try asking your facility manager asking those in charge of your microscope or looking at the data sheets yourself. They do often go by different names and they can be a little tricky to find which is quite annoying. If there are any microscope companies or camera manufacturers watching and you specifically advertise your products for single molecule localization microscopy. I'd really kind of plead with you to make this kind of information front and center on data sheets for common algorithms that are used to analyze the data so it kind of can help out your users a bit more. If you don't know this it's not the end of the world it will slightly limit how we do the analysis but I'll come into that in a moment so I'm just going to leave this as it is. Okay, the next step is image filtering and this is basically smoothing our image and enhancing the bright parts to make detection easier to make it easier to find the molecules. There are lots of options you can explore them by using these nice friendly blue question mark buttons, which are really nice. I'm going to leave this as default. Not because I'm being lazy, I promise, but because I've never had a reason to change it. This filter has always worked with pretty much every data set and ever analyzed with thunderstorm. The next panel is approximate localization of molecules that's the detection phase. Again, I'm leaving that as a default for exactly the same reason, not because I'm being lazy but because it's very robust and it does tend to work. So, if you're kind of slightly skeptical of my default loving habits, what you can do is actually check how these two panels are performing by pressing the preview button at the bottom. What that'll give you is a preview of the currently selected selected frame after it's been filtered in this image here, and it will also give you a preview of where it thinks molecules are the detections. So these little red crosses appearing on this frame here, and it's doing a pretty nice job. So I'm confident in my faith in default parameters in this case. I'm just going to exit all of these things that have popped up. Okay. So the next panel is localization, so finding the center of the molecule from the each of those detections. Again, there are different methods. I'm going to stick to the default, which is like we're talking about, fitting a two dimensional Gaussian function. The only thing I am going to change in the whole of the defaults is the fitting method. Now the default fitting method is something called maximum likelihood estimation. And this needs to know what the camera parameters are because it is based on photon statistics. It models the photon statistics in order to fit the point spread function successfully. If you don't know your camera parameters, this won't work properly. In which case you should use weighted least squares. Okay, and that's what I'm going to use here. Weighted least squares is very slightly less accurate than maximum likelihood estimation, but in my opinion, the payoff in accuracy. It's very, very marginal. You're not going to lose tens of nanometers or even single nanometers really of accuracy from doing this. Weighted least squares is also quicker. If you do know your camera parameters, you've got you're certain that those are right, please do use maximum likelihood estimation, but weighted least squared is a bit more robust if you're not sure of your parameters. Okay, the fitting radius and the initial sigma fitting radius just telling you how far out the fits going to go from each detection center so little patch that'll be seven by seven pixels. And this is just to initialize the fitter. And again, if you'd like to sample your data, you shouldn't really need to change these from the defaults. This box multi meta fitting analysis, I'm going to leave until later as a little bit of a mystery. And then that's all localization. So we're basically set up to run the analysis. It's going to plot the localizations as it finds them into an image for us. And it's going to put them on a five times upscaled grid because remember we're gaining resolution. So we're going to need to plot our localizations with a smaller pixel size. So our original pixel size is 100 nanometers. We're going to be plotting out our found detections on a five times up sampled image with 29 meter pixel sizes. And it's going to update our preview every 50 frames. So fingers crossed, let's go. Okay, so you can see it's going through these 10,000 frames that I've got here and it's plotting out the localizations as it finds them. You can see again we've got the larger image from the five times magnification got 600 by 600 pixels, whereas before we had 120 by 120 pixels. If you start off your thunderstorm analysis and you think, oh no, this is the wrong data set. Why am I analyzing it? You can press escape and that will actually quit the analysis partway through and just give you the results up to the point at which you stopped. So this is the awkward moment where I just wait for the last couple of hundred frames to finish off. I'm not sure what to say in this period of time, other than this image is going to look really great by the time we are finished with it. So your your infantry, I promise. This isn't awkward, not at all. We're nearly there, I promise. I'm not going to stop it prematurely because that will clearly do terrible things downstream and then we'll just get more of my terrible improvisation, which is something that I don't think any of us are physically or mentally prepared for in the middle of a virus pandemic. Okay, it's nearly done. As this image is forming, you might also notice that some things are a bit suspicious with it. Don't worry, they're suspicious on purpose. Okay, just if anything, it feels like it's going slower, it feels like it's trolling me. In every rehearsal I've done of this, it's zoomed through a rate of knots. We've got a hundred frames left. This is excruciating and we're done. Okay, thank goodness that's over. We can all relax. So you can see that we've popped up with our particles table, which is basically a list of all the molecules that are within our data set that's found. And what I'm going to do is I'm actually going to knit back to the PowerPoint presentation and show you what each of these columns means because this is quite important for understanding our data. Okay, so here's our particles table and what are kind of really important columns here. So we've got our XY coordinates for every single molecule in the image. Each row is a different individual molecule that's been localized. So this is the center of that molecule. So this is the peak of that Gaussian fit. The next column is Sigma, which is basically proportional to the width of the Gaussian that was fitted to that molecule. And so kind of encapsulate its proportional to the area covered by your molecule. So these are our intensity offset and standard deviation background. And these are basically the intensity within that detection and within that detection region and the amplitude offset of that. Chi squared is just a goodness of fit measure that, to be honest, you don't need to worry too much about, but the last column is really important. The last column is uncertainty XY. And this is basically the error bar on your localization. So this is how confident you can be that the molecule is positioned where it says it is. A small uncertainty means you've got high confidence, a large uncertainty means you're less sure where the center is. And the uncertainty is calculated by this slightly clunky equation that's just the bottom. Kind of one of the important insight lots of different parameters from this fitting is used to calculate the uncertainty. And a really important one say is this N, which is on the bottom here, and N is the number of photons that were present during that link event during the within that detection window. And so if you have a very high signal to noise ratio if you've got lots of photons, then you'll have a smaller uncertainty and that's more confidence in your fitting. So that's kind of an interesting column to be aware of. Okay, so now that we know what our particles table is, let's have a bit more of a look at this window here. So you'll see that these, this is an image of microchubules and you'll see that something's gone a bit funny, right? There's everything looks a bit fuzzy. And that's because I was a little bit sneaky. I downloaded a low density data set and then I synthetically applied some drift to show you what would happen if during acquisition there'd be some microscope drift. Now, this is really common for a density data set. So any data set, we've got lots of frames. If you look in the tabs down here at the bottom of thunderstorm, we have options for correcting that drift. Now, here, we do, you can use, if you have fiducial markers, for example, fluorescent beads in your sample, we have one here, we can use this. But I'm going to show you how it works if you have no fiducial markers, because you can still correct the drift. And this is a process called cross correlation. And what it does is it splits up your localizations in several chunks in time, and then calculates how far you need to move them across each other and order them for them to realign nicely. And it can then apply that afterwards. So just to be aware, there is a slight bug here. I don't, I'm not a developer, I didn't develop the software and not sure where it comes from. Sometimes it objects to the number of bins that you put in. For example, the default here is five. If I press apply, I get this weird exception. Fear not, it can be fixed. Don't worry. The way that I fix that is by increasing the bins, for example, to eight, press apply. And he can see that it's estimated for a lot over time, the amount of framing drift in X and Y. And it's basically rigid, it's reshifted all the localizations based on that calibration curve. And you can see that's got rid of all the kind of yucky drift in the image. Okay, and you can see down the bottom here, post processing history, drift correction tick, so it keeps track of what you've done. I'm not going to go over these tabs down here, the merging, remove duplicates and density filter, partly because there's not going to be enough time, but also because it's kind of an advanced level of storming. This is for things where, for example, you want to correct for the fact that a molecule has stayed on for several frames in a row and you want to collapse that back to one single localization. If you need to do that, then please, please, please read the manual and the supplementary information so you know exactly what's going on there. Because if you then want to do quantitative analysis down the line, which Roy and I'll talk about, it's important that you know if you might have over counting of molecules in your sample, for example. So there's nothing wrong with doing things down here, but you need to know what you're doing in order for it to make sense later down the line. I am going to talk about the filter though, because this can be really useful. So, one very cool feature of thunderstorm, and this is something that Bram introduced me to a couple of years ago, despite the fact that I used it for years, I've never noticed this button before. And this is plot histogram. Histograms are fun. And what this can do is actually share distribution of any of these columns in your data. That sounds a bit weird. What do I mean by this? So let's say we select uncertainty. What this will do is basically plot a histogram of the uncertainties of all of our localizations, which look a bit like this. So you'll see that the mean uncertainty in our localizations is about 12 nanometers. So that's quite a small error bar. That's good. But you can see that there are quite a few molecules that have large uncertainties. So that's why the fit maybe isn't as good, and we're not as sure that those molecules have been localized correctly. If you have the square or rectangle tool selected in Fiji, what you can actually do is say, okay, let's say I want to build my final image or I only want to analyze these uncertainties, the ones that have got kind of small uncertainty, we've got high confidence. So you can draw this round here, click apply ROI filter, and it's appeared in this text field down here. I can then press apply and it'll filter out those higher uncertainty detections. You can also type straight into this field. For example, if you had lots of kind of saturation in your first frames, you could then say for example ampersand frame, greater than 10. So all the frames after frame 10. And it would then also combine that in the filtering. Okay, so we've got our beautiful image. We're really happy with that. We're ready to publish our nature paper. How do we export our data? The first thing you need to export is the particles table itself. This is your raw data, as it were. So in Thunderstorm we go to export, and you can export it as a CSV file with all of the columns and always tick this button or the protocol button. This is what will generate a text file that tells you what parameters you use for your analysis. So I hope that's okay here. I'm going to click on my desktop. I've got my CSV file and this file, and it says you can see it's got all my camera settings that I applied originally. It tells me how I did my detection, my localization, and it also tells me what post processing I've applied, so what drift correction, what filtering. So that's really, really important. That's really important to say that any time you do any kind of localization. Also, because, you know, when microscopists do like images, they'll want to visualize it, which can do this visualization tab down here. And this basically method here tells you how each localization is rendered onto this up sampled grid, which is five times magnified. The three kind of best ones are normalized Gaussian and average shifted histograms and normalized Gaussian will plot a tiny little Gaussian centered on each localization and the signal of that Gaussian will be the uncertainty in the fit. That's good, but it takes a bit of time. The average shifted histograms, which does something very similar, but is computationally much faster. So I'm going to do that. And that's basically what's already there. Something that I personally am really guilty of apologies for the noise my cat is scratching something. If you something I'm guilty of is applying jazzy lookup tables to my data. This is something we all do we all like things to look fancy right. And if you do do it, make sure you're really careful with your choice of lookup table. I know that grayscale looks boring, but it's perceptually uniform. You're not enhancing or suppressing regions of your image in a nonlinear way. We've all been there. We've all said, oh, you know what would make this look really cool. Red hot. This is not sexually uniform. The bright bits look bright to the dark bits look darker, even though we haven't changed the contrast of our image in a kind of brightness and contrast window. If you insist on using lookup table. There are sexually uniform months available. I've linked some in the resources. This one here, for example, is a perceptually uniform version of red hop. So if you really want a jazzy lookup table, at least use a perceptually uniform jazzy lookup table. Okay, cool. So that's our low density data. So next up was to use again high density data. So why can't we just what happens if I just try and use this exact same analysis pathway to analyze some high density data. Let me just let the cat out. I'm so sorry about this. That cat has been comatose for at least two hours and now she's like high density data. I am out of here. Oh dear. Okay. Apologies. So here are two raw datasets are low density and our high density datasets. And what I'm going to do is show you how this looks along the different phases of reconstruction in the two cases. So if I analyze these two datasets in exactly the same way that I just showed you. So if we analyze our low density dataset with understanding the way I just told you, as we saw when I did the demo, the detections are pretty high fidelity. You get a nice one little cross in the middle of each little fluorescent blob. Lovely. However, if you have a look at the detections for the higher density dataset, you'll see they're starting to run into problems. For example, there are some molecules that have been missed. And there are some molecules where there are lots of fluorescent molecules overlapping, but you've just got one detection in the middle of quite a big blob. And that's a bit, you know, suspicious. Okay, you might think, well, what's the worst that can happen? Let's go through and do the localizations and render the image. So this is what the rendered images look like for these two datasets. You can see our low density rendered image has quite uniform intensity. You can see complete structures. You can see closely separated structures. However, a higher density dataset has run into some problems. So you can see that some parts are quite bright, some parts are quite dim. And you can also see that the structures themselves have become slightly corrupted. For example, you can see there's some, it looks like some structures have merged together here. And it looks like some structures just haven't appeared properly in the image here. So that's one, those are two signs that your analysis has gone a bit wrong and that you've got higher density data than your algorithm can deal with. Another way is to actually use that plot histogram function. So if you plot the widths, the sigmas of the fitted Gaussians for low density data, you'll see you get quite a narrow distribution. That's because all the Gaussians being fitted with about the point spread function of the microscope. However, if you do exactly the same thing for the high density data, you'll see you get this long tail, you're fitting some really big fat Gaussians. And the algorithm has detected, for example, the middle of a group of molecules and try to fit one big chunky Gaussian to all of those. So those are kind of little indicators that maybe some things are going a bit right with your analysis, especially for higher density data. So how do we deal with this in Thunderstorm? Option number one is filter out bad protections, just chuck them out. Let's say we've got a region looks like this. This would be a successful detection. There's one molecule there, and it's got a nice thin Gaussian fit. However, Thunderstorm would also detect this as one single molecule, even though it's quite clearly more than one. The Gaussian fit here would be this big kind of fat Gaussian. The only thing we could do is we could just chuck those localizations in the bin and say, look, that's not right. Let's just not include them in the final image. The other option is something called multi-meter fitting. And so this allows you, what we just did, what I just showed you, was for each detected region, Thunderstorm tries to fit one single Gaussian, one single two-dimensional Gaussian. This allows you to try and fit several Gaussians to that one region, for example, a couple like this. So let's try that out. I'm going to close all these. And we're going to open up our high density data set. Okay. So here we have a high density data set. So all I'm going to begin with is go to Thunderstorm, run analysis. I'm not going to change anything at all. I'm going to analyze it in exactly the same way as we analyzed our low density data. So you can see that the image is building up. And this is what I just showed you in the slide. And you can see immediately we've got real problems in this part of the image. We've got problems over here. Every time there's quite a few crossing structures, we're getting sad times. So let's see what option one would do. Let's just try and get rid of our fat Gaussians. So I'm going to clear this out. I'm going to plot histogram, and I'm going to plot my sigmas. Okay. And so again, this is the histogram I just showed you. Let's say we only want to keep this half of the histogram. So we'll say these are probably where it fitted one Gaussian to one molecule. So I'm going to go back and see the image. I'm going to press apply. Okay, so it's cleaned up quite a bit. You can now see that there are, there's less kind of fuzz in this part of the image. But you can also see we've lost quite a lot of information. And if we zoom in, we've got holes in these microtubules. So that's not ideal. And this is something which is good, but computationally intensive. So what I'm going to do to avoid any more awkward kind of waiting around for things to run. I'm just going to duplicate a really small bit out the image which corresponds to this area here. And the way I do multi-meter fitting is the same as before, but just to tick this box here. And it's now going to try and fit up to five Gaussians for each detection. So it's just starting now. And you can see this is already quite slow, even for a really small part of the image. It's running pretty slowly. But we are actually getting better results. But we're getting to see these three phenomena appearing without having to filter out bad detections on the fly. Whoa! Loud, unnecessary. So I pressed escape there to stop the analysis. Apologies if I was deaf and only everyone with my kind of windows and green noise. I just pressed escape because otherwise that's going to take a really long time. But just to show you what the full results of multi-meter fitting were compared to just normal. This here is the same as the no density analysis with no filtering. This is then filtering out fat Gaussians. And this is multi-meter fitting. So if I switch between the two, you can see what a difference it makes for high dense data to do multi-meter fitting, for example. That's the last thing I'm going to show you in thunderstorm. Before I leave thunderstorm, I just want to point out that our lovely moderator Bram and his colleague Rolf have gone to a lot of effort to make some really nice, and especially a lot of effort to get it ready for today. So macros and plugins that allow you to batch processing with thunderstorm. Again, this is linked to in the resources, but it's some really useful tools. And now hopefully, you know, if you feel confident in knowing how to use thunderstorm and you can race through all your data sets efficiently and confidently with these tools. Okay, so high density data in general is quite a problem. And spoiler alert thunderstorm isn't actually that great at dealing with it. So this is a plot from the single molecule localization microscopy challenge. These are different algorithms run on one of the benchmarking high density data sets. And you can see that thunderstorm, poor thunderstorms quite near the bottom here. It's not particularly great at high density data in general. If you have high density data, you might want to try and go towards another algorithm. A word of caution. I tried to download at least five of these completely unsuccessfully and couldn't install them and get them running. Lots of these are very brilliant, but the user friendly level is quite a little bit further down of the better performing ones peak fit is available in Fiji. And again, if you go to the single molecule localization challenge website, they'll get download links to all of these algorithms should use to try them. And P SMLM, which is phase SMLM is actually available within thunderstorm in the drop down localization methods. And part of the GUI. And I haven't demonstrated it because it's not something I personally used regularly in the past. But if you're going to be doing a lot of high density analysis, then I strongly suggest maybe reading through the paper and giving that a go. Other options for high density data sets are actually pre processing the data. So one new method, which is really promising for this is Hawk, which is the wavelet filtering method. And it was published a couple of years ago in nature methods from Richard Marsh and Susan Cox over at King's College London. And it's available as a Fiji plugin. Hooray, we love Fiji plugins. So what this does is you give it a high density data set and Hawk makes it look more like a sparse data set. So for example, your original data set has frames like this, you run it through Hawk, you get frames like this, and then you can run a lower density algorithm on top of that. And what's quite nice about Hawk is that if it goes wrong, it goes wrong in such a way that it collapses to a low resolution version of the image rather than a fake structures or weird artificial sharpening. If it fails, as Susan says in her talks, it fails noisily, you know if it goes wrong. Very briefly, another option if you've got very high density data, and this is more going towards fluid force that don't really blink at all. There are non particles table approaches that convert your data stack straight into an image. These are things like Sophie and surf full disclosure. I'm one of the authors on surf. This is how surf performs on that high density data set. Again, Fiji plugin. I'm not going to go into too much detail on that here or any detail actually, but just so you're aware. If you don't need a particles table, if you don't need to know the locations of each of your molecules, and you've got very high density data, you might want to give one of these a shot. Okay, so we've talked a lot about how to reconstruct our images. How do you know if you've done a bad reconstruction? How do you know if your image is very sad and should be thrown straight in the bin? Single molecular localization analysis can produce some weird results. This isn't an exhaustive list, but these are some examples. So firstly, closely separated structures can collapse into one structures called merging or artificial sharpening. So here's an example here. I simulated these kind of hairpin structures at high density and ran this three thunderstorm and you can see all quite a nice result. But what's happening as these two kind of parts of the hairpin are converging, they're actually collapsing into one bit instead of saying two separate entities. So you start to lose this separation before you'd expect this is happening. It is not just limited by the resolution, the uncertainty of the localizations. It's actually an artifact where instead of picking up two, it's detecting one in the middle. So that's bad. You can also lose bits of your structures. So for example, this is the low density data set. And I did some filtering to get rid of bad detections around a fiduciary marker. And you can see I've got holes in microchipules. So that's not good. And also you can get intensity non-linearities. So for example, even though you structures are meant to have uniform intensity, you can see that these parts are much brighter than these parts. So these are some artifacts that you can get in your images. I was kind of worried about these. And so over the last couple of years or so, I've been working on an algorithm called Squirrel. And this is a method for assessing the quality of super resolution data, not just single molecule localization data. The reference is down here. And essentially, again, not going to have time to go into this in detail, but what it does is you provide a reference image, which is a high quality, wide field or confocal image of the same region of interest that you've imaged in super resolution. And you basically use that as a gold standard for your super resolution reconstruction, you compare the two and bits where the images don't match up. Those are probably due to errors in your super resolution procedure. It's available as download by the Fiji update sites, same as Thundestorm. You need to check both the NanoJ core and the NanoJ Squirrel boxes. I found that there are a couple of, it uses the GPU, some graphics cards get really, really angry and crash Fiji, if it's not compatible. So what I'm actually going to be demoing on is this little mini release, which is GPU independent Squirrel. And again, this feels safer on my laptop during a webinar than trying to make the GPU doing something mad. So a very quick demonstration of Squirrel. So what I'm going to show you is Squirrel running on these two reconstructions. And we might ask the question, okay, I've got a data set. I've reconstructed it twice with slightly different parameters. Which image is better and where is the image better. So I've got my super resolution reconstructions. And I've also got my reference image. In this case, this is just an average of the raw data frames for this data set to make a kind of quasi wide field image. So I'm going to go to plugins, Squirrel, no GPU, calculate error map. And all you need to do is just provide links to these two images. So my reference image is this one. My super resolution reconstructions are these. Press OK. There we go. It's always a relief on your own software works and live demo. So what you get out of Squirrel are images like this. So you get a quality metric. So this is the RSP. This is basically similar to the Pearson's correlation coefficient where a value close to one indicates a better image. So this is saying that the second image in our two super resolution images is a higher quality than the first one. And that makes sense. You can kind of see that visually poor quality, better quality. The other thing you can get with Squirrel is an error map which highlights areas in your super resolution image, which might be the one of the better word dodgy. In both cases, you can see that there's this very bright region here that's being highlighted. And if I go back, you can see actually, yeah, there's something weird happening in this part of the image. Very much so here, slightly less so here, but something untoward has happened in the localization process. That's Squirrel has loads more capabilities like mapping resolution, etc. I don't have time to go through it as I said, but please do have a look at the paper and the documentation if you're interested in using Squirrel because it'd be really nice to kind of validate your reconstructions. And very finally, what happens if you're trying to localize in 3D in three dimensions. Basically, you can't really get axial localizations unless your microscope has specific optics to encode additional information into the point spread function. What do I mean by this? Well, normally your point spread function is symmetrical above and below the focal plane, which makes it really difficult for an algorithm to tell whether a localization. That's out of focus is out of focus above or out of focus below. If you use a microscope that has a cylindrical lens or two cameras that slightly defocused, or a phase mask that puts in a double helix point spread function, you can encode the Z position into the point spread function of your microscope. So for example, the stigmatism, the degree of ellipticity so that overallness changes with the defocus with Z. And you need to have calibration data for your own microscope and calibration data is normally, for example, an image stack of beads at different Z intervals. And lots of different algorithms can do this. And again, from Daniels paper, comparing them all. I'm going to very quickly show you thunderstorm running on stigmatism data. So with the cylindrical lens, because this is pretty much what you'll need to do for any 3D data, you'll first need to calibrate and then apply that to your localizations. So what we have here is this is a simulated data set of some micro tribules and it's simulated as if it were imaged with a cylindrical lens. So you've got some elliptical point spread functions. And this is real data of a Z stack of beads imaged with that same cylindrical lens. The first thing you need to do is actually make sure that thunderstorm knows how to associate the shape of the point spread function, i.e. how elliptical it is with where it is in Z. So again, you have the camera set up. It's all very similar to what it looks like in our standard thunderstorm analysis. Apart from this time we're going to fit elliptical Gaussian function. So one allows an extension. And I'm also telling it that in this calibration data images are every 10 nanometers plus or minus 750 nanometers from the focal plane. And I'm going to tell it to save a calibration file so that I can use that later to the desktop. So I'm just going to call that astig calibration. That's okay. And so this is with this image selected my calibration beads data set. And that should be running through lovely. More bit more awkward waiting, even more awkward waiting. The suspense is killing me as well. Don't worry. So this is doing fit to each of these beads and working out what fit parameters are associated with what position in Z. So then whenever it finds a point spread function of that same shape in the real data, it can work out from the shape of the point spread function where it is in Z. So you get a kind of slightly mannequin calibration curve. And I've got my calibration file saved here. So to apply that to my data, I now have my data set selected. I'm going to go to Thunderstorm analysis. And what I'm going to change is I'm now going to say elliptical Gaussian 3D stigmatism. I'm not going to do multi fit analysis because that will break my computer and this feed. And I'm going to point it to the calibration file that I just made. It's 3D. And so I'm going to tell it to do this. And what I'm getting now is for different Z locations, it's binning the localizations according to Z. So these are localizations that were within the range by minus 500 to minus 400 nanometers, and then 100 nanometers up 100 nanometers up, and you can change this binning however you want in Thunderstorm. And now our particles table has a column for the Z location and also the uncertainty in Z. And then this is basically every 100 nanometers, it's split the localizations up according to the Z position. Again, you can display that as a montage and your paper, etc. in Z. And you can also, for example, go to Hypestax temporal color code and make kind of merged image where it's color coded by the position in Z. Okay, so in terms of 3D analysis advice, I don't have much more. Much like my personality my kind of limited to 2D in my experience. One which performs really well across the board for 3D analysis is SMAP. And this is one that I managed to successfully install which is good news so it's an installable piece of software. Here's the graphical user interface, it's MATLAB based but you don't need to download MATLAB, it also runs as a standalone. The interesting kind of side note in this is that if you have calibration data for a microscope that hasn't got additional optics 3D, you can still extract 3D information just from a standard microscope. I'm not going to go into that now but I think that'll be something really interesting in the future. The fitting in this is so good it can actually work out asymmetries in the point spread function even if you don't have, for example, a cylindrical lens. That's pretty much all I've got apart from some very general advice. Get familiar with one piece of software and test it on small amounts of data first, make sure you know what the parameters are doing, make sure you understand what's going on. And really importantly, if you're putting this in a paper and a report in a thesis, however you're presenting your data, write up what you did really carefully. What algorithm did you use, cite that algorithm, say what version you used of it, what parameters did you use, how did you render your images, did you apply any filtering etc etc. You really can't put too much detail in this part and it's just really good practice and I think it's something that is quite under reported. That's all I've got in my section. I forgot to put a timer on so I have no idea quite how perfectly I ran over time. So I don't know if there's time for some verbal Q&A now or not. There are a few questions. One question, two questions actually are talking about low density and high density data. So is there a cut-off, where's the cut-off between low and high density and is there a problem at some point to use the multi-meter fitter in those data sets? In other words, is there a downside of using a multi-meter fitter besides the time that it takes because it's much slower? Yeah, so with regards to high and low density, it's really not a clear cut-off. I can't look at a data set and say that's a high density data set or that's a low density data set. In reality, unfortunately, your data sets will often probably contain some regions which are high density, some regions which are low density, you often even get a mixture within the same image. My advice with analysing, unless it's very clearly very overlapping, always start with the low density analysis and then go through some of the tests which we looked at, looking at the distribution of the sigmas, looking at how well the detections are performing to see how wrong you really are going. And then as if you're seeing some benefits to using the multi-meter fitting, then try that. There will be a point when multi-meter fitting stops working. For example, if we think about your limit of you can't just feed a time series of solid GFP data into thunderstorm, say, do multi-meter fitting and expect it to work? It just won't. Again, the cut-off is really difficult to put a number on. There's no magic formula for density, unfortunately. There are some papers, for example, the surf paper, which I referenced in the non-particles table method section, in that we show some different density data sets and what happens with different algorithms. That can be, if you have a quick look in the supplementary of that, that'll give you a bit of an idea of the kind of densities where, for example, multi-meter fitting really begins to struggle. But it's not really something that you can say, oh, this data set definitely won't work. This data set definitely will. It's always a kind of continuum, unfortunately. The denser you get, the more you're going to have to really try with other techniques outside multi-meter fitting. Hawke, I really recommend. And then as it gets really dense, things like safety and surf. Thanks. Another question, another two questions, actually, about squirrel. Can you use, or can you misuse, maybe, the plug-in squirrel for comparing white-field images and decomposed images, and or can you use it for globalization analysis? So, I've never tried to kind of hijack squirrel for co-localization analysis before, so I'm going to just say no to that one. I'd not like to speculate how to do that. But in general, squirrel will work for any method where you have one high confidence, but lower resolution image, and one processed or uncertain image, but that's a higher resolution. That's the kind of two assumptions it makes. So, for example, we've used it on CM images, etc. So there's no reason it shouldn't work on deconvolved data. You could should be able to use deconvolved data as your super resolution and non-deconvolved as your reference. But kind of the key thing to remember with squirrel is any artifacts or errors that are already in your reference image will be also perpetuated into that final error map. For example, it gets dangerous if you use a deconvolved image as your reference image for then looking at, for example, localization microscopy data. The best way to use squirrel really is with, for example, high-quality white-field or confocal image as your reference, and then you're processed as the super resolution. So when you start accumulating errors from all sorts of unknown sources, when you start, if, for example, you were to put a deconvolved image as your reference. All right. Thanks. So last question maybe is also people are wondering about multicolor imaging and chromatic aberrations. Can you say something about that? Yes. I've, again, my experience is pretty limited to single-color imaging, but I know that you have macros available for chromatic aberration correction on thunderstorm analyzed particles tables, if I'm correct. Yeah, exactly. Excellent. If you go to the Yalik lab.ketem.io, yeah, there's plugins that can take care of it, but it's not so easy actually to do the chromatic aberration correction. I mean, it's doable for sure. Yes. But yeah, again, I don't really have anything to add to that. I'm not a massive expert in multicolor imaging. One of the ways where you can actually avoid it all together is by using techniques such as DNA paint or exchange paint where you can, instead of using different colors, use different labeling methods to avoid things like chromatic aberration. So the actual analytics. Again, the plugins that you've provided are probably as good as I know for this kind of problem. Well, I hope so. That is about it with the questions. Thank you very much for your fantastic webinar. And we immediately go over to Florian. Okay, so we'll go directly on my part. So I'm Florian Levee. I'm a researcher in the CIPARITAS team at the Interdisciplinary Institute for Neuroscience in Bordeaux. And here I will speak about the quantification of the localization that can show you how to localize. In fact, in my part, I will do mostly presentation of the technique and explanation of how they work. And then at the end, I will do some demo on the software that I'm developing that is SRT cellar and colloquial cellar, but I will not speak of colloquialization on this webinar. You can find SRT cellar here. You have one click installer for Windows and the data sets and slides are available here. And then you have the speaker and moderator team of today. So we have localization here. You can see diffracted limited images of a fibroblast expressing integrin. And now that we have the localization, we want to find the organization of the data site where we didn't see anything when we have the diffraction limited images. And here we have the localization and we want to be able to quantify the organization. And how can we do that? At first, people get back at the beginning of the technique in 26, people get back to what they knew that were images. So here you can see the diffracted limited image and zoom here on three to three pixel. The pixel size is 160 nanometer. So now you have the localization. And here you can see the same region with the localization. And how can you create a reconstruction resolution image? What you will do is say that, okay, maybe now I want a pixel size of 20 nanometer. So now the pixel are a lot smaller. And what you are going to do is you are going to project all the pixel, all the localization inside the new pixel. So you get this kind of image. So it's quite sparse. Here you can see for instance that you have a hole here. And what people do when they want to use reconstructed resolution image as shown show is that they are going to apply for instance the localization and certainty of each of the localization in order to have something that is smoother and easier to segment. The problem is that obviously with very different biological model, we will have very different organization of the localization of the protein of the molecules. It's very difficult to normalize to every kind of biological model. First, the pixel size that you define will affect the quantification because if you have a pixel size of 20 nanometer or 40 nanometer, obviously you will not have exactly the same image. The pixel size is fixed. So that means that you're over sampling sparse origins and under sampling denser one where you will have maybe 1000 of localization inside one pixel. And usually when you are using intensity-based quantification, you need to combine different techniques and then it's complex to reproduce and generalize to every kind of biological models. But now, and even what bigger thing is that you have localization. You have coordinates of your localization. So why going back to a discrete space that is an image? And how can we use this localization? But even with the localization, there is quite some difficulties that are the experimental parameters that comes from the fuel for photovoltaic, the labeling density, or even the acquisition time. For instance, here you can see four different neurons. So it's an effect in protein that was labeled on the run. And here we have just normalized the density of the image with respect to the number of localization of this data set. And here you can see that this data set are five times more localization than this one. And we want to be able to segment that exactly in the same way. We don't want that the user come back and change some thresholding, for instance, in order to do a segmentation that is specific to each of these data sets. We want something that is more automatic in order to avoid user bias. So now I'm going to speak of the two main set of techniques that are developed for quantification of the localization that are clustering and segmentation. So when you are doing clustering method, you want to try to statistically characterize small aggregates of molecules, usually by comparison with a random distribution of the molecule. So that means that you are really doing statistics and want to find these small aggregates. So for segmentation, what you are going to do is that you are going to classify the molecule in different classes with respect to some defined attributes, some density, for instance. So we can see clustering as one application of the segmentation because with segmentation you can segment clusters, but still it's different because clustering is really statistics and segmentation is not. The most used clustering technique in the field is the carry-play function. So it describes the average number of molecules that exist near another molecule within different reduced air. So that means basically that you are computing the density around each molecule at different reduced air and you try to compute the deviation from a specially random distribution. This is computed iteratively for values reduced. So this is the equation. It's in the end very easy. That just means that you compute the number of molecules that are inside your reduced air. You divide by the average density of the data set and you sum for all of your localization and divide by the number of localization. That means that you are doing some normalization that you are very robust to the density of your data set. When you are using carry-play function, you know that if you have a random distribution, the number of localization that you expect in a reduced air will be p r square. Basically it's dependent of the area of your circle. But the problem is that it's difficult to understand when you just plot the carry-play function. So people use the hash-play function that is just a normalization of the carry-play function. And in this normalization, you know that for a normal distribution, your function will be zero. So you see it's just carry-play divided by p on the square root and you subtract the radius. So if we go on a few examples, here you can see a cluster distribution here. And here it is a complete random distribution. We are going to some magnification of the cluster. For each of the localizations, so here we have only three examples for each of the data sets, but we will do that exactly the same way for all of the localizations. You have to reduce, you compute the hash value, so the density around your localization divided by the area of the circle, basically. And you will have different values. So in this case it's a cluster, so we have a bigger value than the one that is random. We get another radius that is bigger. We compute, again, the value for the two distributions. And in the end, what you will see is that if you have random distribution, your value will be very close to zero. And if you have a cluster distribution, you will have this kind of curve. And here the maximum will be the radius of maximum aggregation of your data set and then the radius of the cluster that you have. So that can be very interesting when you have different conditions and you are really interested in knowing which one is the more cluster. And you can see here that some people have used that, used that in order to do, to compute, to analyze the rest into clustering with different conditions. And when you see the different curve, you can see that here, for instance, in this case, you have the higher value for the, for the curve. So that means that there are the bigger cluster. And even the shape of the curve give you some indication about the density of the cluster and this kind of stuff. Here I can go back to the Faber-Blaz data set that I show. And here we have some cluster in the background. And here we have an adhesion site that you can see here. If we go to our play function, we easily find some cluster in this case because we have background and clusters. But in this case, we don't see anything because there is different level of organization in your data. There is background clusters and adhesion site. And when you have this kind of stuff, it begins to be difficult for the replay function to find what is a cluster. And this is expected. So very good point. It is normalized. So you really don't care about the density of your data set. It gives you only one value, which is the radius of the cluster. So if you're interested in having more information like size of individual clusters on this kind of stuff, you cannot have this information with this technique. And then it's sensitive to multiple value of organization. But it's something that is expected because you only want to find some clusters. Another very close technique for clustering is the per correlation, which is expected to protect against offer estimation of clustering that come from multi-blinking of your offer. And because you will not compute the density inside circle like for carry play, the overcounting that comes from this kind of multi-blinking doesn't propagate to higher length scale. So if you have the carry play and you do the per correlation at the beginning, it's the same. But then at the second level, here you can see that you are doing ring. So you don't count all this location in this ring. You do that again and again and again. And in the end, you have this kind of curve. And if you have something that is random, you will have a curve that is around one. They even managed to derive the mathematics of the per correlation in order to find from the experimental curve the cluster radius, the molecule per cluster and the cluster density. Another very interesting couple of techniques that were developed by the lab of Dylan Owen. First, that's the Bayesian clustering. So in this case, you have a model-based clustering, meaning that you have a pure distribution that is supplied by the user, like the distribution of radius that you would expect on your data. The posterior will be the probability of any given assignment of localization to cluster, basically the cluster proposal. And what you will do is that you will have two parameters, a radius r, because for each of the localizations, they are using the hreplay function in order to have a value for each of the localizations that will be thresholded with some kind of threshold. If the value is below the threshold, it's background. If it's higher, it's a cluster. And then you will go through the whole space of parameters. So you will change the radius and the threshold and get this kind of value. Here you have a line that says that if you are below, you are dispersed, and if you are above, you are clustered. And then the Bayesian model will find the proposal that is best fit the model that was given at the beginning. And they use that, for instance, for finding clustering of these kind of cells. They also did this year some machine learning. So that's a very trendy technique now, machine learning, deep learning on biology. And they use neural network. But what is interesting in the technique is that usually neural network are using images. So now we have coordinates. So you cannot, there is a few, a couple of techniques that can take points, but not most of the technique of the neural network take points. So they wanted to go back to image, but not just the image at the beginning. And what they did is they use the nearest neighbors. So for instance, you take the 100 years neighbor of each of your point, and then we'll compute the difference of distance between the consecutive neighbors. So we'll get this distribution for this point that is clustered. And this distribution for this point that is not clustered just a random distribution. And when you do that for all of your localization that are here, you'll have this kind of result where you can really see that there is a difference in the distance when it is clustered. And then you feed that to the neural network. The neural network will really be very good at identifying that this point are clustered, and then you can get your cluster. So now I'm, I have talked about a few of the clustering methods that are using the field. I will go on segmentation technique. And certainly the most used in the, in the, in the domain is DB scan density based spatial clustering analysis with noise. So segmentation technique, you can segment object or cluster as you wish. The, the way the technique work is that you organize a localization with respect to three classes. Core point, density, reachable point, and outlier points. And I will explain how you find them just after. And it uses two parameters, a radius, that is a neighborhood size that is used in order to compute some density, and a mean points, that is the mean number of points in air for localization to borrow core points. So the mean point is different at the mean number of points for cluster. So if we go, you can have this kind of example, so it's pretty nice. And if you are having a small example here where the mean points is four, you go to this point here, you can see that you have five points in the radius air. So this is a core point. If you go to this point, there is only two points, but one is a core point. So we call it density, reachable point. And in this case, we have only one point. It is outlier point. You do that for all of your points. You end up with this classification. And usually in a single molecule location microscopy analysis, we are using both core point and density reachable points as point for clustering. Some people have used that in order to compute in bacteria and find a really nice number with that. And again, if we are going to the data that I just show on the fiber blast, here I have just put two value, a radius of 35 nanometer and mean points of 15. And the density here is 400 molecule per microns squared. You can find this nice segmentation. But then if you do that on the adhesion site, where the density is a lot higher, then you begin to just get every cluster of the adhesion site as one object. So obviously you can change the parameter in order to find the cluster in the adhesion site. But now you are losing some of the cluster in the background. So in the end, it's very nice, but you have two parameters. And finding the fine tuning to a parameter is always difficult. And it's not normalized. So that's real values that you are giving, hard value. And then if you have very different density in your data set, you will need to change your parameter. And that's quite a problem. Another widely used techniques are state-cell solution-based techniques. And the first one that was used in single molecule was delonation. So it's a subdividing space technique that uses triangles constructed from molecule coordinates. So from this data set, this localization will end up with this kind of triangles. And by construction, you know that no triangles will overlap with any other triangles. And what they use first was a global definition of a cut distance. So if a distance, the cut distance is 25 nanometers, that means that if a triangle has an edge that is bigger than that, it is not used in order to reconstruct the cluster. And then you end up with this kind of result. But again, as for DB scan, it is hard value. So if you have more denser data sets, you will end up with merging everything. And besides, if you want to use the structure in order to add relevant information to your data, to your localization, it's difficult because the triangles is connecting three localizations. And so we decided to use the dual of the delonate triangulation, which is a Voronoi diagram in order to try to quantify the localization. And the dual means that you can construct one from the other. If you have the delonate triangulation, you can see that for the Voronoi diagram, you are creating polygons around the localization and that this edge of the polygon are basically the bisectries of the edge of the triangles. It's a little more complicated than that, but that's the idea. And in the end, any point inside the polygon is closer to this localization than to any other of the localization that you can see here. So again, Voronoi diagram is a space of dividing technique. It's an object by nature. And then it's very interesting because you have a objective representation. For one location, you have one polygon. Two features that are very interesting with Voronoi diagram is the connectivity. You directly know the direct neighbors by construction and then the scalability because the denser the polygon are, the smaller the denser the localization are, the smaller the polygon will be. So what we did is try to use that in order to find some way to add relevant information to the localization and to compute same density. And so for if you have this localization, we can, if we call it rank one, the number is one, just the localization. The area is the area of the polygon in yellow and the density will be one divided by the area of the polygon. But then you can go to the rank one because of the connectivity. So we can now say that the number is five. This localization plus the direct neighbor. The area is the addition of the yellow and green polygon and the density is five divided by this whole area. And then you can do that for the rank two and again and again. But in the end, we mostly use the rank one and we found that it was interesting because it kind of smooths a little the value for the density that can be very, very interesting when you are doing some segmentation. And here you can see that we have just colored every polygon with respect to its density. So what it's interesting with the vulnerability is that you can do some kind of normalization. So it's not a mathematical normalization, but still it's normalized the density in order to be able to use it exactly the same way when you have a similar organization, but very different localization density. So for instance, here you have some cluster. In this case, it's exactly the same cluster. You can see that here we have five times more localization. So obviously if we compute the vulnerability density, the vulnerability diagram and compute the density and we plot the distribution of our area, the distribution of density as a logarithmic distribution, we will have some shifted distribution. That means that if you are just going to the distribution and try to put a threshold here in order to segment this cluster here, we'll end up with this threshold here and then you pretty much take everything in the background. But what it's very nice with the vulnerability diagram is that you can very easily normalize that by just dividing the density by the average density of the dataset. In this case, you end up with this distribution for the two dataset. And then you can very easily apply exactly the same threshold to the two dataset and have a nice segmentation of the cluster. So even if it's a segmentation technique, some people have used that in order to do some clustering by comparing the redistribution to Monte Carlo simulation. So the idea is that you have your experimental data, which have n localization. And you will do a number, a defined number of simulation for instance, 500 with a random distribution with the same number of localization and the same area. You compute the vulnerability diagram, and you plot the polygon area, and then you will see that this one has a lot of small polygon area, obviously, because it's clustered. And this one has always quite the same distribution. In the end, you can compare the two distribution, put a threshold, and really segment beautifully, completely automatically, the cluster. But still, it will be long because you need to do some simulation. And then if you want to segment this kind of micro-tribune, then you are screwed because, well, it's not cluster, and you cannot just say, find it with respect to statistically, find it with respect to the background. And so what we propose to compare the local density to the real density of the dataset and just threshold. So if you have these two data, again, same, the isolated cluster and the addition site, we compute the vulnerability diagram, we compute the local density, we apply one time the average density, and we end up with this segmentation. You will say, okay, it's exactly like for DB scan. You just have the addition site directly segmented here, and you don't have the isolated cluster inside. But now we can exactly apply the same technique that I showed inside the object. We know the average density inside the object, so we can say we will keep only the localization that I have one times more than the average density in the addition site. And in the end, we will end up with this segmentation where we have segmented the cluster in the addition site, but we still have the segmentation in the background. It's very easy because we have only one parameter, and the idea is that when you have done that with your data, you just fix your treasure and do that for every one of your data that you have to apply to analyze. So now you have objects. How can you compute the size and the number of localization of this object? So for the size, we have basically three ways to compute the size. Either you are using the root line of the polygon and you compute the area, or you can compute a bonding ellipse. So the smallest ellipse that contains every localization, but usually you have to estimate the object size with this kind of way of computing. And then you have the principal component analysis. So what you are going to do is that you are going to do a regression of your localization of your cluster in order to find the main axis that describe this cluster. You find the orthogonal axis that is at the centroid, and then you will project every localization on the two axes that will give you a distribution of distance in the first axis and the second. You compute the standard deviation of this distribution and then you multiply by 2.35 in order to have something that is similar to the full width at half maximum of Gaussian fitting. And in the end, I will say that used area or used PCI depending on what people in your domain are using. If they are using area in order to describe the object that they are analyzing, use the area. But if they are using Gaussian fit, for instance, use PCI. In the end, you can end up with this kind of nice segmentation of all the clusters in the dataset and find some quantification of the size and the density inside the cluster. There are a few methods that allows to do 3D segmentation. In particular, there are the 3D Voronoi diagrams that were done by Andronov and Hall a couple of years ago. The problem is that they are using two tools. So for quantification, they are using MATLAB. And then for visualization, you need to use Pia Python. So it's not very integrated. And it's very time-consuming. They are reporting six hours to analyze a dataset of 800,000 localizations, which is very, very common in single-molecule localization microscopy. They also report on a 3D db scan implementation in VividStorm, this paper here. But it's even worse because they report eight hours for 67,000 localizations. There is also a 3D version of the Bayesian analysis, which is in R and is packaged as script. So I never use that, but I expect that it's difficult to do everything inside, like visualizing your results, for instance, and show what you have. And I expect it also to be quite time-consuming because already in 2D, it's very time-consuming. And then now we have done all this technique in order to do some segmentation clustering. And we can say, OK, it's perfect. It's a very good number. But all these techniques are expecting, let's say, perfect dataset. What we know with single-molecule localization microscopy is that you have experimental parameters that will affect what you get. For instance, if you are using fusion protein, like in PALM, you would expect a ratio on one-on-one, like you would have one three or four for one protein. And then if you manage to identify the protein, you get all your protein in your dataset. But in fact, not really the case. If you activate one protein, it will happen like that. It bleaches. You activate the second and third, fourth, five, six. But in the end, you don't have just one peak. It's blinking. So that means that you need to take that into account before doing some stoichiometry analysis, for instance. And overcontaining and fast clustering can be resolved either by, first, you need to have low-level activation in order to really be able to separate temporarily all the peaks. And then you will need to use some dedicated analysis program to regroup all the blinking of the protein inside one. And basically what it does is that it will run analysis on all the fluorophore on your dataset, like trying to find the distribution of dark time and on time, on time and off time. And then it will fit some distribution and find an average dark time for your fluorophore and then try to do the temporal grouping. And even for antibodies like in storm, you can have the same kind of problem. But it's different because before, you were bleaching your protein. So you knew that when you are here, this fourth protein that has bleached won't come back after. So you can do this temporal grouping. But in the case of antibodies, it's not the case. Basically, your fluorophore will go on, blink, go off and can go for hundreds or thousands of frames before coming back. So you cannot just say, OK, this is one protein. This is another one. Because this is only one protein here. And so in the end, if you have several one, you end up with this kind of stuff. And then again, you have this problem of clustering that is not real cluster, but just one protein. And one of the solutions that is used is that to do raciometric quantification. So we'll just go back on that on the next slides. And also you have the problem of the labeling density. And when you are using antibodies, usually you have the secondary antibodies that has the fluorophore. So the first primary antibodies, the second one will stick to the first. And then you can have several second on the primary. So it's very difficult. And one solution is to really try to have short linkers and direct labeling in order to don't have this kind of problem. So about the raciometric quantification. Here, for instance, this is some data of an neuron where we were interested in empanada domain. So you can see them in purple here. But then we have this Empanada domain. And we wanted to know how many Empanada receptors there is inside. But how can we just first aim on them and find this information? Because this is storm. So we just get on and on with the fluorophore. So first we need to find what is one isolated Empanada receptor. But then how many fluorophores there is on one isolated Empanada receptor? So how many locations for one Empanada receptor? And how can we separate these two populations? So what we did is first we tried to find some internal control on the data. And this is here. All these small things here that are isolated fluorophore in the background are our internal control. So we sing on them. We find the size of the structure. So we can do this distribution of size of the object in the background of the isolated fluorophore. We fit a Gaussian. We found an experimental resolution for our experiment of 33 nanometers. And then we also have the blinking statistics. So basically the number of localization of each of the objects. We plot this distribution. And we find that the median is 22 localization per isolated fluorophore. So now we are going to segment everything that is inside this area, inside the neuron. And we have this distribution that is isolated and Empanada domains. So how can we just separate the two? What we know is that Empanada receptor are a complex that are full proteins or something like 10 nanometers. Our antibodies are 10 nanometers also. So we have something that is around 30 nanometers here and here around 20 nanometers. We have an experimental resolution of 33 nanometers. Basically they are exactly the same size in your data. So that means that we took this distribution, defined the 5 to 95 percent, so between 8 and 59 nanometers. And we said that isolated Empanada receptor at this size. So now we are able to separate these two populations. We found the number of localization of this isolated Empanada receptor and we can weight it with respect to the number of localization of the isolated fluorophore. So we found that each of the Empanada receptor has a median of 2.72 fluorophore. And then in the end, we have the distribution of all our Empanada domain and we can find the distribution of size and we can plot the number of Empanada receptor inside the Empanada domain. And that's the way how you do some raciometric quantification. And in the end, some experimental way to some very problem-aizing technique in order to do stoichiometry is to use Q-Paint, where you are using the DNA paint. So you have a Docker strand, a DNA Docker strand on your antibody here. In the solution, you will have some imager strand. And when they bind to the Docker strand, the dye goes on. And when they unbind, the dye goes off. So that means that what is very interesting with that is that your blinking is not dependent from the dye photophysics, but is dependent from a predictive binding kinetics of the strand. And if you have a complex with several proteins, you will have a different frequency for binding. And basically, your dark time will be smaller because you have more protein here. In the end, what you are going to do is that you are doing the cumulative distribution of all the mean dark time, of all the dark time that you get, and you find the mid-dark time. You can see here that for this one, the mid-dark time is smaller than for the one that is alone here. And when you have, you will be able to compute the number of binding sites after calibration. And I'm going here on a specific thing that is very interesting, I think, for really showing that it's very important to understand what you are doing when you want to interpret your data. This is something that is coming from the Solar Lab. And in 2009, they used this term in order to study the real receptor. And they find, at this time, a resolution of 30 nanometers that they thought that this receptor was organized as a regular lattice of 30 nanometers per 30 nanometer, like this way. But in 2014, some study challenged this assumption and they used electron tomography so they get back to single molecule location microscopy two years ago but with the advent of technology, so they used DNA paint. And in this case, they managed to have a resolution of 5 nanometers, much better resolution because of the DNA paint. And they used Q-Paint, like I just say, in order to find the proper number inside the different cluster. So here you can really see that you see PENTA, the individual real receptor. They are doing the cumulative distribution to find the average duct time. Then they are using some internal control here to find the weight of the duct time and to find the real number of the real receptor inside the cluster. And in the end, I found that what they thought was the organization of the real receptor was not really that and the real receptor is more organized this way with some gaps in the cluster and even they performed some colloquialization with another protein and showed that it was quite different than what they expected. So it's very important to pay really caution and to really try to have best resolution and really understand what you are doing. So I want obviously to find my team and my boss, Jean-Baptiste. The source code for the tessellar is here. You have a window installer. You can if you have some problem raise an issue on the GitHub and then you can also go on the forum.image.ac with the tag tessellar if you have some questions. I'm going to do some very fast demo of tessellar. So you have the GitHub here. As I said, you have some results released here where you can download this installer. I have already done that. So here you have how tessellar is looking. You can open CSV that are generated by Thunderstorm with it. So here I just open the dataset that I showed before. What is interesting, I think with tessellar is that we don't use images in order to do the display. We really have a vectorial display. So you have the localization and you can really see the localization. Here you have a missed quantification where you have the replay function implementation of the replay function in dbScan. For instance, in this case, I show you the carry play. As I said, it's very difficult to understand. Here it is the hash function. Even if I put L here, it's the hash function. You can see that we have a reduce of maximum aggregation of 100 nanometer in this case. You can also try dbScan with this tab. Here you will directly do some segmentation of the data and then you can change this, which I call distance, the mean number of localization in order to find what is the core point. Here you have also mean number of localization in the cluster in order to say, okay, this is the cluster. Then here you have the tab for the Voronoi diagram. You can create the polygon and then you have this density map and then you can just say I want something like maybe one or two as a factor of the density and then you can create some objects in order to have your data. This is the version of Tessera. You have also a detection cleaner here in order to clear in the case of palm in order to do some temporal grouping. Now I will do a fast demo of what is coming with Tessera, which is still not public because there is a mess and I need to clean it before making it available. But basically it's going 3D with Voronoi diagram. For instance here, you can see that you have a few localization. I will do some construction of the Voronoi diagram. Here what you can see is that what takes time is to create the Voronoi cells. In fact, it takes a long time so it's not something that is obligatory in the software and the algorithm. But then if you do that you can see your different cells and then you can obviously do some segmentation like that and then if you segment your objects and don't show the Voronoi the object you end up with this kind of result for instance, you can very easily go on one object in order to see how it is for instance. But I have done quite some work in order to make it fast, I would say. And then if I go here I will open this data. It takes some times because there is a million six hundred thousand localization so it's quite a nice data of mitochondria. It will arrive creating the this kind of stuff. I will close that hope that it doesn't crash this error. So you have this kind of result. I will create the Voronoi diagram so I it will take some time even for one million six hundred thousand location it won't take hours. In the end it will take less than a minute at least it should take less than a minute in order to do the Voronoi diagram and the idea is that first it's create a 3D Dolognit triangulation and then use that in order to do the Voronoi and part of the Voronoi is done on the GPU so that's why it's very fast. There is this portion now something that is just rearranging the data in order to feed it to the GPU that I would love to scale down at the time because it's a bottleneck that I know is me for now it's like that and then construction of the Voronoi cell is done already so very fast on the GPU and if you go here now we will show we have the density of our dataset it is computed and then you can do some segmentation I should have put a mean number of locations bigger than that but it's still okay and then you can really go on some part and see what you have and then do another segmentation inside of your object in order to have some cluster for instance so that's what's coming taking a little longer than expected because I don't have time right now and I really need to do some cleaning of the code but at some point it will be released and available to everyone and it's the end of my talk wow that was super impressive let's go over some questions there are not so many but we have a few one of the questions is do the cluster and techniques shown in particular the Voronoi work with non-circular shapes so the segmentation technique will work with non-circular shapes for sure for the clustering like carry play and this kind of stuff it's working less for sure because of the way that the radius is computed that is having a circle basically so it's working well with circular shape and less with elongated shape or this kind of stuff that's not really true it seems for the machine learning thing as they tested it on elongation, elongated data and they managed to still do very nice clustering cool so another question is what happens to the localizations that are specially located near to a cluster but have a large polygon area would they be counted as part of the cluster or not part of the cluster so that's usually with Voronoi usually they won't take me part of the cluster obviously it depends on the steel and the threshold that we used but usually their density is a lot bigger if there is not a lot of background for instance the density will be a lot bigger but you won't take it yes okay so we have some questions that are already answered in the Q&A we know the last question is when will this become available yeah I hope in the couple of months I won't say that for sure because one of the things that I have done with the first version of Tesla is make it very robust so my understanding is that I don't have people saying that it's not working, it's crashing except for reading some files but that's just a problem of E-DOS in the file and right now I'm not at this level with this version and I really want to try to make it the most robust possible when it gets released so I would say I hope that in a few months maybe summer or something like that I will have the first version but I will wait really to have something quite robust alright so maybe the very last question that just came in is related because people want to know what kind of hardware you need to run this 3D segmentation so for what I show you will need an NVIDIA card this is a QDA QDA code so this is really linked to NVIDIA card but then I would expect it that with normal card it should work for instance it don't work on my laptop but it's because it's Intel HD crap card processor but if you have an NVIDIA card and I would say any NVIDIA card it should work okay that's good to know maybe not if it's 15 years old but something 5 years or something like that yes sure okay I think that was it then so normally I would say let's thank all the speakers Sion and Florian so everybody's applauding now and thank you for your participation thanks also of course Daniel and Dele for helping me moderating this and the Q&A session will be completed and it will be posted on image.su website on the NVIDIA bar so with that I think we should end