 Hi everyone. Seems like people are still trickling in, but it can start sort of with overview and introductions, and hopefully people will be along. My name is Beth Simone. I am the lead image asset developer in the imaging platform at the Bird Institute. Among other things, I am the co-maintainer of CellProfiler, and I am in charge of the CellProfiler plug-ins repository. Joining me here from the Bird today is Alan Goodman, our lead software engineer, and David Sterling, a postdoc who's also on the joint between the software engineering team and the image asset development team, and has been very involved in helping work towards the release of CellProfiler 4, which we hope to have for you imminently. I do just want to mention a little bit about how the world will be pitched at what level. In general, if you're already a experienced Python software engineer, there will be some CellProfiler specific stuff that may be of interest to you, but you may find this a little bit simple. And if you're totally unfamiliar with programming and you've never done any before, you may find this a little bit challenging, although there are earlier Python things in the NuVis Academy that you could definitely try to check out to help get up to speed on that. And then come back to this once it's recorded on YouTube. But hopefully everybody who is here is here for about that level of discussion. And we're 9.33, so let's just get into it. We're going to talk through a few different things. First off, just some very basic principles about CellProfiler, how things get added into CellProfiler, how you edit your own copy of CellProfiler. We'll then talk about CellProfiler module overview and CellProfiler module structures part one, some of the structures that are involved in literally every module. And then we'll go through and we'll actually live edit a CellProfiler module together in real time to fix a bug that had previously been reported. I'll then go through and talk about some slightly more advanced CellProfiler module structures and walk you through the process of making your own CellProfiler module using a module I put together this week. Please do use the Q&A to submit questions. If you have questions along the way, Alan and David will be interrupting from time to time to pass them along to me if it seems like it's something that we're not necessarily going to cover. But if your question doesn't get answered, probably it's going to be because it's something that we plan to cover later in the talk. All right. So what is CellProfiler? Probably you are at least vaguely familiar with it if you're here in this workshop. But CellProfiler has actually been around for quite some time now. And so I think that helps at least partially explain why it has so much code already and can do so much. It's been around for 15 and change years. Its first publication was 14 years ago. And it's had nearly 10,000 citations since then, which I think demonstrates that in fact, there's a lot of really good functional code that's in this piece of software. As of this morning's count, the current program has 87 modules. And in the CellProfiler context, a module is not necessarily what it would be in a computer science context, but in general, a set of code that does what a biologist would consider an image analysis function. Some of those relate to image processing. Some of those to object segmentation. Some of those to things like file processing. But it already has 87 modules as of today. It also has a plug-ins repository where people have submitted code. Some people in the CellProfiler software engineering team, but also scientists from all over the world who use this software. And that has 41 modules as of this morning, not counting that some modules have different CellProfiler versions, specific versions. And I mentioned all of this, not just to be like, yay, we have a lot of code, but to say, CellProfiler can already do a lot. So for many of you, even if you use CellProfiler for years, you may never actually need to write your own CellProfiler module or edit an existing CellProfiler module. I think I had been using CellProfiler for six or seven years before I actually started doing this on my own. But that being said, there's certainly always bugs that need to be fixed. There's certainly always new ideas for new biology and new microscopy that our team has never worked on before. And so we wanted to do this workshop to help people learn how they can do this on their own so that we can make this code cover even more of what biology has to offer. And just want to shout out to the historical software engineering team for CellProfiler over time. Alan, David are on the call with us right now. But none of this, of course, would be possible without these people putting in years and decades of work making this program for everyone else to use. So the first question is just, how do I know what can't be done? I've just said that there are 128 different CellProfiler modules that cover a lot of different biology. And even I, who've been using this software for about 10 years now, still occasionally find things that I didn't know CellProfiler could do. So if you have a use case that you feel like CellProfiler isn't handling well, the absolute first thing I would do is I would say go on forum.image.sc and search for what you want to do in CellProfiler. Because possibly somebody's already talked about doing it there and has posted their solution for doing it or posted that there's no way to do it. The other place I would check is the CellProfiler GitHub repository so that you can look at our open issues. You can look at the closed issues and see if possibly somebody talked about doing this and we decided to handle it in a different way or to handle it as a plug-in. But don't just rely on, well, I don't know how to do this. So therefore it must not be in CellProfiler because there's a tremendous amount of code coverage there and of coverage of biology in that code. And even if there's not something that has the name of the exact assay you want to do on it, it doesn't mean that the existing modules can't be strung through to solve your image analysis problems. So that you don't have to write your own code. All that being said, of course we do still write code. We do have software engineers who are actively on this project and biologists who contribute to it and image analysts who contribute to it. So in general, there are three main ways that changes get made to the CellProfiler code. The first is that someone notices or is told about a bug and then they fix it. If you think you have noticed a bug, please do search on image.sc and or in our GitHub repo and see if somebody else has reported that bug. And if they haven't, please do file a bug report. If you click the new issue button, believe, can everyone see my pointer? Yes, we can see your point. If you click on the new issue button, you'll be given the option to pick either a bug report or a feature request. And if you pick a bug report, it will give you a way to describe exactly what went on, what operating system, what CellProfiler version you're using so that we can try to help you as best as possible. But of course, when reporting a bug is only the first half, someone has to then go through and fix it. Likewise, feature requests go through a very similar process. We've experimented with different ways of tracking feature requests over the years, but typically the way they are handled now is that we have a feature request issue tag in the CellProfiler GitHub repository. You say, what is the feature that you want? What is the, how do you think we could do it? What are the alternatives you've considered? And then any additional context you have. And so again, this is another great way to be like, I think CellProfiler doesn't do this, can it? And the people who work on it and know what the best can tell you, oh, actually there's this other way to do it. Or they'll say, hey, this is something we can add to the project. And or if there's a piece of functionality that's really outside of what currently exists in CellProfiler, somebody makes a plugin and shares it with the community either through attaching it to a paper on their labs website or by contributing it to our CellProfiler plugins repository. The one thing I do want to say is that in general, I've said someone here and the more someone's there are who work on this, the faster that changes will be made. In general, we would love to fix everybody's CellProfiler bugs the second they come in, but in practical terms, bugs tend to get fixed more quickly if the person can at least try to narrow down what code, for example, what lines in a piece of code a bug is coming from because it just helps us solve the issue faster and it makes it easier for us to solve and makes it more likely to rise to the top of a priority list. So when should you edit a module versus making a plugin? Certainly if you find a bug, you should edit the module either in your own copy, first of all, and then go through and submit a pull request to the main CellProfiler repository. You may also want to edit a module if your desired behavior is pretty close to an existing functionality and with 128 different existing functionalities, something that's pretty close to something that already exists is actually covering a wide range of things. And in general, as a personal piece of advice for me, I'm of the coding school that says if you can reuse somebody else's program or somebody else's code with attribution, of course, you should generally always do that. The less time the community spends reinventing wheels, the more time the community gets to spend doing other biology. But if there's nothing that's really close to what you want to do, if you think that what you want to do is maybe a little bit niche and there's maybe only a dozen labs in the world that care about this function, or if your module will add new dependencies to CellProfiler, that's when you want to make a new module as a plugin and submit it that way. And again, this I think will become a little more clear as we go through the process of editing one and making one, but that's generally how we would make that decision internally. So before we get to CellProfiler module structures, I do want to talk a little bit about how one might actually then go about making their own copy of CellProfiler and starting to look at CellProfiler issues. Here's the main CellProfiler GitHub window. If you just search GitHub CellProfiler on Google, you can find this, it's also linked from our website. In general, if you've never made any changes to CellProfiler before, the first step you're always going to want to do is make what's called a fork. In GitHub terminology, a fork just means essentially you're making your own personal copy of a repository. You can see I already have a fork of CellProfiler under my username, but with other accounts that I have access to, I could make other copies. In general, this is simply because in theory, many dozen people or many hundred people could be working on their own versions of CellProfiler at any given time. We want to make sure that everybody's work stays separate and that only things that have been well vetted and well tested actually make it into the final code. So unless you are one of the three or so people who are in the lab, who have write privileges at any given time, step one is always going to be to make a fork. Now I close that because I wanted to click through to here. So this is my fork. I only made it a couple of days ago, but it's already 12 commits behind because Alan and David have been working very hard on the new CellProfiler 4 upcoming release. I could try to bring it up to date if I wanted. It's pretty easy to do that. But what I'm gonna actually, all of the work I'm gonna actually do today is going to be in the CellProfiler 319 branch. And that's just because we're in a point in CellProfiler's migration towards Python 4 or towards Python 3, CellProfiler 4, where CellProfiler is a little bit unstable. So rather than submitting my code into the main branch of CellProfiler, which is always called the master branch, I'm gonna submit code into the 3.1, version 3.1.9, which was our final CellProfiler 3 release. It's not something you would actually commonly do. It's just that we happen to be at a very particular time in CellProfiler's development where there might be random bugs coming up if we try to merge into master that don't actually reflect what you would see if you were doing this at another time. If I hadn't previously cloned this, I can get the link to do so just with this clone or download button. And this is a little copy thing right here. But in fact, I have already made my own version of CellProfiler with my username and cloned it on my machine. And if I type get status to see which of those different CellProfiler branches I'm on, as I said, I'm working on 3.1.9. One of the other things that I have open on my machine is Visual Studio Code. There are a lot of good code editors and I've personally used several over the years. This is the one that I happen to like this much. But if you prefer something like PyCharm or some other spider, there's lots of other good code editors you can use. One of the reasons to consider using a code editor is the ability to go through and easily debug your code. When you're working on something that's as complicated as CellProfiler, you don't necessarily want to have to make print statements to send to the terminal everything you're working on, especially if you're trying to see if an image was processed correctly. Images are gonna not print nicely to a terminal. So you wanna actually be able to pause the code partway through an execution of a step and actually take a look at what's going on inside your code. So it's pretty easy to configure a debugger in most of these programs. Again, there's no particular reason to use one over another for the most part. And of course, my debugger configuration has gone away. But that's okay. So you'll get to see me make a new one. So all I'm going to do here is because the individual CellProfiler module files are not gonna run like a program. They're meant to be used within the larger CellProfiler context. All I'm gonna do is tell it that I want it to use. Not should be all that I need to do. Not sure why I forgot. Why? My previous one since about an hour ago when I last looked at it, but that's the fun of live coding. So when I hit this run button here, if I've done this correctly, it should now just open up a copy of CellProfiler and it does. But what's nice is that if I, I know I have the threshold module open. So if I put in a threshold thing right now, I could go through and put pause buttons at any step through my execution of the threshold or any other module just by tapping to the left of the code line. And my code will then run just to that point and stop. And so that's really helpful for debugging. You might not use this as much when you're making your own module because your own module is likely gonna be smaller and less complicated than actual CellProfiler source code. But actual CellProfiler source code is pretty complicated. And so that's why we would tend to need a debugger. You can see here a command that I've already typed here which is called get status to show that I'm working in a particular part of my repo but there's no changed file since the last time I saved. Just for the sake of making a change here, gonna change the variable revision number on this module and we'll talk about what that is in a bit. And if I type get status again, you can see now that it says that there's a modified file that I could use to go ahead and make changes and push changes to the internet and to the Greater CellProfiler code base. I can also type get diff to get a sense of exactly what that is. Depending on which code interpreter you use, you may actually find this easier to do inside your code interpreter. I know, for example, friends I have to use PyCharm, like to do it inside PyCharm. I learned how to do it on the command line. So I tend to like to do it on the command line. But again, the ability to just track changes in your code so that you can be very thorough in documenting what you changed, when and why, is critical for any sort of coding success. I'm gonna type get stash here to remove that change. And you can see now this variable revision number has gone back to 10. In general, commit early, commit often, make your changes as much as you can because you may have a question about which part of a code change served, which function a year or two from now and you won't remember anymore. So if you write as you change individual parts of the code, why you changed them, you're a lot more likely to be able to track down why certain changes were made. We have a couple of minutes. Do we have any questions so far that seem good in general to start with? No, we're all clear. I've been questioned so far. Just so that people are aware, there is a Q&A button down the bottom that you can use to submit any questions you have. Cool. All right. Then I will assume things are clear and keep going and things may get less clear as we actually start getting into the nitty gritty of self-profiler itself. I'm gonna go ahead and close that. So once you have a version of self-profiler that's on your computer, you have an environment where you can make changes, you can track the changes you wanna make, what actually are the changes that you want to make? How actually is self-profiler organized? So the first thing to know is simply that there are four different classes of self-profiler modules, a main class and then three subclasses of it. The particular subclasses are image processing where you do something to an image, image segmentation where you take an image and try to find objects in it somehow, object processing where you take the output of an image segmentation and then change it in some way and then just our general module class which covers basically everything else. You don't need to follow this particular structure in your own modules but it helps, there are common things that are done by each of those kinds of modules. For example, image processing modules always have an image that goes in and an image that comes out. So in that case, it's helpful to know what kind of module you're working with and particularly then once you start writing your own to know which kind of module you should start writing in so that you have as much as possible defined ahead of time and you have to do the least amount of work possible. And these are found in the subprofiler.module class. Once you actually have a subprofiler module it has four major behaviors. This is not comprehensive but this describes most of what a subprofiler module does. The first thing it does is it takes in some settings. It lets the user set up how you want to configure what you're doing in subprofiler and some modules have only one or two settings. Some modules literally have dozens. It completely depends on what you want to do. Then you run, you do the thing that you want to do. And so all of these things are very closely connected but how much input the run needs is gonna depend on how your settings are set up. You then display, you show the user what you did. One of the benefits of subprofiler is that you can do this on the fly looking at how an image analysis step turned out so that you can see if it was successful or make changes if not. So the display is actually a really critical functionality in subprofiler. And then finally measurements. You're gonna write out measurements of what you did. So most modules will have all four of these. Some modules don't make measurements but everything has the first three and most subprofiler modules make some measurements. Even things like object segmentation where it makes measurements of say how many objects it found. So when you're trying to figure out how to make a change in subprofiler's code, the first thing you should think about is which of these four aspects of the subprofiler function do I need to change? Starting with settings, there are lots of types of settings. There are what a particular measurement might come in. Is there a file that needs to be read in? Does a user need to make some choice about something that needs to be used? Does the user need to make a binary choice of if a certain step is done or not? In general, we've predefined as many of these as we think is practical so that you can just load up a setting that does exactly the thing that you want to do. That you say, oh, I want the user to make a yes or no choice and you can then just go ahead and load up the binary class without having to code your own. And if you look in the subprofiler setting file, you can see these and many more. This is actually, if you can believe it, not a comprehensive list of all of the different setting types. And when you're configuring your settings, you always have at least three functions. Create settings where you define new settings for this module. Again, sometimes this is as few as one, sometimes this is many. You have a piece of code that's just called settings and that is the total list of module settings that should exist. Some will come predefined from those module classes that I mentioned and some will come from Create Settings and those will combine together to create your total list of settings. Some settings you also may not want the user ever to see or you might only want them to see under certain circumstances. You might have a hidden setting that counts for example, how many images have been loaded in so far. User doesn't need to see that because they don't need to make any choices based on that, but we wanna be able to count it. You also might, for example, have something where a user is asked, do you want to perform this part of a function? Yes or no. And if they say no, ideally you would like to hide that function so that it's not visible to them anymore so that people don't get confused why they should be configuring something that they don't plan to do. So visible settings can just be a copy of settings but it can also have some pretty complicated logic to allow you to decide what a user can see at any given time. Settings often also contain a few other kinds of functions. You'll see prepare settings and validate settings. Some modules have both, some have one or the other. In general, once settings start getting complicated we've written in a lot of functionality to check and make sure that the settings seem good. So for example, if you're supposed to give self-profiler a long list of images that you might want to all have the same thing done to them, you wanna check to make sure there are no duplicates, for example. So prepare and validate settings are not present in every module but they're present in a lot of them to try to make sure that the user doesn't run into bugs because these things have been checked before the module ever got run. You'll also see upgrade settings. Self-profiler, as I mentioned, has been around since 2004 and we constantly update and change the code. So we have functions to help you migrate old versions of pipelines and old versions of modules to their new versions and whatever new functionality we've added or how to work around functionality we may have removed and moved somewhere else. The second major thing is run. And again, this can really, really be anything, usually anything that's in Python, though you can use it to use Python, for example, call out to a bash script. You may wanna play in Jupyter or Python with your images and with your functions to try and get a sense of what your run function is because theoretically this is the part that's completely exclusive of Self-Profiler but you may also just wanna develop it live. I tend to develop it in the run function inside Self-Profiler, some of my colleagues tend to go make their run functions in Jupyter or somewhere else first and then bring it into Self-Profiler once they think it's doing what they want it to do. It's really a matter of what you're comfortable doing and what your personal organization is. With that, we're gonna actually go through and live at a Self-Profiler module. Any major questions before we get to that? Nope, we're all caught up. Great, okay. So what are we gonna actually do? This is a Self-Profiler bug from about a year ago. It's not strictly speaking a bug which is why it's tagged as enhancement but it's something that we noticed was causing some problems. Self-Profiler has two modules that are for illumination correction called correct illumination calculate and correct illumination apply. One of them will calculate the illumination correction function for an image and then apply will then take what's been calculated and apply it. In a lot of image processing programs you'll see this happen all as one step. In Self-Profiler it's split because sometimes we wanna do very complex, fancy or time consuming illumination calculations. So we wanna do that once and it's own and then save out the results of that and load it in later with correct illumination apply. Very often you'll see them in the same pipeline but sometimes they're split and that's why they're two different functions. What we found last August was that if they're, because of a bug that has now since been fixed here, occasionally image functions that were supposed to be have a minimum of one, had a minimum of less than one which meant when they were dividing out they were causing some Self-Profiler results that shouldn't have been possible. There shouldn't have been possible for there to be an intensity in the illumination function that was less than one but you can see the minimum intensity was 0.999 and this was causing issues in downstream work. This has already been fixed by David but so what we decided was nice was that if you compared to other modules in Self-Profiler which say we want all images to be within the range of zero to one do you want to just automatically cap them in that way correct illumination apply didn't have a setting to do automatic capping of values to within zero to one. For values that were greater than one there was just no way to cap them at all. For values that were less than zero it turns out correct illumination apply was doing that without telling the user. Which isn't necessarily bad because again it was avoiding other issues downstream but in Self-Profiler in general we err on the side of telling the user what we're doing and letting them decide if they wanna do it or not. So what we decided is that we wanted to implement two new settings in this module. One of them to allow the user to decide whether or not they wanted to allow values of less than zero and one of them to allow the user to decide if they wanted to allow values of greater than one. And you can see that this work is in our Self-Profiler for project and if you're looking for more ideas about how you can contribute to Self-Profiler I definitely recommend checking this out. So the module we're gonna do here is correct illumination apply. So in order to do work on this I wanna keep all of my work organized and so I wanna make a new branch in which to do this work. A branch just means essentially a different version of the code in GitHub. And I wanna start from 3.1.9 which is why I started from here before and I can even more explicitly just make sure that I'm starting from the right place by checking it out again. The way we tend to make name branches is to if a branch is in response to a particular issue do get check out dash B which means make a new branch. And because this is issue number 3829 I'm gonna call this issues 3829. Not the most creative but it helps us keep organized. So right now this branch is identical to the Self-Profiler 3.1.9 branch but not for long, close my window here. So here's the correct illumination apply module. It's not one of the longest in Self-Profiler. It's only a little over 300 lines of code but it's not the shortest either. We have some that are much shorter than this. So the first thing is we're making a new version of this so I'm gonna go ahead and increase the variable revision number. And again going back to our ideas about what we're changing in which part of the four parts of Self-Profiler we're gonna change. We said we wanna implement two new settings and so that means we need now to add two settings in the create settings functionality and call these. So you can see here the other settings have something called CPS alias in them or set in them. Those stand for Self-Profiler.setting. In some Self-Profiler modules we'll see that written out and some of them you'll see it abbreviated. We're working to get it all consistent. And we're, what kind of setting are we making? We're making a setting where a user has a choice. Do they want to do something or not? So this is a binary setting. And when I, in my text editor, when I make this function it shows me what are the things that need to be a part of this function. It needs to say what is the text that is going to ask the user the question of what they wanna do. And it needs the default value of is it going to start as yes always or no always? For truncating low numbers, values that were less than zero. So Profiler was already doing this. And in general, when we make changes to main Self-Profiler modules as much as is possible we wanna maintain consistency with how Self-Profiler used to do things. So when we make the change to this module here we wanna make sure that the default value is yes. So the default value is to get rid of values that are lower than zero which is what Self-Profiler was already doing. So my text is, my value is true. I can write a longer doc string. And in general, one of the things that I think makes Self-Profiler very special is it does have quite a lot of documentation. You can see some documentation strings about what a particular step does can actually get quite long. I don't want you guys to have to watch me write a long one. So I'm going to cheat and I'm going to copy it from the one that I've already written. So what I've written here is that this is a function that's going to allow the user to set negative values to zero which was previously done automatically again. We wanna make sure users know why we're making the changes we're making. And in the same way, I'll make a truncate high. And again, as much as I can, I wanna just avoid you guys watching me type. So I hope you'll forgive me copying and pasting stuff in. The major thing I want you to notice here though is that here we've actually set that the default behavior is false. Again, this is because we wanna as closely as possible queue to what Self-Profiler is doing before. Also didn't happen to put in the text equals and value equals here, but I can add them now. I'm gonna go ahead and save. If I look at my git diff to see what is different since I saved and say that I've incremented the version and I've added some new settings. So you can see now that it says that there's one file that has been modified now that it looked inside Self-Profiler modules. And again, I wanna do this as often as I can. So that I have a clear record down the road of which code changes do which function. That means if a code changes wrong later easier to track it down, okay? I've created the settings, but I mentioned that there are three places settings always need to go. So let's look for those other two places. Settings and visible settings. And result, which was previously containing the result, the list of all the settings is a list. So I'm gonna just put these in a list and add it onto the previous list. And for visible settings, I always want these settings to be invisible. There's nothing about these where there's behavior that might be seen downstream whether or not I say yes or no. So I want these to always be invisible settings too. So I can actually just take the same exact line of code from here. Again, the less code you can write, the better. A little bit of overkill, a little more than I would usually commit, but I'd like to show you guys what I'm actually doing. Now here's that prepare settings that I mentioned. Now the way correct illumination apply usually works and might be helpful for this if I actually open up a copy of cell profile or so open up my debugger and let it give me a copy. Is that images are added in sets. They're added with an image that needs to be corrected and image that's doing the correcting and an output image. And those three always need to be together. In general, when I wanna make a new pipeline, I pretty much always just load the example pipeline and work from there because that sets up my images and my input modules and then delete whatever the module was doing before because I don't care about that. I can just use it as a quick way to get a new project setup that I can play around in. So in order to run correct illumination apply, I'll need a correct illumination calculate first and I'll say that I wanna use blue, the original blue to make something called regular blue, loom blue using the regular function. And then when I wanna do my correction, I'm gonna take original blue, divide it by a loom blue and correct it. And you can see here that my new settings have already been implemented here since I've seen cell profile. But all of my previous settings came in sets of three because if I hit add another image here, I can keep hitting add another image for as long as I want. But I'm always getting the same block of settings so they should always be a particular length. And that's important to know because here in the prepare settings module, what it's saying is that it's expecting that how many settings there are is equal to the number of settings per image, which is three, that the equally divides into it. And that if it doesn't equally divide into it that something wasn't set correctly and there's a problem. But we've now added two new settings and two is not divisible by three. So we just wanna let it know that there's now, that this track should now expect that there are two extra settings, correct? So now I've properly told my settings function that this works and I can confirm that by adding my new module here. And if cell profiler couldn't read my settings because something was wrong, it would tell me here that it couldn't load correct elimination apply. So so far we're on the right track. I keep exiting my debugger wrong. Live demos. So we've incorporated a setting now but all that setting is doing is recording what the user's preference are. Is that information actually going to be changed in the sense of set to within a set of zero to one or not? We haven't actually done anything to cause the user's desires to happen. So that's where we get to the run function. And in this case, run is only a two line function. Run calls run image for each image. So what we'll actually need to edit is run image. So I mentioned before that previously the output pixels were being set to zero if they were not, if they were below zero, but that this was being done automatically whether the user wanted it or not. That's where that was. That's going away now. So after this, but before I save out my code wanna add in, first of all, just some notes to myself so that, and to future people who come across this work, say that we wanna optionally clip high and low values and that if truncate.low has a value of true which in Python you can shorten to if that that we wanna take output pixels and use numpy's where function. This is slightly differently than how the other code implemented it but it's doing the exact same thing that where output pixels was less than zero we wanna set it to zero and everywhere where it wasn't we just wanna set it to the previous results of output pixels. We wanna have an identical function mirroring that for our high where we check only if we, the user told us to do it. We say where output pixels was more than one set it to one and otherwise just keep the value it had in. So that's definitely a very important change. We wanna make sure to commit here. So now if I ever need to go back to this code later I can see exactly which commit actually did the work of changing the functions and that'll help me find the functions more easily rather than having to scan through all the code later. I'll check again to make sure I haven't made any silly errors. Sometimes I do, but cell profile there isn't given us any warnings and it does allow me to add correct illumination of fly to my pipeline. And we can see also here that the, the defaults are set the way that we wanted which is that less than zero is set to yes and greater than one is set to no. Very last thing we wanna do is just teach those upgrade settings that we have new settings that we're that future users might want. And typically you're always gonna edit upgrade settings at the bottom. You can see here our original versions of sub profile actually were written in MATLAB and so we have some things about the transition from MATLAB. So this is one of the very original modules of sub profile, but the previous number was three. So we can just tell cell profiler to expect that if the previous number was three that the setting values should be a little bit longer than before. They should have true because we put in the order that our settings were in the module which is first truncate low and then truncate high. That's important. And the order that they are in settings as opposed to visible settings or create settings is the one that actually determines this. We truncated low and then we truncated high which we said is default true and default false. And so we've added here default true and default false. And that's it. We've now implemented that functionality completely beginning to end. I'm gonna add one more time. I accidentally reused the same commit message, but it's fine. So let's push this now to the internet because right now the stranger only exists on my computer. We wanna, it needs to exist somewhere besides my computer if we wanted to eventually make it into cell profiler. So if I look now at my personal GitHub repository or if I look at the main cell profiler repository it shows that I've made a fork of cell profiler very recently and do I wanna compare and make what's called a pull request? A pull request is a request to make changes. You can see here that I said that we started from cell profiler 319. So if I tried to merge into master it says that it can't automatically because those branches are now pretty different. But if I tried to merge automatically into cell profilers 319 it would be allowed to do that because it's pretty, it's a pretty small set of changes. And see here my commit messages where I was described as I was going, what I was doing and all of the different changes that I made along the way. So I can say that I want to do again like everything else we wanna have this documented and organized and I can use some special functions here in GitHub to if I type the word resolves and reference the issue number which I have handy right here because it was in our branch name if and when I pulled that issue, if and when I pulled this branch that contains these changes it would actually close this issue. I'm not gonna go through and pull today. A, because when you guys are doing this yourselves at home you won't have the ability to pull right away you'll have to wait for one of us to check and make sure that this is a change we wanna make and it's been implemented in the way we want it made but also just because I don't wanna change cell profile 319 anymore, cell profile 319 is dead. So I will remake this change now in master sometime later but we've now seen the entire lifecycle of going from taking a exactly standard copy of cell profiler and submitting new changes which is a good place to stop and see if there are any questions. So questions are starting to arrive and there is one about the cell profiler 4 if you want to update us about new modules will be available. Very many new modules available in subparfiler 4 most of the changes that will be in subparfiler 4 there will certainly be plenty of bug fixes but there'll be things like improved threading and a change from Python 2 to Python 3 so most users probably won't see very many changes when you're looking at it day to day. That being said, there's a tremendous amount of work that went into it but it won't look that different. We're excited once subparfiler 4 is out to actually be able to go through and implement new functions and things like subparfiler 4.1. The last thing I just briefly wanna mention is one thing that I didn't do when I changed my code incorrect illumination apply is to check to make sure that the test that subparfiler has for all of these modules still work or not. These are the tests for this module. Every module has tests and in general because I've added new functionality I should go through and write new tests and I will when I make the real version of this but at a minimum I wanna check to make sure that the test that already exists didn't break because of my functionality. They shouldn't because theoretically all of my defaults replicate what was happening before but it's good to make sure. So I'm gonna check that with a function called pytest and the good news is I haven't broken any tests. I haven't made new tests that I should but in the interest of time I think we'll go on from there but since I made new functionality I should definitely write new tests. What happens if you don't want to go from old subparfiler functionality? What happens if you want to make brand new subparfiler functionality all your own? Well, now we're into the case of subparfiler plugins. Subparfiler plugins as I mentioned before live in their entirely own separate GitHub repository and they contain things that are either really different from anything else subparfiler does such as this module for calling barcodes in optical profiling data or they contain things that require extra fancy versions of Python or programs in Python that aren't ordinarily included with subparfiler. One of the first examples we had of that was this classified pixels unit plugin which is not part of subparfiler because it requires the user to have deep learning software installed on their machine and most people don't and we don't want to require most people to have deep learning software installed on their machine because their machines might be too small to handle it readily, they might be too old and running these on older or machines is gonna be slow. It also makes it harder for us to make subparfiler and get it out to you guys. So this is a very cool module that runs a deep learning network but it is included in subparfiler itself. You'll have to go out and get it. We have two other new deep learning modules that have been made just in the last few months. DOGNet and Nuclei AIs are both based on collaborations between our group and some other groups where they run deep learning networks on various images inside subparfiler, therefore making it easy once you've trained a network to send that network to a lot of different images using subparfiler's built-in scaling capabilities. The modules that are in subparfiler plugins are a little bit different from subparfiler in that with subparfiler, we go through and we do a lot of maintenance, we make sure everything always works if there's a bug, we fix it. Subparfiler plugins are contributed by the community which means we will try to fix them when we can but we don't guarantee that these are always going to work and we might need the original office to come back or someone to help us get all of the different plugins that exist running. They also might have less documentation or in some of them displays might be missing because they're not meant to be sort of ready for prime time, they're not meant to be a real part of subparfiler. That being said, if you think you've written a really good plugin and you think it actually belongs in subparfiler, put it in the plugins repository first, let us play with it and then let us know in the main subparfiler repository that that's what you want to do and we will happily go through and take a look to see if we think it actually belongs in the main subparfiler itself. Now, there are some functions and parts of modules that I didn't talk to you about before when we were just going through and editing a module that already had all this functionality in place but that we should talk about a little bit more if you're gonna be writing a module from scratch. You remember, we covered settings and we covered run but we didn't really talk about display or measurements. In a plugin, again, this is technically optional. I can admit that I have written some plugins that did not have displays but you almost always want this and certainly if you want something to eventually be promoted to the main subparfiler project, you definitely want to have a display built into your plugin. Like with everything else we've talked about so far, we pre-prepare a lot of the functionalities for this so that you shouldn't have to write your own for a lot of different things. If you want to put in a table or a scatter plot or show a color image or a black and white image, all of those we have pre-built functions for you so that you can pretty quickly write a display module that goes through and makes a sub-profiler compatible display that will look familiar to users of sub-profiler. Thing that we didn't touch at all in the correct elimination apply module was measurements. And that's because it's one of the rare modules that actually doesn't make any measurements. But most modules do, like I said, even if it's just a count of how many objects it found. There are four general functions that if you're in a module that makes measurements, you will pretty much always need. Get measurement columns is a module that helps any measurements that you make inside an individual module. End up in the exported data structure. So whether that's a spreadsheet or whether that's a database. If you have get measurement columns set up correctly, your measurements will flow nicely into the database or the spreadsheet without you having to do anything else. You don't have that function set up correctly. You can be making all of the most wonderful measurements in the world, but they'll never show up in your output. The other three have to do with some the interactive behavior and cell profiler. And specifically with several modules that can on a per image or per object basis, allow you to go through and select dropdowns to try to say I want to use a particular measurement that already exists for filtering an image decide if I want it or not, or filtering an object to see if I think it's a real cell or a piece of debris. And so in order to be able to have this dropdown structure, we have things that are get categories, get measurements, and then either get measurement images or get measurement objects, depending on if it's an image or an object thing. It's slightly annoying that you have to fill in all four of these, but you generally fill them in altogether, they have a lot of values that are in common and they allow us to have this really interactive user experience that allows users, if they've already made hundreds or thousands of measurements to not have to scroll through all of them to find just the one that they care about but allow them to sort of filter to first the right category and then the right general measurement and down to the specific one that's right for them. Because it's pretty routine to make thousands of measurements in a cell profiler pipeline, we want to make it easy for people to find the one that they need. So let's go through now and actually make a cell profiler module. The biggest part for me, whenever I'm thinking about a new cell profiler module or a new cell profiler functionality is just brainstorming and thinking about what is it that I want to make? Sometimes this comes from an idea of a project that we're working on in the lab that we don't really have a way to get working, even with anything else in cell profiler any other piece of open source image analysis software. In general, if you can't do something in cell profiler but you can do it easily in Elastic or Fiji or QPath we're happy to just work in those pieces of software instead but sometimes we want to keep things inside cell profiler for reasons of convenience. Because I tend to come from it from a project angle, I tend to try to think very specifically about what is the behavior I want in cell profiler and build it from that way, build it from what do I want the user to see and what questions will the user answer to get the exact behavior that I want to see in cell profiler. And the behavior that I've chosen to add to cell profiler this week is this idea of a threshold all. This exists in other pieces of software. And again, in general, we don't want to reinvent the wheel totally. So for example, here's ImageJay's implementation of a try all threshold module and here's psychic images. All of these pieces of software have their own measurement or a thresholding algorithms that they do though. And so we want to make sure that if this is a behavior that we think is helpful to have the user be able to see not just the output of one kind of segmentation at a time but many kinds of segmentation at a time, we want to make sure that we're showing them the kinds of thresholding that exists in our software. So what I decided to do is to implement a version of this that's inside cell profiler that uses cell profilers existing thresholding algorithms. So I don't have to write any new algorithms but just creates a pretty display like this that allows the user to easily pick between things. And again, for me, the first part of doing this and about making the coding process as painless as possible is just thinking and planning about what do I wish could happen. And for me, again, I tend to start thinking of that in a what is the cell profiler module that I wish existed? I don't usually go all the way to the point of actually drawing something out but I did in this case because, fortunately for you guys, you don't have to see the crazy things that are inside my brain. So what I want to see here is a module that looks something like this. Now, this isn't how the final one actually turned out exactly, but it's pretty close. I want to have an input image. I want to have an image that I want to test different thresholds on. And there are 10 different kinds of thresholding algorithms that are inside cell profiler. And I've made sort of an arbitrary decision that six of those are three different kinds of OTSU done either globally across the whole image or locally across different chunks of the image. Now, again, this is actually one of the few modules that may change a lot in cell profiler four. So this everything I'm saying implies most particularly the cell profiler three. But in cell profiler three we have 10 different functionalities and I've decided that four of them, the four that we use most commonly in the lab, which are minimum cross entropy and the three different global kinds of OTSU will be run no matter what. Now that's a choice. I could have literally made 10 yes no buttons here for the users to decide which they want to do but that makes the module really long. And in general, it just makes life a little harder for the user. But once the user has beyond those first four thresholding modules or thresholding settings that will always be tried, they can choose to say yes to have three more kinds of OTSU run with an adaptive window size that they set. And what these arrows mean here is that if this is set to no, this will not appear but if it's set to yes, this box appears. The same thing will happen if they choose they wanna use a manual threshold. If they say that they want to try a measured threshold based on a previous measurement that's been generated using that dropdown functionality that I mentioned or if they wanna try the robust background thresholding method, which works really well in some cases, but has a lot of configurable parameters. These options down here, the smoothing scale, correction factor and lower and upper bounds are ones that I've copied exactly from the threshold module itself. I wanna as close as possible recreate the functionality that we already have. A, so I have to write the least amount of code possible and B, so that it's familiar to the users so that they automatically understand if they understand the threshold module what the settings are here. What I've also chosen to do is that one output image will be output from this. The user can go through and pick what they think is the threshold they're most likely to want and then choose to have that passed forward. Now again, this is also a choice. This could have literally just been a module that made displays and nothing else, but in general, the user probably doesn't wanna go through the process of trying this several times and then deleting it to add a new threshold module. You wanna avoid them having to duplicate that. So I've chosen to have it make an output image going forward, but again, that's something where you could have easily made the decision not to. And in terms of what the display will look like, this was my initial prototype of what that would be is that we have our input image here that in our minimum cases where we have four threshold settings that we have the four that always run with what the reported back threshold is. And also that we have a histogram of the pixel intensities from this image with a red line indicating for the threshold image that's passed forward what threshold was picked as well as dotted lines saying what the lower and upper bounds are. So the user can with these settings set a gate that thresholds always must fall inside. And so it might be interesting for the user to see if the threshold is actually falling inside the gate or if it's being pushed inside the gate from the edge. So I've chosen to add that there because that's something I've wished that existed before. When we have a maximum number of cases, the output window, the major layout is the same, but in my idealized version before I actually started coding it looked like this on the right instead. And again, what we ended up with is not so different from is the actual function that I wrote. So if I do get status in my self-profiled plugins repository which I made a fork of and cloned in the exact same way I did the self-profiler one I'm on branch threshold all which is the one that I made this function with. And I put this inside a working directory just so that I could make changes to it little by little and only be checking on the changes to it. I also made a self-profiler project file that I probably won't keep in the final repository but it made it easy for me to as I kept adding functionality or changing functionality quickly load up a thing to see if my functionality worked or not. Now where did I start though? Where I actually started was inside self-profiler where inside the modules folder we have a plugins folder and that plugins folder contains two template modules that you can follow to create your own. The measurement template gives an example of a module that takes images or objects in and puts measurements out. And you can go through and look at these later. So I won't worry that I'm scrolling through it a little bit quickly, but it gives an example of a function someone might want to make a measurement for and how to, for example, configure all of those get measurement columns and get categories I mentioned previously. That's a really helpful template to have if you're creating your own function. But what I started from was the image template because what I wanna do here is ultimately an image processing step. I want to start from an image that goes in I wanna eventually have an image that goes out. And then what I want to have in the middle is up to my run function. So that's exactly what an image processing module does. So I started from this. So in the interest of you guys not having to watch me spend eight hours coding this thing which is about how long it took me to make it completely from start to finish. I'm gonna do is I'm gonna take advantage of the fact that in GitHub I can jump back to previous commits because I was committing as I went and I can go back to previous points in the code and show you how this evolved along the way. And I can do that just by typing get checkout and the commit message. You can see now my file has changed a lot. So what does my module look like to start? The way a self-profiled module is actually set up is we start by importing some things. And these are other self-profiled modules I thought might be helpful and specifically since most of our functionality is coming from the threshold module I thought the threshold module might be really helpful to have. I made a list of all of the different thresholds I might wanna try just because I'm gonna iterate through this a few times throughout the code. So I wanna have it. I don't wanna have to redeclare it every time. And where I started, where I always start is I started by thinking of what was the concept I wanted and then I wrote the documentation. Some people choose to write the documentation last but for me it helps me get my thought process organized. What do I really want as input? What do I really want as output? Do I, under technical notes, I've mentioned what was inspired by it. I always wanna attribute where my ideas came from. And I also wanna say that all of the threshold is actually implemented from the threshold module. So someone has questions about that code or about the things that are implemented here. They know where else to look. It also notes that this module supports 2D and respects mask but it doesn't support 3D. Thresholding in general supports 3D just fine but the module display looks different in 3D versus 2D. So I chose for now to only support what the 2D displays look like. And so I tend to write a module from literally top to bottom in that again I'm gonna start going through those four classes I mentioned, make the documentation and then make the settings. So for my settings, I started with an image template. So this is a self-processor module image processing. It's inheriting all the special things that come from image processing. And all of this is documentation that already existed in the original template module. I haven't added any of this in but it's very nicely documented to start. And what I started by doing was just literally have my chart here side by side with my setting or with my code. And I went through and started with the first thing is what is the input image name? And in image processing modules that's typically referred to as X. And then I went through and sort of line by line of my fake module that I diagramed here. I wrote down what I wanted to do. It asks whether or not you wanna try a particular setting and it shows the setting for once you've decided you wanna try it, what the actual setting would pop into existence if you say you wanna try it or not. And line by line, I just went through the module diagram and implemented each and every one of those settings. When I got down here to where some of the settings are very complicated and have a lot of extra information that's technical that I didn't wanna have to rewrite from scratch, some of these I literally just copied directly from the threshold module. So there are a lot of settings here because we have a lot of potential things we want to try. But again, I put them in just by following the order that they existed in the threshold module and the order they existed in my diagram. And here's where I stopped. I stopped just by implementing that the settings choices first and decided that that was a good place to commit because there was already a lot of those. We fast forward in time of it. I'm now at the first version that loads. So if I open my self-profiler project that I made to test this along the way, which of course is being cranky because live demo, looks like that might not have made it into the next one. But so I'll go ahead and just skip to the next commit. But what has changed in this commit since the last one, at least times like this one, correctly. Is that now my settings are now put into that settings and visible settings. My run function is literally just pass. My module doesn't do anything, but I could at least at this point check to see if it does it load in self-profiler nicely. I then went in and implemented just a couple of the different thresholding modules with some print statements on them. They're still not writing to anything permanent, but they're at least now showing me whether or not I can get output or not. And my items are closed on this because I haven't written my display function yet. I'm starting just by seeing if I can get my run function working. And down here in my thing, I can see that my print function is actually printing something out, meaning it is actually processing the threshold. And I can go in and I can add a threshold module to see if it matches, but I'll go ahead and tell you that it does. In the interest of time, I'm not going to open each individual one up in self-profiler, but to show you what's changed now, now my run function not only just prints things out, it does have a print statement there, but it actually now has all of my different thresholdings, not just one or two implemented, and it actually puts them all into a dictionary and runs them all when we want them. And it does that after checking whether or not each individual one should run. Fast forward a couple of hours later, and after all of my run functions, I've added in my ability to make all of my measurements. And all of these commits are online. You can go through and look at them yourself or just check out the whole branch to see beginning to end what changes. And finally, I have a version of this that does more or less what I wanted. If I load my debugger one final time, she said, drinksing herself. And see now my test pipeline as my things have gotten more functionality has gotten a little longer because I wanted to test, for example, that my measurements could fit into one of these downstream things and that they also properly went into my spreadsheet. This isn't the most beautiful module display I've ever made, but for about a day's worth of work, it does what I hoped it would do, which is it allows me to run 10 different thresholdings all at once and then look at the output slightly slowly. This looks best if we maximize the whole screen, but again, this is a module it's designed to be used interactively. We have our original and our thresholded and I can zoom in on particular region of those. And the one that's up here is the one that we said is the one we think we want to use final. That one is indicated on the red line here with the images histogram and with the dotted lines representing the limits. And my things down here are a little smaller than I would like, but there are certainly even with 10 of them big enough to get the basic idea and I can't zoom in on them in the same way I can here, but I can zoom in on them one at a time if I want to get a better sense of what's going on in each one. And that's it. So that plugin has now also been committed to online and what I can do right now is create a pull request. And again, you can see 15 steps along the way. This now exists in the self-propelled plugins repository. You can go in and check out what looks different at each of those individual steps. The last thing I simply want to say is thank you to all of the Carpenter Lab members, especially Alan and David for being on the call today. Really want to thank you BIS Academy for inviting us to come and do this. And CZI and COBA are our major funding sources right now for the lab. COBA particularly is doing a lot of stuff with online image analysis and open image analysis. I definitely recommend you check us out. We're tagged in the forum. We're doing office hours this Friday for self-profiler and a lot of other cool stuff. So please do check out COBA. Thank you guys. We have time for a few last questions. So meanwhile the participants, they submit other questions. I want to say Alan and David, they have done really good job answering everything. So that's why you don't have the wrong question. But I can make a personal one if you allow me. So we underlined the importance of having a way to report about the image analysis we have done, which parameter we have used, which values for threshold and so on. Is there any tool in self-profiler that allows to do automatic reporting or if you are thinking to implement these two? Yeah, so in the export to spreadsheet or export to database module, if you're writing any measurements out at all, it creates an experiment either spreadsheet or table and it will tell you what self-profiler version, what modification of the pipeline, what time it was run and it actually lists the whole pipeline. So every module that you ran and the value of every setting in that module. So as long as you don't turn off exporting of that experiment spreadsheet or table, versioning is built right in. Okay, thank you. So somebody asked if you can share the link to these threshold plug-in because it looked really interesting to people also to see effect of different operation on the same image. And I want to thank you again and that we are receiving really positive feedback. So thanks for being here. Thanks Beth, thanks Alim and David, Anna Klenn and Julian. And thank you for participating. Thank you to all the moderators. I've been calling out my web mates, but again, without the folks from new bias, none of this would be possible. So really appreciate it.