 Welcome to the day five of the fragmentation training school. Today, we will have a lesson about benchmark theory from Professor Mikal Kosibak of Maserik University. We will see BioFlows, a benchmark platform, explained by Sebastian Tosi, head of BioImage Analysis Facility, Danish Bioimaging Infrastructure, Volker Becker, Image Analyst and Developer at Montpellier BioCampus and Benjamin Padi from Vibcore Facility in Leuven. Thank you. Please ask Professor Kosibak to share the screen. Good afternoon everybody. It is a pleasure for me to participate in this training school of Noibia's Academy. So thanks for the invitation and I will start talking more theoretically and the colleagues will continue in more practical way. So I will introduce some theoretical background about metrics and benchmarking related to BioImage Analysis. So I'm from Maserik University, as it was mentioned, and I am heading a unit in the Discord Center for Biomedical Image Analysis. So first of all, I would like to talk about motivation. How do we do benchmarking? And then I will talk about the design of a benchmark or a challenge. So challenge is another name for a competition in this biomedical imaging or bioimaging field. So challenge is actually also a benchmark, but with some sort of competition in the community. So in this respect I will talk about data, data set selection, and about evaluation of algorithms, about metrics. And then I will somehow summarize. So let's start with benchmarking. Why do we do benchmarking? Because historically the tendency was to use very simple methods and it was not possible to compare methods to each other and any reasonable solution was applied for a long time. And then at the end of the last century, people started thinking about comparing methods to each other and realized lack of standards, but it was very difficult because there was no internet. So there were only some FTP sites with some sets of standardized images, but there was no correct answer that later on was called ground truth. So only after the introduction of internet and web, we had some standardized data sets along with correct answer that we call ground truth. And such annotated images became the basis for proper objective benchmarking. In biomedical imaging, the first benchmark was a registration benchmark and it was in 1990s organized at Vanderbilt University. At the beginning of the century, medical imaging community started to produce benchmarks and competitions called challenges and approximately one decade later, bio-imaging community also started to produce such benchmarks and challenges. And it also became sort of politically supported by different organizations and different bodies. So there was really hunger for standardized protocols quality control, validated tools, and so on. And then these different competitions called challenges started to appear and especially they were associated with two most important conferences in biomedical imaging community and it was MIKI and ISBI. And if we look at just bio-image analysis changes, it was mainly at ISBI. Concerning the task, the most common task that people had to solve while competing in the challenge or while producing or using the benchmark was the segmentation task. So segmentation was the most common task tackled by the community and on the second place the classification task or the third place detection task and then tracking and some others. Concerning modalities in general in biomedical imaging, the first place was magnetic resonance and concerning bio-imaging so microscopy modality that was on the fourth place. So there was a bit of history and motivation and now let's switch to the design of the benchmark itself. So the first part of the design is the data, data selection. Here we need to take care about the representativeness of the data so cover variability of image objects then we should decide on using real or synthetic data or both. We should annotate the data and we should split the data into training and test data. So training data used for training our methods and test data used for testing the performance. Sometimes test data are hidden to participants of the team and only codes are submitted and then they are evaluated on the data by the organizers of the challenge. Algorithm evaluation is usually done by computing one or more metrics and these metrics actually are measures that measure the performance of the tasks of classification segmentation or whatever and if we compute multiple metrics then we have to merge them somehow and create rankings based on those metrics and finally it's ideal if the benchmark or challenge is kept for future uses. It's somehow maintained or even updated and it's possible to submit new methods afterwards. So if I now speak more closely about this individual aspect so concerning the representativeness of the data we need to cover the variability of sizes, shapes, textures, densities or speeds of objects in time lapse. Then we should cover various events so if we are interested in mitotic events or apoptotic events we should have enough of those in the data in time lapse imaging. We should also cover the artifacts so we should not present benchmarks, we should not present participants with data that are clean and dust free and noise free but we should present the data as is usual in normal acquisition conditions so including those artifacts because we need methods that can cope with these artifacts and with a certain level of noise and blur etc. Then we should keep the natural proportions of different objects and phenomena and then we should think about real cases. So only if we have some very rare events then we should somehow improve the frequency of occurrence of these events otherwise the methods will not work well for real event or real type of objects. Sometimes datasets or benchmarks already exist for certain applications so it's of course worse having some look at existing benchmarks and only if they are not comprehensive enough or something is insufficient the task we need to solve is different than it is worse than increasing additional benchmarks or datasets. So there are quite a lot of benchmarks already around and a lot of benchmark datasets so it is worse having a look at those. Now the next aspect is real versus synthetic data. Of course both types have advantages and disadvantages so the advantage of real data is that they represent the real data best of all while synthetic data can have some other properties not necessarily the same statistical properties for example as the real data but on the other hand high-quality synthetic data generators can simulate quite well the advantage of this training data to develop methods and the advantage of synthetic data is that the ground truth is known precisely and inherently while for real data we need annotators and we need to annotate those data. Typically the synthetic data for bioimaging are generated in several phases. Microscopy process is simulated so some blur is introduced and then the detection is simulated by the camera or EMT or whatever and some noise is introduced. Alternatively nowadays we can use also so-called guns generative adversarial networks to produce similar images or use or augmentation or for real datasets. We can also produce time-lapse synthetic sequences just by generating one frame after another using some moving digital quantum. Now let's switch to annotations so synthetic data are not we do not need to annotate synthetic data but we need to annotate real data and usually it is done so that we ask several experts because one expert is not enough due to some level of uncertainty so each expert produces slightly different result so it's best to hire at least three experts and produce three annotations and then use them for example majority voting so if two experts agree there is a cell in certain place, in certain time point then we say in the final ground through the there is a cell and if for example two experts saying that there is a division at particular time point then in the final ground through there will be division. Sometimes if we need just some numbers like number of objects it's enough to average the values entered by different annotators or exclude minimal maximum so exclude extreme values and then average. Then you can also encounter the terms gold standard and silver standard instead of ground throughs because some people use the term ground throughs only for synthetic data where the correct answer is not precisely while if we use human annotations it's called sometimes gold standard annotations or gold standard corpus and in addition we can also create computer generated annotations by using the best methods that we have and somehow combine their output and this is called silver standard annotations produced by computers. Of course silver is worse than gold but we can generate much more silver annotations than gold annotations so sometimes this silver annotations generated by computers or combining the best methods are used as well or they are used as the first step and then they are curated by humans. And finally concerning data we need to split them between training and test. As I mentioned test data are sometimes kept completely secret or at least ground throughs for test data must be kept secret if we organize some competition and statisticians call this hold out method so we just take some part of the data just a part and keep them for evaluation. Usually for training we take majority of data for testing we take minority of data but in practice sometimes even 50-50 ratio is used but it's better to take more for training and less for testing. And of course during division between training and test we should keep the properties so both training and test data should cover the variability of events or of data properties. Now once we have the data we can think about evaluations about measures often called metrics. So the goal of these measures metrics is to evaluate the algorithm performance by quantitatively comparing the output that the algorithms produce with the correct answer so with the ground throughs. So for example we compare output cell masks with ground throughs masks. And there are basically three levels that we can single off either we think at pixel level so we do the pixel classification which is also called semantic segmentation or we attend to objects at object level so this is classical object detection task or instance detection task or sometimes it is also accompanied with classification and finally we can think at image level and we can classify whole images into several classes. Now if we use just two classes for classification it's called binary classification if we use more classes more than two it's both general or multi-class classification but as you see the most frequent task is classification and even segmentation is semantic segmentation is kind of classification and then there is a special term or special category of tasks that is worth attention and that is instance segmentation which is actually a combination of two tasks one task is to detect the objects or instances for example of cells and the other task is to do semantic segmentation per each object so pixel wise segmentation of each object which means these are two things in one that deserve ideally separate evaluation but I will come to that later on. Okay so as I mentioned everything or nearly everything is classification so how do we approach classification so let's start with the easy binary case so we have just two classes let's say for example we have some infected and non-infected cells and this left part of this image let's say are those infected cells and the right part are non-infected cells and the goal is to find the infected cells so let's say the computer finds the objects within this circle so these green ones are called positive so this is what the computer was expected to do and this is the cells that are not detected but should have been detected are called false negatives then we have cells or objects that were detected but should not have been detected those are false positives and finally we have objects that were not detected and should have not been detected and those are true negatives so for classification if we speak about detection the difference is that we do not have true negatives because for detection we just detect objects we do not classify them so we have just false negatives true positives and false positives we do not have true negatives and now people have defined different ratios with different names so these are precision which is we compare true positives to all positives so actually this is how precise the method is in terms of how many percent of positives it has found then we have sensitivity also called recall which is true positives divided by false negatives so this is actually how many out of all wanted objects we have found then there is specificity that is related to true negatives and finally accuracy where we compute the ratio of true positives and true negatives over all objects so number of objects so everything that was classified correctly divided by the number of objects and finally for detection especially for detection where we do not have true negatives F1 score also called F score is very popular which is defined as harmonic mean of precision and recall so in this way it can also be expressed in terms of true positives, false positives and false negatives so this is very popular metric so this is for binary case now if we go to the general case where we have three or even more classes again the easiest or most straightforward metric is accuracy which means I count all those number of objects that have been classified correctly which is the green cells in this table this table by the way is called confusion metrics on one axis I have the actual class on the other axis I have predicted class so the diagonal is actually the correct answer and everything else is wrong answer so accuracy means I compute the number of correct answers divided by the number of all answers and if we want to compute in addition to accuracy we want to compute precision recall or F1 score for multiple classes there are multiple approaches we can do it per class so we can compute precision recall per class and then compute for example average or weighted average where the weights are the frequencies of occurrence of individual classes but the less common way the most common way is to consider it everything at once so to compute sort of micro precision recall so I concentrate on correctly detected through positive divided by through positive plus all false negatives which means again green ones divided by all so in for multi-class actually this precision and recall are the same as accuracy and the same as F1 so for multi-class we just speak about one metric and this accuracy we usually do not compute precision recall or F1 score except for some special basis then there is some dependence of precision and recall we should mention that these two metrics are mutually dependent and we can even plot the dependence sometimes if we have if we are able to set different recall levels for example for modern neural networks we have some probabilities of outputs so we are able to set different thresholds for those probabilities and we can compute different recall values and if we plot this precision recall curve then we can take the area under this curve it is a famous measure called average precision by the way all these values that I have already spoken about range from 0 to 1 0 is the worst value and 1 is the best value so also on this axis recall and precision go from 0 to 1 and the area under the curve also goes from 0 to 1 if we have multiple classes then we in addition average over classes and this is usually called mean average precision the values of let's say 0.9 or 0.9 to 1 are usually considered as good results of all these metrics for example 0.6 to 0.9 let's say somehow tolerable or good acceptable and if it's less than let's say 0.6 it's not much much right and precision and recall speaking about them sometimes we might want to favor one of those we will talk about it later and alternatively we can have a look at another curve instead of precision recall sometimes sensitivity versus 1 minus specificity is plotted this is usually done in the case that we have true negatives as well so no detection but classical classification problem and then we again compute the area under curve it's called AUCROC under area under curve for ratio operating characteristics so this is another statistical thing that we can use and now how to favor precision recall there is a special metric called a beta score here you see the definition you see that false negatives have better squared weight false positives have one weight so this better regulates the weighting between the importance of false negatives and false positives and thus regulates the importance of precision recall for beta one this is classical F1 score for betas and positives are equally important for beta less than one precision is favored so there is high penalty for false positives and for beta greater than 1 recall is favored for example in medicine we might want to favor finding cancer cells to producing additional cells that are not actually cancer but they are false positives. In time-lapse imaging, we might also favor to detect all objects, even if some additional objects that are not there are produced additional false positives. So if one needs to favor equal precision, one can use a better spot. So this was in general for classification. Now let's switch to segmentation. So for segmentation, meaning here, I mean pixel level thinking. So we compute some masks for objects and we have some masks for correct answers or ground truth masks. So let's say this set A is the ground truth and this set B is what the algorithm has produced. And now the common approach to measure the performance of the algorithm is to look at the intersection of the detected object and the ground truth object. And there are two standardized ratios. One ratio is intersection over union, sometimes called the direct intersection over union. Sometimes called jacquard, similar to index. The other ratio is two times intersection divided by the sum of the sizes of those objects, which is actually intersection divided by average of those two sizes. And this is actually pixel level at once form, alternatively called also dice or dice similarity coefficient. Both these ratios yield the same ranking of the methods. So there is actually not much sense in computing both of them. It's just a matter of preference. For example, medical imaging community or biomedical imaging community prefers usually dice, whereas computer vision community usually prefers IOE. And both these ratios are also normalized from zero to one, worst to best. Now it becomes a bit tricky if we have multiple objects in the image. So if we deal with, for example, cells, multiple cells in the image, then the correct approach is to treat each cell separately if we want to measure these ratios. And we need to pair the found objects with the ground truth objects. And for each pair, we can compute this ratio while pairing, we can match the objects either based on IOU criteria. This is common in computer vision community, IOU larger than 0.5. Alternatively in cell imaging, it's maybe even better to use not intersection over union, but actually intersection over reference, size of reference object. So equation that the intersection is larger than 0.5 times the size of the reference object. This has the advantage that it counts also non-spirited objects, which is a common mistake in segmentation algorithms in cell imaging that the algorithm produces one object instead of two objects. So the classical computer vision approach will just ignore this result while this alternative approach will count it somehow. And then we can compute average over all objects of this ratio. And now, so this was for pixel wise segmentation let's say segmented segmentation. And now a few words, few more words about instance segmentation because I mentioned it's a problematic thing because it's a two in one. It's the detection and segmentation, pixel wise segmentation in one. So wrong approach to instance segmentation evaluation is to compute just dice or just IOU over the whole image because we have multiple objects there and we are interested in evaluating each of them separately. Another wrong approach is to compute just object detection metric for just, for example, one fixed IOU value because then we evaluate only detection part and not the segmentation part of the instance segmentation. The correct approach is to use one metric for object detection and one metric for pixel wise segmentation but per object and then we can average over objects as I mentioned on the previous slide or alternatively sometimes what is used is just one metric that combines both. So we basically use object detection metric like F1 score AP or whatever but we compute it for different IOU thresholds with a certain step like for example, 0.95, 0.9, 0.85 up to 0.5 and then we just average over this IOU thresholds as an example I have here, 2018 the famous Kegel data science Boolean nuclear segmentation challenge that computed F1 score for these different IOU thresholds and then averaged. So instance segmentation is probably the most complicated task to evaluate from all these. In addition to overlap based metrics we also have shape-based metrics. The classical representative of this shape-based metrics is house door distance which is actually let's say the longest out of the shortest distances between the two boundaries. So we have the reference boundary and we have the found boundary by the algorithm and we are interested in distances between these boundaries and first we compute the largest distance let's say from the green boundary to the blue boundary and then vice versa. It's not the same because the largest distance from one curve to the other is not the same as the opposite. So we have to compute both directions and then what is done is usually maximum mistaken. So this is basically the largest error in the boundary displacement from the ground tools. The disadvantage of this house door distance is that it's very sensitive to outliers. So if we have just one protrusion in the whole boundary it may spoil the whole result. The alternatives that I usually used and preferred over the classical house door distance are alternatives that suppress extreme values of all these distances computed. So we compute for example all those distances along boundaries, we have many values with certain steps and then we let's say disregard 5% of extremes and we compute the 95th percentile of all those distance values. So this is less sensitive to outliers to extreme distances that are very, very rare. Alternatively, we can compute average of those all those distances. So this is just to tell you that we have also shape-based metrics. Concerning just simple measurements like position lens or size area, what is done is usually we measure the root mean square distance which is just square root of sum of squares of distances. It can be measured in 2D in 3D. So this is classical metric for positions or lens measurements. And finally for comparing signals like 1D signals or even images we can also compute correlations so if we need to compare one image to another image like pixel-wise whether they are similar then correlation is used. Usually it is normalized in this way as you see it but due to time requirements sometimes the denominator is omitted and only numerator is computed then it is called cross correlation without normalization. Alternatively, 1D signals can be if we have two of them so one measured and one reference we can plot this header plot. So one signal against the other one. If they are similar we will get points experimental points along one line, a straight line. If they are not similar it will be somehow random or there can be even negative correlation then it grows the other way around. So not from bottom to top from left to right but from top to bottom from left to right. So this is standard correlation. I think correlation coefficient that ranges from one to minus one. Okay, so this is the summary of what I have talked about concerning metrics. So classification segmentation, shape similarity and so on. Alternatively, we can also measure speed and memory consumption might also be interesting as additional information maybe not to rank the methods but as additional information. And if we have multiple metrics we have to merge them but it is always interesting for example in the papers to publish them separately if they are measured not only after merging and if we want to merge them for example to get some winners of some competition then we should assign some weights to the individual metrics and compute for example weighted average of multiple metrics after we normalize all metrics to the interval zero to one ideally. Many metrics are naturally normalized from zero to one other metrics like how of a normalized we have to normalize them. And if we merge multiple metrics we have to take special attention if they are mutually dependent then it's not for example good to take precision and recall like with some weighted average of them but of course if we have precision recall or sensitivity versus specificity it's better to do it using area under curve measurement if possible. Okay, so that was about metrics and now just a few words about existing or summary and existing benchmark metrics. So this is just once again what I have spoken about summary about the representative needs, annotations, splitting the data. It's always nice to also provide tools so that compute the measures to participants of the benchmarks or changes because if they program it themselves then they might do some error in the metric computation and it will spoil the participation in the competition. And of course we should do it ideally open a science way so release the data sets, training data tools everything somewhere on the web except for test the data ground tools or even test the data themselves. Concerning benchmarks there are multiple benchmarks around not associated with any competition changes probably the most famous one is the broad by which benchmark collection BBBC at a broad institute but there are some others. This BBBC runs from 2008 and they have quite large collection of annotated images both simulated and real, mostly real and they can provide annotations for them. At our university we have some database of synthetic images of cells, tissues at different SNR ratios, clustering or densities, clustering probabilities and so on. And probably the most important resource is the list of all challenges in biomedical imaging including bio-imaging at grant-challenge.org that is maintained by Bram van Gneken from the Netherlands and there you can find several hundreds of challenges that have been run up to date and with links to the web pages and data sets and so on. Concerning bio-imaging these are the earliest challenges up to 2015, you see already the initial challenges were quite diverse concern topics there were different areas treated and many of them were repeated multiple times. We ourselves have been running over the four 10 years, one of them and that's self-tracking challenge. So finally a few words about it. So what we do here at Maslow University, we take about self-tracking challenge since 2015 which is a challenge that provides data sets of time-lapse sequences of cells of different types and the goal, there are two benchmarks, one is for segmentation and one is for tracking. Examples of images. Yeah, here is the challenge history. We have had some fixed date competitions at different ESBs, already six of them but we evaluate also continuously each month, the submissions and here are some examples of the data sets from the first edition, from the second edition, from the third edition, we included also very difficult Drosophila embryogenesis data set from Lysik Microscopy and then fourth edition and fifth edition, even more complicated embryogenesis data set of a beetle called Trilobium Castineum. And finally, some other modalities like DIC, we also have face contrast Bridefield and so on. And we had to develop one special metric because for segmentation we used Jakart IOU but for tracking there was no metric around so we had to develop a new metric comparing actually the reference graph with the participant graph because the result of tracking is some Linux tree. So we had to compare trees or more precisely forests many trees to each other and we invented some metric comparing these trees and we published it and use it in the self-tracking change. Basically we compute the number of errors while transforming in the easiest way one tree to the other tree. And these are the organizers of our challenge. So this was just an example of what we take about here for 10 years and here are final slides, some references. In the large international collaboration we have published two papers about metrics focused on pixel level, object level as well as image level focus. So object detection, semantic segmentation, instant segmentation tasks. And we analyzed their pitfalls and recommended which metrics to use in which situations or not to use. Then I wrote a book chapter on benchmarking challenges some time ago, but the main principles are still varied. I mentioned already the very useful database of challenges at grandchallenge.org and here is the reference for our self-tracking challenge and for a paper about some peculiarities of challenges. So to say that rankings in these competitions are not always or should not always be taken as clear thing. There are some problems with some challenges and with some rankings. So the interpretation should be done with care because sometimes rankings are very sensitive if you take small part of test data away and recompute the rankings change. So it's not easy actually to make the benchmark or challenge prone to let's say small changes in test data or other small changes. It's quite complicated field and still being researched how to organize challenges and benchmarks and metrics appropriately. So that's shortly about all these metrics and challenges and you're welcome to ask questions. Thank you for your attention. So we're gonna introduce you to Fire Flows which is application for benchmarking and reproducibly deploying bio-image analysis workflows. So this is the outline. There is Sebastian and Benjamin who are with me and each one will present a part of it. So the first thing, the first part is me. I will talk about the reproducibility aspect of the software. So you heard about the metrics and Bio-Flows is for benchmarking but it can also help with reproducibly deploying image analysis software. Yeah, so then I will just after that I will just give a short impression of the architecture of the software architecture of the application. And then Sebastian will give you a demo and show you the content which is actually apparently there. And Benjamin will show you how you can add new content. So for example, the problems that program classes that are already there is like training or filament tracing or a nuclear detection but if you wanted to compare deconvolution algorithms you would have to add a new problem class. And then I will show you how you can add your own workflows into Bio-Flows so that you can use a benchmarking and also run them in the cloud or on your own machine. And finally, Sebastian will talk about future developments. Okay, so let's start with reproducibility. So reproducibility in science means usually a researcher can duplicate a result of a prior study using the same materials and procedures as were used by the original group who did this. Now, it's not so clear. There's a bit of confusion about the vocabulary. So let's have a little look at this. To repeat something would mean the same lab runs the same experiment with the same setup. Just repeat, then replicate an independent lab runs the same experiment with the same setup and reproduce another lab, the rise, the experiment or the setup. And there is different usages of these two words. So if you read some papers about it, you'll have to look what the persons actually mean by replicate or reproduce. And then there would be reuse that would be to transfer the setup or the experiment to another experiment or use it as a building block of another one, for example. So now we are here concerned about the bio image analysis software. And of course you would think this as a computer program. So it's very easy to run it. Each time you run it on the same data, you get the same results. And of course that is true to a point, but your software doesn't live in an isolated island somewhere. It is usually has a lot of, it can have a lot of dependencies. It lives kind of on the internet. And this environment is changing over time. And if there is no maintenance of your software, usually it will stop working after some time. It could be like a short time or a longer time. And this phenomenon is called a software rod. And it's actually a research topic in software engineering. So if let's have a look at bio image analysis in the context of a scientific project. So the researcher has a hypothesis about biological objects. And he wants to test it using imaging. So he does his experiments with his cells, he does his objects, then goes to the microscope, takes the images and uses bio image analysis to extract data from these images. And then uses data analysis to extract information from this data. And then comes to a conclusion about the hypothesis. So why should it be reproducible? Now the science should of course be reproducible. If another group cannot reproduce what a group did, then we usually don't accept this as being a true result. And of course, if we would build on unreproducible results, then we would probably waste a lot of time on drawing conclusions from an error in this space. And concerning the software aspect as a bio image analyst, we also want to reuse bio image analysis workflows, of course. And if they're not even reproducible, there's a little chance that they are reusable. So who would want to reproduce or reuse bio image analysis workflows? First one would probably be the reviewers of a publication. Then there would be other biologists or analysts who want to do a similar or the same analysis on their own data. And of course, software developers, people writing plugins or platforms for image analysis. Might want to build tools on this. And last but not least, the original author or his group might want to use it again later. And of course, you know all the situations PhD students who wrote the analysis is not there anymore. And no one knows how to do it. So why is it difficult? What are the problems? So the basic idea is if you come, if you have a bio image analysis workflow described in a publication, ideally you would have a link. You could go there and you could immediately execute it on with the provided data and parameters and then also try it on your own images. But it's not often like this, of course. Sometimes even the algorithm is not available. I guess this case where it's only written to use this as a platform like we used in the G for the image analysis. I guess that's not really accepted by a journal these days anymore, luckily. So now it's the algorithm is described but not implemented. Then there are two questions. Are there enough details that I can implement it? And how much effort would this take? Could be that it's just too difficult to be realistic. In, okay, there might be easy cases where it can be done but in other cases, it might be very difficult. Then if there is an implementation, I would also need probably documentation to be able to install parameters and run it. And let's say all these is available but I don't have the original data and parameters that were used in the study. So then if I try to run it myself and it doesn't work, I don't know, is it because it's not adapted for my data? Is it because I'm using the wrong parameters or is it just not working? And now the case that all this is available, I could still have trouble to make it run in my own setup because it was done on a different operating system or the dependencies that it wants to install are conflicting with my system libraries and so on. And of course, if my users on the facility often want or sometimes want to use older software and then there could be a problem that the platform or even the operating system is outdated and are not used anymore. So if the problem is that a lot of things are missing, then the solution is to make everything available more or less. So make available the analysis workflow as an algorithm and as an implementation in the form of scripts, macros, plugins, whatever. Make the whole environment available, the operating system, the libraries used, the software platform and of course, all these with the version that was used. Yeah, you need the documentation and of course documentation should tell how to do things but in the long run, it is also important to say why things are done the way they were done. Okay, and not to forget, of course, the data and the parameters used. So make all these in a public place on the internet and since things on the internet are not eternal, either keep a local backup. So it's realistic, it sounds pretty demanding, a lot of work, but actually at the time where we're creating the analysis workflow, almost all this is there, except for the documentation, which is always done later. And luckily now we have tools and technologies that can help us with this. So I guess everyone knows Git and GitHub by now and has probably heard about containerization, Docker, we will use Docker and Singularity and BioFlows. I will talk a little bit later about this in more detail and there are also Jupyter notebooks, which are a great way to showcase, to document code with the documentation just around it. And for the data, there are public repositories like the IDR, you can use an Omero or Seto Miner and the other image database, or there is the InnoDo where you can put your data and of course now our BioFlows. So I guess everyone knows Git and GitHub, so Git is the distributed source code management and version control system. So everything which is textual, your macros, your code, your scripts, and or any other programming code can very well go there. And the GitHub is the public hosting system for Git repositories, which is by now very popular and you could say the de facto standard for open source software projects. And there are also a lot of tools for building software for doing a lot of things around GitHub. Okay, similarly Docker, let's you create a container that contains the whole operating system, all the dependencies of your workflow and the workflow itself. The Docker images, they run isolated one from the other, but in contrast to virtual machines, they are lightweight, so they're not this heavy, like a full virtual machine because they share the same underlying operating system and they're created from a Docker file which tells which things to install. And like in the case with Git and GitHub, there is a Docker Hub which hosts Docker images. You can, we can build them in the cloud actually, we will build them on GitHub and then push them to the Docker Hub and once they're on Docker Hub, you can run them from everywhere by just typing the, if you have a Docker installed on the machine, but just typing a Docker and the name of the image and BioFlows is calling them for you, of course, in the interface. Okay, now let's have a look. You will see this in the demo, but let's just have a little look at which point BioFlows can help with reproducibility. So to have a workflow, image analysis workflow and BioFlows, you need to have four files, basically in your GitHub repository, which is a Docker file which tells you how to create the environment, a descriptor, which is about the parameters. The macro here you see is your workflow, it could be a Python program, it could be a cell profile, it could be any platform and then a wrapper that will download images, run the analysis workflow and upload the results. Okay, so the first thing, your workflow is there, it is on GitHub, the source code is there and it is versioned. And also you make releases, so that you have tags, which you can communicate about the version that you used in a publication, for example. And now you will have execute the same, the workflow executable in BioFlows and linked to the code. So the second is the Docker file, which builds the environment. So it will specify the operating system, install for example, image and the plugins you need for your, in this example, workflow in the form of a macro. And you have the executable image that is on Docker Hub and all connected with each other. Okay, so about the parameters, they are in the descriptor file. So there is a description and the names that will show up in the user interface and the default values. And I think we will see next when you run, the parameters used will also be recorded in BioFlows. So here you see in BioFlows, you have the links to the source code and to the Docker image. Yeah, okay. Here you see when you run a workflow, you know which version of the workflow you run and all this is recorded and remains there and you have the parameters with which it was run. Okay, that's the end of the reproducibility part and here's just the general architecture. So we have the four files that define everything of the BioFlows image analysis workflow on the GitHub repository. When we made a release on GitHub, it will automatically create an image, a Docker image that will be pushed to the per hub. And from there it is available via Docker from anywhere from any machine that wants to access is it. We have the official BioFlows web application and you could also, since it is an open source based on a Cytomine, you can also install your own local instance of BioFlows on your machine. There's very good documentation of all this so it should not be too difficult. And you can also just run the Docker images if you don't care about, for example, the benchmarking and having it in the web interface, you could just run the Docker images directly on your local machine. Good, so thanks Worldcare for the overview of BioFlows. So in my part, we will have a demo of the platform and I will show you a bit more again on what Worldcare has introduced. But before doing this, just a small comment on the workflows themselves. So we try to be inclusive meaning to enable people to add workflows coming from a very broad range of bio-image analysis platforms, Fiji, cell profiler, elastic, et cetera. And it can also be actually a standalone code like something you compile in any programming language. So it's really flexible, but of course, so that the workflows interface to BioFlows, it has to fulfill some minimum requirements. It has to be callable from the common line in your operating system. And it should also pass the parameters of the workflow itself, so like the functional parameters, including also an input and an output folder because the workflows should be built in a way that they will process or can process multiple images that are stored in an input folder and they should store the result in an output folder that is passed through the common line. So it's basically quite simple requirements. Of course, but I will introduce this later on. We also have what we call problem classes. So depending whether you do object segmentation, object detection, et cetera or tracking, the format that the workflow should output as a result is also predefined in BioFlows and you should comply with it. We try to use simple formats that are broadly used and also simple to implement, to limit like the requirement on adapting existing workflows to make them compatible. As Volker said, you basically need four files to define a workflow. And luckily you don't have to do much rewriting most of the time because you can also reuse existing workflows as a template. For instance, if you use the same operating system, same bio image analysis platform, you can basically copy most of this file from existing workflows and just loosely adapt or customize them. The only part that has to be heavily edited or even we've written is of course the workflow itself. It's specific and also the descriptor. So the parameters are usually different from a workflow to the other, but this part is quite simple to edit. So let's have a look at the platform itself. So as Volker said, I mean, we are running an instance in the cloud on our own server. You can find it at this URL that is in the slide. So I will just connect to this web address and we will have a look. If you want some help, the best place is on the image SC forum. We are a community partner, so you can throw in question in there and we will also answer them. Here I am already on the landing page when you go to this URL. So here, as you can see on the left, we also have links to the documentation which is quite extensive. We will cover most of it during the end zone with Benjamin and Volker, but you have it here as a reference. You also have the code repository of BiaFlows. It's open source. So if you want to install it or look at the code, you can have a look there. And so you have the code of BiaFlows itself sitting there and also of many workflows that have already been introduced in the system. And then finally, if you want to contribute some annotated datasets or workflows, you can also click on this link and get some more instructions. Here you have a short presentation also in a small video that basically cover what I will show in the demo, maybe in less details. So if you want to start the system, you just press here, start online. This is the flow we will follow during the demo. It's also here as a reminder, but basically you typically start with problems. So you see here in this upper header, you have problems with workflows and storage. So if I go here to problems, I can see all the problems that are currently available. So here you see the name and we always try to add a short description of what it's about and also importantly where are the annotated images coming from, the annotated images we use for the benchmark. Basically it's a mixture of existing annotated dataset coming from challenges or image repository and also some synthetic images. So we run this generator that Michael introduced to create some images. Again, in this case, we also link to the simulator that was used, of course, each time. As you can see, I mean, the list is quite extensive already. I will have a slide where I describe more in details what's in there. But let's start with the demos. So I will show you demo, nuclei segmentation. So of the class object segmentation you see here, nuclei segmentation. So if I click here, I can have some information that the class of this problem is object segmentation or I should say instant segmentation to be more accurate. You can see a thumbnail of the images and if you click on the problem itself, you see the list of images that will be used to run the workflow and perform the benchmark. So if you further click on one of these images, you access to this image remote viewer. So you can zoom in and out. I will move this because it's a bit in the way. You see that by default, we have this blue annotation. This is the ground truth annotation for the image. If you click here on the right, you can basically toggle the overlay of the annotation to see just the raw image. You can also adjust contrast as in typical image viewer or clip the background as I'm doing here. And interestingly, you can also add several viewers. So let's add a new viewer with the same image. I will link the two viewers. So now when I zoom in and out, I have a synchronized view. Here, of course, we are looking at the same thing on both sides, so it's not very interesting but what I couldn't do on the right side, for instance, is to remove the ground truth annotation and I can, for instance, see the result of one of the workflows that have already been run in the system. For instance, here we have an image segmentation workflow for this problem, so you can add the layer. And now if you move around, you can see in a synchronized way the ground truth and what the image workflow in this case has detected as objects. You can also have more than two viewers. So let's, for instance, do four viewers and also very easily synchronize them to have more than two workflows at the same time. So this is for the viewer parts. Now let's focus on the workflow and the workflow runs. So you can see here on this tab, here we are in the images. If I go to the workflow runs, here you can see the workflows that have been run in the past in the system. So for instance, looking at the image workflows I've just shown, it's here. If you click here, you can see so that it has been run at that time here. And the parameters that were used to run the workflow are also saved by the system. So if I show them here, I can see like, we have two parameter radius and threshold and this were the values that were used. You can also have a complete view at the execution log. So like when the workflow was called, all the output that was sent to the console is also recorded. So you can end kept for record and you can have a view at it. To run a workflow, you just have to press on this button here on top. So for instance, if I run again the same workflow, the image workflow, version 112.10, I have the two parameters. These are the default values. So they are typically values that are optimized for the set of images, but you can also play with a different value if you want. And there is also a way to document the parameters. It's a bit minimal, but you can write this more extensively. And this is done actually in the parameter descriptor. So this of the four files that are in the GitHub repository. Here I will run the workflow with the default value. So as you can see, it's here on the top with the date and the hour when I'm running it and it will take some time. So I will just leave it there. If you go to the bottom part of the table, you see the benchmark results. So basically for all this workflow run, the results were created by the workflow and also some segmentation, some benchmark metrics were computed by comparing the ground truth annotation to the results of the workflow. So Mikal already introduced some of these metrics. We are dealing with instance segmentation. So as he explained, the most meaningful metric is the mean average precision that was introduced in the science ball nuclear segmentation challenge in 2018. So it's here in bio flows and it's computed automatically for you. We actually used the original code from the challenge. So we didn't reimplemented this. Here you also have the dice coefficient and the average host-dolph distance. And this metric is a metric that we design ourselves, but essentially the mean average precision is quite close and probably a better option. But it illustrates that you can have more than one metric, which is nice. Typically in the first column, we always have the metric we consider as being the most relevant, but we also leave them as a reference. You can order the workflow runs by any of these metrics, so from worst to best in this case or the opposite. Here you see the average of the metric on all the images. So in this case, we had 10 images in this problem. You can also see some other statistics like the median of the metrics or other statistics of interest. And also, and I think it's quite useful, here you see the results for all the images, but you can also inspect the results of the metric per image. So if you click on this detailed result per image, here you see the 10 images. And if I open it here, now I have just the metric result for this specific image. Because in some cases, the workflow can fail badly on some specific image that have some artifacts or maybe, I don't know, like higher intensity or lower intensity. So it's a very simple and fast way to check for this kind of issues. You see here that we have these tick boxes. So if you click on a tick box, you can add or remove in the final table one of the workflow runs. And this starting system is something we also introduced. So the start workflow should be like a workflow with optimum parameters. So let's say if you are the author of the workflow and you optimize the parameter for the training or test data that is used, you can start it to say that it's the reference, let's say. But then you can have other run with different parameter and compare the results. So here as you can see the execution of the workflow I just run has been successful. I can also add the results to the table. And as you can see for images, they are essentially matching, proving like it's zero level reproducibility, we get the same numbers if we run it twice with the same parameters. You can also delete a run. In this case, I will delete it because I already have it with the same parameters. Okay, so this is for the automatic computation of the benchmark metrics. Regarding the workflows, if you click on one of them, so let's do it with the image. You jump to this page where you have some statistics on how many times the workflow has been run, how many times it was successful, et cetera. And importantly, you also have a link to the source code on GitHub. So if you click on this button, you jump to the GitHub repository where you can see the files that were described by Volcker. And you also have a link to the Docker Hub. So where the image of this workflow is sitting in the way. So I think that's essentially it. In terms of problems, I can show you another problem with more dimensions. So these were only 2D images, but the viewer and the whole framework also supports images with higher dimensions. For instance, for nuclei tracking here, in this case, we only have a single image. I mean, BioFlow is still a prototype. So in terms of content, it's still rather empty. We are still expanding and testing it. So let me show you like a result from this workflow, which is coming also from the self-tracking challenge that Mikael introduced, and I will remove the ground truth. So here for this kind of problems, you have the option to also see the overlay of the object with color code by ID. So an object across time will maintain the same color or same ID, and you can scroll through the movie to see it. We also have 3D images, object segmentation, also filament tracing. I have actually a slide on this that I will switch to now. So currently we have these problem classes, nine of them. We have object counting and detection. So basically just landmark per object, and the annotation associated to it are just binary masks with a single bright pixel in the where the objects are sitting. An example would be a vesicle detection. We also have landmark detection. For instance, in Josephila wing. So in this case, you also have like a label mask with a single dot per landmark, but it can also have a class associated to it in case you have different kind of landmarks. Object segmentation that we have seen, we have especially nuclei in 2D and 3D segmentation and the annotations are label masks. Pixel classification, again, we have label mask, but in this case, it's semantic segmentation. So it's just a matter of classifying a pixel in a given class, for instance, tumor or gland. Filament tree tracing, for instance, for neurons. And in this case, we use the SWC annotation format for trees. Filament network tracing, such as blood vessels. So you can have loops in the networks. Here we use binary masks of skeletons. Particle tracking, in the case of non-dividing nuclei, we have just label mask. And in the case of dividing object, for instance, dividing nuclei, we use the same format in the cell tracking challenge. So we have label mask and also an extra file, a text file with a division of the object when they occur. So I think that's it. Just as a summary, as Volcker was saying, I mean, the idea of BioFlows is to have everything in the same place and to record everything that is done. So every workflow that is run. So we have sitting in the same place, unnotated data sets, images, standard data format and metrics for nine problem classes, version workflows with their full execution environments. So as Docker images, actually they are converted to singularity images to be run in, for instance, HPC environment, but it's just a technical detail. Default parameters that are optimized for the data sets and a way to visualize the results. So with this remote image viewing viewer that I've shown you and the possibility to synchronize different viewers and also automatic computation of the benchmark metrics and a way to visualize them in interactive table. So statistics or per image. With this I'm finished and I will leave Benjamin introduce the next part. So how to add new problem and content to BioFlows. So I will show you how to create a new problem and to upload the images to this problem. So if you go to the website you can, so by default there is a try on buttons here. If you click on it, you will be automatically connected as a sandbox viewer. So we do the demo and there's this viewer one. You can go to problem. So before there is this buttons here, new problem, that you can click on to create a new problem. Or you can pull this, or maybe I should input dash. Okay, sorry. So you end up in this tab here. So where you can set some parameters. By default I will leave them like that. I will just change the program class here. So because there are 2D nuclei, this is a more an object simulation problem. So we select this type of this class and set it. Other than that, you can define which users can access to the problem. So by default it's only the people who create a new problem. So I will add a couple of people. So they will be able to use it. So again, they can choose who is the guest, the viewer, and maybe that's all. Or maybe that's people. Okay, and then I think I'm done. Cool, then you can customize the graphical interface that displays the problem. So here we just have the workflow run. So people will be able to run the workflow on it. So I can select that, so I can select that. I can now add the workflow that I would like to be able to start to run on this problem. And I can search for that. So I can use a similar one, the API simulation. And then maybe I have one from the machine. It's right here. I think we just start this one. I just take another thing. Select that. So I have that there. And then finally, there's image filter where you can display some filter images, but we want to go there. Then we are done. So I can go to images here. I know I can go to information first and maybe add a quick description of the problem. So here are the summary of the program. So right now there is no image list, six members. I can add a description, so I can click on that here. And then I click on small description. So I took the dataset from the BBC website of the Board Institute. So it's one of their datasets, portion of the dataset. So I will add those to the list. So I can check the problem class is object segmentation. I'm just fine. Okay. So now I can go to storage and then I can start to add the images that I want to add to this program. So the raw images and the ground truth. So those images, as mentioned earlier, they have to be OMT format. And the fine name has to be the following. So here's my example here. So I have my images like that. So we have them, I didn't change them. It's from the BBC platform. And the ground truth is the same but contain underscore LDL for label. And that's important to add if you want to buy your food to recognize those images as the ground truth label, otherwise it won't be recognized. So if I, I just want to add that. So after that, I click on add files here and then I can select my files. So we have just the images. And then I will add the ground truth. And then you can start the flow. You start the flow to choose files because there are various images. Okay, that's done. Now, if I go to my program, this is this one, I forgot something. Sorry, I made a mistake. When you're in storage, I should have precise to which program I want to send it. So that's not bad. So here you can select adding to the problem and I should have selected my program, which is the very last one. So I should have done that to directly put the images to the right new program. So now it's not the case, so it's going to be determined. Okay, I can just move to it. So now you can see the images, but beyond the program here, the labels are not visible yet. So if I click here, I can see the images, but I don't see the ground truth. This is not available yet. And there is a thing that we have to improve is to automatically convert the labels to the images. So right now it's possible, but you need to do it via command line too. This tool is available. So everything is described. So everything is described here. I'll click on the program, so I did all that. And now I need to update the images. I'll leave that for you. But now I need to convert the ground truth images to any versions. So for that, we created a tool, it's a Docker tool. So what you need to install it and to use it is to have a git to clone the tool and then Docker to run it because it's just a Docker image. So I describe, find the documentation, it's described how to install it. So we just show you how to use it and the result and the installation because it takes some time to create the Docker image. So I'm here. And what you need. So here's the command line. So basically you Docker run and then you give the tag to your Docker image. And then you need several parameters. You need the tag. That means the Docker image name used to when you created your Docker image. Then you need the Cytomine post address. The public key is a private key and the project ID. So how do you find those parameters? So the Cytomine, the tag is the tag you define when you create the Docker image. So that's up to you. The Cytomine host is just here. It's address. The public key. So that's the host name. The public key is available if you go to the account user. And here you have the public and the private key. So that's for you to talk to a bio flow server. You have the private key. So you can copy and paste. Check this. And then the project ID is the problem. That's the problem. What is the idea of the new program? So this is this number here. You can grab it from the URL. So it's like this number. So we can copy and paste it. So my new command will be this one. So I can do it. Personally, I didn't install Docker in a way that I can run it with a simple user system. And then you can press Enter. So what it does is that it's grabbed the images and the labels from the servers. And then check that everything is fine and then push them back and convert the labels into annotations. So all those real numbers here show that the labels is now converted to a coordinate annotations. And that's done. And if I go back to the website. Okay, so it immediately refreshed. So now you have the annotation display on the original image. So that's the way to upload a new program with the images. So depending on the category of the program, you have different type of ground truth. As Sebastian mentioned, for example, you can do SWC facts. If you have a problem related to some filament tracing, it can be different if you do set tracking or object tracking. But yeah, that's all the format that's supported with the tools to re-upload the annotations. So it should work fine. So that's it for this part. So I will just go a bit quickly because it was a bit longer than 15 minutes, but I guess it's okay. So to integrate your own workflow like you have written an analysis workflow yourself or you have found an open source workflow that you would want to have in BioFlows, I will show you how to do this. So the prerequisites are you need a GitHub account and a GitHub account and then you need to configure BioFlows so that it scans your GitHub repositories and automatically imports the workflows into BioFlows. Yeah, so that GitHub can build the image and push it to Docker Hub. To give the access rights to GitHub, I will show you how to do this. And yeah, your workflow needs, Sebastian told you needs to do some special things, work on all images in a folder, accept input parameters from the command line and produce the result in the expected form. And then you have to adapt the four files about which we talked multiple times already and then you make a release on GitHub and that automatically builds the image, pushes it to Docker Hub and since your BioFlows configured to look for these workflows, it will automatically import it. Just like Benjamin showed, you have still to tell in which problems it will appear then. So that is everything on one slide. Now we look at this afterwards a bit in the details. Here is once more the connection. So we have our GitHub repository with the four files. So when we built a release, an image is built push to Docker Hub, BioFlows can GitHub for new workflows. These repositories should all begin with W underscore so that BioFlows will not import repositories so you don't want it to import. It are not BioFlows workflows. Yeah, and then when BioFlows executes workflow in the web interface, it gets the image from Docker Hub. Okay, so the little example I made up here is to do 3D spot detection and just use the public transform and then find the maxima. So it has parameters. First is, okay, first the workflow. Simple workflow will clear the first and the last slides which are the borders of the image. We do a median 3D filter of the top head. We find the extended maxima connected component analysis and then we analyze regions and we get the coordinates of the centroids of the detections. That would be a workflow like you can write it or find it and I will show you afterwards what you have to do to integrate it. So here is how it looks like as an image macro using morphology, basically. And here you see the results run with these parameters. It's hard to see if things are correct in these crowded 3D images. You almost see nothing. And in the 2D it's also hard because the center might be on another slide. But of course we will have the matrix to see how well it performs afterwards. Okay, first thing you need your account on... Sorry, I'm a bit... Okay. You need your account on Docker Hub and it needs to match your account on GitHub. So one thing is that Docker Hub only allows letters and digits in the account name. While GitHub, you can use other characters. So if you have minus signs in the name on GitHub they will be removed. So like my GitHub name is Volker minus Becker and this will become Volker Becker in Docker Hub and that will still automatically be matched. All this can also work with organizations which are present on GitHub and on Docker Hub. So it could be practically practical because then you have to do the connection only for organization and not for every repository. Okay, sorry, it's hidden here. Yeah, so if you don't have an account you should create one on Docker Hub. Next step, we configure bio flows to scan your GitHub repository. For this you go to the Webflow tab and on the bottom of the page you find add new trusted source. There you just fill in your username on GitHub and your username on Docker Hub. So now we can begin, we can create a repository for a workflow on GitHub and you don't have to start from scratch. You can look for a similar one that already exists in the Neubius organization and you have the create from template possibility. So if you find one like for me is a 3D spot detection there's already one using image A so I can just use the template and then add a new name. Actually, if you try this this weekend we have a little problem with the GitHub action you shouldn't use the minuses in the repository name use underscore and everything will be fine but I will fix this very soon. So just enter a new name and first thing you modify the read me so that you don't confuse people who find this with the wrong read me. So all the files from the other repository are copied but the two are not in the fork so they're not connected in this way. Now this repository needs to be able to push things to Docker Hub so you need to create access token in Docker Hub for this you log into Docker Hub you click on your username in the upper right corner you go to security and there you can create a new access token you just enter a name for the token and you can give permissions and now you copy this token and be careful because you can only do this once because it should remain secret. So you go back to your GitHub repository and in the settings you find secrets actions and there you create a secret with the name Docker Hub token and there you paste the token that you acquired from Docker Hub and you need to create a second secret which is your Docker Hub username. So this can also be done on the organization level instead of on each repository. Okay, now everything is set up the requirements are fulfilled and we can start to adapt our macro for bio flows. The results we need is the mask of the centroid so image where there's a point at each centroid it needs to accept parameters and it needs to work batch mode on all the folders. So if you don't know what the expected result for the problem class is you find all these in the bio flows documentation here at this link in this case it's 16 bit images where the points are 60, 5000 and so on and the background is zero. So our macro just creates the coordinates of the centroid so we add just some code to draw these pixels in an image stack which is then the expected result. To read parameters from the command line so here I have quite a lot of parameters I probably have exposed all them to the interface but I fixed some we also need input and output here as we already said and in an image macro that would just look like this we have the get argument function and then we split and pass it to get the values for the keywords. And here support to make it run on all the images in the folder we just add a loop and do our work on each image in the folder and then we need some clean up so that we don't fill up the memory over time and very important at the end you need to stop the Java virtual machine from running with the run quit command otherwise the bio flows will just wait endlessly for this to end. We come now to the computational environment so this is in the docker file so in this example I thought I would just have to install the plug in I need morphology so from the template I remove the plugins that were used there and I add a line to install morphology from the GitHub release of morphology but then I recognize that it wasn't running with the quite old Fiji version that was used in the template so I also need to get a more recent Fiji version which is just replacing here these three lines and you can find those on the Fiji archive okay now we need to tell the wrapper how to run the workflow and the only thing we need to change here is the parameters so in our case of an image macro this is how it is run and here we have the parameters and they are replaced by the actual values here so we need to change this and the descriptor the JSON file we have to use we can put in a name which it will be displayed we tell where the image is and a description that will be in the interface here again we have to add the parameters so this is why we want to create a wizard we don't have to put in the parameters and stuff three times it could be done automatically and then for every parameter you put in a default value, a description and the name okay now we are ready we are done all you need to do is to create a release on github now so to do this you can click on tabs there you find create a new release and you should put in a version number in this format v for version and then the major, minor and the I don't know how you call the last one but the tech number or whatever and yeah then the github action will fire automatically if you want to control what it is doing click on actions and look for this last version that is built and if there are any errors in this process you can see what they are in the lock of this which you can access here okay now I think Benjamin already showed this before to be able to access it finally in BioFlows you have to create to a problem class so here we look for the spot counting 3D problem class you click on configuration of workflows and then you enable it okay one thing I forgot because it has changed the polling for the new github repositories was actually not working so great so you have this tab where you tell BioFlows where to find your github and docker hub and on this on this tab you can also tell it to look for it now and that you should now also do and yeah then we're done and you already saw how it looks in the interface now so now you can run it in BioFlows see how the benchmarking goes compares to the other solutions and if you know a bit how to use docker you can also run it on your local machine for doing actual analysis work and of course you could point in your publication to this I mean you should in addition to this you should go to Zenodo or somewhere and create a and then you could point from there to the workflow from your publication to this workflow so that readers can directly go and use it that was this part before we switch to future development I have a small question you said that the token part is important to be careful because you can do it just once what do you do it typically lead the repository and start from scratch or no no the thing is if you lose your token you just can create a new one that's no problem but if in Docker Hub you create the token you must copy and paste it like immediately because you cannot go back to that special token you will not have access anymore after the creation but you can just create another one I just first wanted to make a quick comment on benchmarking as a whole like in the field of computer vision because actually it predates it's using a biomic medical imaging and even more of bio-image analysis which is a newer field so in the field of computer vision benchmarking is in the DNA I would say of researchers and actually it's I think almost impossible to publish without doing benchmarking and the benchmarking is typically done on some reference databases no image repository in the case of computer vision so here I'm showing two of the most well-known and largest database so ImageNet there are really huge databases and actually it's collaborative annotations so like it's only since internet become widespread that people could gather like such big databases no but for this problem such as image classification in thousands of classes you really need like tens of thousands of annotation on images so it's very hard to do this on your own or as a small group so you really need to open this to the community no and Coco database is also a very large database in this case with a label of objects in real images so it's different than only classifying an image as a word so as I was saying it's the only way to publish in this field to compare your new algorithms or workflows to other workflows that have been benchmarked on the same reference datasets no still in many cases at least that was the case 10 years or 15 years back when I was doing some research in the field people were often re-implementing the workflows of other people so it was error prone and also time costly so I think it's very good to try to encourage this dockerization that we are trying to promote us and many other projects now like it's not that we are the only one of course and the workflows as well sometimes also very difficult to reuse by common people that were not so much into programming because sometimes you had to compile them and you could spend days just managing to make it run so it's also something that is solved with the docker approach no Michel already mentioned Grand Challenge which is a sort of database of most of biomedical challenges so it's a very nice place there are quite many challenges and datasets available but even as of today the microscopy and the bio-image analysis part is tiny as compared to the challenges which are mostly like for biomedical so like using medical imaging not so much microscopy like microscopy and fluorescence microscopy so that's also why when we started with bio flows it was in 2015 I think we thought it would be good to have our own place for bio-image analysis and mostly light and EM microscopy images there's another very known I think place for annotated dataset and challenges but it's way wider than just image analysis in this case it's Kaggle and you can also find some microscopy dataset there there are five in the subclass computer vision so like really trying to analyze image features so again it's like a very low representation of microscopy images I don't intend to put grand challenge and bio flows face to face there are meant for quite different things even though there's some overlap in the benchmarking and also possibly to use bio flows as for challenge organization but basically as I mentioned low representation of biological microscopy dataset in grand challenge as of today also the matrix using the challenges for similar problem are not always the same so it might be difficult to compare from challenge to challenge and even though there is a clear trend of asking competitors to provide their workflows as dockers so they have their own API in this for the grand challenge it's still about half of them only that are available in such way as of today and it was way less like few years back when we started also the algorithm are not directly linked to their source code and most of the time you have to require some access to the author so it's not as open as we intend to be so to say finally on the image and annotation visualization I think bio flows is way more flexible to visualize the results and also to visualize the matrix you can see them per image or with different statistics so grand challenge is more like a leader board with the results so it's a bit more fixed but of course I mean as I said there are two different platforms and grand challenge was designed for challenges and nothing else bio flows can be used for multiple use case as we have seen in terms of so limitations and road map things we would like to improve I mean bio flows is still a prototype we don't have really any user community it's very small and we still don't have that many data sets and they are mostly synthetic data sets so essentially we did this to prototype the system and make sure like everything works but we would need to fill it with more content and also especially like real images and human annotation in terms of core features we would like to improve we didn't go into these details but as it is for now we have a unique docker for the workflow and also for the computation of the metric and data preparation so everything is in the same container it's when we started the project it was easier to do it this way so we decided to go for this solution but we noticed that it was not the right way to do it because you want to isolate the computation of the matrix and the workflow itself so that if you do some updates on one part it doesn't affect the other part and you don't have to recompile everything so we are working towards a more modular architecture where this would be isolated in different dockers and actually you could also chain workflows so this is like a really core development it's not that straightforward CytomaN team is also working in this direction so we hope to channel this to to buy a flow and work together on this part in the coming months we would also like to simplify the integration of workflows possibly with a wizard from the user interface and as Volker explained automating some redundant information such as the parameters of the workflow so that it's automatically copied in the correct place in the files and finally we also had one idea of a generic workflow that could be used especially for object segmentation or detection so we would use it on deep learning models that are coming from the bio-image zoo which is a repository of deep learning models that you may know so we would like to have a generic workflow that could directly fetch new models for the same class of problems from the repository so that you don't even have to re-implement it once it's done in terms of more explorative features so yes this modular workflow architecture to be able to chain different workflows and have flexible IO formats it's quite complex but it's on the map we would also like to work on automating the tiling of large images so when you have a very large image if you use for instance a simple image macro you might quickly bump into memory limitation if your workflow has intermediary steps so we would like to relieve this on the developer and have an option to automatically break a large image into smaller images batch process them as it is now so like as a folder of images and then recombine all the results for every independent image to get the results for the large image and finally another idea is to also trigger multiple jobs with different parameters to scan the benchmark result based or depending on this parameter so it's a bit the same approach as this rock analysis that Mikal shown so where you trade off the specificity for the detection rate for instance so these are the main features we are looking to improve in bio flows important to mention this is the core developer team so as I already mentioned it's mostly Cytomine team and ourselves and there's also Lassie working actively on the project and we also have many other contributors that did some contribution to some part of the code or just like giving us feedback so they are listed here and most of them are from UBS and these are their respective institutions with that I'm finished thanks for listening you mentioned the problem of managing a very large tile scan and your approach will be partitioning so you are conscious you will miss probably some of the cell and some of the objects you are trying to cement and you did say you are expecting to gain more from a very big data set where you can count a lot of cells and doesn't matter if you lose objects on the edge right? of course it's not free if you already thought of it like you have different strategy you typically need to have some overlap if you don't want to lose some information at the border of the tiles etc so initially we didn't want to go into this because it's basically also connected to the core development of the algorithm itself I mean because if you do it wrong you might actually induce some errors that are not really due to some weakness of the algorithm itself but just of the way you combine the tiles so to say so that's also why it's only in the explorative parts of what we would like to do because it's not straightforward to have a generic and perfect solution for it but if you design it correctly you don't necessarily have to lose any kind of information not necessarily I mean this is not connected to it we'd like to have this feature because not so much for the benchmarking part but if you want to use Bayer flows for your own image analysis solution and image management it's I think a very nice facility to have because you don't have to care so much then of how you design your algorithm in terms of memory requirements you can just do it working on small images and it should scale to large images so that's the intention thank you