 mine and that is already available on YouTube. The basic features of Cytomine are web-based software for collaborative image analysis. And we demonstrate you how to use it to organize your images on the web platform, how to annotate these images with very different kind of metadata like ontology terms, properties, tags, et cetera. We showed you how to configure your project to give access to your collaborators. We showed you how to very briefly how to execute algorithm on this platform. And so we mostly focus on the web application of Cytomine that you can use in your regular web browser. Actually, this application is written in JavaScript using some UGS library. And for those who are interested, of course, Cytomine is open source. And so it also includes the web viewer, which code is available also on GitHub, as mentioned here. And as my father mentioned, the last week's session was recorded and is already available on YouTube. And we will communicate this link and also the link to these slides later today. So first part was about this web UI of Cytomine. Today, we would like to focus on the data scientist and computer scientist sign because, of course, if you are a developer of algorithm or if you want to apply algorithm, you can use the web platform, of course. But it might be useful for you to know more about how to use external scripts to run these algorithms and how to develop new script, new algorithms, and how to integrate them into Cytomine. And so that's why today we will first give you an overview of existing apps in Cytomine. Then we will explain how Cytomine is structured, what this is architecture. We will also describe the main data models within Cytomine. And then we will describe the API that allows you to import and export all the data from and to Cytomine. We will describe the client that we already have developed to ease communication with this API. So we have client in Java, JavaScript, and Python. And then we will explain how to create yourself reproducible applications using some technologies and some software programming conventions, such as software descriptors and containers. And then Sebastien will present you BiaFlows, which could be seen as a specific version of Cytomine. It uses the same technologies, but it comes with more content and more standardization to allow benchmarking. So Sebastien will discuss the motivation of this specific version. And we'll provide you a demo of BiaFlows servers. So last time, I quickly showed you the application of algorithm within the web UI. Something that we did not add time last week was to let you know that to be able to run this algorithm, you first have to configure your Cytomine server so that it will fetch these applications, these algorithms from a given source. And what we are mostly using now is GitHub. So that means that people developing apps, and I will come back to this later, are actually putting their code on GitHub. And the idea is that you configure in Cytomine some what we call trusted sources, which are repositories or organization on GitHub where you will fetch these algorithms. So very quickly, here I'm showing you this interface in Cytomine. If you are the administrator of your server, you can easily add new sources here. And you mentioned here the GitHub provider. And Dr. Hub, I will come back to this. And so actually, in this demonstration server, we set up three trusted sources that correspond to three different GitHub organizations. And so if I mentioned this Cytomine trusted source, it in fact correspond to this GitHub organization. And the idea is that you mentioned a prefix so that Cytomine will regularly check this GitHub account for repositories that have this prefix. And so these repositories correspond to different applications, different algorithms that you can automatically integrate into Cytomine. But I will come back to this. And so that's the first thing at the level of the server that you have to do. And then if you want to allow your user to apply some of these algorithms within a certain project of Cytomine, you have to activate them to enable them. And so this configuration is at the level of a project. So if you remind correctly, Cytomine comes with the organization of project where you put images, users, et cetera. And so at the level of the project, you can enable or disable certain algorithm so that the end users can execute this algorithm in the analysis tab. So they will only be able to execute the algorithm that you have activated for this specific project. So that's it for the configuration from the web platform. And now I will quickly show you a few examples of existing apps. So the goal here is both to let you know what is already available, but also to give you IDs and to let you know where you can find code that correspond to these apps so that you might use this code to develop your own app. I will later explain better how to develop this app but you already have the links to these different apps. And so for example, here it's a very basic one that I showed you last week that allows to detect samples. So for example, in histology-based research, people have to compute some statistics about the area of tumors with respect to the area of the samples, et cetera. And so in this very simple app that is based on basic adaptive thresholding, it allows you to detect the sample so that you can compute the area of the sample. And so it allows you to work on very large images. And what it takes as input is a list of images in your project and the ontology term that you want to associate to the samples that you will detect. And the output of such an app is basically the contours of the samples. And so just quickly, again, to show you but I will not do it for all the apps, of course, you can launch an application here. You detect, you select the ontology term that you want to detect, you launch the analysis. So it's launched in the background and I will explain to you more about this later. And what you see here, it's already detected two annotations here and then four annotations. It actually was applied on all the image of this project. So there were two images and you have access to the detection of the samples done by this specific app. So just the idea that I already showed you last week. So we have this app, the simple one. Then we have the same kind of app but for hyperspectral images. So in this case, the idea is also to detect samples or more precisely objects. So also this app allows you to select the images that you want to process, the terms that you want to associate to the detected object and you can specify some parameters of the algorithm like the thresholding filter, the projection you want to apply across the different channels, et cetera, the minimum size of objects, et cetera. And so at the end, you get this kind of detections that you can look at directly on the web platform. This one uses a maximum intensity projection and also thresholding, but if you want to modify it, you can fork this code and develop your own code. This one is a very popular one. It's based on the Stardust cell detection algorithm. So basically what we did is we took the code from Schmidt and Weigert and we encapsulated using our programming convention so that you can execute the algorithm directly from the web interface and visualize the results. So again, this algorithm takes a list of images, but here you can also specify a list of ontology terms that corresponds to the region of interest where you want to detect these cells. So instead of applying the Stardust algorithm on the whole image, you might be interested to apply it only on sub-area of these images. And so the idea is that you pre-select some region of interest, you associate some terms to this region of interest, and then in the Stardust app, you can specify the terms of this region of interest and then apply it and visualize the result in the web interface. And so you have the cell segmentations that are generated and you can have a look at them. But it's a bit slow now, maybe because of my wifi connection. Anyway, I think I've already showed you last week. So this one is not using any training. It's using a pre-trained Stardust model. We have not yet implemented the fine tuning of the Stardust models, et cetera, but it's using the basic HME pre-trained model. This one is a quite old algorithm. It's a pre-deep learning algorithm that was developed before deep learning was very, was successfully applied to this kind of biomedical images. So this is based on random forest and still it works for images where you have quite standardized protocols, image acquisition and preparation protocols. And so the idea there is it's an application that is divided into two parts, the training part and the prediction part. So the training part basically takes as input images and also ontology terms of annotations. So if you want to train your model, the first thing that you have to do is to manually annotate the region of interest in your images. So I showed you this last week. So you have negative and positive examples. And the idea then is to train a model. You can specify again to the web user interface, the parameters of these random forest models. And then what happens in the background is that the ensemble of three is built. And this ensemble of three, this model is actually saved inside of my database as an object that we call attach file. So it will attach to the execution of the algorithm, the results of the model. So that when you want to apply the model, predict the two more regions in new images, you will have to specify as input, not only the images, but also the identifier of the execution of the software that previously built the model. So you select this so that the software will apply this model to the new images. And at the end generate this segmentation mask that correspond to the prediction of the algorithm. We very recently also implemented a unit workflow, both for training and for prediction, but it's a very basic one. So there is no data augmentation, there is no post-processing, et cetera. So if you want to actually apply it on your Unreal data, you probably have to fork this code and to modify it to fulfill your needs and get good result on your data. But you have the framework there that could be helpful to start developing your own unit-based algorithm. So we have again the training part and the prediction part, and it's the same thing. So as input, you need to specify ontology terms that correspond to four grand annotations. And you have also to specify some unit parameters like the number of epochs, et cetera, et cetera. And then it will build a unit model, it will save it in Cytomine database as attached file. And for prediction, you will be able to select this job execution model and apply it on new images. That's the kind of thing that we did here on a specific application for zebrafish, add an operculum segmentation. So here the goal is to detect the head of this zebrafish, but also the operculum area here. And so we pre-train unit models and we deliver Cytomine app with this pre-train model to allows you to apply it on new images. So of course, it will mostly work on images that look similar to these ones. But again, it could be a source of inspiration for you to develop your own code. We also published two years ago, one done for us based algorithm for landmark detection. So here in this kind of application, for example, the goal is to detect these dots that correspond to specific anatomical landmarks. There are many different possible applications of this type of algorithm. We actually got quite a lot of requests to apply this kind of algorithm to multiple types of images, and drosophila research, zebrafish research, et cetera. And so again, here the idea is that you can specify the ontology terms of the landmark that have been previously manually annotated. So you see here all these landmarks were firstly manually annotated. And so you provide this as input of the algorithm. It will train a random forest based detection model for each of these landmarks using multi-resolution windows. It will save this model into Cytomine database. And then when you have new images, you select this model and you apply it to the images to predict the landmarks within your new images. In this code, actually there are several methods, not only ours, but also other methods from the literature. So you might be interested to evaluate these different methods on your data. And that's possible because again, you can execute this algorithm, select the parameters values, and visualize the result on the web platform. One quite useful small piece of code is this statistics exporter. It basically allows you to select some results generated, for example, by algorithm. You select the images, you select the ontology terms, and then it will output CSV files that contains, for example, the area that could be useful when you have to quantify the tumor sizes or the counts that could be useful when you apply a star disk and you want to count the number of cells, et cetera, et cetera. So you select images, you select ontology term, and you select a specific execution of a software so that it will generate statistics that correspond to this specific execution of the software. And if you are not happy with the kind of statistic that we generate, of course, you can fork this code and modify it to generate the appropriate statistic for your study. Also, if you are not satisfied with the result provided by this kind of algorithm, we have developed some reviewing user interfaces that allows you to correct the prediction of this algorithm. With the hope that combining this algorithm plus manual correction will provide better result and also will be faster, of course, than fully manual annotations. So these tools allows you to, for example, when you have this kind of detection and you are basically to fill the holes here with a single click, you can fill them or you can complete an annotation when the algorithm was missing some parts of your region of interest. And both the original annotation generated by the algorithm and the validated ones after human correction are stored in Cytoma and database and are accessible so that you can later, maybe, retrain your algorithm with the correction. But we are realistic and we know that the set of tools that we are providing are not enough, of course, for all the different biomedical application that you might have. And so that's why we provide ways to develop and integrate your own algorithm. So we have a generic algorithm that has been applied to different tasks, but we know, although they are based on machine learning or deep learning, they are not sufficient, they are often not enough, might have to apply post-processing step or to modify the architecture of the unit or use other algorithms. And so we want you to provide ways to develop and integrate your algorithm. And so the most part of this presentation will be about how to integrate this algorithm. But I guess that at this point, you might be a bit lost because you don't know a lot about the architecture of Cytoma and how it works. We almost only have seen the web interface, but behind the scenes, Cytoma is as its own architecture, its own data models. And so we want to spend some time describing this architecture and these APIs, et cetera, so that you can integrate your own algorithms. So Cytoma architecture is here very simplified. You could have this browser, it's the one we showed you previously. And the idea is that you can also have your own external scripts or applications. And these scripts or applications will be able to communicate with Cytomine. So you have here the Cytomine server with its own databases, data models, files, et cetera, et cetera. But in between, we have what we call this API that allows you to get all the data that are stored in the Cytomine databases or to put new data into the Cytomine databases. So all the data stored and managed by a Cytomine server can be obtained through specific URLs. And I will come back to this. A bit less simplified. If you look at here on the server, you actually have two big components, I would say, it's the core that contains, for example, the databases. So that's where we store all the project, the users, the image description, the annotation descriptions, the permissions that allows you to give you access to your collaborators, the metadata, et cetera, et cetera. For this, we are using mostly two types of database, an SQL database that is post this SQL database that allows you to store special information like the contours of the notation. And no SQL annotation that are mostly used, no SQL database that are mostly used to store activities and track the activity of the users. And the second big component of Cytomine server is the image management system that basically deals with the images, access the files on the file system, exports the tiles, et cetera, et cetera. Here, I have to mention that all, to access this API and to get the response from the server, you have to be authenticated. So not anyone can access the data. You have to be authenticated and I will let you know how to do this later. If we go even a little bit in more details, actually, it's not only two components that are inside Cytomine server, but there are multiple components that are communicating and we are relying on multiple open source libraries. But actually, you do not need to know all the details to develop your external application. When you are here developing your application, you only have to communicate with the API. But of course, if you want to contribute to the core of Cytomine, it's better to know how it works, but for today, we will not enter too much into the details of these different components. And we want again to let you know that we are using Docker container technology so that you can install all these libraries at once. We have some scripts to install these quite easily, either on a personal laptop or desktop, but then you are somehow isolated and if you do not expose your computer, then you somehow lose the collaborative feature of Cytomine. But most interestingly, you can install Cytomine on the server that could be in your institute, in your lab or even at larger scale. And we did it like installing it at core facilities or at universities or we also install Cytomine for a massive open online course with thousands of users. And this is possible thanks to horizontal scaling of this component. So then you can, for example, duplicate the image management server when you know that you will have many, many users, you can duplicate this component so that it will deliver more faster the image content of your database. So to install Cytomine, you can have a look at our documentation that describes everything and it should work quite easily except if you have to connect Cytomine to a local user authentication, et cetera, but we can provide help for that. We also have demo server. So this one is a very small one with only a few data, but if you want to test it, you can test it. And so you can connect with your browser to this using this username and password. And also through the client, you could authenticate through this user. So that would be enough for basic testing, but if you want a specific demo server, it's also possible. You can contact us or, of course, you can install it at your local place following the installation procedure. So we have seen this, the main architecture, but what about the data models? So again, you do not need to know everything about this. But still, some information is required to understand. So in Cytomine, in our databases, you have, of course, the different entities such as user that basically allows to connect different user and to give them together with the permission entities to give them access to your projects. In your project, you also have the images, that are stored in a virtual disk space that we call a storage. You can have multi-dimensional images. In images, you can have these annotations, which are geometrical contours plus additional metadata, such as an ontology term or textual description or attach file or key value properties, et cetera. You have this ontology entity that is the vocabulary of terms that you can associate to your images, and that could be hierarchical. And we have the entity software that I will describe in more details later. So these are the main concept, the main data entities in Cytomine database. But as we said, one of the main concepts we said one of the main data entity are the annotations. So we have actually three types of annotation in Cytomine. The annotation that are drawn by human user are called user annotations. The ones that are generated by a piece of software are called algo annotations. And we have also these reviewed annotations that are those that could be a combination of user or algo annotation that are reviewed and validated by an expert. And at the moment for one image, you only have the possibility to have one review layer per image. But you can have multiple users, so you have one annotation layer per user in one image. And the same for an execution of algorithm, you have one annotation layer per execution in an image. In terms of geometrical information, the coordinates of these annotations or the contours of them are stored using the well-known text representation. I will show you an example later. And as I said, these are stored in special databases. So this is a quite simplified illustration of the annotation model in Cytomine. For example, we also recently developed what we call annotation links that allows you to group annotation to link them. Annotation that, for example, depicts the same region of interest in different imaging modalities. And so that's still under development, but that's something that's possible. So we have seen Cytomine architecture, Cytomine data model for annotation, et cetera. But for you, data scientists and computer scientists, the most important part is probably the API. So you don't have to explicitly know the data model, but you should know about the API. So these API provide API endpoint or services. And so, for example, you can get the list of project that you have access to by simply using this kind of URL. But I will not show this one, I will show later. But you can so list the project, get a specific project information. So here, when you type this URL in your browser, you get, for example, for a specific project that has a specific identifier here, you get a response adjacent that contains information about this resource. For example, the name of the project, the ontology that is associated to this project, the number of images that are in this project, the number of annotations, et cetera, et cetera. So this is example for project, but that's the same idea for other resources, images, annotations, ontologies, and users. And basically anything that you can do from Cytomine web UI, you can also do it through the API and you can even do more than that. What, here are some examples. For example, if you want to list the annotation, you can list them based on some filter. So let's say you want to have access to all the annotation that were drawn in a given image by a given user and with a given ontology term, you can get this information. So you have here as a response, a collection of annotation where you again have some information about this annotation. And for example, here you can get the crop of this annotation that correspond to a rectangular crop around the annotation that was drawn by a user. If you want to get a specific annotation description, you can have it here and you have this well-known text description of the contours. So everything is available through this API and for example, you get the crop here or at original size. So that's a big image, but you can ask for the maximum size of this crop. You can get a crop with a contour around it using this draw argument or you can get the binary mask corresponding to the annotation. Same thing you can, for example, get a tile. So you specify a specific area in your image. For example, at position 3,000, 3,000 and you want a tile of 512 by 512, you get it. And if it's, for example, on hyperspectral images, you can also apply a projection algorithm to get this result from the server. So we actually have very recently released the first version of the API reference. So you can have access to it on our documentation. Actually, it's still under validation. So it's still on the experimental documentation, but it will be soon released on the official website of the Cytomine documentation. But there you have access to all the API and the endpoints and you both have the URLs, you have the output of this and you also have, for example, for annotations here, it's the service that I use to filter the annotation based on, you can base this search on many different arguments like the project, the image, the ontology terms. Is it an annotation generated by human or by algorithm, et cetera, et cetera. So you have multiple parameters that allows you to filter and extract exactly the annotation that you want to have. So this is the API. What we have developed to ease developers working is this Python, Java and JavaScript client. So actually these allows to manipulate Cytomine resources as regular object within your favorite language. So instead of using the API, we have examples of a function that you can use that actually in the background communicate with the API. So here it will get the collection of all the project. Here it will get the description of a specific project, et cetera, in Python. So we have the documentation available where you will learn about all these models and how to fetch, modify, delete, update, et cetera, these resources. We have this for Java language also. I will not detail this, but you have examples of Java code to manipulate these that was actually used, for example, in this iCytomine plugin of IC that allows IC to extract information from a Cytomine database. We have also this JavaScript client that would allow you to develop your own web interface or to extend our actual web interface. And for the Python client, I can do a very quick demo. What you first have to do is to install this Cytomine Python client. So it's described in the documentation. You have to get your private and public key that are available in the account page of the web platform. And then you can start working with some code. So for example, here, I have a very basic example on the white here of Python code that you import some classes in your Python code. Then you load some of the credentials. So it's basically a JSON file where you specify the host, public key, private key, et cetera. So you load this and then you instantiate and Cytomine object with these credentials. And then you have some, here it's a bit dirty code, but you got the ID that you can print the information about your user. You can get information about project. For example, if I run this, what you have here, you can get the project, you can get the ontology of a specific project. Here, then here, you can list the terms of this ontology. So we have nucleus, region of interest, tumors, et cetera. And then using these image instance collection, you can also get the list of image that are in your project and display here very basic information about these images. So that's basically this kind of code allows you to get all the information out of Cytomine. So a more interesting result is this code that allows you to filter annotation within the Python. So such as, like I did with the API, you can filter these annotation directly in the Python using this annotation collection with some filters. For example, you want to get the annotation within a specific image. You fetch them and then you display some information and what you can do is also dump or download crops of your annotation. So what I just did here is to ask Cytomine server to get the annotation that were done in a specific image of the project and the alpha mask. And these were annotation by a specific user myself, but if I asked them for another user that has generated much more annotation, you see all the annotation that are coming from the Cytomine server into my machine and you have both the crop and the mask. So then with this, you could imagine to develop an algorithm to segment this object. You have the image and the mask and so you can work and start developing your models. We have other examples, but maybe I will not show them, but you can basically also upload images. Very simple, you have to specify a storage which correspond to a kind of virtual disk space on Cytomine and then you can upload an image directly on the platform using this upload image function. I can do it very quickly. Okay, if I go to my project, if I go to the activity logs on the web platform, I see here that one image was uploaded just a few seconds ago and you also have all these activities that were previously done. And so you have this very small image that I uploaded into the server. Okay, so I will not go more into the details of this, but we have also examples on our GitHub, many Python examples to get the data with Python. So for example, based on these examples, you can build a code to apply a stardis algorithm where you would get the annotation out of Cytomine. You would save these annotations as local PNG files. And then you would apply, you would use stardis model function to apply the model and then convert the detection of stardis into a Cytomine annotations and upload these annotations using the append function and save function so that you can upload multiple annotations at once. And so that basically you would do this. But one issue with this is that if you do like this, you will have your own these annotations detected by your algorithm in your own annotation layer because you previously authenticated using your user account, public and private key. So in the web interface of Cytomine, you would see these annotations in your own layer. So if you apply the algorithm multiple times, you would accumulate all these annotations and it will quickly be a mess. So what we have implemented is a way to... It's a concept of software. So that allows you to have software executions. And for each of these software executions, you would have a layer in your interface so that you can compare the results generated by different algorithm. And the issues with this way of doing that I previously showed you is that, yes, it's executed on my own computer with specific library. So it might not be reproducible. The code is not versioned and the parameter values of your algorithm are not saved. So we have developed more additional models in Cytomine and that's what we call Cytomine apps that are relying on three concepts, software, jobs and parameters. So a Cytomine app is a piece of code like your Python code, but it is described by what we call a descriptor where you specify the parameter types of your algorithm. And it also comes with a valid execution environment where you describe all the libraries, all the version of your libraries that are needed by your code so that you are sure that it will be executed using the exact same libraries. And so with this concept of apps, you could have multiple users applying these apps. That would be what we call a job. And so a job is a specific execution of a software where you have these job parameters that encode the parameters values of your software. And these parameters values, you can, for example, specify them from the user interface. They are communicated to the container that could be executed on a server. And this specific execution with these parameter values will generate results that you will be able to look at in the web interface with a specific layer for each of the execution of this app. So a bit more technical here. So you can ever look at the example for Stardust. But basically, so we have this Docker file, this descriptor file and your code. About the descriptor, it's a JSON file where you specify what are the input parameters. So we have mandatory parameters that correspond to Scytomine concept like public key, private key, et cetera. And then you describe your specific parameters related to your algorithm. For example, in Stardust, you have some threshold on probabilities or some threshold on maximum suppression overlap, et cetera. And so you have to describe this in a JSON file. Again, you can ever look at the documentation to have examples. And then once you did this, you can parse the descriptor. Scytomine server can parse the descriptor to create a software entity in the Scytomine database. That's one thing. But you have also this environment descriptors. It's what we, it's a Docker file where you, as I said, describe, okay, I want to use Python 3 base environment with Scytomine Python client included. So we provide basic Docker images with our clients. And then you have to describe, okay, I want to use TensorFlow this version. I want to incorporate these pre-trained models that I, in my execution environment. And I want to execute this specific code that contains the star disk codes to download the annotation from Scytomine and apply the star disk model and upload the annotation. So that's what I showed you previously, the code that I quickly showed you previously, but wrapped into this Docker file so that we are sure that we are using the good version of libraries, et cetera, et cetera. And so that means using this kind of programming conventions, you can basically use any kind of libraries. And Sebastian will show you examples we use in very different libraries in Bioflex. What we also do is versioning of these apps. So that means that each time that you make important modification in your code, you can create a release on GitHub so that it will correspond exactly to your code version. And so on GitHub, you have the release tags that you can create and then in Scytomine, you will find the exact same version and you will be able to access the exact code. So if I click here in this algorithm at the bottom of the page here, you can click on C source code on GitHub and you have exactly the corresponding version here with the code. When you execute the algorithm, we also save the software stack tray so that afterwards you can go back and see what happens when you apply your algorithm. So this is available in the user interface. When you go to the analysis page here and you click here on execution log, you have access to it. And that's also the same thing for the parameters values of your algorithm. So we have traceability and reproducibility as I just show you where you have the parameters values of your algorithm that are stored in Scytomine database. So if you use these containers, you are then able to reproducibly execute your apps either through the Scytomine web UI as we showed last week. And in this case, when you execute it from the web UI in the background, all these parameters values are communicated to the software environment of your application. But you can also execute it then from your own computer or from a server or a cluster. What you have to do is first to get the container image of your application and then execute the code using this quite simple Docker commands that allows you to run code. So when you do this first command, you download actually the execution environment that contains your code, your exact libraries, et cetera, et cetera. And when you run it, you can specify these parameters. I can quickly show you. So you first have this command to get the container image. You can only do it once. So I mean, I already did it. So it's already on my computer. And then I can run it. And so what it does, it executes the code. We have some interaction with Scytomine server to get the raw annotation where we want to apply this algorithm. Then it applies the model. It has detected a certain number of polygons and now it's uploading the annotation to Scytomine. So if I go into the project, it's this one that is still running. Now it's ended now. And you can click here and have access to the detection of this star disk algorithm. And if you execute with a different parameter value of the threshold of star disk, you will get other results. But here you are sure that you are executing a very specific version of the code with very specific libraries that are described in the Docker file. And so you see by using a lower threshold, the algorithm will detect more cells. So here it's still running. You see the number of an annotation that is increasing here because the Python code is uploading these annotations now. And so it's still increasing here. Okay. So I'm almost done. So to sum up, when you want to add a new application, new algorithm into Scytomine, you first have to follow some convention. The idea is that you put your code on GitHub by creating a specific repository with a prefix. In this repository, you have to have a descriptor that describes the input parameters of your algorithm, the types of values, the default values, maybe a short description of this parameter so that in the web user interface, you see this description in the algorithm launching dialog box. Then you have this Docker file where you specify these libraries that are needed or the files that are needed like the pre-trained models files, et cetera. And also the entry point that is the code that will be executed. And so you have this code that will be executed that could be Python or anything else. We, Sebastian will show you that we have implemented workflows using image, routine, IC, cell profiler, elastic, et cetera. When you do this, you also have to configure some automatic builds between Docker Hub and GitHub. It's described in our documentation. So that basically means that this Docker Hub website will generate an image, a software execution environment that contains everything. And then the sitemind server, if you have configured, as I said at the beginning, the trusted sources, it will regularly check for the availability of new applications. It will download them and install them on the sitemind server. And in your project, you configure your algorithm, you enable your application. And then in the analysis tab, you can execute this application using specific parameters value. And you will have the result of this algorithm, the parameters values, et cetera, that will be accessible either through the web API, the web UI, or again, through the API because each software execution is again an entity in the sitemind database. So you can fetch the annotation that were generated by this algorithm and do another workflow using these results, et cetera. So just to end, so this might look a little bit more complicated than popular desktop tools, but we think that using this type of architecture, this type of concept of API containers, descriptors, et cetera, it really allows to make collaborative and reproducible research by allowing to share almost everything, the images, the annotation, the apps, the calls, the results, et cetera, over the web. And we hope that we convince you that it allows both the end users, such as biologists to use these tools through the web UI, but also the computer scientists to develop new algorithms, to plug them into sitemind and make them available to their end users. So for example, co-facility manager might be interested to develop their own workflow and make them available on the web interface. So again, we have this documentation that we have significantly improved in the last months, that you should check, you will have step-by-step documentation to build your own apps, et cetera. We are doing our best to answer as fast as possible issues on the Imagesc forum and also in our GitHub repository. If you want to contribute, of course, we would invite you to develop your own apps and share them because sitemind is as a quite large user base around the world. So that might be useful for them to have your apps. And you can also contribute other ways like translating the web interface or other ways. And now I will give the floor to Sebastien that will talk to you about BioFlows that could maybe be seen as a specific instance of sitemind because it's using the same technologies, but it comes with tens of workflows and data set using multi-dimensional images with standardized inputs and outputs and annotation formats. And it also comes with metrics computation for benchmarking. So Sebastien, it's up to you. I guess we will take question at the very end if they are not solved in between. So we'll stop sharing my screen, okay. Is it okay for you, Sebastien? Yeah, can you hear me? Yeah. Okay, so let me share the screen. Okay, it works, no? Yeah. So hello everybody. I'm Sebastien Tosif from IRB Barcelona. And yes, in the second part of the webinar I will introduce BioFlows. So a web tool to enabling the reproducible deployment and benchmarking of BioImage analysis workflows. So yes, as it was highlighted, BioFlow is based on a site of mine. So it extends its code base, especially for the streamlining of the benchmarking, the automated benchmarking of workflows on BioImage datasets, so mostly microscopy images. And the project was developed within a work group of Nubias during the past five years. And we had a tight collaboration with the core developers, well, some of the core developers of site of mine. So in the first place, why the project started well as BioImage analyst in the BioImage analysis community, it was quite clear that there was no strong culture of systematically and quantitatively assessing the accuracy of BioImage analysis workflows on reference datasets in the community. So this was making, comparing workflows, achieving the same task quite difficult, for instance, to select the best solution for a given dataset. Also workflows were actually often published as side results of biology articles, so especially in the methods section. So they were not the main outputs of the articles and they were often partially documented and sometimes provided without test images and working parameters for these images. So making the reusability and reproducibility quite difficult. And also in the same line, non-experts such as life scientists with limited knowledge in programming were mostly dependent on the developers to integrate the latest algorithm to the major bio or popular BioImage analysis platform such as ImageA, for instance, to be able to use them. So these were all strong limitation in our eyes and they motivated the inception of BioFlows. Looking at other fields where image analysis is key such as computer vision, so it's a more mature field with longer time span, the landscape is actually completely different. Articles are mostly dedicated to describing workflows and comparing them to other methods. Virtually all published workflows are benchmarked on reference data sets. So they are very large databases such as ImageNet or Cocoa of annotated images, so images with most of the time human annotations. So these databases required heavy crowdsourcing. They are actually critical assets for these communities. But also on the downside, the developers are mostly benchmarking their workflows independently, meaning that they have to re-implement evaluation framework and also sometimes re-implement the methods described in other papers. So this is not fully optimal and it's also error prone, no. And also the workflows are sometimes difficult to reuse or adapt to slightly different application because the code is complex to compile or they have many parameters that are difficult to finally adjust to a slightly different images, no. The biomedical image analysis field is quite similar to computer vision in this sense. So benchmarking is heavily used and promoted in a published workflows. All these communities some more meet in so-called image analysis challenges. So there are competitions that are open to a research group to submit their solution to achieve some image analysis tasks. So two important portals are Grand Challenge and Kaggle that you can see on this slide. So Grand Challenge is mostly dedicated to biomedical imaging challenges and Kaggle is way more generic but there are also images coming from the medical and some images also coming from a biological project. There are many challenges that are listed in this portal and you can also download the datasets and sometimes the workflows but actually very few datasets are made of microscopy images especially fluorescence microscopy images which are key to biological research. So it's one weakness we could identify for the bio image analysis community. Also the challenges, sometimes when they address same problems or object segmentation, they use most of the time different data format for the images and the annotations and also sometimes different metrics. So making it difficult to compare workflows coming from different but very similar challenges. Also during challenges, the benchmarking is most of the time not automated. Actually the participants are submitting the results of the workflows even though the workflows are also provided but the workflows are often provided as compile codes for specific operating system so not always very reusable or customisable to other application. And finally by virtue, the challenges are discrete events so once a challenge is closed, it's not easy to add a new workflow and to evaluate it with exactly the same framework. So we wanted to address all this limitation shortcoming and challenges and come up with a software web tool solution for the bio image community that we call BioFlows and that will overcome this limitation. So as I said before, it's based on Cytomine, it extends its code base. It's a system that allows to store annotated data sets so mostly coming from microscopy, optical microscopy that have been selected to illustrate some key problems of bio image analysis. We specified a set of standard data formats for the images, the metrics and the annotations. So it's dependent on the class of problem for instance object segmentation would be a class. The system allows to pack version workflows with their complete execution environment. So this is very similar to what Raphael has described based on docker and singularity containers. The system, the database also holds default parameters that are optimized for the data sets that the workflows are associated to. So you can easily run the workflow with the default parameter and see some meaningful results and then you can start playing with the parameters to see the impact it has on the results. The results of the workflows can be visualized as annotation layer, same type of annotation as Cytomine. And this is new, the benchmark metrics are computed for all the images of a given data set and reported in a specific web page that can be browsed. So the results can be browsed either per image or as statistics over a whole data set of several images. So this is the flow, but I will go fast on this because we will have a short demo at the end of my presentation, but essentially we start from bio image analysis problems. So we have a name describing the problem. Some images, so we have some nails to quickly show representative images of these data sets and also a link to their source where they were collected from. The images can be remotely browsed and then we have workflows associated to a problem. So these workflows can be run on all the images of a data set and the results can be visualized and the metrics can be inspected also for the whole data set. So it's mostly the same interface as in Cytomine where you can adjust also the parameter, you can also document them with this small information button so making it convenient to document a workflow. Maybe for the one less familiar with benchmarking, so this is the recipe of a typical way of benchmarking the accuracy of a workflow. You need to gather a data set with a sufficient number of images so that the results are significant. In our case, we decided to go for the omitive format for the images since it's becoming quite standard and it's an easy way to retrieve the dimensionality and the calibration of the images. Then you need to have some ground truth annotation on these images. So here typically we are detecting tracking or segmenting objects. This reference annotation can be human annotations or synthetic annotations if you use some image simulator to create artificial images. So yes, they're coming either from challenges, image simulators or some time published data set. And finally, you need one or several metrics that can capture the accuracy of the results of the workflow. So essentially comparing it to the ground truth annotation or the reference. To give you a more concrete idea of what a metric can be in the context of object segmentation, a very widely used metric that we also integrated to BioFlows is the DICE coefficient. So in this case, you typically have two masks, a binary mask of which would be the output of the workflow and then another mask which would be the ground truth annotation. And the DICE coefficient is proportional to the overlap between these two masks normalized to the summary of the mask. So to the area of the objects in the mask. So you can probably easily understand that this will be one for a perfect match. So if the results are perfectly matching the ground truth and zero if there is no overlap between the mask. So it's a simple metric. It's widely used, but it only captures some aspect of object segmentation because for instance, if the objects are fragmented or merged, touching object merged, this doesn't reflect really well in the DICE coefficient because the area will not change much. So the message here is that instead of having one metric, it's usually a good idea to have a set of metric to fully capture a given problem. And it's what we implemented in BioFlow. So for every class of problem, we typically report several metrics, one of which is selected as being the most significant one, the primary metric, we call it. So if you would have to only keep one, that would be let's say the most informative one, but still we report the other metrics. So importantly, we have two public servers that are curated by your team. And where we started to gather this annotated dataset and integrate workflow, it's an ongoing work. And now our aim is to open it to the community to get some contribution from developers and also biology groups that would have annotated images to share with us. So I will give you some pointer at the end of the presentation. We also organized a workshop on this topic already that I will link. But at the current stage, we have nine problem classes in the curated instance of BioFlow that are listed here on the left. And then some sample problems for each of these classes. The annotations format depends on the problem class, but most of them are simple binary mask or label masks. So we try to keep the annotation format as simple as possible to make it simple to adapt existing workflows to our format. Because of course it's a requirement so that all the components of the framework talks to each other, so to say. Still virtually any kind of workflow can be integrated to BioFlow's irrespective of its target platform. You might recognize the icon of your favorite platform here on the top right of the slide. The only requirement is that the workflow can be called from common line and parse some arguments such as an input and an output folder and also functional parameters that will be used by the workflow. As I said before, the images should be in omitive format so the workflow should be able to process them. And we also impose that the workflow process is all folder at once. So if there are several images inside, they will be sequentially processed. The way to integrate workflow to BioFlow's is pretty simple. You just have to define four files. Actually, our file already described most of them. Also, these files should be or could be reused from a workflow to another one if it targets the same platform. So we already have many templates for many existing BioImage Analysis platform. So essentially what you need to define to integrate a new workflow is the descriptor of its parameter and of course the code itself of the workflow. But it's pretty simple. This is the whole architecture of the system. So the file should be stored in a GitHub repository that is called a trusted source. Then Docker Hub, which is an online service that is self-managed will compile a workflow image. So a Docker container from these files. And then any instance of BioFlow, for instance here, the created instance we manage will pull automatically this new workflow or a new version if a new version is triggered and make it accessible through the UI. So people can launch the workflow, see the result and benchmark the results. You can also install BioFlow as a local solution and benefit from the workflows that are already available, maybe to apply them to your own images. And also importantly, the architecture is flexible so that the workflow image can run on a different server than the BioFlow server. For instance, a computing cluster you would have in your institution. Finally, for full flexibility, the workflows can also be used as standalone images, Docker images, on any machine running Docker. So without any BioFlow server involved in this case, you could process a folder of omitif images and get the results by just grabbing this Docker image. The main difference with Cytomine, and I would say the magic in BioFlow relies in what we call the wrapper script. So it's essentially a Python script that sequence all the operations that are required to perform the benchmarking. So downloading the images from BioFlow, running the workflow, creating a notation from the results, and then computing the metrics and uploading them to the BioFlow server. So this is the main magic, but it's something that is already written now developed so that that can be reused and the developer don't have to modify it. So here are the links to the two servers that we manage. The first one is our curated instance. So it's read-only. You can only browse the content. This is the server I will use now for the demo. And the other one is a sandbox server that we use to test content integration. So if you want to upload your own images or your own workflow and test its integration. So now I switch to the demo. So I'm just using Chrome or any regular browser is compatible. It's the same as for Cytomine. So first of all, you have to log in to the page. Here I will not log in with the guest account, but with an account that has some more execution rights for the workflows. Here you see a scheme of the steps you should follow. But basically everything starts here in the problems tab. So the Cytomine terminology of project here, we call them problems because every project group a data set that is representative of bio-image analysis problems, you can actually browse them by keyword like nuclei. So here, for instance, nuclei segmentation, you have the link to the source of the data set, some nail of representative images. And if I click on this link, I can see the images that are inside this project. I can also, if I click on an image, see the image, okay? And by default, one layer is always activated. It's the ground truth reference, okay? So you can deactivate it if you want. In this case, it's a synthetic image that is created with a sim set, but it could be any kind of images. If I move to the workflow runs, so what is called app or software in Cytomine terminology, here we call it workflow, but it's essentially quite similar. So here you see all past runs, and you can run a new workflow by clicking on this button here. So you have the list of all the workflows that are available with their version for this problem. So as you can see, we have an extensive list of bio-image analysis, platforms, cell profiler, image, also some Python workflows, and we also have many deep learning workflows. So they are all also Python based, but with some specific libraries. I won't run a workflow here because I have to keep the demo short, but you could do it from there. And then the workflow run just accumulates here in this list. You can also browse the parameters that were used for a specific workflow run from here, and you can also select or deselect the workflow. And when you do so, the results of the benchmark will be displayed in this table. So here we see the results for all the runs that were selected on the top list, and we see the different metrics that are computed for the whole dataset. So for instance, here we have the dice coefficient, and also some other ones. The table is completely interactive. So here we see the average, but we could add the standard deviation of this metrics, and you can also explore the result per image, if you wish, instead of having the statistics. So everything is stored in the database, making it very easy to browse. You can also look at the results visually, so to complement the metrics. For this, you just have to go to the annotation tab here. You select the run that you want to display, let's say for instance, the image run, and you add it to the display. So I could also leave the ground truth, but it's quite difficult to see both at once. So we have an option to also have multiple viewers and display a different annotation on the viewers. The viewers can be linked. So it's very easy to see the difference between the workflow and the ground truth in this case, or some other workflow outputs. All the workflows are also linked to their source code, as Raphael explained. So if I click on any of them, and then here on this button called on GitHub, I have a link to the repository where all the files are stored, so you can browse the source code. Also, browse like a different version of the workflow to see what has changed. Okay, so that's it for the, well, maybe very, very quickly. I can show you that we also support multi-dimensional images, so not only 2D images. Here, for instance, it's for problem tracking, and the images are coming from the cell tracking challenge. Since it's a time lapse here, I can display the objects by coloring them by ID. And as you can see here, we have this multi-dimensional browser, so I can scroll through the image in time and also see the tracks of the objects building. Okay, so that's it for the demo. Just very quickly to finish, I'm almost done. The whole system is completely documented. You have the link here, and also from the UI of BioFlow. You can also refer to the article we published in Cell Patterns. We are also on Image SE Forum. So all these links are available from the Documentation Portal. And also if you are a developer or if you want to contribute some annotated data set, this is very detailed in the documentation, so how to add new images or new workflows to the system. And as I mentioned before, we also have a complete workshop dedicated to this that took place in I2K 2020. So it's on YouTube and you have the link here. I also recommend you to assist this very interesting presentation from Volker Becker from our team on software reproducibility and all the issues you have to solve in practice to ensure full reproducibility. With this, I would like to acknowledge the core developers, but also all the many contributors to this project, so both in terms of code, specification, ID, and content, and also all the institutions that helped us by letting these people working in this project and also, finally, a new BS or cost-funded European project within which BioFlows was developed. So with this, I'm finished.