 Okay. So maybe we can start. Just to check up. Can you hear me? Analyst and speaker? Okay. So, welcome to this first webinar of the Nevada Academy. So we're very happy to see that many of you have joined already more than 600 people. We're expecting almost 1000. Hopefully everyone will be able to connect. I will make a very short introduction to this webinar. So my name is Julien. I chair the bias. And you've probably been in contact with me maybe for the for the registrations. And so we make a sort of summary here of what is possible with the zoom webinar. So basically all participants will be muted without video. We do not recommend to use the chat. Instead, we would like people to use the question and answer panel. So this black man here shows you what you should be able to see from your screen. And you can raise hand and maybe the panelists who are right now visible in the video icons would somehow eventually try to address your points, but question and answers are much better. So please do use them. We don't recommend to leave the meeting. But if you do, you should be able to reconnect unless it's completely full. Okay, so a couple of notes. Hopefully the panelists will try to answer all the questions live. If not, then we will store the questions and we will answer them after the webinar and store them on the image.sc forum in a thread dedicated to this webinar. And finally, this webinar will be recorded and you can upload it to YouTube in the bias channel. So we'll send you the link. And hopefully within a couple of days it will be available. I'm going to try to introduce very briefly what for those who don't know what the bias is and what the bias Academy is so nearby started like several years ago almost eight years ago. Originally with the gathering of some people organizing by image analysis courses at the MVL so mainly caught on your from 2012 to 2017. From there, a small group of people actually gathered and created you bias which was originally an event symposium that happened in Barcelona and Paris. And from there we have moved on to create new bias with so say the luck of being funded by the EU under the cost program. So, new bias organized in the last four years for conferences, and shorten scientific missions gathered over 42 countries, members there off, and then have developed several resources online about by image analysis tools. We find two links to repositories and bio flows benchmarking application. So, most importantly, for today, and the bias also organized 15 trainees pools that gathered in these 15 events, about 415 trainees out of 1000 applicants and we quickly realized that the demand for teaching and by image analysis is huge, and that we could not actually satisfy, say the demand from everyone so from there, looking at the future, we decided to create new bias Academy, which aims to provide sustainable material activities focused on training in by image analysis in many ways. And the first way is this series of webinar and online lectures, which somehow has been set up to be rather intensive, because of the COVID-19 confinement that we're all experiencing at the moment so we hope you will enjoy this webinar and the next ones. And I will introduce the speaker and panellists so Ignacio are going to call us is a bus research fellow at the University of the bus country in San Sebastian in Spain. And so he will be your main speaker today, and he will be held by Steve Alice and Carlos from the University Carlos death in Madrid, and by Daniel size from the PFL in Los Angeles. And so this is it for me. And now I will ask Ignacio to share his screen on top of mine. Okay, if I can. Okay, now I can. Okay. Hello everybody. And thanks so much for the introduction Julian. And I would like to, first of all, everybody for here is such a pleasure in such difficult times to see all the interest in in the world that we do. And also, I would like to thank the organizers and their families as well because of course, to be able to work on at from home we need the support from the people surrounding us, even to be here for an hour and a half we probably need someone else supporting our actions. So, thanks a lot. And that being said, and I'm already been introduced. Let me just tell you a little bit about my background. I'm a computer scientist. I'm usually working on computer vision machine learning in each processing applied to by any analysis. And for many years now I've been also working a lot on developing Fiji and image a plugin. Today, my idea is to make an introduction to the field of machine learning applied to by analysis. And this very nice plugin that just came up a few months ago that is called deep image a. So please, do not worry if you are not so much related to the field of machine learning because we're going to go through the basis when I show the important and relevant concepts and definitions. So we are all in the same pace. And we're going to go from the beginning towards today, the deep learning era that we live on. So for that we need to make an overview of artificial neural networks that we will do until the hopefully the second part of the presentation that will be more or less black particle. So we do this, Julian said, and then we ask courses with people in front of you we can play with the, with the computers. In this case, I will do it, but I assume that you all got on your emails, the links to the notebook, the data, and the presentation that I'm doing right now so you could also follow it later, or even at the same time that I that I do it. And Julian already said we have three very nice moderators with us today. So please make all the questions you want in the questions and answer panel, even if you don't get an answer right away. As Julian said, in principle, you're going to get all the answers collected and all the questions collected, and then we will publish them, maybe either in the forum, or in any other media. Okay, so feel free to make all the questions. Okay, so then let's just start. First thing first, what is machine learning because many people even the ones who will work on it every day we have a rough time to define it sometimes. So when I have to teach my computer vision students what I usually put them is this very first slide, what I told them is thanks to technology in the last decade is not sent to this we have made possible to automatize many tasks that require for humans a significant amount of time, and especially repetitive manual work right. But now thanks to technology and the use of big data, we can automatize tasks that are not only mechanical, but also that require we could call a certain degree of intelligence. So what kind of am I talking about here. Well, let's say that there are some tasks that are easy for humans, but difficult for machines. For example, face recognition, we wake up every day. We immediately recognize the people in in our surrounding not our family, our friends, our acquaintances, our neighbors, especially now that we live all confined with very few people. It's not really about it. It's a very difficult problem, because the end is a lot based on the position position, the illumination if I turn the light on or off if I change my haircut, or if I use a different guy had glasses, etc. So, because of that, for many years it's been a very difficult problem for machine. The tasks are actually hard for humans, especially those that involve working with tons of data, large amounts of data. Do you have something scratching in your microphone? No, maybe I move too much. Yeah, because we can hear all your movements and it's like a scratch. Oh, okay. Yeah, right. Should be better now, right? Thank you. Sorry about that. Things of the life webinars. Okay, so I was saying that even some tasks that are hard for humans due for the large amount of data that they involve. For example, even face recognition is not the same thing to recognize the people that we have around, then to recognize the millions of faces that are, for example, in the Google databases or the Facebook databases. So what we call data mining, machine learning, pattern recognition are all these techniques that have achieved very good results lately in this direction. Making what we call intelligent systems a very important part of research, but also business models. If we know about machine learning, now if we have this on our resumes, it's a big plus. But of course, don't get me wrong, this is nothing new. That was already defined in 1959 by somewhere, Arthur Samuel, sorry. That said that machine learning is the subfield of computer science that gives computers the ability to learn without being explicitly programmed to do so. Okay, such a grand definition, but what does it mean in practice? Well, in practice, we have some data, say some data points, samples, objects, you call it. For example, you may think that we, we have made a vital experiment and we have two types of cells, we have wild type cells, and we have mutant cells. So what we want machine learning to tell us is out of each of those cells, which one belong to one type or the other to one class or the other. So the purpose of machine learning is actually very simple is to assign labels to the objects to the point, indicating their class. How do we represent our points in such a space? We usually represent, for example, ourselves based on some measurements that we take from them. We may have taken maybe the images in the microscope and then we have to measure the area and maybe the average intensity of each of those cells. Then we can represent them based on these two coordinates that we have here. The set of measurements is what we call features. So there are usually two types of frameworks when we talk about machine learning. We have supervised framework where we start with a set of points, the set of cells, sample, from which we already know the labels. So we know which ones are mutant, which ones are wild type. We have the unsupervised learning framework where we only have the cells represented by the measurements, by the features, but we don't have the labels. That's why we call it unsupervised. But the target is the same, even a new point in that so-called feature space because we represent them in this feature space. What is the label that we have to assign to that new point? So summarizing, we have supervised learning where the data label and the target is to build what we call a model or a classifier to automatically label new data. And we do so by a process that we call training. We train on the data from which we know the labels and we try to label the new data. If the labels are not discreet, we have a regression problem. For example, if I'm doing digit recognition, I have images of different digits. The digits can only be from 0 to 9. So we have 10 different classes. They are discreet, then we have a classification problem. If I have a facial image from which I want to estimate the age, the age can be basically any real number, 27.3. Well, then it's a regression problem. In the unsupervised learning framework, the data don't have labels, and the target is to model that data. So we discover the groups inside that data, what we call clusters. So that's why we call it cluster. So a few definitions that we may find in any literature based on machine learning. We say that every sample, every point, every cell is represented by a feature vector. We have this set of measurements. We put them in a vector. So this is what we call the feature vector. The features can actually be qualitative or quantitative. Usually we work with numerical features, but they could also be qualitative, like small, big, short, tall, etc. If we are in the supervised learning framework, then usually the classes are pretty fine. We know which classes are available, and they're usually many less than the number of samples. And usually every sample belongs to only one class. Usually there are different configurations, but in general this is what we do. And each class, in principle, ideally it has plenty of samples that are similar to each other, and different to the samples belonging to the other classes. A very typical problem is what we call a binary classification problem. Imagine that we have images of biopsies, and then for every image we have to say if it belongs to the class tomorrows or non-tomorrows. So two possibilities, two different classes, binary classification. And then we define what it is, the dataset. The dataset is simply the pair of, for example, the feature vector and the class. If we are in the supervised framework, or just the feature vectors, if we are in the unsupervised framework. Okay, so when we say that we are creating a classifier or building a classifier, what we're trying to find is a function that relates the feature vectors, each feature vector with a specific class. You could find in the literature some classifiers that take what we call a soft decision. Okay, maybe they have a first step where they provide maybe the probability of belonging to each class, or directly the final class. We say this is a hard decision. It applies maybe a threshold to them, to the probabilities, and it tells you directly, oh, this is wild type, or this is mutant. It tells you, okay, this is 70% sure wild type, 30% mutant. It depends on the classifier. Okay, so when we say that we're training the algorithm, what we're actually doing is adjusting those parameters, so that the function may have to minimize an error function. The error or sometimes called cost function or loss function usually in neural networks is basically the error between the expected output, if we are doing H estimation with the expected H, and the predicted H by our model. Okay, very good. So now you may have the question, okay, but now how do I select the classifiers? How do I evaluate the performance of the classifier and compare them among them? So I say this one works better than this other one. Well, very typical thing. Let's say if we have, again, we have, let's say, a binary classification problem. Well, we may consider we have our positive class and our negative class. Okay, so at the end of the predictions, we can compare the predictions with the real classes. What we say is the ground truth, the real classes of the samples, and the predicted labels, and we say, okay, all the samples select the predicted as positive. How many were actually positive? So this will go through positives. If they were wrongly classified as positives, they are false positives. Very simple, right? And then we have the same for negative class. We have false negatives and true negatives. So we count all of them. Then, of course, they have to sum up to the total number of samples. And then we have all the positive samples could be in the original labels, the true positives plus the false negatives, and the original negative samples, the true negatives plus the false positives. So out of these numbers, we can get different performance metrics. I put them here, all of them, because you can find them in the literature, but the most common ones, let's say, the typical ones are the true positive rate, or what we call the recall, which is the proportion of positive samples, correctly classified, out of the total number of real positives, how many systems are true positives. And we have also the precision or the positive predicted value, not to get confused. This is the proportion of the samples classified as positive, everything that the system said that was positive, that were actually positive. Those two metrics, they have values between zero and one, being zero, disaster classification, disaster prediction, and one of the best classification possible. Of course, usually we have to have a trade-off between the two of them. So to summarize them, you may find in the literature, instead of just the two of them, you have the F score, which is the harmonic mean between both of them. Again, it's between zero and one, and being one of the best performance possible, where you get everything right. Okay, so the next question. Yes, so there's a question of whether it is possible to evaluate the classification of the supervised machine learning method. Yeah, of course, there are other types of metrics for supervised learning that have to be, you don't know the labels in order to do this. You have to measure, for example, how compact the clusters are, all these type of things. In general, either you do that, or you get some labels, you try to do it and supervise, and see how well they actually compare with the actual labels in the end, with very similar metrics to those. Next question. Yes, sorry, there's another question. So the strategy you talk about with the true positives and negatives and so on, they said that it's dependent on the prevalence of classes. So how is it possible to generalize these measures? So you do it per class. Usually you have a multi-class problem, you only have a binary, you don't have a binary classification problem, you usually have to calculate those values per class, and then you get usually the error metrics per class. Of course, you can always provide the last metric, the error rate out of your total number of samples, how many were actually correctly classified as the real class they had. So I keep going. Thanks for the questions, by the way. So the next question that we may have is, okay, now I know how to compare my classifiers, how to compare my models, but what do I do, how many samples I choose in order to build my classifier? You could be tempted to use all of them. But then usually it's a very bad decision, it's too risky because you take all the samples you have and then you find a very nice function to fit all of them, maybe you are optimizing too much for the training set for your samples and then a new sample that is, in principle, in the same distribution of samples, it will be considered as an outsider, which is not. So to prevent this type of situations, what we usually do is the typical thing of dividing our data set into two sets, a training set and a test set. The training set is usually about two thirds or 80% of the data that we have and the test set is the remaining part. In the sense that, again, we're going to train our algorithm, we're going to optimize the parameters only for the training set but we're going to evaluate it in the test set. So when we apply the classifier to the test, the classifier is seeing samples for the first time. It is a way of knowing how well it is generalizing out of the training set. Another option, depending on the number of samples that we have, is the cross-validation. It's usually called K4 cross-validation. We divide our data set in K different groups, K usually five. Here the example that I put you is a five-fold cross-validation work. I use one of the groups, one of the falls for testing and the other four for training. So I build my model, I calculate my metrics, and then I store that metric. Then I repeat the process from zero. I create a new version of the model where the test set is now another one and the others are training. So I repeat this and every time the test fall changes. And what I do is I provide as metric the average of the five of them. This is to ensure that at some point all of the samples that I have are going to be used for training and for testing. Just in case if I do the training and test partitioning, maybe I'm putting all my easy samples in the test or all my difficult samples in the test and this is not very representative of the real problem. Okay, so in a classic way of doing things and bio-image analysis using machine learning, I say classic because let's say before deep learning, we used to have this kind of design cycle for our classifier. We would have our scientific question from the real world that to the answer we have to take our bio-lake sample, put them in the microscope and create a data collection an image collection. Usually we preprocess them to have them with all the images more or less in the same intensity, sizes, etc. And then we perform what we do, we call feature extraction. We take all those measurements out of the images to represent the images or the pixels depending on the level at which we are working with a feature vector. The same thing that we just mentioned before. There's once we have the vectors, we can jump into the supervised learning framework or the supervised learning framework. In this case, I'm putting you here, the supervised learning framework and usually you have these four classic steps where you do first feature selection because sometimes... Ignacius, okay, there is a question and also there is the scratch on your switch shots restart, so be careful. So the question is the following, how do you decide the proportion between the test and the training set? So it depends on the number of samples you have. Usually there are actually now libraries that create the proportions for you and they try to maintain the distribution of the classes in the training and the test set. That way if you manage to have a good classifier for the training set, then in principle it should work for the test set and the training set is also representative of the test set, everything that is supposed to be out there. So what I was saying is that, okay, when we do the feature selection or what we used to do the feature selection, we usually get rid of features that were redundant or were repetitive or maybe they were in contradiction with other features, so they were not informative enough. Then we would have to do the model selection, select which type of classifier are we going to use and then based on this partition that we were just mentioning we would do the training on the training set and the evaluation in the test. If we get a satisfactory result, then it's fantastic, we can say we're done, we can apply that classifier to any new image that we take under the same conditions. This is very important. If we want to reuse that train classifier, then we have to take images and do exactly the same preprocessing and feature extraction. Then the performance in principle should be maintained. If we don't get a satisfactory result, which is typical, then we have to go back and revisit any of these steps to see if we need to do a different feature selection, use another type of classifier of different parameters, or simply revisit how we did the partitioning between training and test. What about the type of classifiers that we can choose? Well, all the way until a few years ago, of course I told you this is something that has been going on since the 50s, so we have plenty of different classifiers, linear non-linear classifiers in the literature. The most popular right before the deployment were support vector machines, or called SVMs, or the random files. As we will see later, artificial networks were popular a little bit before, but then they went a little bit into the darkness for a few years. In general, what they do is they try to find separations between our data points. Here we have a very nice representation that I took from Scikit-learn, which is a machine learning library in Python. Then you can see different types of classifiers, and the borders that they find in between this training data and this two-dimensional space. So you see that some of the solutions are linear. They find some straight lines in between the points. Some of them are less linear. They also find that they try to generalize in the test. So, because we're going to talk about artificial neural networks, because we need to talk about them in order to get to the learning, we have to go a little bit back in time. Sorry, there's one question. Does a human select which features are relevant, or does the machine learning algorithm classifier select use? There are actually also methods to perform feature selection. There are typical ones, like principal component analysis. You can take all of the most informative components, and there are also just itself another field of study where you can apply these methods to get the most relevant features out of a big set of features. So, back in time, as I said, we need to go back to 1956, which is the very first step towards artificial neural networks. This is the perceptron, which is a linear classifier. As I mentioned, now it's a very simple classifier. It's a function that takes the vector. The X would be the feature vector that we just mentioned. Then it applies a simple formula. It applies a matrix W, a matrix of weights, plus a scalar, which is what we call a bias. If this operation is positive, then it assigns the vector, the sample, to the positive class. If it is smaller or equal to 0, then it assigns it to the other class, 0 or minus 1. So, we're trying to define this line over here. So, why was this so important? Well, there was a big excitement about it back in the 50s, because there was actually an algorithm that was able to find the weights of that function by example. So, it effectively provides more and more samples to the algorithm, and then it finds the most appropriate weights to separate linearly the set of points. So, at the time, it was the first time that it was proven that a machine could learn. Like, just by providing more and more examples, it would have some memory, and it would adjust better the weights. How does it do that? It's very simple. If we apply the formula, if the weights that we use correctly classify the sample that we are working on, then we don't touch them, because they work. We take another sample. If they made a mistake, then we correct the weights proportionally to the error committee. So, we keep doing this, and actually the algorithm stops once it finds a linear separation between our data points. It only converges if there is a linear separation between the data. Basically, we're finding the best line to separate my two classes. Okay. As I said at the beginning, so we call the perceptron as a linear machine. So, it can learn things that we say like an or predicate, and we call it in computer science, that we can separate with a single line. But when we have seen things like the export predicate, we have samples that we need to separate with two lines, then that doesn't work. There was also another problem that was mathematically shown that the linearly separable problems were really unlikely when the number of samples was much larger than the actual number of features per sample. Okay. Well, no problem. They figured out and said, okay, we use two perceptrons. Mathematically, well, we just put one start with the other. That's what we say the perceptron is a linear layer. Then we can create these two lines. So, no problem. Actually, it was proven that if we stack one more of a perceptron, so a perceptron with two hidden layers, it can essentially solve any classification or regression problem. Okay. Any. So, it's fantastic, right? It was just a little tiny problem that was that there was no algorithm to build, to learn from samples as there was with the single perceptron. Okay. So, at the moment in the 60s, at the end of the 60s, this led to what is known as the first winter of neural networks and, well, the concentration of the F-14 artificial intelligence towards symbolic systems. Okay. When did they come back to business? Well, we can summarize the early work in perceptrons by saying that the architecture was right. Well, essentially, they could solve anything. But the training approaches were wrong, or we didn't know how to train them. But things changed in 1986 with the book of this grand title, Parallel Distributed Processing and Explorations in the Microstructure of Coordination, Psychological and Biological Model. Okay. From a group on which Jeff Hinton was already there. Jeff Hinton is one of the stars now on this learning, right? So, behind this grand title, what it did, the big step that it was taking is that it moved learning towards error minimization. Basically, it moved artificial intelligence to optimization. So, suddenly, we could train, there was a way of training networks in minimizing an error function. So, suddenly, stack perceptrons, what we call multilayer perceptrons became highly flexible and efficient, non-linear regression or classification machine. How do they look? The general organization is as follows. You get one input layer, so you get the feature vector right here and then one or more hidden layers in between and everything is fully connected, okay? When I saw all of these arrows, it means that in every arrow we have the weight, okay? So, we need to calculate the value at this node, at this so-called neuron and we have to sum up all the values of the previous neuron multiplied by the weight on these connections, okay? This is fully connected and it also fits forward processes. So, we pass from the input layer, we calculate all these operations and we pass them towards the following layer. Actually, on each neuron, they usually have an action point to make this a smoother. This has to do with the way we train this, okay? At the end, we have set one could be defined as a highly non-linear weight dependent because it depends on all the weights that we have from these connections at transformation and the way we learn the weights is by minimizing suitable error function. What kind of functions? The same as I mentioned before we're trying to learn the IDH out of a feature vector, then we can use for example the minimum and square error between the expected age and the predicted age, okay? And the error is actually used to calculate the different changes that we have to make in the weight by a very elegant algorithm called back propagation that propagates the error from the output layers towards the inner layers, okay? And with the change rule, we can actually change the specific error contribution. So this went on very strongly until the late 90s when new relevant contribution secrets new competitors appear as I mentioned before, boosting support vector machines, abandon forests that were actually maybe more interpretable and faster and this led to what is known as the second winter of neural networks. There was also a very big problem when I'm trying to go deeper in terms of number of layers, okay? Especially when we say a deep network, we mean that it has three or more hidden layers, okay? So starting with five layers we see it was proven that there was the problem of the vanishing gradient. So the gradient of the error were propagated towards the input suddenly became basically zero. So stacking up more layers wouldn't help at all, so we couldn't use deeper networks to solve other problems, okay? So things change in, because of course you know that we overcome these problems to get to the point where we are now. About the year 2007 that was the first breakthrough were two groups, one from Hinton, I mentioned before, one from Benjo. They created ways of pre-training these networks, okay? And then they could later continue training by simply fine-tuning and using standard back propagation, okay? So suddenly the flag gets open last night with huge number of ways and new types of activations layers, regularizations, etc. were created. There was a completely new mood what was impossible before. It was now much easier and it was actually leading to major restitution and very significant problems, especially in natural language processing or in computer vision, okay? Basically new techniques appear not so different from the previous one, but now thanks to technology and these new strategies could be applied and we're actually working very well. Just to mention a few hits of some years, people claimed that the revolution of this learning really started in 2012. That was the year where a convolutional network, we will see what it means in the next slide by a group, but a group from Hinton won the InichNet competition. The network was called InichNet was this competition where there are millions of images and the models have to categorize to which class they belong. There are thousands of categories. It was the first time that such a neural network won and it did by a large margin and after that, basically they feel moved towards deep nets and every year we get new models between the previous models in this competition. 2013 Google hired Jeff Hinton. Of course, this big company had a big interest on this type of model. They have huge data and a lot of data in the form of images. In 2014, Facebook hired Jan LeCun that was the father of the first convolutional network. A couple of years ago, this trio, Benjo, Hinton and LeCun, they won the Keyvalent to the Nobel Prize in Computer Science which is the two in a row. Two important things in order to know about deep convolutional networks. Just with these two layers you could get basically most of the architectures. The convolutional layer on which it is a specific operation applied to images because they realize that basic neural networks applied to images were not very efficient. You have a fully connected layer then you have too many weights because you have one third pixel and then multiply by the number of layers that you have and it doesn't take into account the structure of the images. The pixels that are nearby, for example if I have pixels of a cat basically many pixels that are nearby they belong to the same cat. The information is not shared, it will take fully connected layers. They replace that by a filter in operation with convolutional filters. If this is the input image to the network, we pass a filter of maybe three by three represented here in yellow and then we just multiply that matrix with the pixels below and then we replace the center pixel in the output image with the summation of all the values. This is what we call the compose feature. Basically we are learning filter versions of the input image to enhance some characteristics of the image. Then we would like also to learn a different scale in the image. For that we have a subsampling operation, what it is called a pulling transformation. Basically we have an image like this and we do a pulling of two by two. Every two by two pixels I select one but it is the one that is going to be in the output. If I select the maximum value of the two by two pixels, then I am doing a max pulling operation. If I select the average I am doing average pulling. Those operations are the important ones that I want you to keep in mind because most convolutional networks are actually a combination of these two. We get the input image and then we calculate some filter versions of this convolution that I mentioned here. I am showing an example with just four. Out of the input layer we have a convolutional layer with four filters. It has a chance of output of what we call feature maps and then we apply the pulling operation to reduce the scale. Now we have an image that is much smaller. Then we get the convolution pulling, convolution pulling and that way we get features of different scales all combined until we get a representation that is very, very small. It could be even a single vector, the same as we had before with our measurements. We have calculated that for a specific problem. Here we could use any classifier once we have this representation of the image but an option would be to use fully connected layers, a regular multilayer perceptron so we can train the whole thing in an end-to-end manner so we optimize the whole process for the specific problem that we want to solve. This is the typical convolutional network and we can have this with connections and weights in the millions. This could be very, very large. As you may think, this has plenty of applications especially in my field in computer vision, for example if we are not working with objects but with pixels, we have semantic segmentation so if pixel gets classified as belonging to one class or another, here for example it's grass, grass, trees, sky or we could do regression to calculate the bounding box of the objects in the image and then classify the image in a multitask way. We are detecting objects that are doing regression and then classifying them. Actually, the modern architecture, they do everything at the same time they calculate the different objects in the image by these bounding boxes, they classify them as belonging to one class or another and also they classify each pixel to get the segmentation of each other. Okay, so this is for natural images what about... Ignacio, there's one question there's a recurring question that the participants are asking is how first weights are initialized? A very good question you could do it at first at random, okay but when we were talking about this problem of vanishing gradient there was a lot of years of study on how to prevent that and then there are different techniques usually in the modern libraries you can just use the way you want to initialize the weights based on one of these techniques and you can also make it smoother later for the training, okay but at random in a specific range of values between minus one and one could be the first way to do it. Okay, so what is the current situation on deep learning by image analysis? Well, everything started a little late compared to the 2012 revolution, but we got there now starting in 2015 2016 we started to see many applications using these type of architectures for example for mitosis detection for 2D or 3D segmentation usually the network would predict the boundaries between objects neurons, neurites, cells etc and then we would pre-process doing something else. We have very nice networks from some of the members of the PG community a couple of years ago the CARE Networks context awareness restoration that allow us to increase the quality of our images especially in the C direction that was a very, very big deal or even we can do artificial super-resolution there's something called DeepSpawn that artificially thanks to DeepNet we can take a low resolution image and convert it to higher resolution something similar to what we're going to do in the practice time but with electron microscopy. So when it exactly started in the field of bio-image analysis I would say we look at the number of publications related to microscopic image learning well you see that it doesn't move much in about the year 2000 but there's a big peak in about 2015 this is when the first effective deep learning architecture dedicated to bio-medical image segmentation was created the unit which is almost the standard now in the field with its 2D version in 2015 and the 3D version in 2016 also in that year 2015 there was the obedience of some user-friendly libraries from open source based on TensorFlow or even some commercial ones like the toolbox from Mamba so that facilitates things to people that are not so hardcore computer scientists but they want to work with this type of models. What is the unit? It's very similar to what I told you before it has two paths one we'll call the contracting path and the expanding path the contracting path is exactly a fully convolutional network you get convolutions which are these blue arrows and then max pullings which are these red arrows so we go from passing to a different scale convolving and going to a different scale until we get a very narrow representation but they came up with the idea of then having a way of increasing again the size of the images by doing the convolution and more convolutions so in the end we can come up with exactly an inch of the same size they also have these connections in between that ensures some stability between the future mass of the same size. So this is great because then we can have an image as an input and another image as an output so you can do pixel to pixel segmentation super resolution as we will do later or other type of problems with medical and very medical images. So what do you need as a bio image analyst to start playing with machine learning? Well first you need to have a problem to solve that is very well defined you want to solve something like classifying cats and dogs well then your images really need to be cats and dogs and not something in between because otherwise your network is going to fail miserably. If you're doing segmentation you need to have annotated data that is consistent meaning that if the same person annotates the same data twice it should be it should coincide it should be basically the same thing or if different annotators they shouldn't have many differences among them and of course you have to have high enough quality data I cannot emphasize this but deep learning is not magic you can do pretty much in things but if the quality of your data is not as the quality of the data then when you train your images then the results are not going to be the same. Don't expect miracles. In terms of infrastructure you may need to buy some hardware now we will see that it is not as important as before where we used to buy a lot of GPUs these networks run much faster on graphics processing units in orders of mind faster than in CPU but now we can also do it even for free as we will do today in some cloud platforms such as Google Collab Kaggle or even some paying ones like Amazon you need data as I said that needs to be manually annotated by experts and they have to represent the real scenario of the problem in general we used to say the data has to be large enough to train the model and evaluate it depending on the problem you may not need so much data if you just have let's say a few big images what you annotated maybe all the cells what you could do is perform what we call data augmentation maybe you can crop the image into pieces and then pass to the network instead of the big image just the small versions of it and it learns to maybe segment the small versions you can also artificially create new versions of your images by doing optical transformations rotations noise etc anything that makes sense depending on your data of course if you are doing segmentation you rotate yourself to make a new version then you also have to rotate your segmentation so in terms of resources for data notation I work in FIG and MIG I would recommend you to use MIG but there are other options I put them here you need data to start playing with there are repositories in Kaggle and you also have the self tracking challenge that's been in running for a few years now regarding deep learning software you want to get your hands dirty with the real stuff most libraries now are based on Python the most popular ones are TensorFlow just from Google and PyTorch which is from Facebook and then but you also have the toolbox from Matlab or even from the Berkeley University that has well, it's been used a lot the first unit for example was implemented in Kaggle and regarding you are in the other side you are in the known expert with the deep learning software side then you may want to try user friendly software as the one that we're going to introduce today we have some options in MIG and today we're going to talk about the MIG but there's some other bridges between this software and the friendly software for biomechanologists like platforms like a self profiler or elastic that have also incorporated this type of solution so let's jump into our plugin for today you want to say something? yes, so there's a question how can we assess whether there is overfitting or not that's a very good question again so if we train we get very good performance metrics in the training but then suddenly very bad performance in test then of course we are doing overfitting what we usually do is when we have the training set we separate one part of the training set apart from the test that is called validation it's going to act as an inner test for the training so when we are adapting the network weight we are also looking at every iteration to the resulting validation and if the resulting validation starts to be bad then we stop the training there's a way of preventing overfitting we'll do that in a few minutes so regarding the friendly solution there's another one so if the manual annotations if the manual annotation of data is too complicated is it possible to train a sequential network on partial annotations and to get eventually good annotations like to use some of the run-through that you get from some training and then reduce the gain it's not the standard way of doing things but there are more and more solutions and architectures and models proposed they use as far as annotations and they can improve also from not so good annotations okay the easiest thing you want to start with is to try to get data that can be annotated or that is being already annotated by experts and then you can start from there and then adapt it to your own data okay okay good thanks so regarding DPMJ DPMJ as I said it's part of this friendly plugins for using DPMJ DPMJ deep learning solutions and bio-image analysis so the motivation was clear there was a big jump in the number of methods and applications of machine learning in the literature in terms of publications every year there were many powerful methods proposed that were better than the classical approaches but there was a problem there was a need for high-level programming skills in order to implement them okay so the creators that you have here as moderators of DPMJ use their friendly and general enough software in the field and they came up with this solution the idea is that we can use the open software solutions to create deep learning methods such as TensorFlow, PyTorch, Cafe and then use a Java API to talk with MEJ which is the user friendly part okay and of course there was some previous work on it on which DPMJ is based on and the idea is to create something that is functional so you can integrate new models they can process all type of new data and it's very general so you can make it compatible with all type of different network architectures so those are the basis for the creation of this project so enough here to interrupt you it seems that the noise on your pull over is still starting again so maybe if you can just hold the microphone a bit away from it I should not move that much sorry about this sorry, is it okay now? okay, thank you okay so in any case in one single slide this is what DPMJ does it connects it breaches the hardcore training developers with the final users which are the bio-image analysts the people who usually work with Fiki or MEJ but they don't necessarily know about all the details of these hardware networks okay so from the developer side it offers all sorts of things because the developer can implement and train his or her models using for example these Python libraries and then it creates also through the plug-in the so-called bundle model which is basically just a folder but it's going to have all the information of the model stored in an excellent file this is then transparently to the developer as we will see and also then you can store the weights store some images and some pre-processing and post-processing and macro to normalize the data as we will see next so the only thing that the developer has to do is train the model open it in DPMJ create that folder and then upload it once it is done to the repository that is in the web okay so already in the web you may find solutions for problems that are similar to yours such as fault noise in, segmentation of different types of cells or super resolution all the core networks etc from the user side it's also very simple as we will see just go to the repository you select the the network that you like it's a zip file you unzip it you put it on a folder that is supposed to be inside your MEJ application that you have to call model you put it there and then it's already readable and usable from the plugin and then you can use that network as a regular standard plugin you just call it from MEJ it needs an image that is open it's output in another image and it has a pre-processing step which is an MEJ macro very very simple usually in the pre-processing step you perform normalization to make your image look as close as possible to the images on which the network was trained this is very very important because the networks are very specific of the training data if you don't do that you won't get results that are enough or close to respective results then you click on run you run the network everything is transparent to the user and then you get the output that is actually also post-process if you have a post-processing macro it's also provided by the developers but you can maybe do what I said you can have some measurements ok so the idea the proposition as I said is to have an image that is the input an image that is the output this is the standard way of things right now that is to keep things very simple everything works as an MEJ plugin it's recordable from macros you can also call them from macros it's acting as a unified interface from TensorFlow models so far everything works with TensorFlow models although there are more and more different platforms that are supported it's already out of the box you just need to install the plugin that's regular installation in MEJ you put it in the plugins folder and then once you restart MEJ run out of the box you don't need to have anything installed related to GPUs or Python or anything so that's pros and cons it doesn't run on GPU so it's slower than a regular GPU implementation but it runs multi-threaded so you can get also a reasonably fast result then you get the models that are from the repository it has two sides it disposes the models from the developer to the end user and then the end user can get something that is already working this offers also for the developer a tool to put the models in the format that the end user is going to use okay so that's for all the theoretical part let's try to do some real action so this is what I call the hands-on tutorial what I'm going to use Collab which is this cloud system from Google and DPMEJ so you follow if you want to really follow the tutorial you can do it by following my presentation there's the data it's on this link you have to add it to your round drive okay this is important if you download it it's going to take longer you can just aggregate it to your round drive and you put it on the standard folder my drive then everything is going to work out of the box and there we go and we click on this link from the presentation that is going to open my notebook you want to run it yourself then you have to save a copy in your drive otherwise you can only see what I do so yeah yeah one question on which data can we use the models of DPMEJ that we have in the models folder you have to be related to the data on which we train the models so you in principle you can run it on any type of data it has to be the same format if it then was trained on RGB images they have to be RGB they have to be the scale the problem is as I said at the beginning that these are very specific networks remember that the weights of the filters are trained specifically for the data so it's going to work better if your data looks exactly like the training data of course this is always the case there are ways of normalizing so you get for example to historical matching and trying to make them artificially as close as possible to the ones that you are using but in theory you can apply them to any image that is open in an image as long as the format is the same okay so first things first we can go to the to the notebook that I open here and this is a very nice environment if you're never working it it's for free and you have an email account and then once you open it see now it's disconnected but I can reconnect it's going to work in the cloud in a system that is available from Google if there are not either the money is not too high I will get a GPU you will see it's initializing it's connected and now if I click on Manage sessions then I see I have a GPU this is great because things are going to be much faster so the main idea of this notebook that I created for the last school of Nubias a month ago was to use a unit to perform super resolution so we get a low resolution image of an electron microscope image and I'm going to convert it into a high resolution image with a much better definition then at the end of the notebook I'm going to show you how to download the train model to reuse it in deep image okay so first things first we have to install Keras that are compatible with with DPMJ so this works right now with the version that is available of DPMJ so even if I default TensorFlow has now a much higher version then the first thing you have to do is to run this in order to install the compatible version and I'm going to do it live because we don't have so much time and it may take some time on the connection so once everything is correct even if you get some error messages that you see here it will say successful installed Keras then the important thing is that you want to have access to your data and you want to have it in the cloud so for that with these two lines we can mount our own drive directories into Google Cloud this is a little bit dangerous so it asks you to have some permissions so if you go to this URL then it asks you to validate who you are and then it gives you a code that you can just copy paste it here and then it would say mount it slash content slash drive when this happens then you should have here your drive folder as well so if everything went fine then you should have access directly to the directory that I just passed you with the data so it's called Neuvias TS-15 because it's from TS-15 school images and it has two folders train and test so on every so-called set of code you can run it by clicking here and then you will see usually an output here for example I read the list of images I know that I have 100 images for training loading them into memory is what takes more time in the whole process apart from training the network but usually it takes about 1-2 minutes maximum I open one of those images here so you can see them 1024 by 1024 8-bit pixel image with some mouse cortex and neurons of high resolution so since I only have 100 images and they may not be too much and also they are very big for the standard networks usually the inputs to the networks are 256 512 by 12 then what I can do is what I mentioned before I can create artificial versions of it I just crop the image into pieces so I'm going to create out of each of the images a 4 by 4 crops of 256 by 256 here you have the code you run it then you get images that look like this this is the first patch that I created so out of each of the 100 images now I have 16 times 100 all I have 1600 images to train from much better then based on this paper that came out a few months ago we can simulate the low resolution images basically by them sampling applying some Gaussian blurring so in order to train on low resolution images if I don't have them I can create them like this in this case the input image is again of the same size 256 by 256 but as you see it's much worse than the resolution of the real image so I'm going to train inputting this type of images and considering this my expected or my desired out then I'm going to define a network again based on the unit with some differences experimentally I found useful for this notebook because it works well and it is faster so I used again three steps three levels the input is 256 by 256 have some convolutions instead of max poolings which are these red arrows these yellow arrows are convolutions with dropout so we are doing regularization that also helps and then we go up again here by doing up convolutions called deconvolutions to go up and recover the original side I use Keras if you are familiar with Python it's a very simple high level code because you just say the input is defined with a specific size then for each layer you just put one line and say my first convolution is 16 filters of size 3 by 3 specific activation here is the initialization of the weights this is the technique applied to initialize the weights and then I want the output of the inputtings to be of the same size so I put Python same and then I just flag the previous layer so by doing this I hope it's readable enough you just go creating all those parts on the graph that I represented here so these four lines of code are equivalent to these arrows and this first two arrows and first two lines once we have that and in Keras we have to compile the model and then we have to tell the library which is the optimizer that we're going to use to find the best weights and also which is the loss function this is the error function that we were mentioning before which is the error that we want to minimize since I'm doing pixel to pixel super resolution then I'm going to use minimum and square error which is very typical for that I can also add a secondary metric for the training but also for showing how it is doing in terms of another metric okay so once I run this I get a few texts telling me all the details of the sizes of the layers and then as you see in the last layer then I get an output that is 256 by 256 by one it's just one single channel and I have the total number of parameters as you see here is almost half a million parameters and I did recently a small network then for the training we get a few important parameters to set as I said we have the validation split this is what we mentioned before so the validation set how big it's going to be I created with 10% of the training samples so I decided when to stop based on the last the error committed in validation when it starts to go up I stop and that is defined with something called patient how many iterations or here I call epochs how many epochs I'm going to wait until the error validation changes up epoch is a very typical definition from artificial network it's nothing but just the number of times that the train set is seen by the network so when we are training basically we could say iteration by iteration is used for something else so the epoch is how many times the whole data set is seen if we train for 10 epochs then the 1600 inches are passed 10 times through the network and then we have something called batch size which is related to both how we train and the GPU because these are the number of images that are going to be trained at the same time in the GPU but also how many images are going to wait until we update the weights so the larger it is the faster is the training but we have to play with that value in general the important value for training is the number of epochs and the batch size okay so if I normalize the data in order to have it ready for the training I could just click here model.fit I introduce my training set and my validation state just a parameter and the batch size as well then again I'm not doing it interactively because we're running out of time but you will see if you run it at home those epochs by epochs and this is the training loss and the metric that we selected for the training set and also for the validation set in this case it's called val loss or in this case val metric or val mean absolute value then you see how epoch by epoch the loss goes getting smaller and in principle the validation could also float to it because those are images that are not trained once we train we can plot it and see what it looks then the blue line is the training loss that always goes down and then the validation it actually floats to it but it wasn't but the validation is even better than the training results this is for the loss and this is for the metric that I chose once we have the model trained in the training set then we can apply it to our test images and see how they look here I put you 10 you applied to this other you run this cell you load 10 images you see that they look pretty similar the ones that we have we have to again calculate the same low resolution images exactly the same way we did before if we convert them to images between 0 and 1 we have to do the same thing and then we can apply them the whole network we can evaluate them by calling this method call evaluate that gives us the loss and the metric in test or we can actually plot them which is quite interesting and show some of those images and see how well they do you see how these are the input image to the network again this is test so the network has never seen this image before this is the ground truth and this is the output so I would say for the little training that we had and the small network actually did quite a good job now we have recovered some of the images in a more decent way the jump from here to here is actually quite impressive ok is Matthew there are some questions related to the activation of the convolutional layer so why do you choose exponential linear units in the convolutions in general and then for the last one why did you choose sigmoid yeah so those are images that we have to do well for the the final sigmoid well this is very classic but for elu or other kind of activations this is all trial and error ok for me most of the decisions that I took to create this network were based on trying to make it smaller so I could run it fast although you can see that depending on the GPU that we are assigned to train in we may have each epoch this case is 21 seconds sometimes I connected to well collaborate and run in 10 seconds sometimes it is 1 minute so it really depends but my idea was to show you something fast that have reasonable results so elu worked better for me than elu average pooling worked better than max pooling so these kind of decisions I took by trial and error so this is a very experimental part of the network usually the idea is that you want to start playing with something like this then I would recommend you to start with my configuration because you know there is something that works at least as well so you should replicate it and then you start playing with your data to see if you can find something so once we have the model train if we are a developer we can pass this to our collaborators how do I import it into dpmj this is very important I can just run this piece of code save the model into a folder it would get saved directly into the root folder of colaf as it is but it is not a big deal you can look for it later and make sure that it is there for example here you see that when I did ls it appeared here as a folder called save underscore model and then I save it and download it so this is basically saving the model the way TensorFlow knows creating a zip file and then it would get downloaded through the browser once I have this I am going to follow the presentation again so I don't do anything wrong hopefully so we are already here you follow it all the steps then when do you stop the training when is it acceptable when comparing ground truth or is it because you used a specific callback or how did you stop the training yeah I used this callback I show you it is called earlier stop this is the importance of the patient so when I was using patience I said I set the patience to 5 so basically it means this is better seen in the graph if the validation fluctuates more than the patient times then it stops so you show values for patients are 5 or 10 20 dependent on your data again this is something that you have to you have to fix but what I usually do is I leave it running for a specific number of pet box until it reaches a plateau okay and then in that plateau you don't see much improvement then you can say okay maybe you have to use a patience in order to stop it before because it's a waste of time right or sometimes the validation starts to go up up up because if your images in validation are not that similar to the to the training set then it is really a matter of stopping before okay in this case they go down nicely at the same time so it's not a big deal you could also run it for 20 30 epochs and the results could be very good so we have already downloaded our folder it's called safe model it is a file when I downloaded it it is basically it has the same name.pv and it has a set of variables here that is actually the weight of this is what I need if I am a developer in order to create the bundle model for DPMJ so I'm going to follow the steps so I don't do anything wrong I said okay I unzip the folder and whenever I want I open Fiji and then I call DPMJ so if I install it correctly DPMJ should be under plugins and then DPMJ if I call DPMJ plugins DPMJ build bundle model this is where we are going to import the model that we train in Google collection okay so I call it so it gets open and now you have a bunch of descriptions here you have to browse and select the folder where you get your your model in that case I think I put it in documents presentations that's gonna be interesting I think I can put it directly come here there you are I choose it I have to choose the folder even though here says drop TensorFlow model protobuf you have to select here when you click on browse you have to select the folder then we click on next we see that the path appears here we click on next and then it loads the information here says it's a TensorFlow model with some specific characteristics you can actually visualize it if you want as the graph size etc well we are not interested on that we can keep going on so we go next here we see that it read the image size 256 by 256 256 by 256 so everything goes correctly I click on next and now I'm gonna keep looking at the presentation so I don't skip anything ok everything is correct now we have to select which is the path in size this is important because now you see images that are larger than the input image of the network we're gonna use a patch strategy ok so when there are patches that are in the borders we need to figure out what to do with the pixels for the filters that are in the borders so we have to include some padding here usually the path size is 256 but I'm gonna add 64 for example this works well then I click on next then I have to fill the information about my developer so I can say ok this is my unit for super resolution I just put the information I want here it's me I can put a URL if I have a publication I can put it there I can put the version, the date, the reference etc but with a full name and the author should be fine ok now we jump into the pre-processing macro that I told you before ok so if you remember the images in order to be a process they have to be converted to 0.1 values so float and if my images are a bit they could be between 0 and 255 so I have to do two things not to forget I put them here so I have to convert them to 32 bit you can type it or you can actually select it here that's very nice I click on convert to 32 bit and it already adds the macro command that does that and then I need to divide the image by 255 to have the values between 0 and 1 ok so that maybe I can simply copy from the presentation or I can go here say run I don't know you see it well quote divide dot dot dot double quotes again command double quotes value equal 255 if you ever played with image macros you know that what we're doing here is just converting the input image to 32 bit and then dividing it to 255 ok so all the values will be between 0 and 255 ok I click on next and then I have post processing macro option I actually do not need to do any post processing to be already my super solution we don't need any post processing but before going to next I'm going to create because I know if I go to next it's going to ask me for a test image and I don't have one so I go back and then I'm going to create one so I downloaded the test images here so I can open one of them let's open on my other screen but you have it here this is the image of high resolution to simulate the low resolution version of it then I need to blur it ok so for that I can reproduce what I did on python basically with a Gaussian blur filter of radius 3 ok so now I get a monthly image if I zoom in you see how it looks like low resolution ok so I think I have everything I want now I can click on next ok and now it says run a test on an image here you would get all the images that are open in image j or fiji at that moment ok I can run the test and see what happens then you get a window with information about the processing and also you see in a separate window how the different patches are going to be processed ok you see the running time the loading the cpu the number of patches created 64 patches and then the memory that is being used so it's not applied in gpu but for an inference time it's very reasonable so you can work with this ok when it is done you get the result you see the result is a little bit we go through the through the pixels you see that are between 0 and 1 or if I plot the histogram you see minimum is 0 maximum 0.93 ok and if you zoom in you see how the result actually looks like the one that we have in colab no nice high resolution a cheap version of it so now I can just click on next now I have to save the model in my folder if I'm not wrong I have to go and save it ok I run the test I just want to make sure I did everything correct ok so if the model folder doesn't exist ok I have to create it so if I go here again I have to go to applications it doesn't show up here so I cannot create it directly I can go to Fiji here in a Mac it's a little bit complicated because you have to go to applications and open the contents but see I don't have a model folder so I have to create it simply with the name models ok that's it ok and then I just click go back to and click on save model if everything was correct you would see how you have all the parts correctly saved even the test image would be saved as the sample image for everybody it is already 7 past 5 just for you ok just finishing ok also I didn't start late ok so we click on finish we are done and now we go let's say ok I'm going to close this image my test input image now doesn't work I'm going to save it I'm going to create another test image let's say I'm going to take number 5 again I have to create it again so I go and process it with the Gaussian blur ok so now this is my low resolution image and now if I go to deep image deep image j run then I don't have them there ok I know what I did wrong I think it's saved inside the model folder I have to create my own folder ok so inside I have to create it with a specific name I'm going to call it sr unit ok and then I'm going to put all the save information that I have there here ok so inside models I have to give it a name ok that's the only part that I forgot so if I go now to plugins deep image j, deep image j run then it does run and it recognizes the model with the name that I set it in the configuration ok so you see here as soon as I selected it gives me the information that I wrote size, everything but the preprocessing I have to select it ok use the preprocessing macro that I set there and for post-processing I don't do anything so if I click ok it's going to be applied to the input image ok the same way as before we get this window telling us about the information on the computation and then the output image gets constructed on the fly ok if I zoom in you will see how the result is actually the pseudo super resolution ok so I'm going to leave it here then you have questions otherwise I really thank you for being here there are some I think interesting questions regarding the reuse of models so so when you want to reuse a model how do you know that you can use it with your data or how can you quantify that is doing properly the processing or you know all this matter about reusing training models so this is a very important question because most of the models ok we get very excited about them and we want to reuse them but then you have to take into account needs to be very similar to them to the data that was used to train the model so either you have access to something like I just did a collab notebook and then you will explain it on your own data or if your data looks similar then you can try to open the model folder open the test image and see how it looks for example in my case I created this one if I have this example image right this is my example image it's a bit and it has this aspect if I if I go and look at for example the histogram I can try to have images with the same type of content and same type of histogram in 8 bit and I would expect then I were to perform more or less similar so there are ways of normalizing your data so it looks like this histogram or otherwise the best approach always would be to to have access to the way the model was trained you can contact the authors of course this collab notebook is all up for you guys you can play with it as much as you want and reuse it and spread the word okay and there was another one so in your example what you did was to start the original image with Gaussian blurring and then you trained the network to provide you the high resolution or super resolution image so how do you make sure that the network didn't just learn how to do the others of the Gaussian blurring actually it is what it is doing which is not that easy but in this case I had to say that I planned first another approach where I actually down sampled the images as well but for the version of DPMJ that wasn't allowed to have an input that is smaller than the output so I had to redo it and play it this way okay in several resolution actually what the networks are doing is learning how to do up sampling and doing this kind of blurring that is inherent to the resolution the low resolution level so yeah this is actually what it should learn I think that's all I can say Ignacio are you available for personal consulting? scientific reasons always you can contact me I'm always happy to collaborate and I guess the same is for all this moderators that we have here okay yeah I think that's all thank you so much I'm sorry about the noisy microphone I hope you enjoy the talk as much as I did thank you Ignacias