 And we go a little bit further in the Processing analysis interpretation and as Mathieu already mentioned also there's currently this kind of really wave Which is initiated by companies like Google and so on where a huge amount of data is available And then actually it seems that techniques that have been developed years and years ago like neural networks and things like that It's like once you have both enough data and enough computing power It seems like you can get really really nice results And so this whole machine learning business is starting to become extremely popular in such a way that even also as Mathieu Mentioned already everything goes into the direction that you need to do machine learning and that the understanding is not always the most important anymore and I'm really happy to Introduce Gemma Piella who is a professor here in the university and she's working on these techniques And luckily she's also working in the more smart way is trying to see what is really the value of these tools Compared to just trying to use it everywhere. Please Gemma. Hello. So I'm going to talk about machine learning Some basic concepts and then at the end of the talk I will give some cardiac examples on how we can apply it so Machine learnings aims at extracting knowledge directly from data By learning from experience This is very useful in medical application because We deal with tasks and data which are complex and high-dimensionality so the typical Human-generated rule-based heuristics usually are not appropriate With machine learning we learn from futures and So that we are able to We are able to extract knowledge Knowledge discovery and that I mining and also to make predictions about new that and this Is very useful for example for computed either diagnosis or to support clinical decisions so the terminology of machine learning was already used in the 50s But perhaps one of the most standard definition is the one given by Tom Mitchell at the end of the 90s Which says that machine learning concerns algorithms that automatically improve their performance at some time through experience, so for example Learning to play chess the test could be to play chess the learning experience could be the opportunity to To gain practice by playing against oneself and the performance measure could be the percentage of playing that The program was against openness So a learning task is well defined by these triplet tasks perks and performance. So let's let's see a little bit Each of them about the task. Well, it's important to say that the learning It's not the task itself and also but it's the Let's say it's a tool to to attain this task Also, it's important to to understand that we are not going to program the computer to do the task But to learn to do the task so there are examples of tasks perhaps one of the most typical task is classification where we have a fixed number of levels or categories and the Program is asked to assign to each of the inputs one of these For example, here we have some samples which represent tumors Which can be benign in blue or malignant in red and we see that just by looking at Their size and the age of the passion we can distinguish them Of course, this toy example and in reality is much more complicated, but it illustrates what classification is Another popular task is progression Which tries to predict an output continues up given some input for example? if we have Here we we are giving Yeah, here we are giving some sample Which we have the body math index and the corresponding Cholesterol no so what we would like to do is given a new patient with if we know which is his body math index we could predict the cholesterol and Very simple way to do it. It's just to try to fit a straight line Which fits the samples perhaps these are the two tasks more typical in Machine learning, but there are many other like for example Try into the tech associations or try into the tech anomalies or clustering the noises So the experience has to do with the data that we have and the previous knowledge Basically we can distinguish between supervised and supervised learning algorithms depending on which Experience the the algorithm is allowed to have during the learning so in a supervised in a supervised learning We have some I We are giving we are given some some input features together with the corresponding levels and We determine which is the model function which maps these Features to the level and then what we are going to and once We have learned this this model. We are going to use it to map new features to To the output This is in contrast with the unsupervised learning approach Where we do not have any specific prior knowledge So the algorithm has to let's say explore the data and try to fit some kind of a structure And this can be done for example by clustering the data based on the Relationships among the variables and Finally, well just to mention about this that Many times with the task are Typically supervised tasks are for example classification and regression Typical unsupervised learning that is for example clustering and then we have the performance Which usually is specific to the task for example you want to classify Possible performance method to do is the classification error rate or the accuracy What it is important to notice is that we want the learning algorithm to perform well on previously unseen So this this ability to perform well on on new samples is what is called generalization So we are going to split our data set into we are going to have a training set Which we are going to use To learn the model and then we are going to have a test set that we are going to use to evaluate our model but Okay, and then Of course during the training set What we are going to say we can compute an error method Which is called the training error and we are going what we are going to do is to fit the model So that this training error is minimal. So this is a kind of optimization problem But what is different from just an optimization problem is that we want also this Test error or generalization error to be a small but of course in practice We do not have Well, we do not always have the The means to evaluate this test error Either because we don't have we don't know about which are the new samples We are going to have or because we do not have the ground truth. So What it's done is We split this training set into one which is going to use for the training of for the learning of the model and the other one which is going to use to test the model during the training phase and This validation the set which is called the validation set is going to provide us with some kind of estimation of the the test error on how it generally generalized on new on an independent Dataset and it also will Help to prevent problems like overfitting which I will talk afterwards Also, it can be used to tune some of the parameters of our model For example, if we are using if we want to use a regression model We can use the training set to tune the weight to fit the weight of the regression model And the validation set to the site if we are going to use a linear or a quadratic model or for example, if we want to use a Neural network We can use the training set to tune the weight to find the weights of the neurons and the validation set to decide about how many layers these This convolution this neural network should have Okay, but this is splitting into the training set and the validation set has a disadvantage And it's that of course now. I'm I'm training on a data set which is smaller and this could yell to a model which does not perform so well and also The fact that the that the error can vary a lot depending on how I do this this splitting So in fact in practice what is done is what is called cross-validation which consists in Partitioning the data into complementary sets So that each time I use one part of this data as a training set and the other as a Validation set for example in a tenfold cropped I would do ten iteration In the first iteration I would take the first ten percent as the validation set and the remaining ninety as the For the training on the second iteration the next ten percent For validation and the remaining ninety for the training and so on up to ten and then once I have all these Interation I could compute the validation error as the average of all ten Validation Okay, so we have seen now how to define the learning task the task experience and the performance measure and Another important issue is about the data that we are going to work with the how which is the data and also how to represent it and Just to give some examples for a target application We could work for example with images either directly with pixel intensities, but also looking for more structure like texture We could work with shape either with global parameters like volume or more local like the regional thickness or curvature or even more complicated like measures or fibers and Of course, we can also work with all that type of functions. No in the case of cardiac We could work with motion velocity deformations and Also, I didn't put it here other clinical data like for example The age of the passion the sex if it's a smoker or non-smoker all this kind of clinical Not only is important to know With which data we are going to work with but also it's very important To know how I'm going to represent this data Because the performance of most of Machine learning algorithms depends a lot on how this data is represented. This is just a toy example Here I here we have the the same the same data just is represented in two different ways on the left We have the data representing Cartesian coordinates and on the right on polar coordinates And if the task is to separate them by a straight line I cannot do it with the representation of on the left while it's very easy with the representation of on the right so basically When I have to decide how to represent my data I have two basic approaches Either I tell the computer what to look for and this is what is called hand design features representation or I Let the computer I learned I let the program to learn about which is the best representation And this is what is called representation learning in other words Apply machine learning to find the best representation for a given task And then also another important concept is the concept of death the possibility to use several layers either in the representation or also in the learning To go from simple to complex and this is what also it's related to what is called deep learning So in this next slide, I summarize a little bit. So here on on the left you have What would be a non machine learning approach where? Given an input, I told the computer program what to look for and what on how to do the Then I could have a machine learning Where I keep the I give the computer the features What to look at? But I let the program to learn about how to do the task and Finally, if I let the computer to learn also about how to represent the future, I would have the representation So now the learning it's learning features are learning also how to do the task and if I put several layers here so I can construct more abstract and complex Presentation and task then I would have what it is called deep learning. Okay, so so Let's look now at what are the basic steps when trying to use the machine learning Once we have seen it's to define the learning task and in particular the learning experience No, so the data that we are going to work with another is To To make some some choices about the implementation specifically We have to show to select what model we want to use for example if it's a regression model or a neural network We want also to choose the learning algorithm and these amounts to choosing Which is the cost function which I'm going to minimize With optimization possible. I'm going to use Then there is the training algorithm, which basically consists in Minimize this I'm going to fit this model to the training data by minimizing this cost And once I have learned the model I'm going to test it So for example if Let's assume that we want to predict cholesterol from the body mass index and we are given some some samples So I can decide that I want to use a linear regression model so This I'm going to predict the cholesterol given my body mass index of the passion and What I have to do is just to estimate these parameters that are on data one or it's the intercept of the line and and the slope and We can do it by minimizing the for example the mean square error of The training samples know I compute the mean square error of the predicted Cholesterol with my grant and in order to minimize this cost function. I could use for example Okay, so these are the basically the main steps that you have to take into account when building a machine learning algorithm And of course there are many issues Which also have to take into account So we have seen that in order to define the learning task, of course, we have to understand which is our program and Which features I'm going to work with and how I'm going to represent them Then I have to to choose the model And the learning algorithm the question is which method Well, there are no single method No single method that works for everything But it's very useful before trying anything to observe which is my data Which is the data that I have and sometimes just doing simple statistics can provide us with With a hint of what is happening. So by computing the average and the variability it can give us a lot of information And then of course We have to think gradually and seeing if I need to go for a non-linear approach or not if I need to go for Deep architecture or not And then about the training and testing there are also a lot of decisions A lot of issues, but we have to take care of like for example is overfitting or if I have enough data To learn an efficient model Now what I'm so we already talk about this about the features In the next slide, I will talk a little bit about what it's overfitting and Afterwards I will talk about Some basic methods you can interrupt me if you happen it out there And finally what it's also of course very important. It's once we have our results Let's try to interpret them and how we can use them for our purpose Okay, so what is overfitting? Well overfitting occurs when I have my model Which is Extremely fit to my training data So that it's unable to generalize so I can have a very very small training error But the testing because it's unable to generalize. I will have a large On Well on the other extreme would be the underfitting Where my model is too general So it doesn't fit well to my training error to my training data and This means that my training error is going to be to be also high Of course what we would like to do is to arrive to a best value So having a small training error and also having a small testing error One way to to control the overfitting is Through the complexity or the capacity of the model When my model is too complex to the problem that I have at hand then usually it overfits it Extremely fits to my data and then the Although I have a very small training error my generalization error is too large And on the other hand if my model It's too simple then it will not fit my training data and I will have also a large So I have to try to Work to choose the model With the right complexity to my task and one way to do that is by reducing the number of input Also another way to do to do with is by regularization We keep the features, but we make them to be a small and let's talk now about a method and perhaps one of them techniques which is Mostly used in machine learning is dimensionality reduction the dimensionality reduction It's Consisting finding a new representation, which is low-dimensional and which allow a more efficient processing of that For example in here It's just a toy example, but we have a Swiss roll It is three-dimensional and we can simplify this by just folding it and putting it to the Of course in practice. We are interested in Problems when this be very very large and then I can find Compact representation or simplified representation where D is much more is more than the original Dimensionality and there are many ways There are many methods to perform dimensionality reduction Perhaps one of the simplest and most popular is principal component analysis principal component analysis is linear orthogonal transform Which transform my input data onto a new space of coordinates so that the The direction with the greatest variance comes to lie on the first coordinate The second direction with the greatest variance comes to lie on the second direction and so on so it's a supervised learning technique and because it's a way of representing the data it could be also considered as a representation learning technique From the mathematical point of view, it's very easy to compute these Principal mode these principal lines of variation We just have to center the data and then compute the covariance matrix and By diagonalizing it we could compute the h and vectors which are in fact the the directions of main variation Because it's a way of representing the data that best explains the variance perhaps one of the most popular variability analysis method and If I just keep the first few dimensions Then I'm simplifying my my original data and I'm doing a kind of dimensionality reduction another method it's linear discriminant analysis, which is also In some sense is similar to principal component analysis, but here we explicitly look for the directions that Maximize the separation between the different classes. So it's a supervised learning algorithm In order to look for these directions Mathematically from the mathematical point of view we follow a similar approach than before So we just center the data and then what we want to do is to maximize the Intervariance The Sorry, we want to maximize the interclass Variability while minimizing their interclass variability. So we are going to compute a Scatter matrix, which is the ratio between the interclass and the interclass and In order to maximize these scatter matrix, it's amounts to Generize agent value problem. So I should diagonalize Matrix and the h and vectors will give me the directions which better separate the classes Okay, so principal component analysis and linear discriminant analysis are Two examples of linear But many times we have our data In Input spaces that are not linear and here The if we process this data with linear approaches They will happen strange things So we have to look for other kind of approach so what we can do is take this input space and This input data and transform it to another space Where the data is linear and then in this new space I can apply my linear technique for example principal component analysis Now the nice thing about this is that in order to compute the variance in this new space. I just I can do it by Computing the kernel affinity matrix which can be obtained directly from the input data So I do not need to to explicitly look for this Transformation from my input from my original input space to my this other new space and this simplified things a lot so in practice again We have these three steps of censoring the data then computing the kernel affinity matrix which in some senses computing the covariance matrix and Then by diagonalizing it I can obtain my principal components another way of Doing dimensionality reduction is the manifold learning Which allow also which is also nonlinear First of all what is a manifold a manifold. It's just a topological space which Locally looks Euclidean although it can have a very different global structure. For example The earth is globally as far as fear, but for us it looks Flat because we In comparison with the earth are very small So the the idea about manifold learning is that the data which is in a very high dimensional space Lies in fact or is very close to a low dimensional Manifold and that we can learn the geometry of this manifold and how we are going to do that So we are going to do so given my my have some few samples of this manifold. I'm I'm I'm going to construct a Neighborhood graph this neighborhood graph is an approximation of my model and It turns out that the HM functions of the manifold encode the topological and geometrical information of the manifold. So by computing the HM vectors of the graph I Can obtain information about my manifold and this is what is called the spectral decomposition So for example here we have a database of faces which Although original originally these these are of size 64 by 64 So the dimension the original dimension would be if I put all the pixels in a vector would be 64 by 64 but in fact if I come to think to it I can try To represent this data in a much simpler way for example, they are not so many degrees of freedom Basically Some of the degrees of freedom there is the both if I'm looking to the right or to the left and if I'm looking up or down and also a possible another Degree of freedom would be the illumination. So here what We have done is to apply a manifold technique to represent this data According to the they are looking at the left or right and if they are looking top or down So they would have a position in the space But we can see is that these this type of data is not They are not on a Euclidean space not even a a Vector space so we cannot take two objects of this manifold and then simply averaging them because it could yell to strange things like that Something that it's not on my manifold So I have to learn about which is the manifold and what is more important I have to learn how to operate the operations I have to define metrics on this manifold to be able to to operate. This is somewhat related to what Matthew was Talking before about doing a statistics with large deformation Okay, so These This is about manifold learning techniques and the nice thing about all these is that there are many different dimensionality reduction techniques, but most of them can be represented in a unique framework called graph embedding and Basically what these frameworks say is that we have to compute a similarity matrix with encode certain statistical and Geometrical properties of the data and this lead and this leads to a cost function The minimization of which can be solved to a generalized agent value problem So this Is very nice because we have a unique framework which can be used for a lot of different dimensionality reduction techniques and Well, the philosophy behind But mainly is that we are going to map nearby samples in my input space to nearby samples in the output space But of course here the tricky thing is how we define this nearby point The distance once again the importance to define the appropriate distance metrics so basically What I have is a cost function that a similarity term which I have to see to minimize and then Typically, we are going to add some regularization to be able to constrain my Results to function which are well behaved. So for example, that is This is just an example With Regression where here we would be computing which is the similarity between the predicted and ground truth and Then we will impose some regularity in order to obtain functions which are Okay, so there are many many other methods in machine learning I'm not going to go into details but You can find a lot of tutorials on the web and also What it's important is that there are a lot of available toolkits and free software Ready to use one of them is this is the key to learn which is a free software in Python and Which allows you to do a lot of preprocessing and applying a lot of manifold learning and machine learning and algorithms also For the optimization part It's also important to be able to choose Which kind of optimizer we are going to use One possibility is this The the library given cdx by by matter, but there are many there are many other Okay, so another important Issue is a feature combination We have seen before some example of Input features that we could use but a lot of times We have that we do not only use one type of feature, but we have a lot of Possible features from which we can learn and the question is how to combine them. Okay, so Usually that the more standard approach although it's very simple. It's just let's put them all together and maybe we Kind of normalize them so that they have the same variance or the same range Just we concatenate them Input vector and we give this to our learning algorithm. This is one possibility Another but the well, okay, the bad thing about this is that it tools Do not contemplate the possibility that these features might be really of very very different natural and with very very different distributions so another option which Give us a little bit of more of flexibility About this is to use care I'm more specifically multiple kind of learning in multiple kind of learning. What do you do is? You have different features you want to combine and you are going to assign to each of these features again So it's currently could be measuring the similarity of each feature and then we are going to combine these Skirmann and the algorithm is going to find both with the optimal let's say projection to my output state and also the optimal way of Each of these features to the final contribution so originally the multiple kind of learning was thought for supervised learning Algorithms, so basically classification, but there is also the supervised learning Formulation which allows for example to study the variability analysis Let's learn I will I will talk a little bit about about this work one possibility to Combine features could be multiple kind of learning another Issue which is in fact closely relate to feature combination is feature selection so we have to ask ourselves if in fact Any of the features are useful for what I want to go or if all the variables are useful Maybe some of them are not useful at all So that's what is about feature selection about excluding irrevolant And there are many methods to do that one possibility is subset selection Where I try to select which is the best Subset of among all my possible Features for my land task another possibility is Regularization so I can keep All my features, but I impose some I make them a small and the contribution so basically the most Useful regularization is to use the L2 norm which is called so which regression or the L2 one norm and then it's called the LASA Another way of doing feature selection of course is about using dimensionality reduction where I only going to keep the main the main features given by the dimensionality reduction Okay, so Then another issue is about feature learning. I already comment on that that is representation learning so Learning which is the best representation for my task And one possible way of doing it is through deep learning Let's assume that we want to Recognize object in an image. This might seem a very complex task So what I can do is to concatenate different layer of simple recognition Recognition task so I can start here at the first layer by From directly from the pixels trying to compute my edges This can be done in by Pixel difference Once I have my my edges Then I can use these edges to construct contours and borders so now in my second layer I have already my my contours and And Once I have my contours I can try to construct from them parts of objects And from these parts of objects at the same time. I can try to recognize The full of it like for example a car Person or whatever so this idea of going from simple to complex by just Putting one layer after the other is what is known as as deep learning and Well, it's true that nowadays there is a lot of research going on on these and applying deep learning to a lot of different applications Okay, so finally I'm going to give some some examples of Machine learning applied to to cardiac application One of them is this Which was led by Nicholas de Chateau and This work was done here at UPF and here the objective was to learn the representation of a pathological motion and Be able also to compute distance among We we assume that we Can learn the the manifold using a manifold an internet So the idea was that This pathological patterns described Can be seen as a deviation from normality so every passion was represented by a abnormality a normality motion, which are these kind of things and So once we have learned which is the manifold which embed This pathological motion given a new patient we could project this passion to the manifold and compute the distance to normality along the manner and Well, this was applied in the context of cardiac synchronization therapy by looking at some typical abnormal motion, which is called sector flash This is another work where the objective is to be able to classify between In fact and control and they do that based on two type of features A shape a motion the shape is given by the regional signals and in the Astole and in the Astole and the motion is Represented by a polyafine transfer of Well of the motion along along the cycle What they do is They try It's interesting because they try several classification techniques to this To this input representation So they try which was the classification error rate when using only the Motion information when using only the shape information and when using both of them either without normalization and with normalization And also what this this I think this work is interesting because also what they compare is What would happen if we would do a preprocessing state? So we do the missionality reduction Previous to apply in this polyafine transform. We are we apply principal component analysis or principal laser squares As a preprocessing state and then we apply again these to polyafine and escape the scriptor and What's interesting to see if that just by applying this preprocessing the all the Almost all the classification algorithms get better So the role of preprocessing could be very important. This is another work with where the objective is to the noise so They they start with several input image along the cycle and they assume They are noisy They are noisy samples Which are embedded in a manifold which is parameterized by motion. So in fact along We have these each of these samples would be one of these and along the And they are situated One of these positions depending on the Which moment on the cardiac pickle they are taken so from this noisy image and together with the electrocardiogram signal they estimate which is the manifold on which the Noise image would be and then they project back to the original input space And they obtain the the noise input image and here this is another example This work is by Sergio Sanchez, which is a PhD student in our group and Here what we want to Do is to study meokardial motion patterns to see if the joint analysis of different features can provide us with an insight into the pathology And what we do is we combine different velocity traces at rest and stress Together with some temporal information and we do this combination through multiple color learning supervised Forma Forma approach So that we compute the projection to the optimal space to the output space having into account the philosophy that I told you before that nearby input nearby Input samples in the input space should remain Close to nearby outputs in the output space and we find also Which is the best combination of the different features for this once we are in the output space we look at the main direction of Variation and because we are very invested in to study the the variability analysis we map back to the input space to see the to the variability and As I told you what we hope is that this join analysis Provided with some Insights into what is happening with these types. We apply it to health passions with health values With preserve ejection fraction. This is another Application which doesn't appear in media and here the objective is to segment the left And they do that first by applying Convolutional networks to detect where the left ventricle is once they have detected they apply auto encoders to obtain a rough segmentation Role of approximation of the left ventricle shape and then the they use this rough estimation to initialize The formable model finally this is another estimation which also appears media which applies both unsupervised and supervised learning approach To estimate the vivin tricolor volume So first they apply and supervised representation learning to learn which are the best feature to estimate the volume and Then they use these features together with the supervised regression forest in order to make the prediction about the Volume the vivin tricolor Okay, so there are Many other examples just I would like to draw the attention to some online tools that you can find Not only tutorials, but also software This is a conclusion for saying that of course machine learning is being used in all medical applications and Basically because it offers the flexibility to adapt to the data so to be able to learn from the data without being explicitly Program to say at what to look for or what to do and then that When building a machine learning algorithm, there are many many issues that we have to take into account perhaps One of the most important are the ones that I put here so you're finding the learning path especially what is my training experience then Which methods I'm going to use so which model? the cost functional and optimization president and There is no single method. It works best for all problems And then there are many issues related to evaluating and training the data Which has to do with if I have enough samples to learn The the model if I'm overfitting my mother Okay, so that's the end of my presentation I would like to thank some of my collaborators in In what in the work we are doing about machine learning You have any questions Thank you very much For this comprehensive overview As you partially said already is like a lot of the problems or a lot of the Features in a way in machine learning is that in the end you get extreme good performance I think it's probably one of the most Performance approaches in order to do a specific task, but very often you don't know why it's so performant So the question is a little bit like how can we In the future also like ensure like of course obviously when you need to do a task Then it's a good way of doing but when you want to do research You want to understand something sometimes you want to know what is the underlying mechanism which is being used Which are the features is being used? So how can we try to improve that? It's true that a lot of times These type of algorithms are let's say apply blim without understanding there More and more there are being As some researchers that are going into why this is working or not For example, one of them is the stepham a lot Which try to give some explanation of the deep learning by a specific example, which is the scattering transform and He tries to give a mathematical Justification of why these could happen, but it is a very very very specific thing and Yes, I think this is really an open an open field of research and that Most people apply it without having Really knowledge of what is going on There and there are some methods that are more Propans to that than other not like for example in multiple canal and you have a little bit more of control of which are the Weights we are putting on which are the kernels that you are using to to define the similarity So you are more in control of what is going on, but for example in Sometimes in a lot of neural networks, you really don't understand what is going on behind behind that Any other questions? Everybody's hungry So good then it's lunchtime now