 Hello. So how do you really enjoy the presentation for markers? It's very dense. And especially how do you got essentials for different tasks. Now, I'll show with you some deep learning techniques. Okay, but let me say that these techniques are well applied in life sciences. So here are the five ones I will talk to you. I don't have to give you a complete list of techniques, but just some common ones that are widely used, applied for biological health data in our mathematics framework. So please never mind if you don't see the technique that you really expect to see. I'll give you an overview of all of them, trying to explain the concepts when they can be used, but not really go into detail of the methods, which you can see more in the city in the afternoon and more in the future course more advanced we say. The first one, I will show you is the convolutional neural network. Well, in tours, I just allow just to remind you of a concept of neural network, and even just sort it with some, some minutes ago with markers. So you know that neural networks are used to imitate our brain. So, of course, the most obvious natural input to neural network is an image. That's what we see. And then the most popular applications neural networks is in computer vision. That's why, when people talking about deep learning so the, the first thing they tell about is the is about images. So imagine you have an image input. It can be, it can be an X way image, but it can also be something else like a sequence of RNA or genius personal profile patient. The important is that they are all read in the same way but computer. So the, the, the input is encoded into the input is coded into a vector of numbers. Each node neuron here is, for example, the color measurement of each pixel in the image, for example, each node in the next layer is the result from the activation function on some linear quick question on all the know from all the knows from the current layer, which is defined by weight matrix. So you see the deputy you here. And, and so on up to the output white hat. I. And then we would try to compare the white hat, I, the output with the observe value, why, and then define a last function, the, how. Okay. So the activation function can be very simple one the mean square error, or something else like cost entropy function depends on our problem or the activation function we used. Then the goal is to find the parameters W in a list of several weight matrices to minimize the loss by performing weight and descent for bike propagation. Now we have the protocol to perform a different task. But here come several questions. There's a way to work this protocol, can we really minimize the last function. And even if we can, is the minimum good enough. We know that we expect that the output white hat is very close to why, of course we, we don't expect that the loss is zero. I mean I just come back to the question of yellow to markers. So what we do when we have the last function was zero that that that's the thing we don't expect. And when we have loss of zero we have to stop and, and, and, initial, initialize the, the weight matrix, change the activation, etc. I mean, the loss of zero is should not be there. Okay. And another question is that can we obtain the minimum in a reasonable time will take hours to minimize the last function will take a guess to that. So is that's a problem. For example, the encoding of an image into a vector over the big social. He somewhat to track to simplified. So we ignored here the, the spatial dependency between pixels in the image. The image is about the two dimensional matrix. So that that that's the type of reasoning we use when we, what we play puzzles, we try to put together the pieces with related patterns. That's why we, we, we, that's why the encoding into a vector. We're not work on time. And one, one, one more thing with the dimensionality issue. Let's have a look at these, these two images. You should see that you know that it's too much easier to process the first one, the first image with 25 pixels, compared to the second one with three million pixels. The problem comes with the full connectivity of neurons, which means each neuron, it's a neuron in a layer is connected to all the neurons in the previous layer. The problem is there with even not full, but really high connectivity. The problem here is no as the course of the dimensionality. So there are two issues. The first one. The first one is the responsibility of of data. When the dimensionality increases, the volume or space increases so fast. So the available data become become very sparse is very is very problematic for for the sort of sick of significance. If you obtain a similar statistical significance, the amount data should increase exponentially when adding dimensions. For example, you see this in one dimension, if we have 100 points here. So you see a very dense and this dimension. And if just turn into two dimension to the national space. And then the points they are distributed in this rectangle or square or whatever. And even if I put in three dimensional space, and then the 100 points in this group. So it may become much more sparse. The second issues. The second issue is about the closeness of data, the similarity between data points, when the dimensionality increases, the, the, the similarity of data increases. This is also problematic for what for the sorting or classifying data. For example, this, so you don't see a difference. You don't see a difference between the two data points, because I use you are in online course. So what you see is a two dimensional image. Okay, but in an offline course. I mean, with some people, there's some corner of the, in the room that we see something like that. I mean, when we increase the dimensionality will see more this similarity between the divorce. That's, that's where we need a convolutional neural network. So what's that a convolutional neural network CNN, not to confuse with the cable news network that the famous television channel. Okay. This CNN is inspired by the organization of our visual cortex. If I remember, which is something behind our brain. Individual neurons responsive to the, to the stimulus to the stimuli only in the, in a restitution of the visual few. What we, which we call is called the receptive field. And a collection of those receptive fields overlaps to cover the entire visual area. And the convolutional neural network is a deep learning algorithm, we can take an input image. It will assign importance with ways and biases that you know in the neural network to various aspects or objects in the image, and then is able to differentiate one from the other. In fact, each neuron receives connections, only from subset of neurons, but not all. Okay, in a standard neural network you have each neuron in a next layer receive a weight from all those from the previous one. But here, each neuron receives only a subset of neurons. So it will reduce the number of parameters. I mean, the matrix W, we have the same size but we will have a lot of zero values inside. The CNN can capture the, I say the dependencies in space and time between pixels in the image. This space dependency is about the relationship between the nearby pixels for example. For example, this you see the nearby pixel they are all, they are all red or they are all black. The time dependency is about the relationship between different moments of the same pixels. When we have a series of images, for example, in a video. And with this, the network is the CNN can be trained to understand better the sophistication of the image. So, the role of a CNN is to reduce the images into some form, which is easier to process without losing features that can be critical for for getting a good prediction. And how does it really work. CNN. You see an image with pixel matrix. And the idea is to take each square block of pixels as a neuron, but not each pixel as a neuron. Okay. And this. Okay, so this step is about the convolutional layer in a CNN, which is a central to the CNN performs an operation called convolution is in fact a operation that involves the multiplication of a set of ways with the input, much like a traditional neural network. So you see a ways here. A matrix of ways here defined by a future. This future has the same size as the sliding window or the block of pixels that we like to consider. So the values of ways in the future represent a feature that we wanted to take the image. For example, here you see the future. One on that one of us. So, we try to find a feature of a small X of 33 pixels in the image. Okay. And if I. So, I try to take one slide window, apply the future on this side of window. Perform the convolution operation. You see, it just, you take the pair wise product one times one plus all times all plus all one plus etc. So you have three times. One one one here. So the song will be quite set forward. Okay. The next would slide window. On one dimension. There's something. This one is zero. So you have to zero or time something. Now we have three again. And so on. Three zero five. So here you see the in the center of the image you have a small X. That's why we have a very high value in the resulting map. So, so, okay. Now we have a feature map summarizes the presence of the small X pattern in the input, the high value at the center of feature map, the five indicates that the pattern X is likely found the center of the image. The features can be handcrafted, handcrafted such as a small X slash aspect slash patterns, but the innovation of CNN is to learn the features during training. It means it is like learning the way and might exist in the traditional neural network. Okay. The convolutional layers and I already applied to the input there. The input layer is the raw pixel values, but it can also be applied to the, the output of our layers, meaning we can use multiple layers, multiple convolutional layers. These, these layers allow for extracting low, very low to high level features. Low level features like lies, dots, edges, colors, gradient orientation in image, the high level features can be the whole objects or shapes can be BQX for example. This also allows for, for reducing the spatial size or the dimensionality, which would help them to, to, to, to equate the computational power to process data. And the problem with the output feature map is that they are really sensitive to a location of features in the input. So you have the X at the center of the, the, the feature map. You have the X here in the, in the center at the center of the input, and you have the five at the center of feature map. So, it means that the, the, the position of the, the feature is very sensitive. The purpose of this sensitivity is to down sample the feature maps. Okay. And what, what, what, what is it down sample. Okay. The down sampling is the job of the, the next layer, the polling layer. The polling layers are used to, to reduce the size of feature maps in, in a CNN and to compress the information out to a smaller scale. The polling is applied to every feature map and helps to extract broader and more general patterns. There are more robust to small changes in the input. So this one is performed after the convolutional layer and activation and nonlinear activation function for each feature map. And usually we use a pooling of two, two pixels with a straight up to pick so I mean we, we, we slide the block of two pixels. With this, we can reduce the feature map to one quarter of the future map. Okay. The two types of pooling that used that's usually used the mass or average. Usually the mass should perform better than average. Okay, the mass, for example, you have a two to pixels, you take the maximum value here. So it's a four. For the next window you have two. Yeah, this one, five and four. So the same for average to take the average value in each block and attend the feature map, the, the pooled feature map. The, the resulting feature map after pooling is a summarized version of the features detected in the input. So we use some small change in the location of feature in the input detected by the convolutional layer will not affect the location of feature in the future map, but a bit but well it will reduce the impact, the location of local location. So this is something that we call the invariance to local translation. Then, we, after all the layers convolution pooling, etc. We, we, we come up with the falling connected player that's just like a normal neural network. After several layers of convolutional pooling, we, we obtain a number of feature maps. These maps, we will flatten them mean we will put them into a vector. We will put them into a vector of neurons. And then we, we keep going with a normal neural network to the output. Okay, and then we perform the by propagation, quite understand to train our model. Okay, so what time is it. Okay. When CNN can be used. Of course, you see the CNN will develop for images in the two dimensional input, but it can be also adapted to one dimensional or three dimensional data. For one dimensional, the slide window, we slide one dimension for two, we slide on on two dimensions or three, we will slice for for three dimensions. Like in for color image, we have three layers of color, red, green, blue. So it's just similar. Okay, so with this, the CNN, I think it's the most. Well, the most. I will say the most applied to several subjects in live science, life sciences from the different types of data from sequence analysis, the prediction structure with imaging data. And so pretty predicting the function based on the properties based on the, the NMR structure of the molecule. And also we can predict the interaction between two biomolecules and some functional biology. Okay. The next one, I hope I'll talk to that. Okay. And the next. The next one is about recovering neural networks. This technique is developed for quite a number of data times. So let's just go back to the, the traditional networks. These neural network, these ordinary one, they are only means for data points, which are independent of each other. Of course, we have an output Y hat I for each input Xi. Okay. But if we have, if we have some relation. But between the data points. So, the, we have data in a sequence sequence, but just one data point depends on the previous data points we need. So we need to modify the neural network to incorporate dependency between these data points. We need a concept of memory that helps to store the stays information of previous inputs to generate next output of sequence. Okay. For example, I'm, I want to play, I want to practice my, my, my piano. I have a list of songs of songs, piano. If it's sunny, I'm motivated and practicing next song list. If it's rainy, I'm not motivated. I'll play again and practice again the song that I practiced today before. So the output why I here is the song. The weather Xi today and the state of the memory and just memorize the song that I performed that practice the day before. Okay, so that that kind of it means that instead of a normal function why had I on Xi. So we have a function why had I of Xi and a memory state. The previous one. Okay. Um, the, the, the, sorry. On the end, we call it neural network is a special time of neural network adapted for, for time series data or data that involves in the sequences, for example, for text, we have a sequence of words, or video, we have a sequence of images of a time for time series data for you have biological data measure in different time points on heart rate, heart rate on blood pressure, you have several time points for patients, for example for stock price is also time series data. The recording your network has a memory to store the history information to forecast the future values. Okay. And, and, and this can be show. This now when with you see a recurrence on the on the on the cell island. It means that the stage is a function of the self and something else I mean the input from the current input. So we have. The stage at I at this time step I is a function of the input Xi and the state before. Okay, the W of course is the parameter the way matrices. You know, on and we use the same function, the, the sigmoid the hyperboleic tension value. And the parameters at every time step so that we have three parameters. The W X is for to switch between X to to age. The W H, the ways for between different states of age. And then w y is between the why and age and why. Okay, so we have three men parameters. Yeah. Please. We have several times of our name with with varying architectures. We have one to one even one out one input will produce one output. We have one too many one input can produce many outputs. For example, when we generate music. We have many to one. For example, we have several elements, and we have one emotion as output. We have many to many. For example, when we try to translate English to friends, for example. Sorry. On and we have an notion of back propagation full time. So, similarly to to a new a normal neural network we calculate the way in the last function as a function of the each parameter. So here we have, we calculated last function as a function of for why age, etc. And also, we have one week. When we try to take the derivative of derivative of a two. In fact, we have a chain of the derivatives. So it depends on the on several layers of several states of the memory. Okay. So in fact, the one we try to compute equations. Regarding the w eight. We have repeated computation of equations for different for for from from the stage, hi to the first one. So, we take the derivative derivative derivative will have a multi multi factor of w h in in the question. Here come, here comes a problem. If we have many values, higher than one. Okay. You see, the, if, for example, here you see 100 values of 100 values. 100 values of one one one is very close to one, but the value, the product of this is almost 100,000. Okay, that's we call this blow is floating quite in. So then the point and quite and will quickly which in fact infinity. And what we can do is that we do perform the quite and clipping to scale large questions. And also the second problem is the vanishing quite in. For example, some value very small smaller than one by very close to one. So we have, for example, zero, zero nine up to the 100 is almost almost zero. I mean, the question is very slow to zero, and we cannot move in the question descent. I mean, we will just say in the same same place. So it is very difficult to learn the long parents and dependencies. So with this, we can try to adapt with the activation function, we can use the weight in it means we use the way of of identity matrix, or a, an orthogonal matrix. Okay. And also, there's one way to do that is with the net was structure. Next, okay. We try to adapt to network structure. For example, please have a look at this, this text. So, as I said, I grew up in France. And after 2000 Wars of the story of my whole life. I come up with a phrase I speak fluently, and I try to predict the next world. So, of course, that the world will be put it as French with very high probability. So, but the French the war France is so far away. It means that the on end need to keep the information for 2000 states. So that's a problem. There's a variant of our name is the last time the long short term memory network. It can track information throughout many steps. Okay. And it have a memory that can memorize very long time dependencies. So then we have a memory cell. It is to store information where free gaze, each gate is a neural network where forgets gate. It will decide which information to ignore input gate. So it will decide which values from input update the memory and output question. The network output based on the memory and input. And the gate. The network is in the one. Of course, the activation of wait a song. Yeah, sorry, so there are several questions there but I'm pretty sure I can take and you just summarize on the questions a bit, please, because I'm switching between several windows to CD. Um, for an end the application is a cross for for for tax for language translation for speech recognition, etc. In life sciences, when we have one of sequence, we'll have sequence of for example the in the sequence we have the motive. Of course, we have a kind of dependency. We have a sequential dependency. The SDM can be can can be also was applied to for for medical ontologies for interaction between protein proteins, and also for evolution, where we have the sequential data. Okay. And then the next one is about auto encoders. The auto encoders are kind of new networks that learn to efficiently compress encode data to compress encode data and then learn to reconstruct data back from the reduced encoded representation. It means it will encode this input in code into a lower dimensional representation, and then was that. And then try to reproduce to the decode means to try to reproduce, rebuild the output from the this has a compressed representation. So it means it, the auto encoder will learn to to build the function identity itself. Okay. So with this one, the last function, of course, the very simple one is the difference between the means where our between the output and input we are trying to reproduce the same thing. Okay, of course, we lost information when we reduce the dimensional or the nationality and then we will. The auto encoders can be can be seen as a generalization of PCA. And perform better than PCA in case that we have none linear data because it is a neural network that we have activation function and non-linear layer on this. So it can work better than PCA, but not all time if we have for linear data. I mean the data can be linearly separated the PCA should perform better. The challenge is here is the models can is what the models can can can learn a meaningful and and generic space representation in the encoded. Is it generic enough to to we we produce the. The input. The, the, the, it depends on the regularity of later space. Three factors and manufacturers distribution of given data, the dimension of a later space or the architecture of all the encoders. Okay, so for. So there we have, we have several challenges. Sometimes for several times outer encoders will not work that well. We need we usually we use auto encoders in combination with another type of network for CNN for example to to to to obtain a good classification or prediction, whatever. Okay. So this variation auto encoder is a variant of auto encoder. The, the challenge auto encoder is that we don't have enough the the regularity of a later space. It means that the later space is to it, it, it, it calls perfectly or perfectly just it calls too, too strictly the input. So there's no, no room for for for kind of a random factor. So there we need a variational auto encoder is the same as auto encoder except for this one. This build based on a distribution. So we have input. We calculate the mean, the standard deviation, and then we build this one is not directly obtained from input, but is for randomly from the distribution of the mean standard deviation. Okay. And then we rebuild the input in the same as the auto encoder. Because they usually used when I tried to compress the images went through to the noise, because we didn't always mean we reduce the complexity reduce the dimensionality in the data so we can use to to clean the images can use to detect the anomaly in images as well. And we can use it to extract features to do to perform on the test behind we can use to extract feature and then perform for example, a random forest behind or whatever just a simple motion learning algorithm in life sciences, it can be used for or reducing the dimensionality to close the sequencing data, or, or, and in the same in the same direction can you can be used to integrate multi homage and and medical data. Okay. I have one. Yes. Yes, some questions we like to answer them now or is it at the end. Okay, I see I can answer now. So the last one here is there an example of combination of CNN and R&M. Um, so is that a combination or yes. A combination of you see in the LSTM, for example, in LSTM there are several neural networks. Each gate is a neural network is a neural itself. So, usually people use people use the gate, but people use CNN as the gate in LSTM. So that that's, I cannot tell you which one but it's why used in that way, the new CNN as to gaze in the LSTM. Somebody wants to know what is the difference between reduction dimensionality reduction and auto encoders. Um, yes. Dimension reduction auto encoders is kind of for dimension reduction. We try to compress our data we try to reduce our dimension. So this technique of dimensional dimensionality reduction. It can be different between PCA for example a very. The standard meta for reducing the dimension is that this is a neural network we have a nonlinear layer so it can perform better than a normal PCA for nonlinear data for example, in the example that Marcus show you before on the, on the house a, the, the swirl, swirl, data like that, the PCA wouldn't work, but the auto encoder can can can be helpful. Maybe last question there was one, how to use CNN's for sequence analysis, didn't understand the question, what hidden layers will do actually. I think the CNN, I think the sequence analysis with CNN, they will not, they will not encode the sequence in a two dimensional, they can, but usually they use one D one dimensional CNN. For example, they use a block of pixel blocks of, for example, we have a TTC, etc. And the block we have a sliding window of five characters and the perform the CNN. And for, for example, for Chinese pressure. I have seen both. They can use, for example, 10,000 teams at one factor. And also they can use, they can, they can put the 10,000 teams at as an image. They can use one hardware one hardware as an image and then they train a two dimensional CNN. The last one is a complex confession again, we have certain that just stands for in the ways. So I don't really get question. You can, can you just turn on your, your, the microphone. Yeah. I was wondering, you have this compression game. The size of your image is reduced when you have encoded it. But you need the network to decode it. This basically means that some of the information you have. You have saved is inside the network itself. So is it, is it again there is just a transfer information where you reduce the size of the image but you need the network to decode it. We, I mean, I mean we, we, we, we gain in compression. We gain in compression here we can see as a, like a latent variable, and we get a compression we get the dimensionality, but we lose on the, we lose on the preciseness of the procedure, I say. The information of course they, they, they are transferred to the ways, except that it cannot be interpreted easily. I'm going to show you, I'm clear. Yeah, okay. She should make it fast. So this is technique. This is the end. I just come back to the question from. Sorry, I don't know. I remember who a question for markers about generating sequencing data. This technique is one that is well applied for generating sequencing data, especially in single cell, because we don't have, I mean, we don't have quite. And since sequencing, normal sequencing, the single cell sequencing, we have less data. So this method is about trying to try to model it allows for an augmentation of data mean it will create new data. So we have two types of two types of two men models inside the generator, and then this discriminator, the generator is to try to try to reach the samples. I mean, it will learn the distribution of from the input. And I will try to understand. We have the input. Okay. So the generator model, the generator model will learn some some features some distribution for random input, and they try to produce a terrific sample, I mean a fake sample. And then the generate sample and a real sample that we when we have in experiments. So given to a discriminator model, and this one, this model will perform a binary classification to distinguish the real and the fake samples. And once we they do that they will send a feedback to the model to the two models to radar and discriminator to update the models. So that's the this. This network. Okay. So, so here I put the in for for biology for biological data usually they are used to generate the sequencing single cell sequencing data. And also for to to predict them for for for the design of the molecules and protein samples. Okay, this techniques techniques is used when we, we try to increase our input data because we, we know that I didn't remember if my first measure that about limitations of deep learning. So what one is that we need massive data for for for for good for acceptable performance deep learning. So, but we don't have it all the time we and some, sometimes we need to I mean synthetic data to do a job. Okay. And then this, the last one is about deep reinforcement learning. Just this one is that we might, I think markers presented before so we're former enforcement learning is about learning to make decisions for trial and error with the goal is to act to perform some action. Okay, so we have an agent. It can be to be us in the world, it can be the Super Mario if you know the game. The environment is the is where the, the, the agent leaves. For example, us, the environment is we are exposed to the laws of physics we are we are exposed to the rules of the society. And which we perform action, and this environment will send back a feedback. I mean, it can be a reward can be a penalty penalty. The action is this action. For example, the Super Mario the action can be you go, go, go left and why or just jump. For example, and the policy is about the strategy for of the agent to perform actions to obtain some, some return. For the, let's see. Okay, so we here we have the agent as a state. The situation, the current state, the current situation in which Asian myself, the reward. So once we the agent performs an action that will be you can be several or it can be success or can be a failure. And there's a reward penalty on that so this or we measure this reward. Okay, and the value. The Q value is a function of the state and the action is the expected reward. When we know the current state and the action. There's two types of rewards, we can have the total reward for the for the time t means that we work, the rewards that we gain the time t the t plus one until the end until infinity. And miscount the total reward, it means that we, we have a factor miscount factor here. And we are trying to do that wait. There we were in the future. I mean, if we, we do something it will gain immediately so that we were should be higher than the reward, we gain the next year, of course, because we know that the next year we, we learn more we know more and the reward should be lower. We are more experienced of this and no points that we were in two years, the same as it was today. That that's about we reinforce money learning. Okay, so how about deep reinforcement learning. It's about combination of deep learning and reinforcement learning. When we have spaces of a very large space of states and actions, we will have quite a number of states and quite a number of actions. So we for the reinforcement learning. I can do it will have will have a table will be a table of action stage stage and the values. So if we have 10, well, sorry, 10,000 pairs of state action is okay but but we have 10 billions of pairs like that. It will take time and computing resources to solve the problem. And deep to deep learning here is about neural network. The job of neural network is to map states and actions to kill values. Of course, so you see in the two. There was here, the neural network and take the state action. I mean the agent is this algorithm neural network and produce the value or also the neural network and take only a space and produce all the values on the possible values for all the possible actions. Okay. And it means that in deep reinforcement reinforcement learning we use a neural networks to learn the cool value. And then we try to produce the optimal policy on a stage for example, the traders they, they try to learn from the behavior of the stock market and then try to produce the policy what we do with a buy the cell they hold the stock there the securities when according to the situation. Okay. This reinforcement learning deep in a deep manner. Oh, well, I will, I will say, it can be, it can be helpful for reinforcement learning, but the neural network, not the same, not all the time they can be better than just a simple SVM or decision free method, but it really depends on data and depends on the and the amount of data and the quality of data as well. Yeah, so so some this also applied in different topics in in bar from marriage in biology with. So they can use to predict the genomes of bacteria or predict the interaction of potions potions using I mean kind of we were to what kind of interaction will we will give again or loss on some criteria. And also in and usually you is used in brain machine interface. So I think I think so. Thank you for your attention. And if you have