 Okay, good afternoon everyone, I hope you can all hear me. You are welcome to this section in deep learning with Keras and Tessoflu in R. And I believe this is one of the last sections of the U.S. R. 2020 tutorial that we are actually reacting. And with all this, Dr. Sherin joining us from Germany. He's a data scientist with CodeSentric and also co-organizer of Munchar, our user group. So we'll be starting 1pm dots. So just as we're sending the mail, for as many of us that are having questions, let's do well to go to Github. So you share your question on Github. And once it's time for Q&A, we're going to show the question open to the house. So please let's ensure we all move to a mic. And if you have any questions, please feel free to go to Github, share your question. And you can also use the chat section as well too. So we're going to have, you know, a productive section together. Yeah, thanks for your time. So once it's two o'clock, if I sit or we take the ground and we start the training. Thank you. Good morning. Okay. Good afternoon, everyone. My name is Foulajima Ali from Lagos R User Group. Welcome to this section on deep learning with carers and deaths of flu in R. And with us is Dr. Sherin from Germany. And she's a data scientist with CodeSentric. She's also a co-organizer of our Mocha R User Group. So she will be taking us through this section. Please do it to mute your mic. And as to what's taken, if you have any questions, do it also use Github. Or you can as well share your questions on the chat. So we look forward to a productive section together. Enjoy the section. Thanks for your time. Over to you, Sherin. Yes, thank you. And welcome everyone to the workshop. I am very happy to have so many people interested in the workshop. It's actually quite more than I expected would actually show up. So I'm very happy about that. Yeah, so let's get started as there is not that much time, I guess. So this is the short overview what we'll be trying to cover in these two hours. So I will talk a little bit about me not too long. Then there will be a short theoretical introduction to a neural networks and deep learning with Keras and TensorFlow. And the way I want to handle this is that I will try to keep the explanations pretty brief during the theoretical part because I think it kind of gets boring if I talk too much and don't show you enough code. So I will try to keep the explanations short and then go over to the coding section and I would encourage you to ask questions anytime you like. But if it's not a really urgent question, try to save the questions for the theory for the practical part when we actually go to our studio and code something. Because there I also provide a lot of written explanations that you can read up on later and you will get the GitHub repository with all the material in it. So even if there is something that you don't get you can always read up on it later and look over it again. So because there is not that much time I decided. Can you please mute yourself everyone. Thank you. So I decided that we will focus on convolutional neural networks today and try to go as much into detail as you would like. And there is a lot of extra stuff I don't assume that we will be able to cover the entire code that it's in the GitHub repository because of time but I thought I will start. So the workshop is designed in a way that we can go as fast as you like basically if there are not that many questions we can go really fast and cover a lot of topics but if there are questions. Then I would rather go into detail on the few things that we do cover, and you will have the chance to look at all the additional stuff on your own later. I have a few words about me. I'm already introduced me briefly. I'm a data scientist at a German IT consulting company. I have been a data scientist now for three years but before that I actually started out as a biologist and bioinformatician so I don't have to really, I have a kind of side entrance you might say it into data science. The background is mostly statistics and data visualization and traditional modeling, but I also have been working a lot with machine learning and deep learning during the last years. And I find it all very fascinating the things you can do with data and hope to convey some of this fascination with you today. So just a few words what you will learn in this workshop hopefully the basics of deep learning. A few words that you might encounter when you talk about deep learning are cross entropy loss activation functions, optimizations of weights and biases, back propagation gradient descent. All these words are usually thrown around when people talk about deep learning and I'm trying to give you a short overview what these things are and why they are relevant to to building neural networks with Keras. And then of course the interesting part how to build the deep neural networks with Keras and TensorFlow and how to use these models later on to make predictions on test data and do some additional cool stuff. So the scripts and code you will get later once we once we did the theoretical part. So you will get the link then but first let me talk a little bit about neural networks. And before we actually start with neural networks I just want to briefly classify how I think of machine learning typically when I give a workshop. We have the three pillars of machine learning which are supervised learning so the most of the common tasks that you will encounter are called supervised under the column supervised learning. So everything that is a classification or regression task where you have labeled data is called supervised learning because you have a known truth that you try to model with data. And you will try to have a model learn from the data to be able to learn a mathematical representation of this data basically to get to the labels you know so that you can then use this learn representation new data and make predictions. So in case you want to predict classes, we call this classification. And in case you want to predict like numeric values we call this usually regression. And this is also what we will focus on in this tutorial. So we will only be covering supervised learning classification tasks. But more advanced usually our unsupervised learning where we have unlabeled data so we don't know the truth, as you can call it. So things you usually do when you talk about unsupervised learning are clustering dimensionality reduction like PCA or multi dimensional scaling or also anomaly detection. And reinforcement is often the third pillar of machine learning. This is where we have more self generating data like computer game that we can have a computer play on its own for thousands and thousands of rounds. And what we give this computer then is a word punishment setup, and the computer will then play against itself again again and again, and try to get the highest reward and this way, learn some tasks. So just a few short words about neural networks because this is basically what we will be building with Keras in a minute. So neural networks are one of the many algorithms that you can use to do this machine learning so to have a computer learn the mathematical representation of data. So actually, what we are using our artificial neural networks, because they have been built to mimic the learning patterns of our brains so the neural networks of our brains are basically what we are trying to copy so that we can use the complexity of the structure of our brain to learn from data in an artificial way. The very most simple type of neural network can is called a perceptron and the very simple perceptron would consist of some input data, some weights, maybe a bias, then we all combine this go have it go through an activation function, so that an output is calculated. And the trick basically is that we can change these weights and changing these weights will then change the output, and I will talk more in detail what this actually means. In the next slides. There are a lot of activation functions, I will just talk very briefly about a few of them. Some you will often encounter are for example the sigmoid activation function or rectified linear units in short radio. And these two examples show you what an activation function does. It is basically just a transformation of an input. So let's say we have input data going into our neural network. And let's say we have 10 input values, each of them is multiplied with the weight, and we sum all these input data multiplied by weight up so we have a number, basically that's coming out an input. So depending on what this number of our input is let's say it is five and we would end up here on our scale. So this input data would get transformed. And you see the sigmoid activation function would transform our input data to lie between zero and one. Another commonly used activation function the ReLU does something very simple. It basically keeps the input as it is as long as it's above zero and everything that is below zero will be set to zero. So an input of minus 10 would become zero and input of minus five would also become zero. There are more and more activation functions and they usually used for specific cases, and we will talk about specific uses of activation functions when we go to the coding section. And why I'm talking about them here is that I want to explain the concept of why we are actually using these activation functions in neural networks. And the reason is that we can use these activation functions to break linearity. If we did not have them and we would just have our input going into the perceptron, we would not be able to really do something interesting with the data to calculate an interesting output. We would not be able to have a linear representation between input and output. But if we use complex or at least activation functions that transform the input in a certain way, we are able to achieve nonlinear combinations and therefore be able to learn more complex tasks than just with linearity. So the simplest type of artificial neural network that we can also build with Keras is called a multi-layer perceptron. And this multi-layer perceptron consists of an input layer where we have our input data and then feed it into a set of hidden layers. And there can be basically any number of hidden layers and any number of nodes in each hidden layer. So here we have an example of three hidden layers where we have four nodes, two nodes, and again four nodes, now our hidden layers. And in each of these nodes, our input data will get transformed. So the special type of layer we are using here is called a dense layer where every input gets sent to the next hidden layer in every node. So you can see here input number one will go to the first node, the second, the third, and the fourth. And so on for every other layer until we get to the output. And the way neural networks learn now, so the learning process that we want, the learning of the mathematical representation between input and the known output is by changing the weights in a certain way by optimizing these weights so that the output will be as close to what we expected to be. And how this works is like this. So here we have a classification example where we have our input data, we have a weight, we don't need to have a bias, but often we do. So the most important things are input and weight now and the output that we know. So let's say we go to our fruits example that we will use in the coding part later. And we know that each image can belong to let's say the class apple banana or pear. So we have now three classes. And what we will do is, we know first of all we know that let's say image one contains or shows a banana. But it could also potentially show some neural network things it could be a banana and apple or a pear. So what will happen now in the neural network is that it will send the data through the neural network so the image data through the neural network and calculate the output and the output is basically a score. So let's say a number. This could for example be a two for class banana a one for apple and 0.1 for class pair. So just on its own. This doesn't really help the computer that much, because this these numbers can vary in size so that we don't really they are not standardized. In order to achieve the standardization we use the softmax activation function. And here you have another example of an activation function that transforms the input. And what the softmax does it will transform the score into a probability. So let's say here. The score to will get transformed into 0.7 the score of one into 0.2 and the score 0.1 will stay 0.1. You can look up the mathematical function of softmax it's not that complicated but I don't think it will add to that much here. What is important to know is that the probability you will get out with some up to one. So this means you will always have one number that is higher than the others. And what we tell the neural network or assume the neural network will make of these probabilities now is that the highest number is the predicted class. So in this case to neural network will think okay I calculated 0.7 for class one 0.2 for class two and 0.1 for class three. So I am assuming this image shows what is class a in this case a banana. And we know that this is correct. So we have the next case. The probability distribution, but we also have the correct state of this image. So the first part is just as just the same as I showed you before the probability distribution from the softmax function. And the one hot encoded vector is what we use to display the so called truth. So we know that our image belongs to class a showing a banana. We give it a one. And we know that it's not class B and also not class C. So this is a case where we have an exclusive classification so we can only predict one of several classes for our image. And what you can see if you compare this one hot encoded vector to the probability distribution is that it shares a few very important characteristics and the most important is that it sums up to one. Just as the probability distribution. It was some up to one can't be larger can be smaller it has to sum up to one. And now we can use a neat trick. We can subtract them from each other and calculate the distance or what we call cross entropy between the two, because you could think that, let's say on your network, made a perfect prediction, then it would have a score that gets converted into a probability distribution where the perfect class the right class would have a probability of one. And because it has to sum up to one all the other class would have to be zero. So in a perfect case scenario, or on your network would have predicted class a to be one and the other classes to zero. So we now use the distance. It's very easy to see that in the perfect case the distance would be zero, because one minus one zero minus zero and zero minus zero. So of course in reality we rarely have that perfect the case so if we use our example here and we calculate the distance. It will have a certain small distance. What we want to do during model training now is to adjust the weights and the bias in a way that reduces the distance between probability distribution and the one not encoded vector and not just for one image but for a set of thousands of images. And in this way, average the cross entropy over our entire training set so that we will have a loss that is as small as possible. So this works now is using back propagation and this is all relatively complex and I'm going very briefly over this I don't expect you to understand every word I'm saying if this is completely new to you. I don't have the object that you really need to understand back propagation and gradient descent here 100% just have these words in mind, try to keep in mind things like softmax function and why not encoded vector because these are things that you will encounter later on if you are building your Keras model and then it's good to at least have heard the words and have a bit of an understanding what they are doing and you can read up more details once you actually need it. But for now just keep in mind that back propagation is going backwards through on your network. So once we start our data goes in from left to right through all the hidden layers until we get to the output. We do what I just showed we calculate the score, have it converted into a prediction and enter probability prediction probability, compare with the one not encoded vector. And this distance, the error across entropy now has to be back propagated through the neural network from right to left, so that we can calculate the, the error part that each neuron has, because each neuron is associated with a weight. We need to know the error landscape we call it for each neuron, so that we can adjust the weight in the correct direction. So adjusting the weight and the correct direction is done via optimization and the simplest most often used optimization is called gradient descent. So what gradient descent does the first part you should already be familiar with. So we had our neural network we had our known one not encoded vector, compared against the predicted output. And now we have this error landscape where we back propagated the error through on your network. And we now you can think of it as a, let's say a landscape with valleys and hills and we have a hiker who is on your network and on your network is blind so the hiker gets blindfolded. And we use a helicopter and drop the hiker somewhere in this hills and valleys landscape. And what the task of this hiker, or your network. So let's think of this neural network as the hiker now is to find the place with the lowest valley. So the global minimum, because the, you can think of the landscape as the error landscape so the lower you go in the landscape the lower your error. And that's what you want to achieve if you would want to find the mathematical representation of your neural network, where the error is as low as possible. So the global minimum of the error landscape. And the neural network would do would go about the similar is this blindfolded hiker. It's blind he is blind and he now wants to know, where do I have to go to reduce the error. So he is blind and just has to feel around himself. Where is it steepest so where do I have to make a step to reduce the error as much as possible in the next step. Because we only think in steps, we don't go from one point to the global minimum in one step we use several steps just as a hiker would he would not jump in one one leap he would have to take several steps. So the first step he feels around himself goes to the place where it's deepest. And we start the whole process again. So we now can think of the next step is our updated weights. We go through the entire process again we calculate the output compare it with the known output have an error. And again, look around. Where is the error steepest. And this is being repeated ideally until we find a place where we cannot go any steeper and assume that we now have found the optimal place in our error landscape so the optimal combination of weights to achieve the lowest error as possible. Of course in reality, it gets a bit more complicated because there are things like settle points and local minima where you might think you cannot go steeper anymore but that's just because you are in a valley that will go and you would have to go through and over another hill to go even deeper again. So there are ways you can overcome this problem and we will talk about a few of them later but for now just remember that you are trying to find the lowest error and you adjust your weights in a way that reduces this error. So this was just really, really, really brief because we wanted to talk about the interesting things which is hands on deep learning with Keras and TensorFlow. And you can do a lot of cool stuff with neural networks and Keras and TensorFlow you can do object recognition and images and videos you can create classification tasks for images for text you can even generate text you can do all kinds of cool stuff with deep neural networks and Keras is very suitable to do a lot of these things because Keras is made to enable fast prototyping of neural networks so this means it's a high level API. It's originally been developed for Python. It also works with R because R uses a package called reticulate to basically create a conda environment on your computer and run Python in the background so you actually run the Python Keras in the background but you don't know it because you have the Keras R package that makes use of this background conda environment. What's also interesting about Keras is that it works on top of different back ends so you don't have to use TensorFlow as a back end. But usually people do because Keras is basically designed to work with TensorFlow and TensorFlow even recognize this in a way that it includes Keras as part of its core image as of TensorFlow 2.0 and larger. So if you install TensorFlow you automatically have Keras installed and it's automatically designed to use Keras with TensorFlow because the big advantage of Keras over just bare TensorFlow is that it's more high level. It allows you to define neural networks in a more abstract way so you're much faster in building neural networks and changing certain things and going from idea to finished model. Yeah, it enables fast and easy prototyping. And another interesting thing about Keras is that it's highly modular and this makes it very flexible and what this actually means you will see later because you will see this when we talk about layers. And still Keras is really powerful. You can do a lot of more advanced networks with Keras. You can do confnats, you can do RNNs, you can even do LSTMs, you can do multiple input, multiple output. You can do a lot of really cool stuff with Keras so this doesn't mean just because it's easy and fast doesn't mean it's less powerful. And of course if you want to train more complex or bigger models you can also use TensorFlow and Keras to run on a GPU. TensorFlow itself has originally been developed by Google Brain. It's also an open source software for deep learning and why it's called TensorFlow is because it's based on tensors and on flow charts. So this means tensors are the most important thing for neural networks if you work with Keras and TensorFlow and we call a tensor basically anything that is a multi-dimensional array. If you ask a mathematician he will probably cry in his sleep because mathematicians I've been told have a more strict definition of what a tensor is but for us machine learning and data scientists. It's enough to know that a tensor is a multi-dimensional array. So it can be one dimensional like a vector can be two dimensional like a matrix can be three dimensional like an RGB image. The really nice thing about tensors is that they can be processed in parallel. So we can do matrix multiplications with tensors and we can process them in parallel so we have the option of scaling them the calculations very well. So here are examples for multi-dimensional vector multi-dimensional arrays and the graph part of tensor flow so the flow graphs just mean that we have a computer computational graph to represent our mathematical operations. So we have certain nodes that mean certain operations, how the data get transformed just a mathematical operation can be anything. Usually it's a matrix multiplication. And then we have input. So a tensor any multi-dimensional array can be an input. We do something with it some mathematical magic and we get an output another tensor. And this is how the information flows through our graph. If you want to build models with Keras the first decision you will have to make is whether to use the sequential model or the functional API. These are the two basic modes of Keras how to build a neural network, how to define a neural network with Keras. So the sequential models are the simplest way to define a neural network. And they are suitable for most cases but they require that your neural network consists of a linear order of layers. So you are restricted to have one input and one output and the information flow has to be in it has to go in a linear fashion from input to output through all the layers. So if that's something that is not complex enough, you can also use the functional API in Keras. And this allows you to build more complex models and it's usually used if you have multiple inputs or outputs. Let's say you have a model where you have images, but you also have the caption of an image. So you have image input and you have text input, then you would have to use the functional API to build a model with multiple inputs. So the basic thing you will always encounter in Keras is dealing with layers. You will have an input layer, you will have an output layer, you will have lots and lots of different layers that you will encounter when you work with it. Some of them are dropout layers, you have noise layers, pooling layers, convolutional layers, normalization, embedding layers, LSTM layers, all kinds of different layers. And you can combine them with functions like activation functions for each layer. You have regularization techniques, optimization functions. And of course you can save models, layers and weights to use them later to predict things. So this is how you think of the pyramid if you build a Keras model. You start with your API. Let's say you start with a sequential model. Then you define the model by adding layers and it will become clearer, I think, if we go to the coding section in a minute, just so that you have already an idea what's going to happen. And once you have defined your layers, you can add functions like you have to define the loss function, you have activation functions, optimization functions and so on. And finally you have to define metrics that you use to measure the performance, like accuracy, mean squared error, whatever you want. So the practical part, convolutional neural networks, and I promise you you will get to coding in a minute. Just a few more words about confnets before we actually build a confnet. Confnets or convolutional neural networks or CNNs are typically used to classify images and an image in this case can be classified to show just to belong to just one class and this is the cut. This is what we will be doing with our fruits images. So we will have a set of images that can belong to one of N classes. So let's say this image belongs to the class tree. A similar task that you can also do is called object detection. This would be the case where you want to detect objects within images, but this is not what we are going to be doing today. We will just be doing entire image classification. And if we have input images, our data in this case is the image. Looks like this. So the simplest way is having black and white images and black and white images in terms of how we see them as data consist of a two dimensional matrix where each position in the matrix is a pixel value. And a pixel value in the case of a black and white image can range from zero to 255. So I think I'm mixing this up very often. I'm thinking zero is black and 255 is white. And each position has the information of basically how light or how dark this pixel is. And if you look at the entire image, this is what you would see. Colorful images are not that different data-wise. They just don't consist of one two by two, one two dimensional matrix, but of typically three. And each two dimensional matrix consists of the pixel values for one of the color channels. So in this case we have R, G and B, red, green and blue. We have the same pixel values from zero to 255 in each of these color channels and combined they make up the image that we see as a, let's say the image of a banana. ConvNet will now take this image data and learn representation of the data by going from abstract forms like edges, lines through more complex forms. Let's say they are shapes or more complex patterns to more to specific patterns and build up basically a pipeline of learning different abstractions of the original image to end up at an output. And perform a similar classification task as I described before. So here's a bit more about CNNs and all the things that are highlighted here are words that you should keep in mind because you will encounter them in Keras in a minute. And the first one is the sliding window. You have your image here with all your pixel information. And if you have, if you want to define your Keras model, you will have to tell it how big should the sliding window be. So what size should the sliding window be. And the sliding window is often three by three pixels or sometimes five by five pixels. And what this means is that we have a window of this size, let's say three by three, and we start sliding the window across our image. Typically from left top from top left to bottom right. So we would go across once, then we go one lower and we go across again. So in the end, we will have covered each of our pixels several times with the sliding windows. And what happens in the sliding window is that each time the sliding window gets applied so it will be applied once, and then it moves one over, we will apply it again. And each time we apply the setting window, we will use a filter to calculate an output from this sliding window so from the pixels in each sliding window. So it can be any number, but typically we have something that filters that would recognize horizontal vertical lines or even other patterns. And it's a bit abstract to understand now what I have a few examples in a minute. So what these filters do they detect shapes and patterns. And by sliding our window across our image and by applying the filter every time we have our sliding window. And form the image to make some certain shapes appear more strongly, for example. So filters are also what you will have to define in Keras. So not which filter but how many filters. And I will talk about this later in more detail. For now, something else that you will have to tell Keras is the padding. The padding means how do you want to treat the fake border pixels. Because if you think about your sliding window and you think about how you would slide it across your image. You have to you, it's clear that if you just slide it across the image as it is, all your border pixels will only be shown to the sliding window shown to the filters. Let's say once in case of the, the outer edge pixels. But what you want is for each pixel or usually not always but usually you want to have all your pixels be shown to the filters the same number of times. So in order to achieve that we use fake pixels around the edges of the image. So that our starting window basically starts here and not in our image. Another thing you will have to tell Keras is the so called stride and stride means how much overlap do you want to have between sliding windows. So typically you slide your window one pixel to the right or the bottom, but you could also increase the stride and let's say our setting image should go two or three pixels to the right or bottom. So now once we have this and we apply our filters, what we will achieve as a so called feature map. And this will become also clearer if you actually code with Keras because then you can see in the output what this means. But our feature map is just a stack of feature maps. And a feature map is the output of one filter. And I will show you what this means to make it more clear in the Keras, in the Keras setup. Pooling layers is something that has been used for a long time. It's not being used as much now. But if you use pooling layers, this just means you reduce the compute time by boarding information down to so reduce the image. Let's say you take another sliding window, but this time for each sliding window. Let's say you choose the maximum value. This is called max pooling. So you would reduce your image. By here you have three by three and you would just keep the maximum value and then go through the next pixels and repeat the same. So convolution and pooling are the traditional common setups of a CNN. And usually they have been interchangeably combined. So you have one or two convolutional layers typically. Then you have a pooling layer and you repeat until you end up with a fully connected dense layer and feel it into the output. And I guess this is already abstract if you have never seen Keras and have never worked with it. But I think it will become clearer once you actually see the code. Just before we start the summary, why are CNNs now better at using better to use for image classification than let's say the multi-layer perceptron. The really really big thing about CNNs is that the pixels are considered as a group of connected information now. Just before because we had our dense layers in the MLP, every input data would get sent to every node. So all the information is independent of each other basically. It is computationally pretty fast. But if you have something like images or even text, it is important to consider each pixel as a context of other pixels or as a context of words. If you have a text model. So it is important to use these sliding windows because now we don't have to look at one pixel at a time. But we can actually look at pixels in combination with the surrounding neighboring pixels. So we now analyze chunks. And the big difference learning wise for our neural network is that before we learned the weights and now we actually learn the filters. And what this means will become clearer in a minute I think. So this is it for the theoretical part, very very fast. And you will be able to understand better I think once you see the code. For the code we will go to our studio and I have to share my screen again. Let's see. Is it big enough or should I increase the font size? It's okay. Okay, so this is the link to the repository where you find the slides and all the code. If you want you can clone this repository now. I will have to set it to public still I think. I'm not sure if I have done this yet, but I will do it right now. So in this repository, you will be able to check it out I will not leave it on public for that long. I'm not sure how long I will leave it for probably I will make it private again after the tutorial. So that, or maybe I will leave it, I'm not sure maybe I will set it back to private, but you will be able to download everything right now if I find this here project is public now. So if you want to start coding along you are free to do so right now. If you just want to listen and look at everything later that's also fine. My screen sharing has stopped right now you can see everything again. Great. Let me find a chat again so that I can see your questions. All right, so I will be working in the general setup of our studio. I'm not sure everybody is going to use the same but this is how I will be working with Keras and are here today so I'm using a our studio project. And this is actually here the repository that you will be able to clone from from the GitLab link here. Get lab.com slash here in G slash Keras underscore tutorial underscore user 2020. And in this repository here on the right are all the files. You should see most of these files I have a few of them and get ignore about the important ones you have the important thing that you can have a look later is the Keras cheat sheet so everything. I just mentioned very briefly and even more is on the Keras cheat sheet. And in the folder, sorry, Keras workshop material here you find the code. The setup code is again in the 00 setup dot RMD. And you can have a look at this again if the setup installation didn't work for you can go through this again and see if this would change something. All right. So, we start with the practical part. And as you can see I have added a lot of text actually to this are marked on file. And if you want to go over it in detail later after the tutorial and to maybe didn't get get everything I've been saying here because it's so fast I know I'm telling you lots and lots of information a very short time and it's basically impossible to remember everything. I hope I have written down most of it, at least everything that's that you need to know to understand Keras. So I will not be reading this out now of course but it's for you just to read up again later. So we will start by loading the libraries Keras. And I'm also loading the tidy verse because I enjoy working with a tidy verse and I think it's the best way to deal with the data analysis in a tidy way. If you have not been working with tidy verse. I highly suggest checking it out but you don't have to to work with Keras can also use base art. The first thing we need to do is load our images and the data I told you to download from Kaggle, the fruits data set. And you can save it basically anywhere you just have to tell or have just have to give the path where your images are stored. So in my case the images are stored here and the documents get up data fruits 360. And if you open this folder you will see that there are two folders. One is called training and one is called test. And the training folder contains most of the images and is designed to use the images for training. The test folder contains fewer images but they are saved in the same way. So let's look at them I will just define here now the file path and if I list all the files that say in the training images. You see all the folders that you have here, lots and lots and lots of fruits. You can actually see all the same folders in the test part of the fruits 360 data set just with fewer images per folder. So this is not yet Keras, but the next thing we are doing is preparing everything for reading the images into Keras. And the way we will be doing this here is to use something called the image data generator. You also have the option so let's say you look at, you can also go to the Keras help page for more information. I think it's keras.rstudio.com or rstudio.keras.com just Google rstudio and Keras then you will go to the come to the right page. You also have examples where they work with images. I think the basic example is still the digits one M list. And there what they are doing is they convert so they read every image basically into R converted into a multi-dimensional array and create the data set this way. So you don't have to read images in with the image data generator. But typically if you are working with images, it's the much better way to use the image data generator because the image data generator allows you to have a big set of images without having to read all the images in at once and have to keep them in rem. But what the image data generator does is it will know that the destination so it will know where on your computer these images are stored and during training it will read in the images only when it needs them. So it will be much less efficient a much more efficient to use the image data generator. And if you have big set of images. Yeah. I'm sure. Can you increase the font of your screen? Yes. The participants are claiming that the font is very small. Is that big enough or should I increase it even more? Is it okay now? Scan comments in the chat. If it's okay now. Okay, so can I increase it a little bit? Let's make it a bit smarter here. Okay, it's better now. Okay, it's okay. Okay, I'm good to go. Okay. All right. So let's actually start reading in our images or maybe I shouldn't even say reading in because we are not actually reading everything in at once we are preparing our function for reading batches of images in during training. So we are using the image data generator to do that. And in order to have a model that trains a little bit faster and to reduce the complexity a little bit I don't want to classify each of these folders so each folder in our data is one type of fruit. And it's lots and lots of folders. I don't even know how many, but at least too many to show you the principles of Keras. We just want to or I just want to show you the principle on I think it's 16 folders and I picked more or less randomly 16 folders here from all the folders that we have. And you can if you play around with that you can change them out and see if this changes your model if you have to change some other things you can have more or less, however you want. But in this case I chose 16 fruits that I want to classify with my Keras model. So I write them into a vector called fruits list or fruit list. And I also define an object with the number of classes I have now so this should be 16. Let me run this code. So output n is 16. And you will see why I use this later because I want to make sure I have the correct number for the output layer in my Keras model later and if I change something here it will automatically change the output layer in Keras. Another thing I want to do to reduce the calculation time and the complexity is to scale down the images. The original images are already relatively small they are 100 by 100 pixels, but because they are pretty uniform, the model trains well, even if we reduce the image size, in this case to 20 by 20 pixels. The target size for our image is now 20 by 20. So image width by image height. And we also give the information that we have color images. So we have three channels and the default for Keras is RGB. And it's important to note if you have other images, if you work with your own images, for example, because some programs work with a different order of channels. So the default is red, green, blue, but some programs work with I think blue, green, red, I don't really know which order they use but you have to be careful that you don't mix them up. This is just some housekeeping we have defined a few convenience functions here so that we don't have to copy paste numbers later on. And I'm also defining the batch size right now. And the batch size means that we will be reading in our images as I told you, once they are needed. So the image data generator we are creating now will be fed into the Keras training function. And what it will do it will load batches of images. So let's say we have the batch size of 32. And what will happen is that 32 images will be chosen. They will be sent through the neural network. So basically what I told you in the theoretical part they will go through the neural network the score will be calculated the activation function will be applied and compared to the one not encoded vector will have our error will back propagate the error, we will apply our optimization function, and we will go to the next step, basically the next epoch by adjusting our weights in going through the entire process again. And batch size of 32 means now that all of our images will be looked at, but they will be looked at in batches of 32 images. So the image data generator function. Just a note at the side. All these functions basically have the same name as the original Keras Python version of the code has. So if you ever want to look at examples for other Keras models of our other Keras networks, because you have a model where you don't know how to start you can always look at examples of other people how did they build a Keras model. You often find more Python versions for models than for our, but it doesn't really matter because the function names are very, very similar. I think they have some some naming conventions with camel case instead of underscore version, but you can generally recognize which function is the equivalent in R. So the image data generator is something we define first and the image data generator as you can see allows us to adjust our image. And the thing that we are doing here is we want to rescale all our images by one divided by 255. And the reason I'm doing this is that the original pixel values of our images lie between zero and 255 just as I explained before they are the standard pixel values. But for neural networks, it's often advantages if the values in our tensors lie between zero and one. And this just has to do with matrix multiplication and the fact that the way our neural network learns it's easier if the numbers are in a smaller range. And it's more efficient for the network to adjust weights and to to optimize with if these values are between zero and one and they are standardized. So we are actually doing this by dividing every pixel value by 255. So we will have our values to fall between zero and one. The things I've commented out here are options for data augmentation. I'm not using this year I've just left it in as an example for you if you have a model with your own images for example and you don't have that many images. So let's say you only have a couple hundred images per class and the power is not enough to train a good model. What sometimes helps is to augment the data and data augmentation simply means that you take your images. And you use the image data generator just as before so even every time the training is run it will pick pick pick batches of images so let's say a batch of 32 images. But if you use data augmentation it will not take the image as it is, but it will change the pixel in a certain way. And this is how what you define here a rotate rotation range of 40 means that it might rotate the image up to 40 degrees. Or a width shift or height shift range it might adjust the height or the width of the images or do some other interesting things with it to basically just change the images a little bit. And now what comes into play if we use data augmentation is that we don't only look once at each image as we do in the normal way as we do with our example here. So each image will pass through training once per epoch. But if we use data augmentation we want every image to pass through training several times per epoch and every time it passes through the training it will be changed in a slightly different way so we artificially increase our sample size basically. So this we define for our training data. But training alone is usually not a good way to measure the performance of our model. Because if we train our data on a set of images, we have it learn just this set of images so we will have it learn a task on a reduced sub sample of what we call like the real world. And if we apply this model then to other images, let's say you want to predict fruits of other images, it will probably be really really bad, even though it was really really good on our training data. And the reason is that our training data was so specific and it basically learned our training data by heart optimized on this training data alone and then wasn't able to generalize to other images anymore. So one thing we can do to improve this a little bit is to use validation data during training. So this means during training our model performance and model optimization will not be performed on a training data but on a validation set. So on a separate set of images that is not used to adjust the weights, not not used to measure the performance, but on a, yeah, just left out to get a more independent idea of how well our model is learning. Of course, if we are using something like the fruits data set we have to keep in mind that the validation data is basically the same as the training data it's not the exact same images but the images basically look exactly the same. So they all have fruits in a different way on there with a white background and the fruit really centered. So it's not something that you can expect to really enable our model to generalize beyond this very strict framework of images, but it's still better than just using training data. So we define another image image data generator just for the validation data this time. And the validation data if you remember came from this other folder that's here in the fruits 360 data called the test folder and something else you should note here data augmentation is always only done for the training data, never for validation data. Oh, let's say not never but typically it shouldn't be done. All right, and now we go to the heart of the matter. We load our images. And for this we use the flow images from directory function. And this just takes the image data generator we just defined. Sorry, this is here. We give it the file path where it has to look for these images. We give the target size. The target size was what I defined up here where I wanted to reduce the size of the images. I also tell it the class mode in this case categorical because we have a classification task this means different categories of images. And this is optional, you can give it a vector of classes. If you don't want to have each of the fruits classified just subset. And if you remember the subset was this vector of folder names that I defined before. If you didn't include this line of code here, what would happen is that every folder that's in the train image file path. So all of the fruit folders here will be taken as one class. And per default flow images from directory considers the image, the file structure in a certain way and this is in this folder here. So in the training folder, it would expect to have a certain number of sub folders. Each sub folder is a class label to predict. So it would expect to have the images sorted already to belong into certain classes. And each class has to have has to have its own folder and the folder name will be. Yeah, what's called the class label in this case. So here because I don't want to use all of them, I just define a subset. I also give the batch size, so how many images should be processed in batch. And they seed to have. So this is not necessary for training a model, but it's just for replication purposes to have a pseudo random number generator starting from the same point so that everything will end up the same if you repeat the process. And I'm doing the analog for validation images, just that I'm using here the file path and the image data generator for validation data. All right, now you can double check that everything works as expected because it will tell you how many images were found and how many folders so in this case 7709 images belonging to 16 classes. And for the validation data 2592 images also in 16 classes. So just look at how many, let's look at how many images we have per class. So here we have the table can see here. And the first thing you probably note is that if you go back up to your files. And here I listed all the files in the training data was that they had nice human readable names right you had papaya you had pepper plum, whatever, and even the fruit list I defined here also had the nice human readable names. I know what a banana is I know what a strawberry is, and so on. But all of a sudden all these names are gone. All of a sudden what we have have here are numbers from zero to 15. And this is something that Keras does in the background always is to convert the class labels into indices. And it starts with zero because Python is zero indexed, while our is one index so very common problem you encounter is that Python always starts counting with zero and our starts counting at one. So if you work with Keras be sure to keep that in mind. It's a Python package and Python starts counting at zero. So here we now have all these indices. But in order to make them human readable again, we have to be able to map them back to the original human readable labels. And this is actually stored in the train image array generator I defined above. And something called class indices. So this is what you can see here class label versus index mapping. And you can note that the order is actually the order I chose in the vector of subset of images here and the fruit list. So it starts with Kiwi banana and so on ends with pomegranate. And this is actually the same order that Keras kept when it converted the labels to indices so Kiwi has the index zero and pomegranate has the index 15. And this is just something we keep for predicting later on, so that we know what's what. All right. So I gave this as a separate object to have it for later. So this is just what I showed you before the index. And another thing I want to have as a separate object to paste later on is the number of training images and the number of validation images. I already know that should be 7000 something and there should be 2000 and something just to double check. Yes, that's correct. So are there any questions up to this point. Let me have a quick look. I think I think the question has been resolved. So if you have any questions just share it on guitar and we'll take it off on the. Thank you. I did not really understand you just now. Yeah, I said may you have the question that was asked has been resolved. So, um, yeah, so if you want to use kids idea main question. So being this shuttle attention. I'm having a bit of connection issues so I don't really understand you. But you can type the questions in the chat here, I have the chat open so how is a picture labeled as a pepper. So this is by having this folder sub folder structure. So as I showed you before, we have this folder called training and training here this is the folder training list files. And this path gives you a number of sub folders. And Keras, if you use the flow images from directory function assumes that your, that your file path that you give it contains a number of sub directories and each sub directory contains images. So each image let's say in the folder chestnut is expected to show a chestnut. So you have to make sure if you use your own data that they are labeled correctly and sorted in the correct folders. Because Keras will take chestnut as a class label and say every image that's in this folder is a chestnut and this is what I will learn. I hope that answered the question. So I decided the image size to scale down basically just trial and error. I tried before with 100 by 100 pixels the model was super super good. I thought, right, these images are very uniform. The task is pretty simple. Let's try to scale them down pretty drastically. If you have more complex images you probably won't get away with scaling them down this much. But here it works. So to learn how to choose the optimal parameters. The best way is to use hyper parameter tuning. So you would typically start by looking at similar examples that you find on the internet. So if I have, let's say I want to train a CNN a computer convolutional neural network. This is what people did with similar networks. And then I go to the hyper parameters and I would tune them. I can actually, after this post a link to the hyper parameter tuning site, where you find some information about how to do that in Keras. Yeah, and we will talk about convolutions. Let's find them all a bit more. This is just what we did. So now let's go. More explanation here. I defined the number of epochs here to be 10. This is also something you can play around with. Let's talk about this more when we talk about, if we go to callbacks. Just to keep in mind here, you could use a much higher number and use a callback for early stopping. But in this case, I just stayed around with enough to know that 10 is a suitable number. So epochs is defined here line 226 for the person who asked. I have a little excursion about how a computer sees images, but I will skip this in because the time is already advanced. This is just to show you how Keras or how the computer looks at the images you see actually the pixel values here. You can look at that later if you want. So we are going to the interesting part. We actually have something called Keras here. So because we are using a simple model, I will be working with the sequential model here. And what you do first is before you define your model is you initialize the model by calling the Keras model sequential function and give it a name. So a few things to know before are that if you have worked with the tidy verse before you will be familiar with the pipe operator. This weird sign percent bigger than percent is the pipe operator in R. And it's very prominently used in the tidy verse where it's used to have some input, let's say some data. You pipe it to another function and then you pipe it again and again and again so that you have a workflow of transformations of your data and you have go to an output. What the tidy verse does with the pipe operator. It never modifies the original object. So let's say you start with data, have a pipe operator and fire functions, then the output will be printed, but the original data will not be modified. This is different in Keras. In Keras, these objects will get modified in place. This is really, really, really important to know because once you do something with the pipe operator here, it will directly change this object. So I am first initializing my model and this is what I've done. And now I'm building a relatively simple. Yeah, I will talk about what these parameters are in a minute. I will build a very simple convolutional neural network. And I do this by adding layers. I already told you the basic of Keras is to work with layers. And the layer we are using now for two dimensional convolutions is called layer conf 2D. And here you will encounter all these hyper parameters again that I briefly mentioned in the practical part on my slides. And I will explain them here a bit more because I think it becomes clearer now. So let's make this a bit easier to see. So our first convolutional layer and convolution 2D just means that we have a two dimensional sliding window that we want to use. All right. The first hyper parameter we define is the filter. And I told you before that when I explained how a multi layer perceptron learns it's by learning the weight. So it will optimize the weight in a certain way to reduce the error. And the weight is just basically any number can be a number can just think of it as any number. And we will change the number in a certain way to reduce the error. What the CNN does it's not learning weights it's learning filters and a filter. In this case, a filter with the kernel size three by three is just a two dimensional matrix with three rows and three columns. And what happens now let me go to Chrome I think I can show you very quickly if I find it image filters. I will share my screen in a minute once I find the correct site. Right there it is. Here you go. So this is a really neat site I will copy the link in the our markdown for you to look at that explains how filters work. So here you see a three by three filter just a two dimensional matrix with numbers in it. So this is now any filter you have different filters so these are pretty fine filters let's say. Yeah, sharpen is fine. And what happens now is that this filter here, this is our sliding window three by three will go across the image from left to right. It will go from top to bottom. And every time we apply our filter, we multiply the positions you can see this here. If I move my cursor, it will not show anymore, but you can see it in the middle. So you have the pixel values 255 1096 and so on and so on. Zero is black and 255 is white. And we multiply each position in this two by two array with the corresponding position of our filter. And now we add all this up so this is just a simple dot product. We add this up and we end up with 44. So this is our output of this filter at this specific position in the image. On the right you will see the output of this filter for this image, it sharpened all the edges. So this is just the image that what an what a filter does a filter applies a dot product calculation for every sliding window, and will in this way modify the image. Okay, let's go back to our. We define 32 here. And what does this mean. This basically means that we want our neural network to learn 32 filters in our first convolutional layer. So it will basically optimize each of these numbers in our filter to find the optimum way how to modify the image at each step. To go through the output basically to predict the output is best as good as possible. And if I show you the model in a minute you will see where the 32 ends up. Just keep that in mind for now. Padding same padding was this fake border pixel thing that we did. Because we want to have all of our pixels seen the same number of times by our sliding window. We create a border of fake pixels. And by saying padding equals same. We just say use the same pixel value of the border pixels and replicate them through the left or to the outside basically we could also use zero padding. This will just create a border of zeros around the image. Something else you have to keep in mind if you use Keras is that if you use the sequential API, the first hidden layer. So in this case, our convolutional layer has to have the input shape. So maybe this is a bit unintuitive if you are used to working with our way usually if you run some function the function will automatically get evaluated right away. This is not the case in Keras. So what we do here is we have a let's say a whiteboard and we draw on the whiteboard how we want our neural network to look like so we basically are an architect and we think in our head. Maybe I have one convolutional layer with 32 filters and then another and I just write this down. Nothing gets calculated here nothing it's just writing down how I want the architecture to look like. So and the input shape tells the neural network later what number of pixels to expect in the image with an image height and the number of channels. So this is the image dimension it expects 20 by 20 by three. And if you try to put images in that are let's say 100 by 100 by three it will throw an error. So you have to give the input shape. So then, after the first convolution I added an activation layer this time rectified linear units. And they add another convolutional layer. And by the way everything here gets combined with the pipe operator. So in our second hidden layer we want to have 16 layers, another three by three kernel padding. After that I just to show you I mean, this is basically just playing around to show you a few of the different options you have with Keras, you could also use again activation ReLU here. And just to show there's also something called leaky ReLU. I'm not going to explain all these advanced things here but if you're interested you can look them up and see what they are. Then we have a pooling layer. So here pooling size of two by two. And what this does you will see in a minute. Dropout layer. Dropout just means it will randomly set in this case 25% of all our all our nodes to zero to improve generalization. After that we flatten everything into a dense layer. This is something we need in order to convert the high dimensionality of the convolutional filters and the feature maps that we can get from these to go into our output layer. And our output layer has to have the number of classes that can be predicted in this case 16. That's why I defined this output and object before because we need to have 16 nodes in our output layer one node for every possible class. And here you see I'm adding the softmax activation function. And this I showed you in the slides was the way how to transform the scores that are calculated into a probability distribution. And finally I'm compiling my model by adding the loss I want to use to optimize. And in this case because we have a classification task, we are using categorical cross entropy. There are other losses, of course, you can even write your own custom loss functions. And there are lots of optimizers you can use in this case I'm doing arm as prop you could also use stochastic gradient descent. If you want to know all the options you have, you can start typing question mark optimizer. And our studio will automatically suggest all the optimizers you have here. Optimizer SGD is stochastic gradient descent. Alright, so I'm defining this model. And let's look at the model to explain a bit more what all these filters actually do. So the summary of our model shows you the layers we have here, and it shows you the output shape. So the output shape is interesting to get an understanding of what happens now with all these filters. So here, this is just for the batch. This doesn't matter right now. We have 20 pixels width, 20 pixels height. And our first convolutional layer we said we wanted to learn 32 filters. So here we have the 32. And this means each filter, each of the 32 will be a different three by three matrix with different numbers in it. So a different filter that is applied. And so we will end up with a stack of 32 matrices, 20 by 20. And each of these 32 matrices is the output of one of these filters. So this is now our dimensionality. Then these get combined back into one. And in our second convolutional filter, we wanted to learn 16 filters. So now here we have the 16, the feature stack of 16. And so on. Until we come to max pooling, where you can see that now so here in our convolutions we always change the third dimension right. And now we come to max pooling. And here you see the third dimension remains the same. But the first and second dimension here so the pixel width and height of our images gets reduced down. And this is because max pooling in this case I had a two by two sliding window only keeps the highest value of each of these two by two windows. So we basically reduce the image size by half here and this you can see. So we flatten all this out so 10 by 10 by 16 ends up being 1600 with just that here we had a stack of 16 feature maps all the size of 10 by 10. If we flatten them out with this just means we break break the context so here we still have the context of each pixel. We know the neighbors of each pixel we know the corresponding value and all of the 16 stacks, but if we flatten everything out we basically make them independent, we consider them now all independent. So we flatten them out in order to be able to go to our 16 output layers. You always get the number of how many parameters are being calculated here. This is just interesting to compare and see how big your model gets. Right. And now we fit. So until now Keras did no computational calculation whatsoever nothing just just on the whiteboard. But when we start the fit generator this is when training actually happens. The general function is called fit and in future functions of our it will even throw you an a warning if you use the fit generator with the older versions you had to use fit generator if you were using a data gen data generator. So this is what I'm using here but in newer versions the fit function alone will automatically be able to handle generators. If it encounters one but here I kept it just in the old way just to be clear. So if we run the fit generator, we give it the training data and this is the image data generator with flow from directory we defined before. So this tells the generator where to look for the images, how many images to process in a batch. And basically what classes to consider and so on. And the number of steps per epoch. This is relevant if you had data augmentation, you would want to increase the steps per epoch so that each image gets seen more than once. But here I divided the number of training samples by the batch size so that each image would be seen once during each epoch. And the same I'm doing for the validation data. And when I started the generator you can already see that our that keras gives us a nice output here. It tells us the number of epochs it's working at so one of 10, and the different steps, how long it took the loss, the accuracy and the validation loss. And it will print this for every epoch. And if you are using our studio the viewer pane will automatically open with this plot that shows you the validation, the loss and the accuracy for training data and for validation data. So we now have to wait a little bit and see how our model performs and then let me go to the chat again to see if there are questions. Yeah, I've seen this error a lot. I'm not sure why you get this error that there is no pill image. The number of parameters. So I just picked them because yeah, I experimented with them a little bit and I picked something that works fine but if you don't know anything and you're starting from scratch, you would have to do hyper parameter tuning and I will copy a link. That gives you more information about hyper parameter tuning. Is there a way to parallelize this. Yes, there is. Typically, if you have a small model I just use my CPU, but you could of course use the GPU version. Is it because you mean the error. I don't know if that's the question but yeah we are using the CPU version if you had bigger images it would make sense to use the GPU version. But of course your computer has to support the correct GPU or you would have to use let's say an AWS instance or something else where you have a GPU running. Yeah, you can of course if you run GPUs you can scale them to run on let's say an AWS cluster or something else. What CPU memory do I have. Good question. I think I have 16 gigabyte RAM and the CPU I have should be a 2.9 gigahertz quad core Intel core i7. Not sure why you get this pill error for displaying the load images because I was pretty sure that load image is not something from a Python package. It's something in our package and not sure you mean could this approach be useful for non image classification is one question. And I guess you mean this approach you mean a CNN and CNNs are typically used for image classification or for simple text classification. If you use text you would not want to use these functions. Conf 2D but conf 1D because in text we don't have two dimensional sliding windows but one dimensional sliding windows, but you would consider them more or less the same so you would slide across let's say sentences or words and you would consider a word or data in context of what's before and what's after it. So that's why I can also use convolutions. One question is I saw that most people use Python to build models with Keras is it a general practice to use Python for Keras. Yeah, actually, more or less, let's say, I mean, it has somehow been established that the most data scientists tend to use Python for the hardcore calculation stuff and this is because I think because our tends to be a bit slow sometimes and the rum aspect makes it a little bit tricky. But yeah, a lot of people use Python with Keras but I think this is also because most people don't really know that it's possible to use our. You just have to know a few tips and tricks and then you can use our if you go to keras.rstudio.com you will find a few examples of really complex things like deep dreaming for example all bit in our and it's running just fine I think it's just this weird competition between Python and our and what's better and what's not and who is a serious data scientist. Yeah. What is the desired output of your network. Is it a numerical 2d or 3d matrix what should be the output layer in this case. The output layer should always be the number of classes you predict so each class needs to have one output node. We have 16 possible classes so our output layer has to have 16 nodes. And this is because we want to have a score calculated for each of the possible classes, which we can then convert into the probability distribution. And this means that if we have more than two classes we should use the softmax activation function to do this. We had just two classes we could also use a sigmoid activation function and just one node in the output layer because of course if we specify two classes, then our input is in one or the other class so if it's in one class, it's automatically not in the other so we don't need to output know it's there. We have a regression task of course we also just need one node because then a number is calculated and we don't need a softmax function. Another question how do you deal with input that might not belong to any class that the model was trained for. Yeah, that's not possible with this type of model. So the way we define this here, it will predict one of the 16 classes it doesn't handle. It doesn't belong to any class, for example, you would have to explicitly train your model this way, or you could do a multi output model and a separate output for different classes so that one image, for example, can belong to two classes or none. Alright, let me just very quickly go over this, the rest of this file so the hist object here you can plot it and it will just plot the curves you saw in the viewer pane in our studio. And this is the number of epoch on the x axis and the accuracy and loss on the y axis and see how well your model learned during training and the summary you see here. This is the final epoch. And you see the loss the accuracy of both for training and for validation. Because it's already almost 20 to five. I think we can discuss a few more images just to let you know. I've added another arm up down here, where I'm discussing a few more things that are interesting if you're training models and you can look at them on your own if you want later. I'm actually covering how to use a validation split instead of a validation set. So the validation set was a predefined set of images that would always be used during validation. Here we have a validation split where we use this case 30% of the training data as the validation data. And I'm also showing you here how to use a separate set of images. In this case, I'm using the test data here to predict new images. So you trained your model here. And now you have a data folder of new images, which you want to predict. So you would have to define another image data generator and use evaluate generator or predict generator to predict using a model predict on new images. And the last thing I have included here, which we won't be able to cover due to time, but which you can look at our few callbacks. And one of the callbacks is used to save model checkpoints during training. Or to use early stopping. If you, for example, don't know how many epochs your model will have to run, you just set the number of epochs to a very high number and tell it to monitor the validation loss and stop training ones that say the model doesn't improve by 0.01 or more for over three epochs. And, you know, how to visualize something with tensor board, but this is advanced stuff and you can look at it so I'd say we talk a bit more so that at least the parts we did cover and get understood by everyone. Okay. To predict new images, the pixel format has to be exactly the same as the training. Yes, that is true. So you have to remember this is something or at least something you have to define somewhere or remember. If you train a model like this, and you reduce the image size here for training. And you save your model. And let's say you want to deploy it somewhere and other people should be able to use it. They will have to know how you prepare the input data there because the input data will a have the same dimensions. So in this case, 20 by 20 by three. So it is not shown here but you can see it here. The input shape image with image height channel so it has to be 20 by 20 by three if your images are bigger you will have to scale them down before you can use the Keras model to predict on them. And what's also important you have to prepare them in exactly the same way. So here, what we did if you remember is we scaled our pixel values to lie between zero and one. This is what we did in the image data generator here. Yeah, so all your images that you are using you cannot feed the raw pixel values into it otherwise it will do weird stuff that you don't want. So all the images you want to predict with this Keras model will have to be scaled. So any book suggestions or online resources to learn. Yes, let me share my Chrome again to show you a few resources. Okay, the most obvious resource of course is Keras.rstudio.com where you find all the information. So the package site tensorflow.rstudio.com is for a bit of advanced stuff. But here you also find a lot of tutorials and I think a lot of them include Keras automatically because this is now tensorflow 2.0 and larger which automatically or which is designed to have Keras as a core feature. So you see a few for beginners a few tutorials and a bit more advanced stuff. So this is something worth checking out. And let me see if there's hyperparameter tuning actually here might be on their side. Fine tuning. No, it was on the tensorflow side. So here you find something about hyperparameter tuning. How it works. How you prepare your script with the so called flex here. So this is what you would do. And if you are starting from zero and for example you want to play around with different numbers of filters, different dropouts and different number of hidden hidden nodes and flat layers something like this. Another resource is the Keras cheat sheet that's included in the GitHub repository and the book I recommend is by François Chaudet. It's called deep learning with R. Exactly here. Deep learning with R. You can either buy it here doesn't want to load. So this is the book, which also has some code examples. And the really cool thing about this book is that it has a GitHub repository with notebooks here. Let me copy this link and put it all in the markdown. I will push the changes after the tutorial and you have them all there. Let me put the resources here. So here you see, there are actually a number of notebooks. All in our markdown. And the description what's in them you can find here. So there are some nice things using confnets using pre trained confnets visualizing what they learn. Some stuff about natural language processing and even more advanced stuff like deep dreaming style transfer guns and so on. Okay, I'm also hyper parameter tuning I will give you and the Keras site. Here is my zoom session. There it is. Okay, where's the chat always looking for something here. All right. Any more questions. Yeah, Francois shortly exactly is the main author of Keras. He wrote the Keras Python package and wrote the book deep learning with Keras and deep learning with Python, but Keras and together with JJ and they're the R version as well. Could you explain a bit more the loss accuracy graphs at the end. How do we know the model is giving good predictions. So let's go back to our graph. So what we are looking for when we train a model is the loss, most of all, if we have a classification task, obviously accuracy is also something. And the accuracy means how many images were predicted correctly, because the model or Keras knows which folder our images come from. It knows the true label and it can compare the prediction against the truth basically and know how many images were predicted correctly in each epoch. So this is the accuracy. So the best possible accuracy is one. This would mean 100% all of the images were predicted correctly. And the loss is the loss function we defined in this case categorical cross entropy. And because the loss is a way to represent the error. We want our error or loss function to go towards zero. So this is the general direction we want to take loss of zero accuracy of one. And now we have training and validation data. And what will often happen is that your training loss will decrease just as you wanted or the accuracy will increase just as you wanted. But what sometimes what you can sometimes see is that your validation will just drop at the beginning and then increase again so that you have a very big difference between validation and training loss. And this is something you want to avoid, because you are using the validation data explicitly so that your model is not overfitting on the training data. It's used to learn a general representation of the data that's able to work on images that are slightly different. So we want to have a model that's generalizable. So what we are looking for is the curse of training and validation to be close together actually, and of course to go here towards zero and one. So these are the two important things towards zero towards one validation and training should be close together. Could you explain again the filter concept of CNN, what exactly learns the CNN by these filters. Okay, let's go back to the citosa.io site. Cool Chrome. Damn it, I closed it. I have it here. Yeah. So these are the pixels. Sorry. So you have different filters and there are a few filters that are have been used like for a long time image analysis and that are the predefined you can say. So the simplest filter is the identity filter and you see the filter here has zeros except for the one in the middle. And what happens now is again, here you see the red sliding window three by three pixels. Has the same size as our filter. And because these filters and the setting windows are the same size, we can calculate a dot product of them. So the dot product takes the top left position of the filter, multiplies it with the top left pixel value in our sliding window. It takes the next position in the filter matrix and the next position in the sliding window multiplies it and does this forever position in the filter matrix. And the dot product now means that we sum up all of these products. And we end up with a number in this case here we end up with 25. So this now gets transformed to the output. And because we have a sliding window in this case one that's always sliding one pixel to the right, respectively one to the bottom. Our image on the right has the same size as the the original image except for the the border. And this is because we have not used any padding here. And our. So let me make this clear, I think you can see the red squares, right. Now the top left pixel has been seen once. And all the other ones in my setting window have also been seen once. Now I move it to the right. So the second pixel from the left and the first row has been seen twice. And all the other ones right to it as well. And I move over. So now the third pixel from the top left has been seen three times. And if I go across now, you can see that all of these pixels have been seen several times except for the pixels at the borders. Right. And if I wanted to avoid this I would have to use this padding. So this is why here you see the image with this black border. Does this explain a bit more what filters do. So if we change the filter, we would have a different product and our filter output would look like this. So let's say we have our model learning all these different filters. Each of these outputs on the right would be one of the feature maps in our, in our stack of feature maps, basically. So we have this outline filter we have a sharpened filter and so on. Yes, exactly what is learned are actually the values here in these filters. So this is what's being optimized during training. What are some of the other applications for Keras outside image recognition. Very good example if you go to the Keras.io page and you want to see a few more examples. You can go to the examples. Here on the top and you can find a few more examples here. Okay, for example, see some more image stuff, but you see also a long short term memory network. You can see deep dreaming. You can see generative adversarial networks. You can see image captioning, how to do style transfer, how to generate text here, for example, lots of different stuff, style transfer, a few really cool things here. Variation and auto encoders, you even see something with TensorFlow probability and you can look at them and see how this works. It's not working right now. Then let's go to Keras or Studio GitHub. There they should still have them. Let me copy this as well. I think under R. And their examples. Here you see all the examples and a few more that are not included on the side STM text generation. The R file I'm looking for. Where is it? So here you can see an example how to do this. So you can read some text, do some preprocessing and the Keras part is here. So you build the model, you define the model, you compile it. And that's that. So I'll copy this link also to the R markdown. All right, one more question I think, optimal. Someone is, there is a comma, these values. The aim is to find the optimal filter coefficients to recognize the pictures. The values are chosen optimized such that the resultant filters which are useful for extracting features which are relevant for deriving the correct class for each image. Yeah, I think there is not much to it. Yeah, I think we should probably stop here. Any major question that's really, really, really important. You cannot go to bed with tonight. On the employment of the model cannot be deployed using plumber. I will think so, but I've not tried, but I think it should. Will it show several filters. Yeah, it will learn the number of filters we define in the filters have parameter. Yeah, there is has been a question about Bonferroni limits and simultaneous statistics when we are optimizing the filters. Yeah, I think that's a pretty complex answer that I would have to research in order to not tell you something wrong here. But yeah, hypothesis testing has been mentioned already. Yeah, we are. It's a bit tricky. Let me just leave it at that. How to deploy the model say using Google Cloud or AWS. You can save the model. Maybe that's something worth showing you if you have to save mall you see you have the model. I'm HDF five file ending, which is a Keras file ending. So if you are working with Keras, you can save basically just the HDF five. You can save the tensor flow models format. You can also just choose to save only the weights. And you can then save this model and load load it again either into Keras. And what you would do what what's the best practice if you want to deploy it to a cloud service is that you have your model you strip it off all the layers that are not needed for inference. So like dropout layers and stuff so that you just have the bear on pencil flow graph. And then you create a Docker container and you deploy it to let's say AWS. Okay, there are really, really a lot of questions still, but I think we have to quit now due to the time. You can keep asking questions on Gitter. I'm trying to answer as many as possible over the next few days, but I have a small 14 month old child so not that much time, but I will do my best. Organize us any final comments from you. Doesn't seem this way. Then thank you all very, very much. Thank you. Thank you. Thank you everybody. Thank you for the workshop. Yeah, we are going to close the workshop now and will you will have all the materials available online. Thank you everybody. Bye. See you. All right. Bye bye. Thanks for participating. I hope this was helpful.