 Hello, everyone. My name is Kevan Kamali. I'm going to be presenting Image Classification in Galaxy with Food 360 Dataset. Just so you're now a member of the Galaxy team at Penn State University. So this tutorial basically has two sections. First, I'll go over some slides to discuss convolutional neural networks. That's the model that we're going to use for Image Classification. I will then briefly talk about the Food 360 Dataset. And afterwards, we're going to move to usegalaxy.org. And we're going to actually implement the neural network. We're going to train it, evaluate it, and visualize the results. So just so you know, the Galaxy Training Network website, which is training.galaxyproject.org. This website has all the tutorials Galaxy related under the topics. You can scroll down to statistics and machine learning. And if you scroll down further, you would see Image Classification Galaxy with Food 360 Dataset. These are the slides, this link. This is the tutorial, the data set, and the workflow. So let's go over the slides right here. OK, the requirements. You need to be familiar with Galaxy Platform. Introduction to Galaxy Analysis would be a good tutorial. Also, it would be a good idea to complete deep learning part three, convolutional neural networks or CNNs. Regardless, we're going to cover both of those tutorials. Roughly in this tutorial, the Introduction to Galaxy Analysis basically teaches you how to upload a file, run a tool, and see the outcome. We're going to do that here anyway. And we're going to cover a big chunk of the deep learning part three in this tutorial as well. So the questions that we want to answer is how to solve an image classification problem using convolutional neural networks. And the objectives are learn how to create a convolutional neural network or CNN, as they're called, using Galaxy's deep learning tools and solve an image classification problem on fruit 360 data set using CNN in Galaxy. So what is a convolutional neural network? So there are different types of neural networks. Roughly, you can break them into three types. Feedforward neural networks are the classical, if you will, neural networks that don't have any loops in them. They've been used for many classification, regression, optimization problems. We have recurrent neural networks that deal with time series data, whether it's a temporal data or ordinal data. And we also have convolutional neural networks that are specifically tailored for image processing. So convolutional neural networks, there has been an increased popularity of social media in the past decade or so. And this has made image and video processing tasks very important. Feedforward neural networks, or FNNs, could not scale up to image and video processing tasks. And convolutional neural networks, or CNNs, are specifically tailored for image and video processing tasks. Hence, we're going to be using them. So inspiration for convolutional neural networks. In 1959, Hubble and Beisel did an experiment to understand how the visual cortex of the brain processes visual information, they recorded the activity of neurons in the visual cortex of a cat while moving a bright line in front of the cat. So some cells in the visual cortex fired when the bright line was shown at a particular angle and location. So they called these cells simple cells. Other cells fired when bright line was shown regardless of the angle and location. They seemed to detect movement. And they called these cells complex cells. And it seemed like complex cells receive input from multiple simple cells. And they have a hierarchical structure. Hubble and Beisel won the Nobel Prize in 1981 for their groundbreaking research. So inspired by complex and simple cells, Fukushima proposed Neocognitron. It's a hierarchical neural network used for handwritten Japanese character recognition. It's a first convolutional neural network and it had its own proprietary training algorithm. 1989, Locom proposed a convolutional neural network that could be trained by back propagation. So back propagation, if you're not familiar with it, it's a standard neural network training algorithm. It is used for training feedforward neural networks, the classical neural networks, recurrent neural networks, neural networks that handle time series. And now, with this research, it could be used to train convolutional neural networks as well. And convolutional neural networks became very popular when they outperformed other models of ImageNet Challenge. So ImageNet Challenge is a challenge that's been run annually since 2010, present. And it's an object classification and detection on hundreds of categories and millions of images. So notable convolutional neural architectures that won the ImageNet Challenge are AlexSent in 2012, that basically started the convolutional neural network praised, ZFNet 2013, GoogleNet and VGG 2014, and ResNet in 2015. So here we're going to talk about the architecture of convolutional neural networks. A CNN typically has four layers. There's an input layer, convolution layer, pooling layer, and fully connected layer. We will explain a 2D convolutional neural network here, but the same concept applies to 1D or 3D convolutional neural network. So what does 1, 2, and 3D convolutional neural network means? It's basically what we have to go to the next slide to understand what a filter is. But it's the number of dimensions that we move the filter in. If you move the filter in one dimension, that's one-dimensional convolutional neural network. If we move it in two dimensions, that's 2D. If we move it in three dimensions, that's 3D. So we're going to get to what a filter is in a layer. So input layer, for example, we could have a 28 pixel by 28 pixel grayscale image. And we don't need to, in a traditional neural network, feedforward neural network, we usually flatten this into a vector. In convolutional neural network, we don't have to do that. We can present the image as is, which is a matrix of 28 by 28. This makes capturing spatial relationships easier for the neural network. So convolution layer is composed of multiple filters, as we discussed, or they're also called kernels. And the filters for a 2D image are also two-dimensional. And suppose we have a 3 by 3 filter. It's basically nine values. Values are randomly set to values between 0 and 1, or sometimes minus 1 and 1. And convolution is like placing a 3 by 3 filter on the top left corner of an image. Say it's a 28 by 28 image. And multiplying filter values by pixel values and adding up the results. Moving filter to the right one pixel at a time or repeating this process. When we get to the rightmost pixel, we move down one filter and start from the left-hand side again. This whole process is repeated until we get to the bottom right corner of the image. That's when we stop. So this could be visualized a lot easier, what I just explained. So let's say you have a 4 by 4 image. It's a light blue image at the bottom. And let's say you have a 3 by 3 filter. It's the dark blue square that you can see. So the dark blue is on the top left now. And the filter values and pixel values, image pixel values are multiplied and added. And the result in one value in the output, which is the green square at the top. So we move to the right, then we move down and again to the right. And this is on the loop. So you can see it multiple times. So as you can see, we applied the filter to the image and we got an output, which is shown here. So convolution operator has multiple parameters, filter size, padding, stripe, dilation, and activation function. We're gonna go over each of them one by one. So filter size can be 5 by 5, 3 by 3, and so on. Larger filter sizes should be avoided because we need the learning algorithm for neural network needs to learn the value of weights in the filter. And the more values that we have, the more difficult it is to the learning process becomes. Also, odd size filters are preferred to even size filters. They have some nice geometric properties in that all input pixels are around the output. So you can't say that for even shaped filters. Padding, so you saw that we applied a 3 by 3 filter to a 4 by 4 image and we got a 2 by 2 image. The size of the image has gone down. That was right here. So a 3 by 3 filter applied to a 4 by 4 image resulted in a 2 by 2 image as the output. So if you wanna keep the image size the same, we use padding. So what does it mean? It means we've had the input in every direction with zeros before applying the filter. So if the padding is one by one, then we have one zeros in every direction. If the padding is 2 by 2, we add two zeros in every direction and so on. So here's an example of padding. We have a 5 by 5 image. That's light blue in the bottom, light blue square in the bottom. We have a one by one padding. So we add zeros, one zero in every direction. And then we apply a 3 by 3 filter. So we apply it to the image plus the padding, as you can see. And this results in a 5 by 5 output image. So the size of the input image and the size of the output image are the same. So that's what padding does. The other parameter of convolutional neural network is stride. So how many pixels do we move the filter to the right or down as stride? If you move the filter one pixel to the right or down, it's a stride of one. If you move the filter two pixels to the right and down it's a stride of two and so on. So here, actually let me go back to our previous example. If you look here, we move the filter one to the right and one to the one in the downward direction. But it doesn't have to be one, it could be two. And here's one example. So we move the filter, the dark blue square at the bottom, two to the right and two in the downward direction. So that's a stride of two. And the effect of stride of two is the output image is smaller now. So that's another parameter. Another parameter is dilation. So when we apply a 3 by 3 filter, the output is affected by pixels and a 3 by 3 subset of the image. Dilation is to have a larger receptive field. What that means is portion of the image that affects the filter's output. So if we set the violation to two, instead of a contiguous 3 by 3 subset of the image, every other pixel of a 5 by 5 subset of the image affects the output. So what does that mean? Let's take a look. So we have a 3 by 3 filter. That means our filter has nine values, but the dilation is two. So instead of these nine values being contiguous, they are every other value of a 5 by 5 square at the bottom. As you can see, it starts from top left right now, moves to the right, and then one down again to the right, and then one down. So this increases the receptive field, the part of the input image that affects the output for us, if that's what we desire. There's also an activation function. So after the filter is applied to the image, we use an activation function to introduce non-linearity. So the idea is that we wanna solve non-linear problems with neural networks, and this is our way of introducing non-linearity. So the preferred activation function for convolutional neural network is RELU, rectified linear unit, if I'm not mistaken, and RELU leaves outputs with positive values as is, replaces negative values with zero. So here's one example. So let's say our output is on the left-hand side, and those are the values, some are zero, some are positive, some are negative. After we apply the activation function, the negative values on the left are replaced with zero. Zero and positive values are left as they are. So this is an example of a single-channel 2D convolution. So we have a five-by-five image, and you see the value of the pixels and the image on the left-hand side. And we have a filter, which is three-by-three, and it has nine weights, and each weight has a value, it's initialized to some value. And what we do is that we place the three-by-three filter on the top left corner of the image, and we multiply filter values by image pixel values accordingly, and we add them up. And the calculation is shown down here, and we have one value, which is five, and then we move filter to the right and repeat, and that gives us the output image. So this is a grayscale image in that every pixel is represented by one value. So usually, I think the value of pixels could be between zero and 255, where I think zero is black, 255 is white, and anything in between are different shapes of grayscale image. But if the image is a color image, then we mix three primary colors, red, green, and blue, with different portions to represent every image. So every pixel is represented by three values, and to basically present the 3D image, we're gonna show you three two-dimensional images here. So for example, channel one, these are the values for the red channel. Channel two is green, and channel three is blue. So it's RGB, and we have three filters, and these filters are applied, let's say to the top left corner accordingly, similar to the previous example, and we get three values. So the difference is now we add these three values and we have a single value for the output. So when you apply a filter, the channel size becomes one. So we had an image that was in, it was a color image and was represented by three values. The output image is represented by one value. So this is the same thing only in three-dimensional. So you have an image on the left-hand side, you have a filter in the middle, this filter is being applied to the image and the filter moves from top left to the right, and then it goes down and repeats and goes down and repeats, and we have the output on the right-hand side. So as I mentioned, the output of a multi-channel 2D filter is a single channel to the image. That's the example that we gave here. On the left-hand side, we have an image which is color, and it has every pixel has three values, hence we say it has a channel size of three, but after these three filters were applied, the values were added up, we have one value. So if we apply multiple filters, this results in multi-channel 2D image. So for example, if image was 28 by 28 by 3, number of rows, number of calls, number of channels, and we apply a 3 by 3 filter with one by one padding, we get a 28 by 28 by one image. But if we have 15 such filters, our output would be 28 by 28 by 15. So we can use the number of filters to increase or decrease the channel size in the output. So here with one filter, the channel, the channel size of output is one, but if you have 15 filters, repeat this process 15 times, we would have 15 values in the output, or we could lower the number of filters if needed. Increase or decrease. So after a convolution layer, so we have the input layer, the convolution layer, and now we have a tooling layer. Pulling your layer performs down sampling to reduce special dimensionality of the input. Very simple. So this decreases the number of parameters, reduces learning time and learning computation, and reduces the likelihood of overfitting. Overfitting is when the learning process learns the training data really well, but cannot generalize to any other data which should be avoided. So most popular type of pooling is max pooling. It's usually a two by two filter with a stride of two, returns the maximum value as it slides over the input data. So finally, we have a fully connected layer. It's the last layer in convolutional neural network. Connects all the nodes from the previous layer to this fully connected layer when it is responsible for classifying the image. So this is an example of a convolutional neural network. On the left-hand side, this is the image of a digit that we wanna classify. This is like a hundredth digit zero. So we have a convolution layer. Many filters followed by a pooling layer. And we can have multiple convolution plus pooling layers in our convolutional neural network. In this case, we have three, as you can see. So we have convolution with activation function ReLU followed by pooling. We have another convolution with activation function ReLU followed by pooling. And we have a third convolution layer with ReLU followed by pooling. We could have four, five, 10, depending on how complex a problem we're trying to solve. Same thing for the number of filters in each layer. It would be 16, 32, 64, 120, and so on. Eventually, you get to a fully connected layer and then we have an output layer that because we're classifying images, we want to have 10 outputs in our output layer, which represents the images zero to nine. So an example, CNN. A typical CNN, as we discussed, has several convolution plus pooling layers. And each of those is responsible for feature extraction at different levels of abstraction. So for example, filters in the first layer could detect horizontal, vertical, and diagonal edges. Filters in the second layer could detect shapes, which is a collection of edges. And filters in the third layer could detect collection of shapes. So that's how convolutional neural networks do feature extraction. So filter values are randomly initialized. They're learned by the learning algorithm. CNN not only does classification, but can also do automatically, automatically do feature extraction, which is very important. This is something that distinguishes CNN from other classification techniques like support vector machines and makes it very, very powerful. Okay, so we discussed what convolutional neural networks are. Now let's talk about the dataset that we're going to use for image classification a little bit. So we're using Fru 360 dataset. It's a dataset of 90,380 images, 131 fruits and vegetables. Images are 100 pixel by 100 pixel and our color images RGB, red, green, blue. So each pixel has three values. 67,000 images are used for training. And 22,000 images are used for testing. And this is the link to where you can download the dataset. For this tutorial, we only use a subset of the Fru 360 dataset. And the subset contains only 10 fruits and vegetables. We selected a subset of images. So the dataset is smaller and the CNN can be trained faster during this tutorial. Otherwise we have to wait hours for, well, maybe an hour or more for the 90,000 images training dataset, 67,000 training images to train the dataset, to train the neural network. So the subset dataset has 5,000 images in training and 1,600 images in testing datasets. So I created a GitHub repository called Fru Data Set Utilities. This is, these are the scripts for creating a subset of Fru 360 dataset. First, I created a feature vector for each image. Second, we select a subset of 10 fruits and vegetables. And the training test dataset sizes went from seven gigabytes and two and a half gigabytes to 500 megabytes and one high 77 megabytes. So that's a sizable reduction in dataset size. Third, we created separate files for feature vectors and labels. And finally, we mapped the labels for the 10 selected fruits and vegetables to a range of zero to nine. Because the initial dataset, the original dataset has 131 fruits and vegetables. The labels are in the range of one to one, zero to 130. That's not what we want when we're only classifying cannibal. So they need to be remade. The images are 100 pixel by 100 pixel color RGB. So the image can be represented by 100 by 100 by three values, which is 30,000 values. So let's say you wanna create another subset of this dataset and you want 30 fruits and vegetables to be included. You can just use the scripts in this GitHub repository, follow instructions and you can create your own dataset. So next we're gonna define a CNN. We're gonna train it with a fruit 360 dataset. The goal is to learn a model such that when you give it an image of a fruit and vegetable, we can predict what it is. The neural network will spit out a number between zero and nine and each of those numbers represent a fruit. For example, zero could be strawberry. We can evaluate the trained CNN on a test dataset. So what we do is we use the training datasets to train the model and we've set aside some data as test dataset. So when the training is over, we present the test dataset to the model. We compare the predicted output with the actual output to see how good or bad our models do it. And the way we evaluate our models by using a confusion matrix, which will explain the hands-on section. So references, please go to tutorials, reference section for all the references. I think you scroll down here. These are all the references for this presentation. Oops, sorry. So what else? This is training.galaxyproject.org. It's Galaxy Training Networks website. You can follow and find any tutorials, Galaxy related there. And if you need help, you can go to helpgalaxyproject.org or use one of the Twitter channels. And we also have events, which you can go to galaxyproject.org forward slash events to find them. So this concludes the slides for this presentation, for this tutorial. The next step would be to the hands-on section where we implement a neural network in Galaxy, which we'll get to shortly. Okay, let's start the hands-on section of the tutorial. Show me screen. So we go to usegalaxy.org. And if you go to Galaxy, we need to go to the tutorial itself. So if you go to training.galaxyproject.org, scroll down to statistics and machine learning on the topics and go to image classification in Galaxy with full 360 dataset and click on the tutorial. So we go to the, we covered the background knowledge. Now we go to get data section. Okay, so I'm gonna switch between this tab and use Galaxy as I implement the steps. So the first step is make sure you have an empty analysis history. The way you do that is by clicking on the set plus sign. So we have a new history. Then rename, number two, rename history to make it easy to recognize. So you rename the history by clicking on this box that has a main history. And it's a good idea to give it a meaningful name. So I'm gonna call it food360. Let's go back to the tutorial. Import the files from Xenobl. So what you do is you copy these four links, go back to usegalaxy.org. On the top left corner, there's an upload data button, click on that, click on paste fetch data and paste the URLs here and click start. So this starts four jobs to download these four files. Two of them are training files. The other two are test files. The training files, trainX is the feature vector, trainY is the label. So these are the images, these are the label of the images. For example, strawberry, apple, apple, apple, apple. These two are used for training. These two are used for evaluating the trained model. So when the job starts, it's in a gray color, which means it's cued and it becomes yellowish, means it's running and when it goes to green, that means it's complete. If something fails, it's going to go into red. We have to look and see what's going on. So the next step is after the file download, upload is complete, we're going to rename them. We basically drop the extension. And finally, we're going to make sure that the data type is tabular. So let's go here and wait for this to complete. Okay, so the file upload is complete, I pause the video just so you don't have to wait for it. Now what we need to do is we have to rename the files and make sure they are type tabular. So we click on this edit attributes button. Here's the name part and we get rid of extension. Save it. Two data types and it's not of type interval. It's not of type tabular. So what you do is you type tabular here, select it and save it. So we're done with train X10. We're going to do the same thing for the other three files. Edit. Remove the extension. Save. Check data type it is tabular. So we put here the third one, get rid of the extension. Make sure it is tabular. Save. It seems to initially didn't work out in a while. So same thing here. Get rid of the extension. It's tabular and save it. So we did, what we did was that we renamed the files and we made sure that the type of all the files is tabular. If not, we converted to be a tabular file. Let's go back to the tutorial. So we've completed the get data part. Now we're going to go and basically implement the neural network. So one note here is basically this part. In order to train the convolutional neural network, we have to have the one hot encoding representation of the train labels. This is to calculate the loss function. Basically when our neural network makes a prediction, we're going to compare the prediction with the actual output and we're going to measure how much off is the neural network. In order to do that, the labels have to be in one hot quote, one hot encoding representation. So what is one hot encoding representation? One hot and OHE encodes labels as a one hot numeric array where only one element in the array is one the rest are zero. So here's one example to clarify things. Let's say we have three fruits, apples, oranges and bananas and their labels were one, two and three. The one hot encoding representation of apple would be one and two zeros. The first element of a vector of size three is one. The one hot encoding representation of orange would be zero, one, zero. Then the second element in the array of size three would be one and the one hot encoding representation of banana would be zero, zero, one. The third element of the vector of size three would be one. So we're going to do the same thing for, in our case, we're dealing with 10 fruits and vegetables. So it's going to be a vector of size, 10 only one element in that vector would be one, everything else would be zero. And if it's label one, the first element would be one, if it's label five, the fifth element would be one and so on. So how we do that is we use a tool. So let's look at the train-wide 10 data. You view the contents of a file by clicking on this icon. So this file has three columns, label name, file name and label. So label name is basically a string representation of what this integer value represents. So we represent strawberry with zero on an apple with one and so on. And this is just the image file that has this label. So what we care about, we care about this third column only. And what we're going to do is that we're going to use advanced cut to only get the third column. We don't need the first two. So the way we do that is we're going to go into this tools search box and type a best type of, that's math cut. So I click on show the section. So I know that this tool is under text manipulation. So advanced cut, there's two. Let's see if it's this one. I think it is, I think it's this one. So advanced cut from a table, cut two file, which is our train-wide 10. So we're going to select that here. That's the file that I showed you that three columns. So we're going to keep the third column so we're going to leave the operation as keep. We know that the file is tab limited and we're going to keep the third column. We're going to exit. So this is going to create a file. That's only the third column of train-wide 10. And if you look at train-wide 10, the third column is the label zone. So we should have the labels in this output of this tool. So then after that, we're going to use a tool called two categoricals. This basically calculates the one-hot encoding of that integer value. So let's say we have a column that has values zero to nine. The one-hot encoding representation for each of those values zero, one all the way to nine is a vector of size 10 where all the values are zero except one of them. And the only one that's not zero depends on the label. So if the label is three, the third element is not zero. If it's seven, the seventh element is not zero and so on. So we do that with this two categorical tool. We saw that this cut command completed. If we view the contents, we should see only the label grade. So we go here and type two categorical. So we find it. So we pass the output of the cut operation, which is here, number five. Does the data set contain header? Yes, it does. And how many classes do we have? So we have 10 fruits and vegetables. So this should be set to 10. We're going to execute this and this should give us basically a vector of size 10 for every label. And we're going to view the output when this completes. Okay, so while that's running, we're going to start developing the, creating the neural network. And that's done via a tool called Creative Deep Learning Model Architecture. So you could try typing it here in the search box, but I know that it is under the machine learning tools. So I'm just going to scroll down. There's a statistics and visualization header under that. We have statistics, machine learning, and graph or display data. So if you expand the machine learning and you scroll down, you see created deep learning model architecture. So I'm going to click on that. Okay. And then we're going to go to the tutorial. We're going to basically implement it just like the way it's specified here. So select Keras Model Type, it's sequential input shape is 30,000. It is sequential, we're going to change this to 30,000. Then layer one, it's a reshape layer. So each image, because it's an image of 100 pixel by 100 pixel and it is RGB, which means each pixel represented by three values. So each image is represented by 100 by 100 by three values, which is 30,000. So we're going to reshape it. So it's a two dimensional image with a child size of three. So reshape to 100, 100, and three. So here I'm going to type reshape if I can. And I'm going to specify the target to be 100, 103. We'll add another layer. So we're going to add a two deconvolutional layer with 16 filters, kernel size five, activation function ReLU and input shape 100, 103. So let's go here, I'm going to type in conv2b, 16 filters, kernel size five, activation function ReLU. And I'm just going to copy this on here instead of typing that six and one. So this goes in the keyword arguments section. Let's double check, it is 16, five ReLU, 16, five ReLU. So that's good. So we're going to do another layer. So usually a convolution layer is followed by max pooling layer. It's the case here too. We have a max pooling 2D when it's a pool size of two by two. So I'm just going to type max pool 2D. It is two by two, I'll leave it as it is. Now we have another convolutional layer followed by max pooling layer and maybe another convolutional layer followed by max pooling layer. So we're just going to enter those in. So let's see, we got the reshape layer, we got the convolutional layer, we got the max pooling layer. So we have another conv2D. This is time the number of filters is 32. So max pooling 2D follows that. Okay, let's go back to the tutorial. So we implemented these layer four and five. Again, layer six and seven are again convolutional layer four and followed by max pooling layer. And number of filters is 64 here. So I'm going to go here and type conv2D 64, thermal size is five, activation function is ReLU. ReLU and this is max pooling 2D. Go for the next layer. So let's go back to the tutorial. After the third convolution plus max pooling, we do a flatten. So we flatten, basically connect all the notes from the previous layer to a flat layer. So let's do that. I type flatten here, it finds it. Then we go, the next layer is a dense layer of 256 with the ReLU activation function. So this is already dense. We just say 256 for the number of units, select ReLU. And finally, we have a dense layer of size 10 with softmax activation function. This is size 10, select softmax. So what this does is that basically we're going to get a probability distribution for 10 nodes and one of them would be, would have whichever node has the largest probability, that's our prediction. So let's review this one more time. We had a reshape layer. So the input is a vector of 30,000. We reshape it into 100 by 100 by three. The image is 100 by 100 pixel. That is a color image RGB, so channel size is three. Then we have a convolutional layer followed by a max pooling layer repeated three times. The first time it has 16 filters. The second time it has 32 filters, the convolutional 2D. And the third time it has 64 filters. Then we flatten everything and connect it to a 256 unit dense layer. And then we have an output layer. Again, it's dense, it has 10 units and the activation function is softmax, which gives us a probability distribution over 10 possible labels. So we click on execute, this creates the model. This model, when created, it can be downloaded as a JSON file, it's readable. You can see what the model represents. So after that, we need to specify the optimizer, the loss function and some fit parameters. And that's done via create deep learning model. Again, because you can search in the search box or again, you can scroll down to the machine learning section, expand it, scroll down and find it. So this is create a deep learning model with an optimizer loss function and fit parameters. So the input is the output of the previous step, which is this job number seven. It is pre-populated in this dropdown. So let's look at the tutorial. We want a Keras G classifier. Let's again set correctly. The loss function is categorical cross entropy. So we're going to change this to categorical cross entropy. So cross entropy is a function that objectively measures the difference between the desired output and the actual output. And because we have more than two labels, if we had only two labels, we would use binary cross entropy. Because we have 10 labels, we're going to use categorical cross entropy. As for the optimizer, this is the optimizer that minimizes this loss function. And we're going to select atom optimizer. Atom optimizer is preferred because it has momentum and it also has different learning rates for different dimensions. So those are the two advantages of atom optimizer. And finally, the fit parameters. We pick epochs to be 40. So epochs is how many times do we use the training data set to train the model? When we set it to 40, that means we use the 5,000 training data sets. Set 40 times to train the model. And the final parameter is batch size, which is 50. So we need to basically, we present the training data to our neural network. We evaluate the outputs. And depending on how good or bad the neural network does, we update the parameters. And this is a loop. So we could update the parameters after all the 5,000 data sets, data points in the data sets are presented. Or that's probably going to take a long time. We could update the parameters after batch size, the training data is presented to the neural network. So instead of updating the neural network parameters after every 5,000 samples, we do it after 50. This is just to speed up the updates of weights. So we can get to the local minimum value for the loss function. And I think that's it. We can click, click execute here. And this is the Keras model builder. So let's go back to the tutorial. So now we're going to train the model. And that's done via deep learning training and evaluation. Again, in the machine learning section, I'm going to look for deep learning training and evaluation that is right here. Let's go back to the tutorial. I select the scheme, train and validate. That's already pre-selected. Choose the data set containing the estimator, pipeline estimator object. That's the output of our previous step, which is job number eight right here. That's already pre-populated. That's good. Select input type. Fabular data is pre-selected. Good. And we're going to train on train X10 data set. So let's select that. So we're going to go here and select X10. We're going to select all the columns in this model. Okay. And then data set containing class labels or target values. So as I mentioned, we had to transform the labels into one-hot encoding representation. So we're going to select that, which is a two category called data number six right here. And it is already pre-selected. We're going to select all columns. I think that's it. If we click here, we're going to start the training of neural network on our training data set, which has 5,000 images of fruits and vegetables. So we're going to train our model by training by using this 5,000 data set 40 times. That's the epoch size. And we're going to update the values of the weights, the filters after every 50 sample. That's the batch size. So I click here. I'm going to pause the video, wait for these three jobs to complete. And I'll be back shortly. Okay, so the training step completed. The training will generate three files. One of them is the model or fit an estimator. The other one is the weights for the model. And the final one is the accuracy of the model on the training data. So we're going to go to the tutorial now. And the next step would be model prediction. That's when we use the test data that we set aside and did not use for training. And we pass it to the trained model. We want to know how well the training model does on the data not used in training it. So this is done via a tool called model prediction. Again, I know that this is all under machine learning header. So let's look for model prediction here. The left hand side. Here it is. So choose the data set containing the pipeline or estimator object. That's object number 10, job number 10 here. That is pre-populated correctly. Choose the data set containing weights for the estimator above. The weights are in file or job number 11. We're trying to predict the input data type is tabular. And let's go to the tutorial. It's test X10 and we select all columns. So here I'm going to select test X10 from the dropdown. And here I'm going to select all the columns. I think this looks good. Let's double check everything. Yep. So I'm going to execute this. So what this does is that it takes the images and the test set passes it to the model and model makes a prediction. And then we need to compare the predicted output with the actual output later. So okay, so the model prediction task completed. If you view it, it should give you a bunch of labels that were predicted given the test data set as the input. Now we need to see how well did our model predict. And the way to evaluate it is via a confusion matrix. So the job that generates a confusion matrix is called machine learning visualization extension. Let's see if I can find it here on the left-hand side. We get some of the graph display data that's here. So let's go back to the tutorial. Select a plot and type confusion matrix for classes. We're going to select confusion matrix for classes here. Select data set containing true labels. That would be test Y10 that has the labels for the test data. Does the data set contain a header? Yes, so we're going to flip this toggle switch. Choose how to select data by columns. We only care about the label column. So we say select columns by column header name. We care about the label. As you saw previously, the test X, Y, 10, sorry, test Y10 and train Y10, they have three columns and we only care about the third column, which is an integer representation of the label. Let's go back to the tutorial. So we select for selected as a contained predicted labels. That's the output of our previous job, which is number 12. It does have a header. So we're ready to execute this. This is going to create a confusion matrix. So let's go and discuss what a confusion matrix is while this job is running. This is a tutorial. So a confusion matrix is a table that describes the performance of a classification model. It lists the number of examples that were correctly classified by the model, true positives and true negatives. It also lists the number of examples that were classified as positive that were actually negative. These are called false positive or error type type one error. And the number of examples that were classified as negative that were actually positive. These are false negatives or type two error. So given two positives, false positives, two negatives, false negatives, we can calculate precision and recall. So these are the formulas for precision and recall. So precision is a fraction of predicted positives that are actually positives. It's formula is true positives divided by true positive plus false positives. And recall is a fraction of true positives that were actually predicted. So recall formula is true positive divided by true positive plus false negatives. So we can calculate the precision and recall, but sometimes it's easier to deal with one number that describes the performance of our model. So what we can do is we can calculate the harmonic mean of precision and recall. These are ratios. So we can't just average them. We have to use a harmonic mean instead of arithmetic mean. And that harmonic mean of precision and recall, it's called an F score. So we can calculate the F score for every digit that we classify. So this is a confusion matrix. The job is still running when it completes. I'll show you the confusion matrix there, but this is the confusion matrix that was generated last time or previous time I ran this. So the rows are true class labels. So anything on the first row is the images that have class label of zero, which is 164 of them. Image label zero represents a strawberry image as we see here. Label one is an apple red delicious and so on. So there are 164 strawberries, 166 apples and the columns are the predicted labels. So as you can see for label zero, we have 164 strawberries and 164 of them were predicted to be strawberry. So our true positive rate is like basically one, 164 divided by 164. Our false positive rate is zero and false negative rate is also zero. So we can calculate precision recall and F score. But we don't have such a perfect prediction for label three and let's see what label three stands for. So for label three, which is corn, we see that CNN has correctly predicted 118 as being a corn. So you can see on this row, 118 were predicted as being corn correctly, 16 are false positive. They were predicted to be a corn, whereas they are label four. What was label four? And well, let me, I'll get to that. And 32 of them, they were label three, but they were predicted as label four, five, six. So these are false negatives. So we have the false positives was just 16, true positive, which is 118 and false negative, which is 32. Given these three numbers, we can plug them into the precision we call an F score formula and calculate the F score for label three. So we calculated an F score for label zero, one, two, three, all the way to label nine. And then we get 10 F scores. Potentially we could calculate the harmonic mean of those 10 to just have one number. So the confusion matrix job just completed. I'm gonna click here and this is the result. It's slightly different than what we have in the tutorial. That's because the, depending on the training process, it's not deterministic, it's somewhat stochastic. So you may get slightly different results. But it's the same thing, you know, we calculate like you can see right here, we have a one false negative for label zero, whereas here we have not. So going back to the tutorial, the conclusion in this tutorial, we briefly describe convolutional neural networks and their application to any classification problems within use galaxy MLs tool to solve one of the classification problems using CNN and food 360 data set. So these are the references for this tutorial. Well, thank you very much and hope to see you in the next tutorials.