 Hello, everyone. My name is Kevan Kamali. I'm going to be giving a presentation on Feedforward Rural Networks. This is part one of a three-part series on deep learning, part of the Galaxy Training Network Tutorials. So the requirements, it would be a good idea if you're familiar with a Galaxy platform, Introduction to Galaxy Analysis as a good tutorial. Also, there's a Introduction to Deep Learning Tutorial, which is helpful, but it's not required. So the questions we're trying to answer is what is a Feedforward Rural Network and what are some applications of FNNs? And our objectives are to understand the inspiration for Rural Networks, learn activation functions and various problems solved by Rural Networks, discuss various loss and cost functions and the back propagation learning algorithm and learn how to create a Rural Network using Galaxy's deep learning tools and then solve a sample regression problem via Feedforward Rural Network in Galaxy. The last two bullets, they're basically the hands-on section of this tutorial and it would be done in a separate video, possibly. So what is an artificial Rural Network? Artificial Rural Networks are a machine learning discipline that we're roughly inspired by how neurons in a human brain work. There has been a huge resurgence of Rural Networks in the past 10, 15 years due to a vast availability of data and increases in compute capacity and also improvements in how Rural Network weights are initialized and also the activation functions. We'll get to those in a few slides. There are various types of Rural Networks. There are Feedforward Rural Networks where the signals only move in one direction. There are Recurrent Rural Networks where you have loops and they're convolutional Rural Networks that are mostly applied to images and video. Feedforward Rural Networks are applied to classification, clustering, regression, association, problems and they have a lot of new work applications. So inspiration for Rural Networks is how the human brain works roughly. So Neuron is a special biological cell with information processing ability. It receives signals from other neurons through its dendrites. It's shown up here. And if the signal, if the signals received exceeds a certain threshold, the neuron fires and transmits signals through other neurons via its axon, which is OQ. And Synapse, which is the connection where the axon of one neuron meets the dendrite of other neuron, it can either enhance or inhibit the signal that's passing through it. And the theory is that learning occurs by changing the effectiveness of the synapses, our brain. So, a celebrity cortex. It's the outermost layer of the brain. It's two to three millimeter thick and has a surface area of 2,200 square centimeters. It has about 10 to the power of 11 neurons and each neuron is connected to 10 to the power of three to 10 to the power of four neurons between 1,000 to 10,000 neurons. So human brain has around 10 to the 14 to 10 to the 15 power 15 connections. Just so you get an idea. And the neurons in the brain communicate by signals that are millisecond in duration, okay? However, complex tasks like face recognition are done within a few hundred milliseconds. So what does that mean? This means that the computation involved cannot take more than 100 serial steps roughly. So that's a very interesting observation. The other thing is that the information that is sent from one neuron to the other is very small. So this means that the critical information is not being transmitted, but rather it is captured by the whole interconnections. So the brain has a distributed computation and representation and allows slow computing elements to perform complex tasks quickly because the frequency, the signal transmission frequency in the brain is several hundred hertz max, whereas the computer chips could be millions of times faster than that. So even though the computer chips are much faster than the neurons in the brain, the brain with a distributed representation, distributed processing somehow seems to take care of very complex tasks and very few serial steps. So that's a very nice observation. Okay, so we're gonna discuss perceptron now. Perceptron is basically the first neural network that's still in use. I think Rosenblatt came up with the idea of perceptron in the 1950s. You have an input layer and an output layer. The input layer is connected to the output layer via a weight and every input is multiplied by that weight and the products of weights and inputs is added up together. We also have a bias down here, which the input is always one and we have a weight of B1 and that helps us how the function that this neuron represents can be shifted to the right and left using that bias. So that's just a mathematical sugar coating, if you will. So if the sum of inputs multiplied by weights plus the bias multiplied by one is greater than a certain threshold, the neuron fires. If it's not, the neuron does not fire. So fire means it has an output of one does not fire means it has an output of zero. So this is the simplest neural network. I think this was proposed by Rosenblatt after he studied how the vision works in flies, hence the name perceptron, and it was implemented in hardware. It's still used. It's just different activation functions are used instead of a step activation function, which is like kind of, it's a threshold function. But the perceptron has also a learning algorithm. That is, we are given a training set, a set of input output pairs. And the goal of the learning algorithm is to iteratively adjust model parameters, which in this case are the weights and the bias so that the model can accurately map inputs to outputs. And this is called the perceptron learning algorithm. It's actually very simple. You make a prediction with perceptron and if the value that you've got is more than what you got, you will reduce the weight by a small factor, multiplied by a small factor called low learning rate. If it's less than what you expected, you increase the weight multiplied by the learning rate. So that's the simple perceptron learning. But the problem is that this is highlighted by a paper by Minsky and Papert that a perceptron or a single layer feedforward neural network cannot solve problems in which data is not linearly separable. So if you have data that's linearly separable, your perceptron can solve it. If the data is not linearly separable, perceptron learning algorithm basically fails. So a simple problem is the XOR problem. You can't use a straight line to solve the XOR problem. So this caused what has been named the AI winter. That means the interest in neural network research and AI in general was reduced significantly and so was the funding, the government funding for research on AI neural network. And this went on for a while until a multi-layer feedforward neural network was proposed. I mean, researchers knew about multi-layer feedforward neural networks, they just didn't know how to train them. So that was still a big problem. So the idea was that adding one or more hidden layer enables a feedforward neural network to represent any function that's called the universal approximation theory. And that's great, but how are you gonna train it? So in the 80s, this algorithm called back propagation became popular that allowed training of multi-layer feedforward neural networks and the interest in neural network was revived again. So this is a multi-layer feedforward neural network. As you can see, there's an input layer as before and an output layer, but we also have a hidden layer. We could have one or more hidden layers. I mean, based on universal approximation theory, if you have one hidden layer, you should be able to approximate any function, but in reality, training such a network would be very difficult. So generally we have more than one hidden layers and deep neural networks, we have many, many hidden layers hence the name deep. Also, I think there are some restrictions in universal approximation theorem, the function that you want to approximate should be continuous and there are some other restrictions. So in reality, you usually have a neural network with multiple hidden layers. So there are good things and bad things about having more layers. So layers, more layers means more weights. So more weights means that you're increasing the dimension of your search because you're searching for the optimal weights that can map the inputs to the outputs with as good as it can. But if you have more weights then your search space dimension goes up. So that increases the training time and difficulty of the problem. There's also the problem of overfitting. So if you have way too many parameters you are more likely to learn the peculiarities of the training data and you come up with a model that works great on the training data does not generalize well to unseen data. So you train, use a training set to train your model you're doing great. You provide the input that was not part of that training set and suddenly your model collapses. So that's a typical overfitting scenario. And generally the more parameters that we have we should have more data. So if our data is fixed and you increase the number of parameters we're exposing ourselves to overfitting. So as I said in perceptron at the output layer we have a binary step function. That was if the sum of all of the inputs weighted inputs was weighted on zero that was a threshold the output is one, otherwise it's zero. So the second activation function is called binary step it's used in perceptron. It has a range of between zero and one. Well actually has two values zero and one. And the derivative is zero everywhere except at point zero which is undefined because the function is not continuous. There is identity activation function. The range is minus infinity to infinity. The derivative is simply one. It's used usually in the output layer in a regression problem. So we'll get to that later in this tutorial. Now some more useful activation functions are three should be discussed logistic or sigmoid activation function. It's basically a soft step function. It goes from zero to one but as you can see there's no sudden jump. So this function is differentiable unlike binary step function and the range is still between zero and one. Hyperbolic tangent or tanh it's similar to sigmoid except that the range is between minus one and one. Again it's differentiable. And the more recent relu rectified linear unit. This is a function where if the input is negative it's zero. If the input is positive it's just whatever the input is. So the range is between zero to infinity for negative numbers the derivative is zero for positive numbers the derivative is one and for zero itself it's undefined because the function is not continuous at that point but that's not a problem we can deal with like a function not being differentiable at a specific point that doesn't stop us from using relu. So the problem of supervised learning is that we have a training set of size M a training set is a set of training examples and each training example is a pair of feature vectors and labels. So a feature vector it's a vector could be n dimensional and each of the elements of the feature vectors called the feature and there's a label associated with that feature vector we call it YI. So the assumption is that there is a function that maps the feature vector to the label and given the data we want to learn that function or approximate it. So that's the goal of supervised learning. So there are supervised learning if the label is a real number or continuous number it's called a regression problem if it's a category called value it's called classification problem. So for classification problem there are three cases binary classification that means our class label can have two possible values a classic example is when somebody we provide the patient data we want to predict whether the patient has a disease or not there are two cases it's a binary classification. There are multi-class classification problems so let's say you're given a bunch of images and you want to predict whether that image contains a dog, a cat or a panda that's a multi-class classification problem. A less popular problem is multi-label classification and that let's say you're given some images and you want to see which one of these animals are in that image it could be more than one or could be zero could be all. So that's called a multi-label classification problem. So as you can see in all these three cases it's classification problems and our labels they are categorical. You can't say one of the labels is bigger than the other label and you can't measure the distance between any of the labels there's no relationship it's just a nominal or categorical. But we, so the activation function and the output layer for each of these problems are as such if you have a binary classification problem we usually have a single neuron in the output layer and we use a sigmoid activation function as you remember sigmoid the output is between zero and one so we say if the activation of the neuron and the output layer is greater than 0.5 the output is one let's say there is a disease if it's less than 0.5 or less than equal to 0.5 we say the output is zero no disease so that's how we model a binary classification problem. For multi-label classification it's the same thing except that we have as many neurons as needed. So if you have a multi-label classification with three possible labels we would have three neurons which have sigmoid activation function and again the activation is the same way if the output of each neuron is greater than 0.5 we assume it's fire if it's not we assume it did not fire. So for multi-class classification it's a little bit different we have as many neurons in the output layer as the number of classes so if you have five classes we would have five neurons but we use a activation function called softmax so what softmax does is it takes the input to the neurons in the output layer and produces a probability distribution so such that the sum of all those probabilities adds up to one usually the neuron with the highest probability is the predicted label so in case of a dog, cat, panda we get three probabilities for the image being a dog being a cat or being a panda and we pick the, we say the network predictor for example dog if the probability of dog was the largest among those three for regression problems we usually have a single neuron in the output layer and we use a linear activation function that's what we're gonna use in this training actually so loss and cost function so what are they and why did we need them? So during training for each training example we use the XI, YI we present XI to the neural network and we, the neural network makes a prediction we're gonna call it Y hat of I and we have to compare the predicted output with the expected output and we need a way of to objectively compare these two and see how much we're off so for classification problems the main loss function which measures the difference between predicted and actual desired output it's called cross entropy but for regression function for regression problems there's something called a quadratic loss function which is also called mean squared error so we're gonna get to each of those in turn so cross entropy loss function and cross entropy cost function are defined as follows so the loss between Y of J that means the desired output for the Jth training example and Y hat of J which is the predicted output for the Jth training example is defined as such we multiply the predicted output and the desired output this is a multi-class classification problem are vectors so they have multiple elements so what we do we go through the elements of those vectors and one by one multiply the element of the desired output by the natural logarithm Ln of the element of the predicted output and we add them up with a negative sign so that's a loss so that's for a training example J but we have M of those our training set is of size M so the cost function is basically average of some of all those losses so we calculate the loss function on the first line for every training example we have M of those we add them up and divide them by M to get the average so that is the cost function and the cost function is a function of the parameters of the neural network which are the weights and biases so these are all the weights between all the neurons in the network and all the biases for all the neurons in the network so if you have a regression problem we're gonna be using the quadratic loss and cost function so the loss function again between the predicted output and desired output if it is multidimensional is we subtract these two vectors from each other we calculate the length of that vector and raise it to power of two and we multiply it by one half so that would be the loss and we have M training examples and the cost function would be just adding up the loss for those M training examples and then dividing them by M so we have an average we have an average out okay so now we wanna know we know how to calculate the loss and cost functions now we wanna know how to update the weights in the network so we can learn so there is a algorithm called back propagation it was proposed in 80s multiple people independently came up with the algorithm it's basically gradient descent technique in that the goal is to find a local minimum of a function by iteratively moving in the opposite depends the direction of the gradient of the function at the current point so it's like think of it as like a hill climbing if the slope is positive you wanna find the local minimum you walk backwards if the slope is negative you wanna find the minimum you walk forward that's all that is the slope is decided by the gradient and we usually move in small steps so we don't overshoot so goal of the learning is to minimize the cost function given a training set so our cost function for example for a multi-class classification problem is given by J which is a function of WMB so the goal is to minimize that given the training set so cost function is a function of network weights and biases for all neurons in all layers back propagation iteratively computes gradient of cost function relative to each weight and bias so let's say we have a network with multiple layers and these layers are fully connected that means every neuron in one layer is connected to every neuron in every other layer so this is one example so here we have one, two, three times four, I don't know, 12 that would be plus six, 18 weights and also we're not showing the biases here so that would be five more biases as well so all of these need to be updated by the back propagation algorithm so we update weights and biases in the opposite direction of gradient because we want to minimize the cost function and the gradients are used to update weights and biases the goal is to find a local manual so the derivation of the back propagation algorithm is somewhat involved if you go to the tutorial these are the slides for the feedforward neural network but there is also a tutorial for feedforward neural networks I do cite a reference that has the derivation for back propagation algorithm it's basically the chain rule in calculus that's the bottom line but you have to get creative in order to do all these calculation I'm not gonna discuss the derivation I'm just gonna give you the results and if you're interested you can look at that paper to see how these values are derived so the whole point is in order to do the back propagation we define this term called error it's displayed by small delta and delta of iLj means the error of neuron i at layer L for training example j so what that means is that's a partial derivative of the loss function that we defined up here first line for a multi-class classification relative to zilj and zilj is the input to neuron i layer L for training example j so that's how we define error then if you read that paper you will see that the error at the last layer or output layer of a neural network can be calculated by the first formula and it's simplified to basically on the right hand side to the predicted output minus the expected output these are vectors because it's a multi-class classification it's multi-dimensional so we can calculate the delta at the output layer then the second formula we can calculate delta at the layer L given delta at layer L plus one so the way it works is we calculate the error values for the output layer then we use the second formula to calculate the error values for the layer prior to the output layer again recursively we calculate use that to calculate the error two layers before the output layer and we go all the way to the input layer so as you can see the errors propagate backward hence the name back propagation so after we do this multiple times we get to the input layer and all the delta error values are calculated so what good are they? well the same paper will show you that partial derivative of the loss function relative to the bias is just the value of delta there's an i subscript missing here sorry for that and the partial derivative of loss with respect to the weight is given by delta multiplied by the activation of the neuron in the previous layer so in summary we define this term called delta or error we know how to calculate delta in the output layer by the first formula by the second formula we know how to calculate the error in a previous layer given the current layer and if we use that from the output layer we can go all the way back to the input layer iteratively hence the name back propagation having the value of deltas we can calculate the partial derivative of loss with respect to biases and also with respect to the weights and that's the gradient that tells us whether we should increase the weight or decrease the weight in the gradient descent algorithm so there are different types of gradient descent algorithm batch gradient descent you calculate the gradient for each weight bias for all of the samples because as I said our training sample is of size m so we can calculate the derivative of the loss function relative to weights and biases m times and then what we do is we average those gradients and then we update the weights and biases so this is good but it's low if we have too many samples so if let's say you have 10,000, 100,000 samples you have to calculate the derivative of the loss function for all the weights and biases for all of these 100,000 samples then average them then update them and then repeat so this is gonna be very slow so an alternative is a stochastic gradient descent and the idea is that you after one training example is provided to the network you get the predicted output calculate the derivative of the loss function relative to weights and biases and use that one derivative not the average of all derivatives for the training samples to update the weights so this has the benefit of being fast but that single gradient may not be representative and you may not have good results so the middle ground is something called mini-bash gradient descent so instead of using say 10,000 training examples to update the weights we break it down into like 500 mini-baches so after we average the gradient for 500 samples and a mini-batch and we update the weights so that's better than just using one sample to update the weights and it's also better, it's more accurate and it's also better than using all the samples to update the weights, it is faster so this is the preferred mini-batch gradient this is the preferred way to train a neural network so neural networks suffer from a problem called vanishing gradient problem so as you can see the second back propagation equation is recursive if you go up here you see the second one the gradient at the delta at layer L is calculated via the delta at layer L plus one so if you have a network that has like five layers we have to repeat this five times we go from level 5 to 4, 4 to 3, 3 to 2, 2 to 1 actually it's four times but if you look at the formula the second term actually the second term is GL prime of ZLJ ZLJ is an input to the derivative of G is the activation function at layer L so we have to calculate the derivative of the activation function at layer L to provide this input that gives us a value that is multiplied by the value on the left-hand side but the point is on the right-hand side of the formula for calculating delta, we have a derivative and this is a recursive formula so in order to calculate the delta on the first layer in the five-layer network that I told you about we have to multiply four derivatives by each other and the derivatives of for example sigmoid are generally very small numbers so if you multiply multiple small numbers with each other you're gonna have a very, very small number and that is what the delta that you wanna use to update the weights so if the value that you use to update the weights is very, very small then the weights don't get updated and then your mid-intensine algorithm does not converge or it will take forever to converge so that's a problem and that's more frequent in the networks that have many layers so this actually prevented neural networks to be applied to many complex problems and actually resulted in a loss of interest in neural networks in late 90s, early 2000s but since then we've had much better activation functions proposed so for example, relu does not suffer from this problem so you can have a network that has like tens of layers and if you use relu activation function for the hidden layers you can avoid the vanishing gradient problem but so sigmoid and tanh are still used in the output layers mostly sigmoid but for the hidden layers if you have a very deep network with many, many, many hidden layers it's better to use relu to avoid the vanishing gradient problem so in this tutorial we're gonna solve a regression problem it's car purchase price prediction we have a sample data set given five features of an individual their age, gender, mouse number per day personal debt and monthly income and the money that they spent buying a car we wanna train a feed-forward neural network to predict how much someone will spend on buying a car so then we're gonna evaluate this feed-forward neural network on a test data set and we're gonna plot graphs to assess the model's performance training data set has a 723 training examples so the test data set has 272 examples test and training data sets are mutually exclusive so when we test our model on the test data none of them are actually used in training the model the input features are scaled to be between zero and one and range that's a common preprocessing step before the data is presented to a network so this is the slides for the tutorial the tutorial itself has a references section and you can find references for this material the material used here there it's just as easy to put references and slides some general information the galaxy training material can be found at training.galciplasia.org there are various bioinformatics and rolling machine learning topics many tutorials, many contributors you need help you can go to helpgalciplasia.org there are also Gitter channels there's a Gitter channel for galaxy training and there's also one for Galaxy the main chat rooms and there are also domain specific chat rooms there are various events that you can find out about them in galaxyplasia.org slash events an upcoming one is well, that is this one, GCC 2021 so thank you so much the next video would be the hands-on section for feedforward neural networks in which we use Galaxy's neural networks facilities to create a feedforward neural network to solve this call car purchase price prediction problem which is a regression problem so I'm gonna be, I'm gonna be seeing you there soon thank you hello everyone, I'm back we're gonna be doing the hands-on section of the feedforward neural network tutorial so what you need to do is you have to go to training.galciproject.org and that's the main website for galaxy training if you scroll down to statistics and machine learning here click and there's a deep learning part one feedforward neural networks if you click on this monitor sign you're gonna be taken to the tutorial so we already went through the slides in the previous video here we're gonna do the hands-on section so we're gonna click on the get data and then we're gonna solve a simple regression problem just a note, you could download the workflow that's used in this tutorial here then import the workflow into galaxy and run it however, here I'm gonna like start from scratch I'm not gonna use this workflow but if you have any issues it would be a good idea to just use the workflow first and look at workflow, see what it's doing and to figure out what you're doing wrong okay, so let's get the data this is the section of the tutorial on getting the data I'm gonna copy the Zenodo links for the data that we want to upload to galaxy so you have to go to usegalaxy.eu this is the galaxy website for Europe there's also usegalaxy.org but I think some of the tools are installed on EU and not on the ORG website so let's stick with the galaxy EU you need to register, register it as me obviously after that, this is the it has three panels on the left hand side we have the tools, this is the main panel on the right side we have the history so we start by creating a new history you do that by clicking on this plus sign up here and then give your history a name this is so we can basically refer to everything that we do as part of the hands-on section of this tutorial so I'm gonna call it GCC 2021 feed forward neural networks so that's that and as remember I copied the URL of the files that we need for this tutorial we're gonna go back here click on upload data on the top left corner we're gonna get this page I'll click on paste fetch data and I'll paste the links here and hit start so this is going to fetch the files from Zenodo these are four files two for training, two for testing one of the files is the feature vector the other one is the label so we have a feature vector and label for training and a feature vector and label for testing and if I'm looking at my notes we have 723 training examples and 242 test examples that's the size of our training and test set so as you can see the jobs went from gray into a kind of orangey and then green so it went from like they were waiting to be executed they were executed and now they are finished so we have to rename these files you click on this edit attribute button here what we do is that we're gonna remove the extension and we're gonna also remove the URL they're automatically included in the name of the file and then we're gonna save this but the other thing that we do is that we look at the data types sometimes the data types are not detected correctly so all of these files are of type tabular so select new type tabular and click change data type so we're gonna repeat this process for all the other files and just so you know everything is documented in the tutorial so if at any point you think you don't understand something you can pause the video, go to the tutorial and look it up so this tells you how to create a new history this tells you how to rename a data set this tells you how to change the data type and so on so I'm just following instructions in the tutorial so let's rename and change the type of the remaining files so I get rid of the extension and the URL, save it, change data type to tabular and then I'll do that for the next one finally the last one, okay so two of the jobs are still running well one completed, wait for the last one so we have the X underscore train is the training feature set the Y underscore train is the training label similarly X test and Y test are the input feature and the labels for testing so we're gonna use the training data to train our model and the testing data to test and evaluate our model so what we're gonna do next is follow the tutorial and there are like multiple steps step one is create a deep learning model architecture so I'm gonna do that in the Galaxy page so what you need to do is almost all of the tools that we're using all of the tools that Galaxy has are on this pane, you can search for them that's one way, the other way is I know that all of these tools are under machine learning header so if you scroll down you see the machine learning header here, you click you would see all of the tools that you need for machine learning here, okay so let's go back and the tool that we need is create a deep learning model architecture so I'm gonna find that create a deep learning model architecture with Keras just actually a good point because Galaxy has wrapped Keras under the hood for all of our neural network facilities and Keras is basically a higher level library on top of TensorFlow that is a Google library so anyway, Keras has a very nice interface the API is very clean, it's easy to implement and it doesn't take a lot of code to get a lot of things done there are other libraries like PyTorch is also very popular but I really like Keras but this is very concise and it's very, I like the way they've implemented the library anyway, so here we are we're gonna pick version 042 of this tool that's done by clicking on this version button and selecting 0442 the model type is obviously sequential we have a feed forward neural network the input shape is a five because we have five attributes in our dataset it's looking at my notes it's age, gender, average miles driven per day personal debt and monthly income so these are the inputs to our model and the output is how much a person spent buying a car so the inputs are given for training in X train those five and the outputs are given in Y train the amount that they spent buying a car so that's the shape of the input so here we define a layer and if we have multiple layers we can do a click on the plus insert layer here to add a new one so the first layer, it's gonna be the type choose the type of layer as port dense it's a fully connected, that's fine we're gonna have this layer we're gonna, this layer is gonna have 12 units and the activation function we're gonna pick Relu and we are going to insert the second layer so it's gonna be also dense the number of units is going to be eight and the activation function is gonna be Relu and finally we're gonna add the output layer it's gonna be port dense again, the type of the layer number of units is going to be one because this is a regression problem so in a regression problem we have one neuron in the output layer and the activation function is linear so let's select linear here if we can't find that, oh there you go non or linear, okay so we are done defining our neural network architecture and everything seems fine so what we need to do is we're gonna click on execute and this is gonna start a job called Keras model config it's in the weight mode while it's gray when it turns kind of yellowish it means it's being run when it's green it's complete that's the color coding for Galaxy so this model can be downloaded as a JSON file so while that model is building I'm gonna show you the if you click on this I button you can view the data so if you click here this is gonna be our input feature vector for training as you can see it includes age, gender, miles driven per day or month the amount of debt and the amount of income and if you look at the labels which it's just a value that is normalized so I don't know it could be the first line could represent $32,000 or whatever it's a normalized value okay so we have this model and I think as you can see it's a JSON object so next what we do if you look at the tutorial we're gonna create a deep learning model so what that does is it takes the JSON file that we just created the model architecture and we have to specify a loss function as we discussed in the lecture and also an optimizer and some metric and we also need to define a few other parameters so I'm gonna go here and again I know that all the tools are under machine learning so if you scroll down on the left this is the machine learning tools and I'm looking for creates creates a deep learning model with an optimizer loss function and fit parameters so you click here let me get my notes here just to make sure I don't deviate from the tutorial so we do a build a training model we'll leave it as it is and here it says select the data set containing model configuration that's the JSON file that's what was created in job number via job number five that's been pre-selected correctly that's fine here do classification or regression the default is a Keras G classifier because we're doing a regression problem we have to change that to Keras G regressor and next is select a loss function as we said for regression we use mean squared error so let's select that mean squared error select an optimizer we're gonna pick Adam optimizer so Adam optimizer is kind of I think it should be the preferred one because it has two benefits over the basically the vanilla optimizer one is that it uses momentum so you're the step that you take also depends on the steps that you took in the previous states it's weighted so there's less weight for the steps way back in time and there's also different learning rates for different dimensions of the search space so these are the two benefits of Adam and it's pretty good so for the select metrics we can select mean squared error again and then we can go and pick number of epochs and batch size so epoch is how many times do we want to use the whole training set to train the model so we might use I guess we have 723 examples in our training set we use 723 examples and we can still use them again to train the model it still improves the performance of the model so the epochs basically tells us how many times we're going to use a training set I'm going to say 150 we have a very small data set so that's not going to be a problem and the number of batch size is how often are we going to update the weights or the parameters of our model usually we don't want the whole training set we don't want to use the whole training set to do one update of the weights because it's going to be too slow so we're using mini batch gradient descent and for batch size we're just going to pick 50 so that's done too we just click on execute and that's a model builder that took the JSON file from the previous step and all the new parameters that we specified and it's going to build a model builder so while this is building let's look at the tutorial so the next step we need to do is do run deep learning training and evaluation so let's see if this is complete okay just started running so while that's running I'm going to look for deep learning training and evaluation tool again I know that all of these tools are under machine learning header so let's see if we can find deep learning training and evaluation which we do it's right here so let's wait for the model builder to finish so okay so the Keras model builder job just completed we're now going to do the next step which is deep learning training and evaluation so you can find that the tool under machine learning header it's here so let's see let me look at my notes to make sure I'm not doing anything differently so I think I may have picked the wrong tool deep learning training then okay so okay sorry I got my notes wrong so the train and validate select the scheme remains as it is and the next step is choose the dataset containing pipeline estimator pipeline or estimator object that's the output of the Keras model builder in step number six up here we'll just leave it as it is the input type is tabular data obviously training sample data set that's going to be X train so we're going to click here select X train that's the feature set does the training data have a header it does so we're going to change this to yes and we're going to say we want all the colors and now we're going to worry about the labels so the labels are the data set name is Y train that's the correct one that's been pre-populated it does have a header so we're going to change it to yes and again we're going to select all columns and I think that's it we can click execute and this one's going to train and evaluate our feed forward neural network model so let's wait for this to complete okay this job completed I kind of paused the video while it was running but anyway so we get three things one is the model the other one is the weight of the model and the third one is like the metric so you can look at the metric that's a mean squared error it's a double check I have to double check to see what this is but anyway this is the evaluation result the model and weight of the model so this should not have been negative that's why I have to check that so this is the tutorial the next step is model prediction so what we did was we have a training data set we provided this training data set to our neural network and we used it to update the weights of our neural network so when the training is done we have a neural network that ideally is able to predict the price of buying a car given five attributes of an individual like age, income, debt, et cetera so now it's time to test this model so what we do is we pass in the test data to the model we compare the prediction of the model with the expected output and we see how our model does so let's do that and that's the next task the next task or job is model prediction again we go to a galaxy page we click on machine learning and we should look for model prediction it's up here so model prediction takes a few parameters the first one is choose the data set containing pipeline estimator object that's the result of job number eight that's pre-populated correctly the second one is choose the data set containing weights for the estimator above and the weights are the result of job number nine so we're going to pick that from the drop-down next is the select invocation method we want to do a prediction so we'll leave it as it is the input data type is tabular training samples data set this is our oh this is not training this is the test set that we want to use and that would be x underscore test we pick it here from the drop-down it does have a header and we select all columns so we're ready to execute this and this would this would basically provide all the data in x underscore test see what the model predicts and provide us with those values so it is running now because the color change on the job and it completed okay as you can see the model prediction job completed if we click on view data we will have the predictions by our model for the for the car prices so what we want to do now is we want to plot the output of our model so we get an idea of its performance if we go to the tutorial with the next step would be plot actual versus predictor curves and residual plots so most all of the tools that we use so far they were under machine learning if we scroll down we have this statistics and visualization section the headers are statistics machine learning and graph or display data all of the tools that we use so far we're under machine learning this plot tool is under graph or display data if you scroll down we would find it plot actual versus predictor curves and residual plots of tabular data so you click there select the input data file this is what we're comparing against which would be the labels for the test data which is why test and the predicted data file is the output of the previous steps right here it is pre-selected correctly so what we need to do is we're going to click execute and this is going to create three plots let's look at it here then first plot is true versus predicted values plot the second one is scatter plot of true versus predicted values and the third one is residual versus predicted values plot so we'll let these three jobs complete and then we're going to go over all three graphs one by one so the first one this is true versus predicted values so true values are given by the color blue and the predicted values are given by color orange and the more overlap you have the better if they completely overlap that means our predictions are 100% accurate the second one the second graph that we get is a true versus predicted values scatter plot so on the x axis we have the true values on the y axis we have the predicted values so if our predictions are 100% accurate we're going to have a 45 line with a 45 degree slope as you can see we're slightly off that that means that our predictions are now 100% accurate the root mean squared error for our our neural network is 0.11 obviously if it was zero that would have been a perfect neural net and our r squared metric is 0.87 these two are given up here so the r squared metric the closer it is to one the better and we're like pretty close to 0.87 and the last graph it's basically the predicted value on the x axis and the residual value on the y axis and what is the residual value it's the difference between the predicted and true value so if you predict I don't know 0.6 and the predicted value and the true value is also 0.6 you would have a point on this line on the y axis equals zero so the more the points are off from the zero y value the worse we are and the closer they are to the line that represents the y equals zero value the better we are so finally the conclusion in this tutorial we discussed the inspiration behind neural networks explained Perceptron one of the earliest neural networks designed that's still in use today and we discussed different activation functions what supervised learning is and what are loss and cost functions and we also discussed the back propagation learning algorithm that minimizes the cost function but updating the weights and biases in the network we implemented a feedforward neural network in Galaxy to solve a simple regression problem to predict the purchase price of a car given a dataset that we uploaded so this completes the part one of this three-part series the feedforward neural networks the subsequent tutorials are recursive neural networks and convolutional neural networks which we'll cover later okay thank you and see you soon