 In this video, I want to show you how to create a deep neural network for regression problems. So we're in the desktop version of Mathematica here, and this is what I want to discuss with you. We're going to have a little introduction. I'm going to talk about the data and I'm going to show you how to use your own data. So that data exists in a spreadsheet and how to import it and transform it, prepare it for use inside of a deep neural network, a densely connected deep neural network using the Wolfram language. It is a bit different from using curated or resource data that you might find online that's already been prepared for you. That's just not the norm. The norm is for you to have your data in some form of database which you can extract into a flat file such as a comma separated values file that you can then use yourself. So we'll go through that. I'll show you how to create this model and how to create it so that it can be used in a regression problem. Before we looked at a classification problem and there we have a target variable that is categorical in nature. Here though we have a numerical variable as our target variable and we need to make a few changes so that we can use that data to create predictions using the Wolfram language. So in the introduction, remember this is really a series just for domain experts. So you're not a computer scientist, although I suppose you could be, or mathematicians, statisticians, et cetera, but you have domain knowledge, business, health care, it doesn't matter what it is, you have domain knowledge and you want to use deep neural networks and machine learning using the Wolfram language which I think is a fantastic space, a fantastic language to use to create your models in. So as I state in the introduction, really this is about the target variable being of a numerical data type. And we're going to have to do a few things different. A few things will have to be done differently so that we can use this numerical data. So let's import the data. Now all of this is going to be available on GitHub. I'll put the links down below. So we're going to import this simulated data set. It's called regressiondata.csv. The data set does not contain any column headers. So the first row doesn't contain the variable names or column names. So it's easy just for us to import a tray at the top here, though I'm setting the directory to the notebook directory because this file, the CSV file, lives in the same folder as this notebook. So let's execute that. So we're setting the directory and we can, we just refer to the data file here. So let's import that and I'm creating this computer variable, this object data set to hold the CSV file. Now let's use the table form suffix here. Remember, if you put these two forward slashes and then a Mathematica or Wolfram language function, it actually, it's just the same as putting that in the front and then putting this as an argument inside of square brackets. So I'm using the double square bracket notation here and the semicolon, semicolon five means up to row number five. So let's put that in table form and have a look at the first five rows. So we note the 11 columns here and this data set is set up such that these first 10 columns are our feature variables and then this last one is our target variable. And notice that it is to at least one decimal space, this is a continuous numerical variable and that's what we're trying to predict here. Note also that these values here, they are on totally different scales. We see feature variable number 10 here, we see 13, 14 versus if we look down here, say to the second one, it's 0.24, so very different scales and we know that if we want to help the gradient descent process, we would need to put these all on the same sort of scale. But before we get there, let's just look at the dimensions of our data set calling the dimensions function, passing the data set as an argument and we see we have 4898 rows over 11 columns including our target variable. So that's a small data set. We also only have 4898 of these samples, so not a big data set at all. Let's just have a look at just getting familiar with the data that we are dealing with. One way to do that is to create a histogram of each of our 10 variables and the target variable. And to do that, I'm just going to iterate over all of them using the table function. So remember the table function has my first argument here, which tells me what I want to do and then also what I want to iterate over. And I want to iterate over this placeholder variable i and I'm using the defaults. So I'm not saying 1 comma 11 comma 1 to say that I want to start at 1, go up to 11 and go up in steps of 1. If I just put the value where I want to end, it assumes the defaults. I'm going to start at 1, go to 11 in steps of 1. So 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11. And that will be my 11 columns. So what I want is the data set. These two semicolons, remember that means all. So all the rows comma column i. So that's going to be all the rows for column 1, all the rows for column 2, all the rows for column 3, et cetera. Of my data set. And I want to create a histogram of that. So let's go. And there we can see the distribution of my 10 feature variables and my target variable right here in the end. You can see the target variable, almost this sort of a normal distribution there. Now we need to split our data into a test set and a training set. And because this is such a small data set, I'm not going to just use 20%. I want my test set to be representative of the data. And then a small data set like that, if I use a very small percentage, say 10%, it might be that I introduce a bit of bias that my test set is not representative of the actual data sets. I'm going to use a slightly larger number and I'm going to use 20%. And let's use this take drop function that we've seen before. So to take randomly, to take 80% of that, I just need to know of my data set. And we saw there 4,898, what represents what integer value represents 80% of that. And for that, I use this floor function. So first I'm just going to store the length of my data set that 4,898 has a variable. And then I'm going to take the floor of 0.8 times N. And see, there's a space there between 0.8 and N. If you put a space in there, the Wolfram language sees that as a multiplication at space. So 0.8 times N, now that might give me a fraction, depending on what N was. So I take the lower value of that, the lower integer value of that with a floor function. So let's do this 80-20 split. We're going to use a random seed value of 1,2,3 so that if you run the code, you're going to get the exact same split of your data. Take drop is going to give me two variables and I put them inside of a list. I'm going to call them data train and data test. And the take drop is going to take whatever number I specify here, which is 0.8 of all the samples. It's going to put them in the data train and whatever remains it's going to put in data set. And then I just want to show you what the dimensions are of these two. So let's run that. And it states that 3,918 made it into the training data and 980 made it into the test data. And I'm hoping that 980 is representative of the data. Now let's split this into our feature variables and our target variables. And I'm going to do that both for train and for test. So there's data train, data train there, and data test, data test there. And all I'm doing is instead of two semicolons, you can also use the key word all. So all the rows in columns 1 through 10 of data train, that's going to go into an object called X train. And then all the rows in column 11 of data train is going to go, that's 80% of the data remember, into Y train. And we use that X and Y to be feature and target, to represent feature and target. And the same for the test data. So I've got my four data sets now. Now remember we mentioned that these feature variables are not on the same scale. So let's normalize them. Remember, you will see the word normalizing inside of literature for deep neural networks quite a lot. But it's just a generic term because there's lots of ways of normalizing your data. So in this instance, we're actually going to, the proper word for what we're going to do is to standardize it. So we're going to change it. So the actual data point values for each one of the feature variables so that we have an overall mean of zero and a standard deviation of one. So we're going to transform all those data points. And you can see the equation there. You take every value you subtract from that, the mean for that variable and divide it by the standard deviation for that variable. So that's standardized. So I'm going to take my X train and I'm going to create a new object called X train standardized. You can call it, of course, what you want. And I'm going to map. There's the map function. The function standardized to the transpose of X train. Why do I want to do the transpose? I'm changing rows into columns. So instead of this long 10 columns and 3900 and something odd rows, I'm changing that into 3900 and something odd rows and only 10 columns because the standardized works on rows. So I'm changing each column into a row and that will map the standardized function into each of the rows. But remember I wanted to be transposed again so that I have 10 specific values as far as 10 specific columns as far as my original data was concerned. And then just to prove that the standard deviation, that we are going to have a mean of zero and a standard deviation of one, I just want to print those out. So let's have a look at that. And we see the means now that says times 10 to the power negative 16, negative 17. So that's just rounding errors. I mean that is such a small value. It actually just is zero. And we see the standard deviation of all 10 feature variables are one. So I've transformed that data so that each one of those 10 variables are on the same scale with a mean of zero and a standard deviation of one. Now we need to do the same to the test data. So I'm going to create X test standardized. But remember it has to be standardized according to the original mean and original standard deviation of each of the training data's variables. So I've actually got to implement this little equation here. And the way that we're going to do that is as we did before, I'm going to eventually just transpose the whole thing because I am going to have to do this in a specific way. So I'm going to take X test all the rows for the first column minus the mean of that column divided by the standard deviation of that column, and I'm running through all the columns one through 10. And just to prove that all of this works, have a look at it, is I'm going to call the mean and standard deviation, which is going to be slightly different from zero and one for all of those. So we can see close to, close to, close to, but there is a difference between the test data and the training data. And I'm using the training data's mean and standard deviation, original mean and standard deviation before the standardization. To subtract that mean and divide by the standard deviation for each and every data point in my test set. And that's why I get values that are different from zero and one. Now let's prepare this data so that it's actually in a format that our neural network here in the Wolfram language can use. So I'm going to create two variables, computer variables, train and test. And what we're going to do is use the table function once again, and I'm just going to create this format. Each sample is going to be a nested list. So I'm going to have the first, second, third, fourth, up until the tenth value. So that's one for each and that for each of the rows of my feature variable set. And then this arrow sign and then the target value. So let's do that. So I'm going to iterate again just over all of these rows for extreme standardized. So it takes the row, all the columns, to the appropriate training values. Training sets value there. So let's execute that so that you can just see and I'm going to look at that first sample. This is just a sanity check, the dimensions. So to show that I have 3,918 rows of data there, nested the lists, I could say, and look at that first one. So you can see there's this outside, this very outside curly braces. And then this inside curly braces set here, this and the arrow and the four, that is exactly what we have here. So we're saying all of the first 10, the first samples, 10 feature variable values, 1, 2, 3, 4, 5, 6, 9, 10, all of those, and then arrow to whatever the target variable was for that first sample. So if we go way back up where we looked at this first, 5.8, so and so and so, and then eventually goes to 6.7. That's what we have. We have one row and it goes to 6.7. Now that is not the first row. Remember we have there because we had a random split of the data into an 80-20%. So it's not necessarily that the first one made it into there. And we can see this first one mapped to a four, so it definitely wasn't that, an original first row. So let's do the same for test so that it's in exactly the same format. This nested list, where we have each one being a nested list of the 10 values and then this little right-facing arrow, which remember is this minus and greater than, and the target value. Let's move on to creating the model. That's more exciting. So we can have three hidden layers and an output layer. And I'm going to keep it very simple. Well, I suppose it's a overdoing things here a little bit. And I'm going to use the syntactic sugar for my linear layers by just, instead of writing linear layer and then putting it in the 20 inside of square brackets to note that this is a linear layer, I'm just going to use the syntactic sugar to say that that's 20. Now I'm going to do something a bit different from the previous video I made for the Wolfram language, where we had the classification problem and then I'm going to do the initialization step different, separately from the creating the network. And because this is just a densely connected neural network, I'm just using the net chain because I'm just stacking up one deep layer after the other. So I'm going to have my first deep layer with 20 nodes and then the activation function and that is going to be the rectified linear unit activation function and the code word for the word for that, the function name for that inside of the Wolfram language is just a ramp. Another deep layer with 20 nodes and an activation function, a Rayleigh activation function, another 20 and then a ramp and then just one. Just one, my output layer is just a single node because remember all I want is a single number and it doesn't have an activation function, you just want that actual number. It's like linear regression, I just want that number. I've got to state that the input is of size 10 so that's going to be my vector of my input variables, remember there are 10 feature variables so input is 10 and I've got to specify that my output must be a scalar. So that one node must be a scalar. I've got to state that and this is a regression problem, so of course this is what I want. So there's my net chain, it'll take a second or two just to create that and we see now it is an uninitialized and we see the input vector of 10, my linear layer with 20 nodes, my activation function, linear layer activation function, linear layer activation function, a linear layer with a one output which is a scalar, one node output which is a scalar. So let's initialize this net and I'm going to use exactly the same computer variable, object net. So net initialize, I'm going to say net and then instead of just using the random initialization I'm actually going to state a method here, I want the random method the random method allows me some sub-options and I'm going to use the weights sub-option for random which allows me to state a distribution and I'm just using shorthand here for the normal distribution by not stating the zero of the mean but only the standard deviation and I want a standard deviation of 0.01. So this is going to be, my weights are going to be initialized according to a normal distribution with a mean of zero and a standard deviation of 0.01 and I want my biases to be initialized to all being zero. And there we have, now we have an initialized deep neural network. Just to have a look at these weights I'm going to create a histogram of all my weights I'm going to flatten them because remember we're talking about a tensor here. So I'm going to extract from the network my weights in my first layer there and I'm stating my bin size there so that we can see it really is a normal distribution and we can well imagine that the standard deviation is 0.01 and I've shown you the bin size I've set the bin size there so that the data actually fits nicely into this histogram. Alright, so let's train the model remembering that this is a regression problem. So I'm going to create the object called trained net train is our function so I'm going to pass my network which is now initialized my training data and I'm going to specify a loss function and the loss function that I'm going to to specify is the mean absolute loss layer I could also choose mean squared loss layer as my loss function and remember that is this going to be this difference somehow this difference between the expected value the actual value and the predicted value because my data size say the small I'm going to run a very small batch size only 32 it's so small I needn't have even done a batch I can run an epoch make an epoch being the full batch but the mini batch size I'll set to 32 my target device is going to be my GPU if you don't have a GPU in your system this is a small data set just run it off of the CPU or just leave out this argument completely and the method that I want to use as my optimizer for gradient descent in this instance is going to be rmsprop just rmsprop and my epochs I'm going to run over 25 epochs so let's execute this and I'll pause the video so that you don't have to wait that long there is the training it is running on a GPU so it's going to be fast it's a laptop GPU it is an Nvidia GPU and so it does work and there we have this trained now we've got to test our model now as before when we looked at classification we had a classifier measurements function and we could pass the trained network and the test data as arguments to that at the time of the recording though the comparison for regression problems which is the predictor measurements function does not work the same way as the classify function so unfortunately I can't just pass trained, the test data then it just won't work I have to do things manually if you want to use the predictor measurements function to create that object which will have all those properties that we can expect you'll have to use the predict function and I'm going to show you how to do that so what I want to do is to use my trained model and I'm going to pass my X test standardized values to it and it's just going to create for me a list of all the predicted values of that 20% of the data which remembers my test data it's going to just create a predicted value for each of those and I'm going to store that in an object called predicted and now what I want to do is just create little pairs so I want a pair of Y test sample number I comma predict I so that I have these pairs here put them inside of quotation I should say curly braces and I'm just going to run from I to the length of my test data so now I have these pairs let's quickly have a look at the first pair so that the actual value was 6.9 and the predicted value according to my deep neural network is 5.4 so across all the test data I'm going to create these I'm going to have these pairs because that allows me to create a list plot so my list plot is going to be this pair that I've called actual predicted I'm putting in a plot label I'm putting my axis origin at 2 comma 2 in my aspect ratio to 1 so that my so just so that we see this this is an actual factor scatter plot we call it a comparison plot in deep neural networks I'm just putting the axis origins in aspect ratio there just to make the plot come out nicely and my axis labels are going to be actual and predicted let's have a look at that so there we see my actual values here and my predicted values on this side so if you run this you're going to see something different so let's just see how accurate this was and one way to do this is just to have look at the mean absolute difference so I'm going to take the difference between each pair and because some are going to be large or small I'm going to have positive and negative layers so I'm going to get the absolute difference the absolute value the difference between each pair and I'm just going to get the mean of all those differences and that's the mean absolute loss so that's one measurement of how accurate my predictor is my model is so what I need to do is to subtract the values in each of these pairs and to do that I'm going to create another set of pairs I'm going to call my an actual negative predicted because I'm taking the actual value and then negative the predicted value it's because I want to take I want to add those two values there's no way for me to subtract them from each other the pairs but if I use plus and the three at symbols what the three at symbols will do it'll take each nested pair remember I have a nested list a big list this actual negative predicted and it has a bunch of nested small little pairs inside of it but I want an addition of those two values because the second one is now being negated with a negative there this is actually a subtraction that I'm performing and I want to take the absolute value of each of those differences and then the mean of all of that so let's do that and I see that I have a mean absolute error here of 0.62 on values that ran from about 2. something till 9. something that's not too bad that's fairly accurate for this small data set that we have so that's it how to create a deep neural network from your own data and as far as the regression problem is concerned I promised you I just want to show you the predict function it goes with the classifier function now what the Wolfram company has done through the Wolfram language and inside of our coding environment Mathematica is just to automate machine learning and they have the predict function for regression problems and the and the classify function we see down here the classify function if it's a classification problem and all I have to do is call predict and just pass the training data and the training data remember isn't that correct format where we have the 10 values and a little arrow and then the target value and I'm going to change one of the defaults because there's actually quite a few arguments have a look if I hover over there and bring up all of this information on the predict function you know the documentation for the Wolfram language it is absolutely excellent so one of the defaults I'm going to change is just the performance goal and I want that to be set to quality and now it's going to run this and you can see it's going to change nearest neighbors random forest linear regression back to random forest and so it's going to try a few machine learning algorithms to create this for us and there we see it's settled on a random forest training on our 3918 samples there and it gives me a predictor function and this p1 which is now a predictor function I can pass to predictor measurements whereas I could pass my model to classify classifier measurements for predictor unfortunately I have to use this predict function I'm sure that will change in the future and then I'm also passing the test function because we need to see how accurate we were now we have this predictor measurement object it has a bunch of properties as you might expect and we can look at most of them so it used a random forest machine learning algorithm here let's do a comparison plot now that's going to be as we did before on our own the scatter plot so we have the predicted values and the actual values there and instead of the mean absolute error let's look at the mean square all done by with default values the Wolfram language decided what it was going to implement all automated for you and we have a mean square error of 0.5 to quite good now I can actually specify a method but I I just have to say neural network again it is going to decide how many layers it is going to decide how many nodes it's going to make a lot of decisions but instead of specifying linear regression or random forest or canerous neighbors there's quite a few you can choose from I'm going to specify that I want to neural network but I'm not telling it anything I'm letting the Wolfram language decide on everything that's going to run for a while as I say this notebook and the data will be uploaded to GitHub there's a link run this file yourself and see and see what you get so in short that is a deep neural network with your own data as a regression problem is concerned give the Wolfram language a try with with your with your deep neural deep neural network or machine learning task you would find that it is a fantastic language to use if you want to learn more about the Math America or the Wolfram language I've got two courses for you on the Udemy platform and I'll put the links to those below as well