 Welcome to this video on creating a densely connected deep neural network using Mathematica or the Wolfram language. Now if this is just a random video that you stumble upon, please note that it actually is part of a larger set of videos, a playlist here on YouTube, where I just look at deep neural networks using Keras with a TensorFlow, Google TensorFlow back end in the R programming language. In my research unit, though, I make extensive use of Mathematica in the Wolfram language and I thought I'd just pop in this video just to show you how easy it is to create a densely connected deep neural network in Mathematica. So we're going to run through some of the steps that you have to take. There are videos online about using Mathematica for deep neural networks. A lot of them are for convolutional neural networks with images as an import. And it's not easy to transfer what is taught there into just something that you want to create yourself with data that's on your own hard drive, data that might be in a comma separated value or spreadsheet file and how do you prepare that to put it inside of a network. So I'm going to assume that you know Mathematica, that you know the Wolfram language. If not, there will be some links in the description below. I have two courses on you to me that you can take just to introduce yourself to Mathematica. It is a fantastic coding environment. The Wolfram language is a fantastic language. It has knowledge built into the language, which makes it quite unique. You can write lines of code and the data will just be extracted from the Wolfram service and you can just compute with that data. It really is unique. I urge you to have a look at the Wolfram language. There is a cloud version. You can log into the Wolfram cloud that you can open a free account there and you can code right inside your browser. This is a desktop version of Mathematica. Now I'm going to add this notebook that I have here. It's a notebook similar to what you might see as an R Markdown file or in Python, a Jupyter notebook. It is this notebook file. I'm going to upload to Udemy and I'm going to make a few extra videos on machine learning on deep neural networks on both of my courses on Udemy. So this is part of that larger effort. So in the beginning of this file, you can see a bit of explanations about deep neural networks. This being part of the current series on playlist on YouTube, I'm just going to show you the code. I'm just going to assume that you know a bit of Wolfram language code and that this should not be a problem for you. The first thing that we're going to do is I'm just going to set my directory to the notebook directory. That means if I put all the files in the same folder or directory as this notebook, then it's easy for me to import. This is a common feature in many languages. I'm going to hold down shift and hit enter or shift and return and that line of code is executed. Now the data set that we're going to work with contains 29,963 samples over 13 feature variables. The feature variables are coded 0 as 1. So this makes it a classification problem. They being labeled. This is of course an example of supervised learning. Now this is a very difficult data set. There are many overlaps in this data. Very difficult to extract a good model from this. I'm going to import that just with the import command. Now this data set does not contain any column headers. So I can just import that as such. There we go. Let's have a look at the dimensions. We can see there already it's 29,963 rows over 13 columns. Let's use the table function and just draw a histogram of each of the variables. And I'm just going to feature variables just 1 to 12. So I'm just going to iterate my table from 1 to 12 there. And all the rows in the column. So what we're going to see these 12 histograms. So let's just look at the 12 we have. And there seems to be a normal distribution for my first feature variable. Again a normal distribution for the second. This kind of uniform distribution for zeros and ones. So that was a categorical. This was categorical. This was categorical. The feature variable number 6 here seems to be some form of chi-square. Another sort of what some seems like it's decaying exponentially here. But it might also be chi-square. We see more uniform distribution here. Categorical variable here. Another chi-square. Another normal. Looks like sort of uniform distribution there. So you can just have a look at the distribution of the data point values for each of these features. Just some interesting information. What we want to do now is to split the data into a training and test set. So what I'm going to do is just to create this variable in which holds the number of subject or sample size that I have 29,963 rows. And with that size I'm going to only extract 10% as my test or validation set. So what I'm going to just create is this object called train in. And I'm just going to do 0.9 or 90% of the data. And I'm just going to use the floor function because I want it just to round down. So that says that 90% of the data would be about 26,966 rows. So what I'm going to do now is just use the random seed and set it to 1, 2, 3. Just so that when you run this code you'll get the same split of the training and test set. So I'm just going to use this easy way to create two variables in one go, pass them as a list so they go inside of curly braces. And I'm going to use the take drop function. And what the take drop function does, it says take the values that I'm giving you. That's the first argument. The second argument is cut it in two. And the first part takes whatever the second argument is. So we're passing it all the data. And we're saying take, split it in two and put 26,966 in the one. That'll go to the first one, which will be data train. And whatever's left, put it in the second one, which we'll call data test. And what I'm also using is just the random sample function. And instead of putting this in square brackets, I just use the syntactic sugar here by the at symbol. So what it's going to do, it's going to take 26,966 rows at random from data import. Put that into data train and whatever remains, the take drop will put in the second one. So I have this very nice split of my data at random. So let's run that, shift enter, shift return. And I'm looking at the dimensions of each of these. And indeed, I see 26,966 and then the remaining 2997 in my test set, both over 13 columns. Now we've got to spread these two apart from each other. So I'm going to create X train and Y train, X test and Y test. And I use these double square brackets. Double square bracket notation. Remember to get it in this nice form as a escape. Hit the escape key and then left square bracket twice and then escape again. And you'll get that. So it says all the rows in columns 1 to 12. So that's my feature. And then all the rows in column 13, that's going to be my target. And I do the same for test and train. So let's just have a look at this. Just a histogram of the distribution of the target class. So that I see that I just want to make sure I don't have any class imbalance. And there we go. We see the training data and the test data. And we see that for both of them, we see that there's an equal distribution. So we see zero and ones here at 13,538, 13,424. So there seems to be this equal distribution. So we're not sitting with a class imbalance. Now we want to normalize our data. The way that we're going to normalize here is just to standardize it. And that means I take every value, subtract from it the mean for that column or the variable and subtract the standard deviation for that variable or column. So if I use the mean function on X train and I use the postfix here in just to have numerical values, I see each of the 12, the easy and mathematical, using the Wolfram language, I see each of the 12 means and I can see each of the 12 standard deviations of my feature variables. And that's for the training sample. Now I'm going to use the map function and I'm going to map a function to a list. So the function is standardized, which is going to do this automatically for me. I just have to remember that the standardized function works across the rows. But I wanted to work across the column, so I've got to transpose X train. And then when I'm done with all of this, I've got to transpose it back so that it's in this long form. And I'm going to call this X train standardized so that every value is now standardized. Let's run that. And then I'm also going to show you, it's going to run, take a while to run. This is a big data set and there we go. We see here, now this is 10 to the power negative 16. So it's the surrounding areas, it's all zeros. So the mean for each of the 12 columns is now zero and the standard deviation is one. So that's very easy to do. Now I'm going to do the same with a test, but I've got to do something a bit different. I've got to actually create a table because I've got to iterate over all of these columns. And what I want to do is just to subtract each value from the mean of the training set from the actual value in the test set and divide by the standard deviation of the training set standard deviation. So even though it's test, I'm still using the mean and the standard deviation from the training sets. And I'm going to iterate over those and in the end just transpose that back. So let's have a look. There we go. Now let's have a look at the mean and standard deviation, the mean and standard deviation of my test set. And you'll notice that the means are not all zero. They are going to be close to zero because these values came from the same distributions. And the standard deviations are going to be close to one, but they're not one. This is because I used the stable function to subtract the mean of the training variable and divide by the standard deviation of the training variable for the test values. Important to do that. Now a very important step. I've got to get the data, the feature set and the target into a proper form so that the neural network can use it. So I'm going to create a train and a test variable here. And it's going to be a table. And what's going to happen, I'm going to run through each of the rows for all of the columns. And then this little arrow, which remembers escape minus greater than escape to the corresponding training value. We'll look at the dimensions and I'll show you the first example, the first sample. So you'll see the dimension there is 26,966 as we expected, but look at the form that it's in. It's inside of a list, a larger list. But then you see the 12 values all normalized and then the arrow to zero. So this first sample had as its label a zero. And then you can go through all of them. And we're going to do the exactly the same for the test sets. So look at the code there. It's got to be in this format where you have this sublist, each row is a sublist and then a arrow and then it's label for that specific sample. Great. So let's create a neural network. We've worked through our data. Our data is now properly done. We've done the proper stuff for neural network. And we've created the correct format for use in the Warframe language. I'm going to use the net decoder so that we can decode the values that are output from our neural network. Remember, we have a binary class here, zero and one. So we want to tell it that the predicted value must be decoded. And it must be decoded as a class and they are two values, zero and one. So that makes it very easy to look at the output of a network, this net decoder. We don't have to do a net encoder at this stage because we're not importing images or anything like that. There we see our net decoder. Now, this is a bit of compact code here. I've put everything together. There are two things going on here. I'm creating my network, my deep neural network. That's with the net chain. But then I'm also initializing it. Normally you would do this in two steps. I would say net one. I've created two years as an example for you. Do the net chain and then initialize it. But I'm doing this all in one. So the net initialize a second. So that comes on the outside. So let's look at the network that I'm creating. I'm using a bit of syntactic sugar here, meaning I'm not writing linear layer, square bracket five. If you just put the value, doorframe language, accepts the fact that you are talking about a linear layer, normal, densely connected layer. And since I don't want to add any specific arguments to it, I'm just using five. So I've got a hidden layer here with five nodes in it. The ramp, ramp means rectified linear unit. So the activation function is next. Then I'm going to have two nodes in my next layer. And the activation function there must be a softmax layer. And the output, I want the output to be decoded as a class, either zero or one. And that's why we have the net decoder there. And for the first time, we have to put the input layer. The input has 12, it's a vector with 12 elements, because our input features, we've got 12 input features. So that's my network single deep layer and a softmax layer as output. Now I'm also going to initialize it. And then initialize what I'm going to do with initialization here is just to create random values for my weights. We see the argument there and the biases I want to set all as zero. Let's look at another example, net2 here. I'm going to have two hidden layers here, five with rectified linear unit, five nodes rectified linear unit, two nodes with a softmax layer, everything else the same. But my method of initialization this time is going to be a Xavier initialization with a factor type being mean and the distribution being uniform. So let's run that code. There we go, these are two networks. And we can tool this open and we can see we have a 12 element vector input. So that's a rank 1 tensor as an input there. And we have a linear layer with five nodes, then a ramp, that's the rectified linear unit activation, another linear unit with two, just the two nodes with a softmax as its output. There's my second one a bit deeper because it has a second deep layer there. Now just to show you, you can extract the random weights by using the net extract. So there you have the random weights which this first tensor that's going to be multiplied by this input, those are the weights that got randomly selected. And that comes from up here where we initialized it. I put a random seed, one, two, three, there. So if you do the same, you're going to get exactly the same random weights. Now let's have a look at these weights. Just for net one, it's weights. And I've just made the bun size there 0.01 versus the Xavier with a uniform distribution. Look at the weight distribution of my net two. So you can have a look at those using net extract just to look at how the initial weights were randomized. Very good. So inside of comments here, I've done one, two actual networks. The one I'm going to run is here just to show you some of the things. So let's just look at the bare bones one. So net train. So I've created my network with net chain. I've initialized it with net initialize. Now I've got to actually run it. Net train. That is fitting. So I'm saying please use net one. Pass my training data which was in train. A mini batch size of 256. My validation set. Yes, please use the test set as the validation set. Target my GPU. Very easy here. I don't have to. I mean on this machine, I do have an Nvidia GPU. I don't have to do all the crazy imports and installs. I can just say target device GPU or CPU. The method that I'm going to use for my the optimizer for my gradient descent. I'm going to use stochastic gradient descent and my max training rounds. That's the epoch set that to 25. Let's look at some of these other ones. I created net one train batch size 256 validation set target device GPU. My loss function. I'm going to use cross entropy layer here set to binary. That's the default. Mathematica will see that it's a binary output. So it'll use that. My method for my optimizing my gradient descent is Adam adaptive moment estimation. I'm going to set some of its parameters beta one parameter 0.9. I'm going to introduce L2 regularization with its parameter value and the learning rate of 0.003. Here's another one. Also very vanilla. This target device and max training round and all the other defaults will be used. So have a look at net train hover over there. Click the plus button. Have a look at that. So let's train this model and you saw how quickly that model was running. Let's open it up. Just to give you an idea again there of what had happened. So let's have a look at passing the first our first sample and our test set to this new train model. And we see the prediction is zero. Let's create a classifier measurements object and we pass the training model to it. And then the test data. Let's have a look at that. And there we go. We can see the accuracy was only 71%. The number of classes 2 and the sample size there. Let's have a look at this objects properties and there's quite a few of them and you can work your way through those. So let's just have a look at the accuracy as we said that was 71%. We can create a very nice confusion matrix. So we see what the actual classes were 0 and 1 and the predicted classes. So we can see on the cross diagonal here where the mistakes were made. We can create a lovely report which shows us the accuracy there, the baseline accuracy, the first time it was run, etc. You can see the values there and our confusion matrix there. We can also get an area under the curve and we see 78.6 and 78.6 there for 0 and 1. We can do the ROC curves for both of the classes there. Have a look at perhaps the class mean, cross entropy, etc. All of these properties you can play with. So that's a nutshell densely connected neural network inside of Mathematica using Warframe language. It is really just a pleasure to use. And as I mentioned, if you want to know more about Mathematica, have a look at the description below just to have a look at some of the courses that I have. And try your hand at deep neural networks using Mathematica.