 So the time has come for us to look at designing a proper deep neural network We're gonna have these two hidden layers and an output layer And I'm going to show you how to design it and how to write the code And I'm also going to move away from showing you the actual RPUBs document We are here in RStudio and we're going to look at the actual code now It's already all written as you can see here and What I did to create it was just to say it's a new So on that little triangle there are markdown. This is a markdown file Now a markdown file is different from a script file You see script there in that we have this rich Environment most importantly up here be small you can see this little tab that says knit And if I click on that downward facing little arrow there We see I can knit this file to an HTML. That is what we do to Create an a file an HTML file to upload to our pubs or upload to the web and I'll put the link in the description below of this exact document You can also knit to a PDF and you can knit to word So it is a lovely environment from which you can create all sorts of documents Now I want to show you the code the RStudio Coding environment itself and then the deep learning so there's a lot to this video lecture So it always starts off with the three little minus Symbols and closed by those inside of these is a type of language I suppose you can call it that called YAML YAML just another markup language now the web is built on HTML Hypertext markup language that shows a webpage how to display things together with cascading style sheets and Mark markup language is you could see that as a simplified form of HTML and What we have here some a few things that are pretty standard There's title author and then output and you see the title that will be the title That's displayed on the top bar of the website It'll have the author listed and then output. I'm specifying here that it must be HTML So I am going to knit it to HTML, but I'm specifying overriding that by saying this must be an HTML file There must be a table of contents and then the number sections is false So it's not going to have number one number two number three to the different headings that I do have All the code that you write go inside of these three little tick marks on my keyboard It's top left next to the number one key on the top row And you have to have three of them and you close with three of them And you can see that our studio colors this in a different color minus light gray as we can see here and The first line stipulates inside of these set of curly braces a few things The first thing is states the language in which this Piece of code called a chunk is written and I'm specifying that it's written in R You can give this chunk this piece of code inside of these little tick marks a name You needn't do that. This is done automatically this so this chunk was called setup And if we look right down here, there's a little tiny little piece of text there If you click on it, you can see all your code chunks and you can see their chunk one setup because it was named But I didn't I didn't name all the other code chunks But if you do name them, it's easy to navigate to those code chunks there Include equals false. This means this is not going to be shown in the final HTML or PDF or Word document This is also also set up automatically. You don't have to worry about it There's some options here the echo is set to true which means that In all the other ones to follow unless you say include as false the code is actually going to show up and I've introduced this set working directory to get working directory So it's going to get the working directory where this actual R studio file is saved Rmd file is saved and it's going to take that address on my Solid state drive of my computer and it's going to set that to the working the the working directory Now here's another code chunk well before we get to that You different ways to run this code chunk when you're inside of it You can click the run button up there Or you can click this little run Button right here and you'll see there was a brief little green Stripe there and it will go from top to bottom as it executes the lines of code and it is now executed Another way to execute that is to have your cursor somewhere inside blinking there and to hold down shift Control and enter. That's PC and Mac a PC and Linux I should say or shift command and return on a Mac That'll also if you have that key combination. It'll also execute So here's my second code chunk and again those three little tick marks. You can type them in Now show you a keyboard shortcut for that a bit later and again the open and closing curly braces And I only have R in there just to show that this is our code I haven't given it a name or specified anything else and I'm going to import three libraries reader Keras and DT But see that it's also enclosed in a separate function called suppress messages If you import these libraries, it's going to give you a bit of information Some of them contain functions that overwrite the base and core functions and are and you'll get those little messages But I don't want to see them on the screen So I just say suppress messages and then library reader Keras and DT reader is a package that helps to import a file such as spreadsheet files in a better way than the base on Extended way than the base core R can do Keras, of course is our Deep learning neural network package Which is going to provide us with function and code to be able to design and run deep neural networks sits on top of tensorflow in My in in my case and then the DT stands for data table It is just a package that allows for you to create very beautiful dynamic and interactive tables on a Web page because I'm going to export this as our pubs. I use DT just to do that for me Now this piece of code here was also something that I introduced it's a bit of cascading style sheet and It just says that my heading one heading two and heading three should have different colors It's that royal blue and then the orangey gold that I always use This is also a line of text that I introduced if you do this in this way So we see the exclamation mark the open and close square brackets and then inside of parentheses This is the name of the file that lives in the same folder directory as this file Which is a PNG file and that is the image of the logo that you see on the our pubs documents So if you wanted to Put your own logos in there or any other kind of image file. This is the way you go about it next up we have two hashtags Pound signs hashtags. I think is what most people would know them as so we have the two hashtags. That's Markdown language and that indicates that whatever is to follow must be in heading two size So that gives you a nice heading to the paragraph that is to follow and it's the introduction and On the our pubs file or when you download this specific file from github you can read all about this What is coming in this in this lecture? So let's start with the data. We can do import a file It's a CSV file comma separated file. So it was opened up and created in Microsoft Excel And just saved as a CSV file. It contains 50,000 observations so 50,000 rows With 10 feature columns so 10 feature variables and then a target variable that is binary so it only has zero and ones in the sample space of That target variable remember the sample space are all the different elements from which The the values that that actually go into the target values the data point values are chosen from a set and In this case it is nominal categorical and there's only a zero and a one So I'm going to use this read underscore CSV function. That's different from the read dot CSV That's the built-in core Function read underscore CSV comes from the reader package and it creates what is called a tibble Which is different from a data frame and the data frame comes from the read dot CSV and they slide differences between the two Most notably in the way that the datasheet is displayed is especially if you use an r script file It just displays it differently on the screen in our studio making it more manageable and they subtle other differences Which we need to be concerned about now you can read about some of them there So here's our code again tick marks tick marks Scully braces with r and I'm going to create an object the computer variable called data dot set It's my choice. That's what I use when I import files. You can use your own name Just bear in mind that they shouldn't be illegal characters like spaces and leading numbers etc and Again my Assignment operator there now the assignment operator is easy to do it is just alt or option and the minus key shortcut So read underscore CSV and now the name of the file, which is would be available will be available on GitHub simulated binary classification data set dot CSV inside of quotation marks and I'm setting the call names argument to true because The data file does have its first row and in the spreadsheet file is the column names and Then I'm using data table now data table is a function from the DT package and I'm going to pass to it what I want to be printed out that is data dot set So I'm data dot set. I want that expressed as an HTML table on a website Eventually and I'm using Remember what these square brackets are they are addressing for row? So there's the row comma There's the comma actually comma Column now you see the column is empty and if you leave it empty like that. It means all of the columns So which rows do I want there are 50,000 rows? I don't want all of them. So I'm just going to take a 1% sample Random sample of all the rows to put in my HTML Table here, so I'm going to specify the first argument in the sample function. Remember, this is all Part of which rows to select comma which columns to select So I'm going to use the sample the first argument is the total number of rows to select from and I'm using the number of rows in Row function and passing the data set object to that Replaces false. So when a row is selected at random, it's not put back Into the bowl to be reselected. So I'm seeing that is false and the size the number of Rows samples that I want is 0.01 times the number of rows of the data set. So that's 1% of the data set So that's a random selection a random sample of all the rows showing all the columns in when we run this bit of code here We execute that First we have to import it. So that happens quite often when you do when you do This so we've got to go all the way back up and we didn't execute these lines of code So let's set the password. Let's import all of those libraries. Let's go down. Let's go down Let's go down. You see the little red day. I tried to execute and gives you an error It says cannot find read underscore CSV because we did not import the reader Package yet. So now we do that. You see the green line. That's all done And you can see a representation of what it is going to look like on the eventual HTML file very nice because you can Select these variables and you can order them in descending or ascending order You can search for specific values say for instance these when some of these were names Nominal categorical variables you can search for them and you can go previous and next and Look at all these pages very nice way Now I'm going to use the summary function and pass the data set to that Let's just do that and that's going to summarize all of the columns for us And we see that the column names were VAR 1 VAR 2 VAR 3 VAR 4 VAR 6 7 8 9 10 And the last column column 11 was named target That's in this actual spreadsheet file and we can see the descriptive statistics min first quartile median mean third quartile max And you can see all these values are have a mean of round zero So they like from a store standard normal distribution. This was a specially created data set Which makes the training of the neural network that we can create very easy so it was designed specifically For this with this lecture in mind Now we've imported this data But it does not exist in a format that we can pass it into the neural network once we design the neural network So we've got to prepare the data and that's called preprocessing and we've got to go through a few steps The first step is to take this tibble this data frame or this list whatever you want to call it and transform it into a mathematical matrix now remember that's a type of tensor and If you were to bring in images so eventually we are going to bring in images as Data that has to be transformed and we're transforming this into a tensor and in this instance a matrix because we're going to have rows and columns of So that's going to be a rank 2 tensor and so we change that into a rank 2 tensor or matrix using this as Matrix function, so I'm going to recast the data set as a as a matrix data set So now it's not a double or data frame or list demo is now a matrix And then I'm going to discard all the column names so that I only have the numerical values and for that I say dim names of the data set set that to null So all I have is this rows a matrix of numbers now if we had categorical names in there like benign malignant Whatever we would have to change that into numbers as well But in this instance in this simple first example We have none of those worries they all numbers to start off with you could see that they were all numerical and all we're Going to do is just change it into a matrix and remove the variable names the column header Now the next very important concept to understand is the splitting of your data Now you've got to split your data into two parts a training set and a test set very important You want to take some of the data out of the data set and and keep it separate and call it a test set Now the test set must not be seen by the matrix. It must contain some of these samples some of the rows That will never be seen during the training phase So we've got this test set the training set which Makes up the majority and we'll speak about how to split it and what size is used for the split in a moment We are going to split it So that the training set is what we actually pass to The neural network and from that it's going to train and optimize its parameters by minimizing a cost function and this Continuous process of forward and back propagation and the back propagation through gradient descent We've talked about all these things But then we once it's all done We want to pass new data to it that it's never seen before data to which we know the answer So it comes from this original data set in which the target is known There's a supervised machine learning remember and then it can test the accuracy for us So we've got to do we've got to do this splitting of the data By the way, we see the three little tick mark. So we're coming up to an expert of code I just wanted to Tell you how to do this in a shortcut ways instead of typing this all you have to do is hold down control alt and I the I key that's PC and Linux and on Mac that will be command Option I so if I just do that you see all of that was created in an instant and it's ready for me to write some code So that short keyboard shortcut is very useful. It's highlighted and deleted So what do we have here in this? I'm going to use the seed the set dot seed function and that means every time this piece of code runs It's going to generate the same random values for me in the same order because I'm going to generate some random values here But setting the seed and you can use any numerical value here. I've just used one two three You can just use one or one two three four or ten or fifteen. It doesn't matter But it just means that every time this code is run It'll follow a little recipe during the pseudo random number generation and every time we run this code will get exactly the same values out So I'm going to call it create an object called I in DX short for index And I'm going to use the sample function that we've seen before The first argument is two Now what happens it creates a little list it starts at one and it counts up in whole numbers and integer values To wherever you wanted to stop So in this instance, I'm going to have a sample space of just two values one and two and The next argument stipulates how many of these I want and I want that to be equal to the number of rows in my data set Replacement is two so at random. It's going to pick a one But now there's only two left in the bowl the number two left in the bowl So next time can only pick two so I can only have two random values, but what it does is it? Imagine there's a little piece of a little card in a bowl and they both of them one's got a one one's got a two written on it They folded you put your hand in ice closed Regal them around take out one at random. You see what it is, you know jot it down and you put it back in the bowl That's what the replacement is so that means I can do this thousands of times over and I'll get one one two two One two one two whatever So replaces two and I'm also sitting a probability in the same order in which these two numbers came because we used short hand We just wrote the one number, but remember there's a one and a two. I'm going to set the one to be chosen with a probability of 90% and 10% for choosing the second one, which is a two so very imbalanced here 90% 10% at random So there's going to be many more ones than there are twos. Remember this has got a sum to one So we can to run that 50,000 times Let's run that 50,000 times So I've got this long list now of ones and twos of which are many more ones and they are equal to Exactly the number of rows in my data set So that's great because now we can actually use this to split our data Now there's many ways in r to split the data. So this is just one way It's a little bit laborious, but let's have a look at it I'm going to create two objects now called one's called x train and x test Now it's customary in machine learning to use x for your matrix of features So that does not now contain the target the column vector of the target variable If you if you if you draw out only the features We usually call that x and I want to I want a training set and a test set And what I want to do is to assign to that The original data set and then I'm going to use Square brackets that means I'm going to do row comma column indexing addressing So the rows I want where index equals one. So this is very compact code. So it's going to it's got this index one one two one one one one two one one one one one two one Go through all of those all 50,000 and wherever it's one It is going to use that for the rows. So it's going to select all the 90% of the original rows. It's going to draw them out. And the columns I want is only columns one through 10. Those are the 10 feature variables. So remember they were 11 columns. So 11 is the target one. I don't want that. And the test set is then where the index is two, which only comprises 10% of the data and also just those. So I'm creating these two. I am creating these two matrices. A train one and a test one. I should call them two tensors. So that takes care of splitting the data as far as the features are concerned. We need to do exactly the same thing for the target. And we better have them stay exactly the same. So the same ones stay with the train and same ones stay with the test. Otherwise we may mix them up. This makes no sense whatsoever. There's one little side track I have to go down. Because later when we do the testing, I want a separate object. I'm going to call mine YTestActual. And I just want to save that separately. Because remember that is what we're going to test against. That is our ground truth. So I'm going to take data.set where the index is two. So that means it's going to be equal to this XTest and column 11. So I'm just going to store that separately. I always do that right in the beginning just to keep them separately. Okay, we're almost there. We still need a bit of pre-processing. The last bit we're going to do is something called one-hot encoding. Now one-hot encoding is something that we use quite often. And it changes a single data point value into a lot of dummy variables. So remember my target. My target variable consisted only of 0, 1, 0, 1, 0, 1, 0, 1. But I want to create because there are only two to choose from. My sample space is two. I'm going to create two dummy variables. And they're always called 1 and 2. If they're three, they'll be called 0, 0, 1, and 2. If they're four, they'll be 0, 1, 2, 3. Doesn't matter. So one of these, the 0 would represent one of the elements in my sample space. And the other one would represent the second. So imagine I didn't have 0 and 1, but I had benign and malignant. The 2 in my sample space. So imagine my target just said benign, malignant, benign, malignant. One-hot encoding means I'm going to form two dummy variables called 0 and 1. And the 0 will be benign and the 1 will be malignant. So now I have two columns in my target variable, 0 and 1. And if that specific one says benign, I'm going to mark a 1 in the 0 column and a 0 in the 1 column. Makes sense. One-hot encoding. So only one of those possible ones will have a 1 under it. And if the 1 is under benign, which was 0, that means that was benign. And if the 1 is under malignant and all the others, the 0 in this instance, have a 0 under it. It means that was malignant. So let's do that. And there's a keras function called 2-categorical that will do that for me automatically. So pay attention. I'm going to create two objects called y-train and y-test. Y underscore train, y underscore test. And I'm going to use the 2-categorical function. I'm going to pass my data set and again use addressing. So index is 1 and index is 2 and only column 11. Let's run that and let me show you what it looks like. The noise outside is obviously tremendous. Again, apologies for that. I mentioned in the other videos, it's the Neurosine Center, but right outside my office window here. Unfortunately, nothing I can do about it. I'm going to open the environment tab up above, which was not open initially. And all the objects, computer variables that we've created are listed here. So we've just created y-train and y-test. Let's have a look at them. I'm going to show you y-test. Let's open it up by hitting that little button. We see this opens up here. Now look at it. Instead of there being a single 0 and 1, 0 and 1, we have these two variables. And the first one was a 0 and the second one was a 1. It's the 1 that has the 1 under it. So this means this first one was a 1. You can see the second one was a 1, third one was a 1, fourth one was a 1, fifth one was a 1, sixth one was a 1. And that's the one hot encoding. So if you had more, because your sample space was bigger, only one of them would have a 1 under it. One hot encoding. Right, let's carry on. Let me do this bit of code. I'm going to use the C bind function. That's going to combine whatever I give these. These are vectors. Those are r vectors. And I separate them by a comma. So it's just going to combine all of these as columns. Let me show you. So there we have the first one is called y-test actual 1 to 10. So these were the actual values. Remember I saved that separately. So my test set the actual target were 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0 for the first 10. And now I'm passing y-test, which was one hot encoded as the next two columns. So you see column 2 and column 3. So because this was a 1, the one hot encoding gives me 0, 1. And when it's a 0, the one hot encoding gives me 1, 0. It's a 1 under the 0 column. And therefore the actual one was a 0. So I think you get what one hot encoding is all about. So we are doing that in this instance. Now strictly speaking, you can have a sigmoid function, an activation function. I'm going off on a tangent here. But bear with me. You can put in a sigmoid function in your last, in your output node. But we're going to do something different here. And I do that on purpose because in many instances, our sample space is going to be more than just 2. That would be quite common. And it's good to get used to this one hot encoding and having a different activation function for our last couple of nodes and not only to have a single node. It's a sidetrack. You'll get, you'll be appreciated as this course continues. Now very, very exciting. Let's create our first model using Keras Intensive Flow. I'm going to skip all of those written word. And here we are in this chunk of code, we're going to create a model. First of all, we're going to give it a name. So our object is called model, just model. And I'm going to say that this is a special kind of object. Remember when we created functions, we started off by saying object and then assigned to that a function. And then in parentheses, what the arguments are. And then inside of curly braces, what the function actually does. Similar sort of thing here. I'm going to specify this object is a Keras underscore model underscore sequential a sequential model using Keras in build function in Keras. There are two ways that you can create deep neural networks in Keras. One is the sequential model, which we use quite often. And the other one is a functional API, which allows for very intricate modern type of neural networks to be designed and we'll get to that in a future lecture. So I'm just instantiating this object, and I call it model, and it is a sequential Keras model. Now you're going to see something new. This is a pipe symbol. So it's a percentage greater than percentage. And all it does at shorthand, it takes a function. So the function is actually called layer underscore dense, which is part of Keras. And it takes whatever's on the left hand side of this pipe symbol, which is model, and it passes it as a first argument. So right there with a comma. So it's actually layer dense comma name, etc. But what the pipe does is it allows for you to embed things. So you'll see another pipe there and another pipe. So this model goes as first argument there. This whole thing from there to there goes as first argument right in there. And then all of this goes as first argument in there. So it's this layer upon layer upon layer. It's a very nice part of R. It's actually part of what is called the tidy verse. And we might have time to discuss this that later. But it's a very nice design, so that you can still set out things like this, but just like telescope it, but pull it out. So one thing fits inside of the other thing. Don't worry too much about that. So we're going to say model and pass that to the first layer, the layer underscore dense. It's a function in Keras. And it says that this first layer is a dense layer, a densely connected layer. And we're just going to move down. We'll have to look at that separately. We're going to just do the dense layer. The first argument is a very optional argument. I'm giving it a name deep layer one. See there are no spaces there that would be illegal. So I'm just going to give it a name. You do not have to give it a name. This is just for completeness sake. I'm stating that the first hidden layer must have 10 nodes. Remember, I had 10 feature variables, so I'm going to use 10 nodes. It's up to you. That's what is called a hyper parameter. A hyper parameter is something that you decide on during the design of your network. I have just decided that my first hidden layer must have 10 nodes up to you. The activation function, you've seen this one before, I want that to be a rectified linear unit. And for your first dense layer, you have to stipulate the input shape because it doesn't know what data you are going to pass to it after the design phase. And you need to stipulate the dimensions of this incoming vector, which remember is each row vector, the samples, one row after the other that you're going to pass into this network. And because there were 10 variables, I'm passing the number 10 to it. This refers to the number of variables, feature variables in my data. And it is because we're going to do, behind the scenes, remember, forward propagation is the inner product between two tensors. And those dimensions have to be correct. Otherwise, that inner product that tends to multiplication cannot happen. This type of mathematics, remember, linear algebra, and it cannot happen if the dimensions are not proper. So I've got to stipulate that. Now all of that gets passed into a next layer, another densely connected layer. Hence, I use the layer dot dense name, I'm going to call it deep layer two, needn't do that again, 10 units, again, the rectified linear unit activation function. This time, though, the size of the dimensions of what gets passed need not be stipulated. It would be inferred from what is coming in. So from that 10, and this 10, it can infer what the dimension should be. You needn't do that. You needn't worry about that for subsequent layers. Then another dense layer. And this is going to be my output layer. It's going to have two nodes. And the activation function is not sigmoid, it is softmax. Now, in the future, we are going to discuss these, including softmax. Softmax is a very special kind of activation function. What it does, it takes the number of units that you have, the number of nodes in the layer. And it will, after activation, provide a probability for the value in the first node and the value in the second node, such that the probabilities add up to one. And you can see where this is going, because we must predict either a one or two. It is going to have a probability for the number one node and a probability for the number two node. And similar to what we did with the one hot encoding, what it's then going to do is going to take which node, the zero, the one had the highest probability and that will become the predicted output. Lastly, I'm going to call summary of the model. Let's do that, run that, and it will give me a little summary of the model. Now, how does this work? It gives the layer and its type. And because I called a deep layer one and deep layer two and output layer, we're actually going to see those names there. If you didn't put the name, there'll be something generic there. It says the type. Both of these are densely connected layers. So all the nodes are connected to each other. The output shape is going to be these column vectors of size 10, 10 and two. And we specified that with a number of nodes. And the number of parameters that it has to learn through continuous back propagation, forward propagation, back propagation, forward propagation, back propagation, back propagation using gradient descent to minimize the cost function until we have optimum values. It says how many parameters there are in each layer. Now, how did it get to 110? That's very easy. Remember, I had 10 in my input. So there were 10 nodes in my input. And the first one had 10 in it. The first hidden layer had 10 units in it. So if each one is connected to each one, that's 10 times 10. That's 100. And remember, each of the 10 in the first hidden must also get input from its own biased value. And that biased value must also have a size of 10. There must also be 10 of them. So 100 plus 10 is 110. Same for that 110. And the last one, remember, there are 10 nodes connected to two. So each of the 10 has two connections to it. So 10 times two is 20. But that two also has input from a bias node. So that's two extra giving you 22. That means I have a cost function that is a multivariable function with 242 unknowns. Now, remember, from school y equals x squared, x, that's a single unknown. I now have an equation with 242 unknowns, which I have to optimize through partial differential equation, just through taking partial derivatives, I should say. So that is beautiful. You can see it coming together. It is so nice. Now, I've got a little image there. That's what I wanted to show you, but I forgot that you don't actually see it here before we knit it. So I'll show you what that looks like. Now that my network is created, I have to compile it. Now I'm going to introduce a few new things here, which we're not going to cover. So just have a look at them and we'll discuss them later. So the compiling of the model says, again with a pipe, use the compile function. So the first argument is going to be model, but I don't do that. We use this pipe. And then I've got to specify a loss function, an optimizer and a metric. Now, the loss function instead of mean squared error, we can use categorical cross entropy. We're going to discuss what that is and how that differs from mean square in the future. Just take my word for it. This is a better loss function. The gradient descent is going to be done in a specific way, which is different from the very generic ways that I showed you before. And it's going to use an atom, what is called ADAM, an atom optimizer for this gradient descent. Stochastic gradient descent is another famous one. And the metric we want to use is accuracy. So our measure of how well the model is straining is going to be accuracy. So we've created our model and we've compiled our model and let's fit the data to it. And to fit the data, there's going to be a few other things in here, which we'll discuss later. And anyway, I'm going to call this fitting. I'm going to give that a name object, create an object called history. And it's going to take the model and pass it as first argument to the fit function, the fit also part of Keras. In the fit function, then the model is passed as first argument through this pipeline pipe symbol. I'm going to pass X train and Y train to it. So X train is my matrix of feature variables, and Y train is my matrix of target values, which is one hot encoded. Now remember epochs, we've discussed epochs before, that is how many times we're going to have full forward step of full forward propagation and back propagation. So going through all the data once forward with all the tensor multiplications and additions of the bias values and then activation functions creating the value and then creating a loss. So the prediction and then a loss. So all that and then back propagating through gradient descent through the derivatives and updating all of our weights, our parameters so that full ones going through the network ones and coming back through back propagation propagation, that's one epoch. And I want 10 epochs, I wanted to run back and forth 10 times because I know every time I should have better, better, better, better parameters. The batch size is something new. What the batch size is is don't run through all the 45,000 samples in one go. Do it in small little pieces and then update and small little pieces and update and small little pieces. And I'm going to set it to 256. By the way, if you're using a GPU for your training, which we're not going to use here on this machine, although I do have a GPU, I've only installed the CPU version of TensorFlow and KRS, make it a power of two. So two to the power of four, two to the power of five, two to the power of six, two to the power of seven. So I have 256 here. It's just a good way for memory to work in these size batches if you use powers of two. That's also all of these are hyper parameters. So my 10 epoch size of 10, my batch size of 10, 256, my mini batch size, for which the argument is only batch size, not mini batch size, but this is actually referred to as a mini batch. These are hyper parameters that I set. Now here's another thing that's new, a validation split. As we split the data into a training and test set initially, I'm also splitting the training set within the learning process. That allows for this network to have this special little set held separately, so 10% of the training set and test itself all the time so that we can view it and see that it's actually doing well. It's obviously going to do well on the training set because it knows what the answers are and it's doing that forward propagation, backpocket propagation, but now I'm giving it data that it hasn't seen inside of this training phase, which is called the validation split, just to test itself and you want that to also come down. If that doesn't come down, it means something again, something that we're going to discuss in future, that it is not generalizing well. It is learning the data too well and it's memorizing your data and if it memorizes the actual data, the training data, it would not generalize to unseen data that well, points that we'll really discuss in depth in future videos. And then there's this verbose argument. It just tells what to show on screen as this runs. So let's do our first training. There we go. And we see a few things happen on screen. So even though there's 45,000 rows of data to go through in batches of 256 forward and backwards, we see even on a CPU, now this is a Core i7 CPU on this laptop, a rather high end, so it's not too shabby and it's running quite fast. You'll see a little bit of nonsense up here saying that the TensorFlow binary was not compiled using some of the features in this specific Core i7. You're always going to see that. You can just ignore that, but it does stipulate that we're using the CPU here. And they ran through the first epoch, the second epoch, up to the 10 epoch. It'll tell you how long it took. The first run took a second, the second, and then it was a fraction of a second run through the other epochs. If I run this through a GPU, it'll be much, much faster. It said on a drain through 40,000, remember there was a validation set of 4512 samples kept out. Now let's see what happened during the first epoch. It had a loss of 0.7 and an accuracy of only 59%. And of the validation set, it had a loss of 0.47, but quite a good accuracy. Now remember the accuracy is the correct predictions. It's the correct predictions divided by the total number of 83%. And now it went back through propagation. It had better values to start off with for these weights. Now something we didn't discuss. The first time it runs through, there are random values. All those parameters 242 were given random values to run through the first time. Somewhere on that multidimensional curve, we start off at some point. It was just totally random. But now through gradient descent, it got two better values. So the second epoch was run, the loss fell dramatically for the training set. The accuracy went up to almost 91%. The validation went, the error dramatically decreases well and the accuracy went up. And you can see as we go along, as we go along, as we go along, it gets better. You might have also noticed this beautiful graph that RStudio provides for us. And this is really great. And one of the reasons why I love RStudio as opposed to just running this in a Jupyter notebook and using Python, is that this was a dynamic thing that happened. And you can see the two, the validation is in green and the training set is in blue. And you can see as the epochs were running, the error got lower and lower and lower. And the accuracy got higher and higher and higher. And what you see, something that we'll get into is these two are very close to each other. So it is generalizing quite well. The training is not only specific to the training set, which will always get better. Always get better here at the bottom, the loss will always go down, the accuracy will always go up. But in tandem with that, the validation set, which it just uses to measure all the time also gets better. That means it is generalizing well to data it has not seen before. So that is a very good marker. Now this is a toy data set, simulated data that I designed specifically for it to do this. This is not what you're going to see in the real world. And we'll certainly do some more real world examples in the future. And you'll see these two being quite far apart, and that's bad. And you'll see what we call it, what the problem is, and what to do with it, how to change the design of your deep neural network to combat those problems. I'm just going to use the plot function. And it's going to create some nice little plot here. It's a GG2, a GG plot type of plot that we can see of the loss and accuracy, in case you wanted to save that and use that in a publication. So let's evaluate our model. Now I'm going to use the evaluate function, and now data that it hasn't seen at all. It's not the special validation set that it kept out during the training phase. It's the actual data that it has never, ever, ever seen X-Test and Y-Test. I'm going to pass that to the model. And this is where the tire hits the tire, whatever that's saying is. So it says of the data that it's never seen before, the loss was 0.158, and the accuracy was almost 96%. Not too bad. We can improve it though. There would be ways to change the design to get even better. But that's not too bad for data that it's never seen. It was 96% accurate. Now you're not going to get to 100%. When I designed these 50,000 rows, it was designed so that there's a bit of overlap. And that is what happens in the real world. You're going to have variables. And for similar variables, you're going to get a different target value or the other way around. And that's real life. And it's because the variables that we gather are not always representative of what the real causes are. The target is not caused by the variables that you have in there. And that's a fundamental problem, which is very difficult, specifically in healthcare, in that the data point values for variables that we do collect is not necessarily the ones that determine the outcome, the target. Well, there might be surrogates of a deeper lying physiological process that we don't understand yet, and we can't collect data on that. And that's the true determinant of the outcome, the target value. And that is a real world problem. That's a problem that we deal with in normal statistics and here in machine learning. Are the variables the actual ones that cause the actual outcome? That's a deep debate that we can have there. But let's carry on here. I'm going to do something else called create a object called predict. And now we're going to do the predict classes function and we pass x to it. So it's only given it going to give us the prediction of the test. Is it predicting a one or zero, one or zero? And I'm going to pass that to a table. And my table is going to have two rows and two columns. And I'm going to call the top part predict in the bottom actual. Let me run that and show you what it does. It's called a confusion matrix. So the actual goes on the top. It's the second one and predicted goes on the left hand side here. So it says if the actual value was zero, remember that comes from this y actual, it was predicted as a zero in 2,424 cases. And if the actual was a one, it was predicted as a one. And that's what this predict classes is going to give me just this long list of zero, one, zero, one, zero, one. If it was actually a one, it was right in 2,250. But if the actual was zero in 34 cases, it was predicted as a one. And in 173 cases where it was actually a one, the prediction came out to be a zero. This is called a confusion matrix. And it also helps us to determine, you know, it's a visual way of seeing how good this test data was managed by this network that was trained. Another function that you could use is the predict proper. And I'm going to save that in a computable variable called probe, because what I want to show you is what happens. The predict proper is going to give me the prediction of, in this instance, only because we have a zero and a one, remember through the one hot encoding, it's only going to give me the probability of this first one, the zero. But because we've set it up, we're actually interested in the ones. So I'm just going to subtract them from one. So the prediction of the first one, the zero, is going to be subtracted from one. That gives me the probability of the second one, the one. So if I run that and I do the first five, you can actually see what the probabilities were, was in my one hot encoding, my second node in the output, the one in actual fact. So it was 99%, 99.99%, 56%. That was a close call. So it just shows me this first five, what the actual value was in node number two. And that's what the softmax function did. It gave a probability for the second node and the first node. And I'm only looking at the probability that was outputted from the second node through the softmax function. And what actually happens for us to get either zero or one, there'll be this cutoff of 0.5. If it was 0.5 or higher, the final of this, the final prediction would be a one. If it's less than 0.5, the final prediction would be a zero. Now, let's do that through C bind. So you can actually see what happens. So I'm passing one minus probe. So that was the probability in the second node. And then I'm passing the predicted value based on that for rows one to 10. And then the actual one, which I saved right in the beginning. So we see that the second node was the highest at 99%. Therefore, the prediction was one and that was quite correct because the actual value was one. So you can see how this all comes together. And that's it for this very long lecture. I hope it was as exciting for you as it is for me. Deep neural networks and the design of it using KRS on top of TensorFlow, here in R is just such a wonderful, exciting thing to do. It really is such a pleasure. And I really hope that you are as excited as I am. So download this file from GitHub. Watch, look at it on R pubs. And in R pubs, you're actually going to see what it looks like. Let me do that for you. So let's save everything we've gone through. And now I'm going to knit. So I'm going to knit to HTML. Let's go. I'm going to warn you. If you use the GPU version of TensorFlow and KRS, you're going to run into problems at times when you do this knitting. It might not work properly for you. If you use the CPU version, it does. And here on the right-hand side, we see the viewer. This is what we now can publish to R pubs. And I've already published it, so it's going to say republish. But this is what the document looks like. All the web elements done very nicely. We see what we typed. There's this very nice column thing. We can look at page number two, page number three, page number four. Very nice widget there. We see the summary there. We see the colors of the different headings that I did in the cascading style sheet initially. And I just wanted to scroll down actually and just show you, give you a visual indication of that network. This is the network that we created in visual form. So I have my 10 input nodes and they densely connected. So they connected. Each one is connected to each of the other ones. And that's why you get the 100 weights here. But there's a bias node as well. So after this tensor multiplication, we're going to add this column vector of bias node values. So there's another 10, giving me 110 weights there. 110 parameters of which 100 are weights and 10 are biases. And another 110 here and then the 22 there giving me the final output of these two corresponding to the one hot encoding. So whichever one through the softmax gets the highest probability, that's going to be the final predicted value. I'll speak to you in the next video lecture.