 So in this tutorial, I want to introduce you to the Keras package in R. Now, I'm going to assume that you know a little bit about deep neural networks. I think if you're a healthcare worker, the time has come. You've got to learn about artificial intelligence, about machine learning, and specifically about deep neural networks, deep learning. Always maintain, if you want to be a researcher, you really have to understand statistics. There is no excuse not to. You don't hand over your work to a statistician, learn to do it yourself, and call yourself a researcher. The same is now going for deep learning, deep neural networks, artificial intelligence, machine learning. You've got to learn how to do this. We have to have medical personnel that can stand with their feet on both sides, on the computer science side, as well as on the healthcare side. It's a good thing to do. To understand this, it is the future that is where research is going, and it's obviously taken over so much of our lives. We see self-driving cars. There's no excuse not to learn anything about this. So this is an R Markdown file. You'll find it on my RPUB's page, and I'll certainly link to that. So you'll see all the R Markdown here. What we're interested in here is just the code cells, the code chunks. So I'll show you and walk you through those. First of all, I'm always set my working directory to the current working directory because I save my R file right inside of the same folder as we have the data set, the CSV file. So I'm going to import Keras. If you want to read more, if I just go down here, you can see keras.rstudio.com. Have a look at that. Very easy to install here in RStudio. So read that page and have a look. And I'm also just going to import reader because I just want to import my CSV as a table. That's just a standard practice in the research group. So what we're going to look at is this University of California at Irvine. You see Irvine datasets. They have a lot of datasets for machine learning. This is placing a tachogram for cardiac monitor and foetuses. You can download the dataset there. You can read up more on the dataset what the features mean. It is an XLS file as usual for an Excel file. It's a bit messy and you have to clean it up. I've done that already for you. Saved it as a CSV file and you can find it in my GitHub repository. 2,126 rows of data samples across 21 features with a single target that we are trying to predict using a deep neural network. So I'm going to showcase you what is possible. Just a simple neural network here. We're going to get about 88% accuracy there. That's another spectacular. I'm not going to spend time explaining what a neural network is. Just to show you what a pleasure is working in RStudio and using Keras with a TensorFlow back in the research unit. We do most of our work, of course, in Python and TensorFlow, but really moving across to using R and Keras here with a TensorFlow back in. It really is a nice coding environment. It really makes the work easy and we can concentrate on what we're trying to get from these models as opposed to worrying about the writing of the code. So I can really recommend using RStudio. So first of all, let's read in that CSV file there as a table. We see it there, nicely printed to the screen if you are doing all markdown. One of the reasons we're using a table. Now, you can take these data sets, data frames, and you can use them in Keras in a variety of ways. What I'm going to show you here is just the standard practice voice, and that's just using a matrix. So I'm going to take my data frame Df here, and I'm going to cast it as a matrix, as dot matrix, having a data computer variable. And then I'm going to strip the column headers, the variable names there, using dim names data, and I'm going to set that to null. So we just have a matrix with all the rows and the 22 columns. Now, the target variable has in its sample space three classes, one, two, and three, being a normal or suspect or pathologic, and we need to convert that to zero, one, and two, so that we can do one hot encoding for our target variable. That's easy to do with a bit of broadcasting. So I'm going to take data, all the rows, column 22. I know my target variable is in column 22. I'm going to cast it as a numeric variable, and through broadcasting, just subtract one from each of the one, two, and three, because that is going to leave me with zero, one, and two, as the sample space of my target variable. So that's easy to do. It is now all numerical variables. Now, what we want to do this is a multi-class classification problem. We have three data point values in our sample space here. We've just got to do a train and test split. We're going to split our data into a training and a test set, I should say. And one easy way to do it in R, really easy. It's just to create for the number of rows that we have, just an index. I'm going to use the sample function for that. I'm putting a two here. It's going to start counting from one and counting numbers, so one, two. And replacement, of course, has got to be true, because I need more than 2,000 of these data points. But I'm going to set a probability of 0.7 for choosing a one and 0.3 for choosing a two. So we have that 70% to 30% split. And now it becomes very easy to create two computer variables called x-train and x-test. And I'm going to use indexing, and the row values is where this INDX computer variable that I created has a value of one, that'd be 70% of the data. And where it is two, that is going to go into a test set. And I'm only using my feature variables here. This x-train and x-test is just two matrices of my features. Now, later on, we're going to use a confusion matrix. So I'm just going to save this computer variable y underscore test underscore actual. So it's going to take the test target data and just save all of them, the 0, 1, 2, 0, 1, 2, 0, 1, 2, whatever it is. I'm just going to save that separately to use later on, just in my confusion matrix. But we're going to use one hot encoding for our target variable here. Keras has a two underscore categorical function. We're going to use that, which will do the one hot encoding for us. And I'm going to create the y-train and the y-test. Again, just using that index. So my feature matrix and my response matrix, target vector, they correspond, the rows do correspond. So we indexes 1 and we indexes 2, and it's only column 22, where I know my target variable is hiding. So if I just view the first five rows of you, just to show you what the hot encoding looks like. So we're going from a vector into this matrix format, where this is column 1, 2, and 3, corresponding to my 0, 1, and 2 in my sample space of my target variable. This first sample in the test set is going to be a 0. That's going to be a 0. That's going to be a 0. This is going to be a 2. And this is going to be a 1. So you see the one hot encoding there. The next step is to normalize my training and my test feature sets. Now, I see this mistake being online in some tutorials. Please don't make that. You want for the normalization to apply the mean and standard deviation of the training set onto both the training set and the test set. Don't do mean and standard deviation of the whole lot. Don't do mean and standard deviation separately for the training and test sets. So here we have mean underscore train, mean and standard deviation underscore train. So just to clearly show my computer variable names there that I'm going across all the columns, and I'm getting the mean and the standard deviation for all the 21 columns for the training set. And then I'm using the scale function to apply that mean and standard deviation to both the training set and the test set. So that's the training sets mean and standard deviation being applied to the training and the test set. So look at that carefully. Now let's do a model. I'm going to show you there's a model. That is how easy it is to create a multi-layer neural network. And let's have a look. I'm just going to call my model model and it's going to be a sequential model, keras.underscoremodel.sequential. Now you can use the keras API and you can introduce more than one inputs, more than one outputs. You can have a cyclic graphs, whatever you want. It's really fantastic to do. I'm just going to use the sequential model here. Then I'm going to pass model to the layer dense function using a pipeline. And in my first dense layer, densely connected layer, I'm going to have 22 units. Now just to show off here, but this is not to make my model any better, but we know if we have overfitting that we can use regularization and we can use dropout. So I'm going to use regularization in my first layer here. It's L2 regularization, set at 0.001. My activation function is going to be rectified linear unit and my input shape is going to equal the number of variables in my training set. So that's going to be 21. I don't have to redo that. This graph is going to infer the size of the next layer based on just this first one. So the layer dropout is the next layer. So the pipe, there's again the pipeline with a rate of 0.2, dropping 0.2 of my weights and densely connected layer with 12 units. Again, regularization, again, the rectified linear unit activation. And lastly, I'm going to use softmax with three outputs, three nodes there because we know my sample space has three values. Let's run that and just get a summary of the model. Very easy to calculate the number of parameters there based on the shape here with a total then the end of 799 trainable parameters that our model is after. The next thing we have to do is just to compile our model using the Keras compile function. So with the pipeline, I'm passing my model to the compile function and my arguments being a loss function. I've got the sample space of three, so I'm using categorical cross entropy as my loss function and the standard atom as my optimizer, so I'm not passing learning rates or anything. It's just the default, set at the defaults and the metric that I want to see is accuracy. So let's compile that model. It's now compiled and I can pass my values, my training values to it now to fit the model. So there's our fit function. I'm going to use the pipeline again just to pass the model to the fit function and I'm passing X train and Y train and across 20 epochs with a mini batch size of 64 and I'm also going to introduce a random validation split of 20% of the data being used as a validation. I'm going to store all of that inside of a history computer variable so that we can make use of it later. Let's run this through 20 epochs. Very nicely on the right hand side, we can see these graphs, top for loss, bottom for accuracy, blue for my training set, green for my validation set. I can also click on these if I just want to see one of the two, either the training set or the validation set and we can see the loss comes down, the accuracy goes up. We can still see there's some overfitting there but we try to compensate for that. So we'll see the 20 epochs run there. Finally, I can just quickly plot that history instead of the plot here. I can save a plot, a nice GG plot there. No problem. We can also evaluate this. I'm going to pass my model to the evaluate function and now we're going to put the unseen values into our model. So x-test and y-test pass that to the model and that's going to give us a loss and an accuracy there. You see the accuracy of about 89%. Told you we could just look at a model here of a confusion matrix. So I'm creating a pred and that is just using the predict underscore classes function passing x-test to that. That's going to predict one for us and then I'm also using remember that what I saved initially the y-test underscore actual just to give us a table there. There we go. The actual values and the predicted values, you see the accuracy around the main diagonal and then off the main diagonal where all the mistakes were made. Let me just show you lastly that you can also use predict underscore PROBA probability for the x-test. I'm going to save that in the computer variable prob and I'm going to just use column bind, c-bind and just bind those together. It's going to run over way too many rows there but you can see the output here so for zero, for one and two. So this first sample here the highest probability at 0.9966 was for a zero hence the prediction is zero and then the actual there is also zero just here through my c-bind and then you can go and see where the mistakes were made. So really fantastic. Give it a go. Of course, learn machine learning artificial intelligence, specifically deep learning. I think all healthcare professionals should seriously have a look at this especially if you're on research and if you're going to stick to a place to do it or inside of our studio with a Keras package and a TensorFlow backend you can't do any better than that.