 In this video, I want to show you how you can use images on your local hard drive and take them in batches into your model as it trains. Images are usually fairly large as far as the data size is concerned. So to bring them all into memory for the training might just overwhelm the system. So I just want to show you how we can use image data generation taking images in batches as it is needed by your model. So what we're going to do here is to just use some data from the Kaggle website. You can log in to kaggle.com, create an account and there are many, many data sets that you can download. The particular one we're looking at here is of skin lesions and that makes it a classification problem. We have two classes, it's either a benign skin lesion or a malignant skin lesion. That's very important then that you have your structure, the folder structure, your directory structure put in place properly on your hard drive or your internal drive. So what we're going to have is separate folders or directories for the training data, for the validation data and for the test data. And in each of those we're going to have a sub directory or a sub folder with the separate images. So we're going to have a sub directory for benign images and a sub directory for malignant images in each of the training validation and test directories. So that's very important. So what I'm showing here is just the RPUBs file that we've knitted and uploaded. As always though this file, the actual RMD file, the Rmarkdown file will be available on GitHub. So we're going to use the Keras library, let's just increase the size here but there you can see we're using the Keras library and it is the newest version as of October 2019, in other words it is the version that goes with TensorFlow 2.0. So let's have a look at how the data was structured and I'm going to use the file.path function here and if I pass a series of arguments all inside of quotation marks it'll string those together as a folder structure. So I'm going to call the first one training underscore deer for directory validation deer and test deer. So it's on my D drive, I've got two internal drives on the system, it's in the kaggle folder, R folder, skin and then there's a train, a validation and a test folder. So it's going to create these three folder structures for me. In each of those I have a benign and malignant, benign malignant, benign malignant sub directory and in those we have all the photos already separated and it's very important that this benign malignant, benign malignant are spelled exactly the same as far as all these subdirectories are concerned, lowercase b, lowercase m, everything has got to be exactly the same. Now I'm going to do my training in step sizes, I'm just going to set up the total number of images that I do have by using the length function and I'm just going to call that num underscore train underscore benign and that's going to count list dot files, just list all the files in that directory and the length that's just going to tell me how many files are in there and I'm just totaling all of those, the total number of training data, total number of validation images then and the total number of test images, as simple as that. Now for demonstration purposes for running this code on a laptop with just a small Nvidia graphics card, I'm just going to make these images exceptionally small so I'm going to stick with 112 pixels wide and 112 pixels high and my batch size is going to be a tiny four, just so that we can run it on this laptop and if you are downing it on a similar laptop with a small GPU it'll work for you as well. So I'm now going to use this image data generator function and I've put this whole layout here keres colon colon just to show you that that is a function inside of keres, you don't have to do this keres colon colon, it's image data generator. Now we can use the image data generator to do all sorts of pre-processing on our files and also data augmentation. So have a look at this, as per usual I'm going to divide each pixel by 255, that's just normalizing the value of each pixel, I'm going to allow random rotations up to 10 degrees, I'm going to allow a width shift range of 0.15, height shift range of 0.15 and a horizontal flip and a zoom range of 0.05. So you can look that up on the website, keres website image data generator to tell you what all these things mean. It just means as the image comes in it can do some random changes to the image, that's called image augmentation and that really helps to generalize your model as a trains. It just creates different views of these images as they come in every time so as to almost fake, well I can't say almost it does, a new image by allowing these random changes to be made to the images every time they are bought in. My image gen validation function here, computer variable here is going to also be image data gen generator and all we're going to do there is rescale and the same with a test. The validation data and the test data we don't want to augment them at all, it's only for the training data. Now we're going to set up a computer variable train data gen and then also validation data gen and test data gen and from those we're going to use the flow image from directory, that's actually going to do this flow just to grab these files off of your internal drive as they are required. So we're passing the training directory, remember that's going to have sub-directies benign and malignant in it. The generator that we are going to use is this one we've just set up, image gen train to allow for this image augmentation. The target size is then going to be the image size 112, 112 and the class mode is binary in as much as we have two classes and I'm going to set the same up this flow image from images from directory function, I'm going to set up that for the validation data gen and for the test data gen. So you've got to go through both these steps. Now let's create a model, the model is going to be fairly simple, you can see it there I'm going to have two convolutional layers, convolutional layer one and two, there's going to be max pooling after each one and a 20% dropout after each one. I'm using 16 kernels and then 32 kernels and the kernel size of three by three, the padding we're going to keep the same and the activation function is going to be ReLU and these are color images. So for the input shape in my first layer there I'm telling the R here that is 112 by 112 by three, these three color channels of course. Then we're just going to go into flattening all these layers and our hidden layer here is a 512 node dense layer with that activation function and lastly a single node and activation function is going to be just sigmoid, in other words just so that we can have this binary outcome. There's the model, we see that's 12.8 million parameters that have to be learned even for such a small little network. Now I'm going to use binary cross entropy as my loss function, add them as my optimizer and the metrics assist the accuracy. The training then, I don't just use the fit function now, I use fit generator. Take the model as first argument, the model is going to now have to grab these images as required and so we pass the train data gen computer variable that we saved to generate those images. The step size per epoch we have to specify and we're just going to take the total number of training divided by the batch size. So that'll give us the steps to take per epoch and I'm using the floor function there because we need an integer value. I'm going to use 10 epochs, you see the setup there of my validation data, it's also the validation data gen and I've also got to set the step size there. I'm going to use a callback and I'm going to use early stopping there, we're going to monitor the validation loss, we want improvements of at least 0.01 and if it doesn't do that for four steps in a row there, epochs in a row I should say we are going to stop and I'm not going to run it here, if you've downloaded this file you can run that and it's usually a tiny little network like this with the image size made so small you're going to get about around about 70% accuracy. So that's how we would do the accuracy, I'm going to call it score, model, I passed that to evaluate generator so you can't just use evaluate now, you have to use evaluate under generator, I passed the test data generator to that and my step size again the total test size divided by the batch size and in this instance when I knitted this file we got an accuracy of about 70% as you can see there. Just as a last little note remember that you can always save your model in HDF5 format, suppose the best format for these models so we just use save underscore model underscore HDF5 we pass the model to that and we're going to call the file skin.h5 because I've set the directory it's actually going to save it in the same directory to reload that is very simple we just use load underscore model underscore HDF5 pass the file to that and the model is there with all its hyper parameters and weights and biases all the parameters are saved so if I were to just load this model use this load model in the evaluate generator function again pass the test data gen and the step size to that I'm going to get it back exactly the same accuracy because all the weights have been saved. So there you go how to use data that's on your own local drives, image files in this instance and using the image data generator function to take these images as required for the model to train and you don't have to load all of them inside of memory in one go.