 Namaste. In the past few modules, we have been building machine learning models using TensorFlow APIs. There are situations where the model trains for a long time and we would like to store the intermediate step of the model. So that we can assess how it is performing on the test data or safeguard against unforeseen situations due to which the training loop may not complete. The model can resume training where it laptop and avoid long training times after restoring the weights. Saving the model also helps us share our work with others so that they can recreate it. When publishing research models and technique, most machine learning practitioners share code to create the model and trained weights or parameters of the model. Sharing this data helps others understand how the model work and try it themselves with the new data. In this module, we will learn how to store the model during or after the training. I would like to caution you using any untrusted code because TensorFlow models are code at the end of the day. Hence, you should be careful and ascertain the origin of the code before using any untrusted code. There are different ways to save TensorFlow models depending on the API that you are using. Here we use tf.keras which is a high level API for building and training the model. Let us begin by importing TensorFlow and other dependencies. Let us install TensorFlow 2.0 and make sure that the right version is installed. We also installed OS package because we want to write and read files to the disk. Let us load a missed dataset and take 1000 examples each from training and test tensors so that our model can run faster and we will be able to demonstrate the save and restore functionality. Let us define the model in a Python function so that we can call this function for creating the model before and after saving and for restoration purpose. So, we define a simple neural network model which has got a single hidden layer with 512 units. We use ReLU as an activation function and the input to this particular hidden layer are 784 values. So, these 784 values come from 28 cross 28 image of digit that is stored in a MNIST dataset. In addition to that we use a dropout regularization with a dropout rate of 0.2 and finally, we have a dense layer with 10 units as an output layer that uses softmax as an activation. We want to output one of the 10 digits as a desired output. We use atom as an optimizer and sparse categorical cross entropy loss as we are we are interested in getting integers as an output and we will track accuracy as a metric. Let us create a model and examine the model through model dot summary method. So, you can see that the model has got exactly 2 layers 1 hidden layer with 512 units then there is a dropout layer dropout is actually applied on the on the first layer and then we have an output layer with 10 units. So, there are 400,050 parameters in the model. We would like to automatically save check points during training. This way we can use a trained model without having to retrain it or we can pick up the training where we stopped it last time in case the training process was interrupted or stopped for some reason. We use a callback model checkpoint for performing this task. This callback takes a few arguments for configuring the checkpointing. Let us look at the usage of checkpoint callback. So, first we will have to define and configure the checkpoint callbacks with callback which is done through this particular code that is highlighted on your screen. We define the checkpoint path this is the directory path where we want to store the checkpoint and then we also configure the checkpoint with the checkpoint path and by specifying what part of model we want to save. Here we are trying to only save the weights we are not we are not saving the architecture or the optimizer configuration along with the weights. Here we only want to save the weights. So, this is the configuration of the checkpointing the simplest configuration of the checkpointing. We will see even more advanced usage of checkpointing later in this particular exercise. Then we create a model with create model command. Remind you that create model command actually creates a tensor flow model that has got one hidden layer of 5 and 2 units and an output layer with 10 units and we will fit the model by running the training loop for train for 10 epochs. And notice that we are using a callback in the training process. This callback creates a single collection of tensor flow checkpoint files that are updated at end of each epoch. So, this particular configuration checkpoints the model at the end of each epoch. Let us train the model because we are very few example we finish training the model very quickly. Let us look at the checkpoint directory. So, exclamation mark followed by any command that we write is interpreted as a Unix command and is run as if we are running it on the command line. So, this particular code snippet will print the directory listing for the checkpoint directory. Note that you train the model for 10 epochs. So, you can see that there is a checkpoint there are few files that are created in the checkpoint directory. So, here we will have to first create a model with the same architecture as the original model and then restore the weights and apply those weights in the new model. It is perfectly fine to share the weights from the from the previous run even though this is a different instance of a model. Before applying the weight what we will do is we will create a model and we evaluate the model performance in the test even before restoring the parameters. So, in this case some random values will be used for parameters and we will see the accuracy that we get is just by chance. So, here we get only 10 percent accuracy as against the 99 percent accuracy that we got or 81 percent validation accuracy that we got during the training. Now, let us load the weights from the checkpoint path and again evaluate the model and check the accuracy. We can see that we are able to get the accuracy of 87 percent as we got earlier during the training of the model. So, you can see that just by using the same model, but just by building the architecture as the original model and restoring weights helped us to get the same performance as the original model. So, you can see that this is very very powerful. Imagine you build a model and share its weights with your friend or with your colleagues and then your colleague can take advantage of these weights and recreate the same model and use it for the prediction task. Let us look at various options that we have for creating a checkpoint callback. We can specify a period instead of saving the checkpoint after every epoch, we can specify a period after which the model should be saved. So, in this case we can do that with a period argument and here we are setting period to 5. So, we are going to save weights every 5 epochs rather than doing it after every epoch and here we are only going to save the weights. We also give the checkpoint path and configure it to store the ID of the epoch. So, this is a unique ID that is created for a checkpoint which consists of the ID of the epoch. So, that it is easy to identify what epoch is a checkpoint from. Then we create the model, we save the weights to the checkpoint path and then fit the model and note that in the fit function we give the callback as one of the arguments. And you can see that the model is getting saved after every 5 epochs. Here we are training the model for 50 epochs. So, we should see there are 10 checkpoints. So, you can see the checkpoint at 5, 50 epoch, 10 epoch and so on up to 50 epoch. Let us look at the content of the checkpoint directory and you can see that there are now 10 different epochs. You can see that there are 10 different checkpoints that are stored in the directory. If we use latest underscore checkpoint as a function and give checkpoint underscore directory or the checkpoint directory as an argument, we get the latest checkpoint. By default TensorFlow format only saves the 5 most recent checkpoints. So, let us retrieve the latest checkpoint and create the model with the weights from the latest checkpoint. You must be wondering what are these different files that are there in the checkpoint directory. Let us take a look at them. So, you can see that there is a format, there is a sharding kind of a format. Since we train or model on the single machine each checkpoint will have all the weights stored in a single shard. If you are doing it on multiple machines there could be there could have been multiple shards over here. Apart from the callback we can also manually save the weight that is the other way of saving the weight and we can simply use model dot save underscore weights function and we can provide the directory or the path and we have to provide the file name where we want to store the weights. Let us run it to check it. So, we are essentially saving the weights to my underscore checkpoint file and we are loading the weight from that particular file. So, you can see that we are getting again 87 percent accuracy after saving the weight and restoring it in a new model. Instead of only saving the weights we can also save the architecture of the model or the optimizer configuration. Let us see how to do that. So, the entire model can be saved using hierarchical data format or HDF 5. We can specify the HDF file with H 5 as an extension. Here we create the model, we train the model and we will save the model into HDF 5 file with the file name my underscore model dot H 5. Later we can load the model from this particular file the HDF 5 file and use it for prediction. I would like to point out the difference between the earlier checkpointing method where we were only storing weight as against this particular method where we are storing the entire model. In the checkpointing we had to first create the model and then load the weights into the model and then use it for the prediction task. In this case we do not have to create the model as the model itself has been saved in HDF format. We simply load the model that essentially creates the model puts the weight and the model is used and the model is ready for the prediction task. It is important to note this particular difference. Let us load the model and we can see that this model has got exactly the same summary as before and then check the accuracy of the model yeah it is almost the same accuracy of around 87 percent. So, this technique saves everything essentially weights model configuration and optimizer configuration and Keras saves the model by inspecting its architecture. In this module we studied how to store and restore TensorFlow models to and from the disk. These techniques are very handy when you have models that are training for long period of time or to export model for deployment on different platforms. Hope you enjoyed learning these concepts. See you in the next module. Garniwal.