 Namaste. In this session we will build a logistic regression model with TF dot estimator API. Logistic regression is often used as a baseline for classification tasks. In this exercise we will use a titanic dataset with a goal of predicting passenger survival given characteristics such as gender, age and the class. Let us first install the required libraries like sklearn and tensorflow. We use matplotlib for plotting and numpy and pandas for data manipulation. Let us load the titanic dataset. We will install tensorflow 2.0. Let us load the dataset. So, we have training data in DF train data frame. The evaluation data is in DF eval. The labels are in y underscore train and y underscore eval respectively. Let us first explore the data. We will first see, we will first examine a few rows in the dataset. So, you can see that there are, there are features like the sex of the passenger, the age, the number of siblings, spouses, then fair, then class, etc. Let us also use a describe command on the data frame to get statistics about each of the numeric columns. So, we will get statistics like count, mean, standard deviation and various quantiles. You can see that there are 627 examples in training and 264 examples in evaluation set. Let us plot the histogram of the age and we can see that majority of the passengers are in their 20s and 30s. That is where the histogram is speaking. There are approximately twice as many male passengers as female passengers on board. We can also see that most of the passenger or majority of the passengers were in the third class. And let us look at the survival by, by sex of the passenger. And you can see that females have higher chances of survival than their male counterpart. So, you can see that sex can be a very good predictive feature for the model. After exploring data a bit, let us get into engineering features for the model. Feature engineering is extremely important to get good results. So, we will be using feature columns for converting the raw features into a form that can be consumed by estimator API as well as for constructing cross features from the original features. So, we have several categorical features and a few numeric features. We use numeric feature columns for numeric features and for categorical features, we first find out, find out unique values from each of the categorical features and use categorical column with vocabulary list for converting the categorical features into one hot encoding. Then we define input function to specify how data is converted into tf dot data dot dataset format that feeds the input pipeline in a streaming fashion. The dataset takes multiple sources such as data frame, csv formatted files and many more. Let us define a dataset from tensor slices and if required shuffle the dataset and return the dataset in the batches. We repeat the batching operation for the number of specified epochs. This is how we define our input function and we make the input function for training as well as for evaluation. With only difference in the training we have said shuffle to be true, in evaluation we said shuffle to be false and we specify number of epochs to be one in case of evaluation. Let us inspect the dataset. You can see that the dataset has feature keys which are same as the original dataset that we explored and we can also see the feature batch for class for the attribute class and you can see the values like third, second, third and so on. So, there are 10 values in a batch here because we have specified the batch size of 10 and in the same manner we can see that for a label batch also there are 10 values that are present here. Let us define our logistic regression classifier using tf dot estimator dot linear classifier command. It takes feature columns as its argument. Then we train the model by giving train input function as an argument and we evaluate the model using evaluate input function as an argument. We store the result in the result variable and we will print it at the end of the execution. You can see that we got accuracy of 76 percent on the evaluation data. The baseline accuracy was 62 percent and there are bunch of other metrics that are also print like AUC precision, AUC precision recall, average loss, loss, precision and recall values. Now we have reached accuracy of 75 percent. Now what we will do is we will try to include even more features and see whether we can get past this accuracy of 75 percent. So, what we will do is we will combine different features. So, one of the feature combination that we will try is between age and gender. So, we construct a crossed column between age and gender and we set the hash bucket size to be 100. I would like to remind you about crossed column. So, whenever we cross, whenever we construct a crossed column, it automatically uses hashed, it uses hashing for storing the values in order to avoid the problem of sparsity. Here we specify the hash bucket size to be 100. Let us add the combined feature in the model along with the earlier feature columns. So, we can specify the we can we can construct a derived feature column with all the crossed features. Here there is only a single crossed feature and we specify feature columns to be original feature columns and derived feature columns and that is how we install sheet or linear classifier and then we train it and evaluate it as earlier. Since we have created cross column, it took slightly longer. Now, we have achieved accuracy of 76 percent which was which is 1 percentage point higher than the earlier model. Finally, let us look at the ROC curve to get an idea about tradeoff between true positive rate and false positive rate. So, this is the ROC curve that we got which is a reasonable ROC curve for the classification task. So, in this session, we studied how to build a logistic regression classifier with tf.estimator API. We use Titanic dataset as an example dataset to predict probability of survival of a passenger from a logistic regression model. Hope you enjoyed learning these concepts. Hope to see you in the next module. Thank you.