 Good evening and welcome to PiData. So I'm Karthik and I'll be giving a presentation on what is from scikit-learn to TensorFlow estimators. So this will be purely a hands-on sort of an approach. We'll be discussing about why are we talking about scikit-learn, what is it that TensorFlow has updated, things like what are the latest updates, what's going on, and all those details about TensorFlow itself. So the main goal of this talk is to make it very practical. So it's basically trying to reduce the barriers of entry for machine learning. So what scikit-learn did for machine learning, the classical techniques, TensorFlow is trying to do that for deep learning. In this talk, we'll try to see how TensorFlow is sort of adopting scikit-learn's methodology when it comes to deploying and developing machine learning models. So before I begin, let me say, may your local minima be global, where your variance be bounded, you have labeled data plentiful, and a compute massive. So that's not my code. So I'm quoting Ilya Siddhskever from Twitter. So this is a wonderful code, and I think this is very apt. Now that we are in 2018, I think it's all the more important that we have, so now that we have massive computes, but quite a lot of data issues and things like that, this code sort of wish more so actually makes it all the more important for all of us working on machine learning. So before I begin, let me give you a brief introduction to who I am, and like I introduced, I'm Karthik Muthuswamy. I work as a Data Science Researcher at SAP. I work at the SAP Innovation Center Network here in Alexandra Technopark. I'm also a Google Developer Expert in Machine Learning. So there are a few in Singapore, and I was one of the first few here, and I'm also a proud alumnus of NTU where I did my PhD sometime back. So I've been here in Singapore for quite a long time now, and yeah, so this is a brief introduction to what I do. At SAP, what we try to do is we have a complete machine learning center where we try to actually see how we can leverage user data for getting insights. So this is not talking about what you have from website data or things like how do you privacy? So we actually take privacy as one of the biggest topics since at SAP. We try to see if a customer has some sort of a data residing on a database. We're trying to leverage the information that we have to provide insights into that sort of a data. So things like structured data, unstructured data. So structured data being, say something like CSVs or databases, so things like that or of interest for us. Images also come into play here. So most of our, as you might have understood, most of our text, so most of our data is actually revolving around text. So which means that we need to see what probably a user is actually talking about or what is it that actually goes on in what the customer is actually storing. So for me personally, what I do is my day job involves around understanding data, coming up with models or playing around with different models. So I'm talking about machine learning models. So coming up with new insights to data and so this is fun. So more or less, I actually work on to from start to end how do we understand data and how do we bring insights from the data. So this topic is also more or less set up on that side. So it's like how do you understand data? How do you make it usable and how do you actually train a classifier or some sort of a machine learning model that could actually solve the problem. So let me move on. In this case, so the objectives, like I said, are basically trying to understand data, visualize features and more so about TensorFlow estimators, and also explore some of TensorFlow estimators. So there's something called a pre-made estimator or a canned estimator. So trying to see how we can leverage that. Another thing is also how do we develop our own custom estimator. So now that you say there's a canned estimator, is it even so why do I have to use a pre-made estimator? So what we will try to do is we will try to use a pre-made estimator and use, try to develop our own deep neural network using a custom estimator. So that's also one of the objectives of the session. And finally, juxtaposed scikit-learn and TensorFlow estimators. So what I will do is this is quite hands-on. So I'll also share the code. All the code is already on my GitHub repository. I'll share the link for the GitHub also. The idea here is that if you, so just before we start, so probably I just want to know how many of you have used Python before. Okay, yeah, almost everyone. So yeah, and what about scikit-learn? Okay, quite a few. How about TensorFlow? Okay, so it's more or less half and half I would say. So did you like using TensorFlow before? So have you used, yeah. Yeah, so have you used TensorFlow and scikit-learn? So how many of you have used TensorFlow and scikit-learn? Yeah, okay, so quite a few. So it looks like, so there are quite a few people who've used scikit-learn, who've used, so the intersection between scikit-learn and TensorFlow users is there. So which means that, so the point here is that, so people who've used scikit-learn actually find it very, very, very useful because the paradigm is quite straightforward. You just take an estimator or some regression problem, all you have to do is just massage the data, and then use the proper classifier or predict a regressor and then call the train fit and predict paradigm. So this is actually proven to actually give very good results, so because people find it very, very useful. However, what TensorFlow did was it said, okay, let's talk about sessions. Let's talk about graphs. Let's create a session and then run inside, run the graph on that session and then think about managing all the objectives, do all this. So what happened was the barrier to entry to actually use TensorFlow became quite difficult. Recently, what Google did was basically said that, okay, let's revamp this and this is also because a lot of people started creating wrappers around TensorFlow. So if you were quite early adopters of TensorFlow, you would have seen a lot more wrappers earlier like TFlearn, which is also there still. There's also something called SKflow, which is still there but TensorFlow has adopted that, and that's one of the culmination of that is what TensorFlow estimators is. That's actually started off the reason why TensorFlow estimator actually started coming into play. So what we will do is we will actually see how we can actually probably have a scikit-learn model along with the TensorFlow model and then we can develop it together. So the data pipeline will be the same, but what we will try to do is basically just use the data pipeline that we used in common and then simply train a TensorFlow model as well as scikit-learn model and then see how easy TensorFlow has made it to actually develop one. So that's one of the objectives also from the session. In the end, you should be able to quite easily take up TensorFlow if you are a scikit-learn user, and probably people who don't actually like using TensorFlow, should probably be able to appreciate the fact that TensorFlow is also an evolving software, which means that, so I think there is a Dev Summit on March 30th this year. That could be new breaking changes and of course newer APIs, newer versions and of course there's going to be. So one of the quickest in adapting library for machine learning. So there's always going to be constant changes and without change, I think there won't be much progress. So TensorFlow is actually moving towards that, so where it could be quite easy to actually develop and deploy the model into production. So let's move on. So what I will try to do is probably talk about what, so why are we actually talking about this? So first of all, I'll introduce something called the TensorFlow input functions. What the input functions are basically, it tries to pass features. It allows you to pass features as well as the target data. So basically, get the train predict, evaluate pipeline directly in. It also allows you to do feature engineering. For instance, what happens is, if you have some data and you would like to have say some pre-processing to it, so probably drop some NANs or if you would like to pre-process your strings to maybe bring it down to lower case character. So things like that. What the input function does is it allows you to do the pre-processing inside the function. So this would be the way in which the new input functions actually work. So I'll tell you how this actually fits. So towards the end, all of this will actually fit in and then we will actually go through the entire example. So this is for example, how an input function would look like. So for example, you could write an input function, you could pre-process your data first, and then you could actually map that to a feature column for instance, and then you could the corresponding feature data. So basically, and you will also have a TensorFlow for other labels itself. So similar to how Scikit-learn did earlier and we used to write the pipelines. In this case, what happens is an input function is basically a function which is actually trying to do it and wrap it around, wrap all these data pre-processing pipelines around the actual data itself. So in this sense, what happens is you could have multiple input functions doing different things, and all you have to do is encapsulates it quite very well. So it's super efficient in that sense that you can actually keep it all modular and you don't have to actually worry about how these are. So in some sense, it's actually how we actually develop software in them. So it's like we have modules and we actually try to integrate that. So this way, the input function also behaves in a very similar fashion, and you can use your input functions to actually get your data back. So it's like features or the label. So you can actually have iterators or batch the data and things like that. So the input function allows you to do all these generators or something or iterators and you can actually get batches of data and randomized data, things like that. So all these are possible with the input function sort of an approach. The second one that the, so this is basically the dictionary, basically the feature is basically the dictionary that contains the key value pair, the mapping between the feature columns of the tensors to the actual corresponding feature data. On the other side, you have the labels which are going to be a map from the labels to the actual. So it's basically the target values, which is basically another tensor which is containing the labels itself. So moving on to the estimators. So the main topic, estimators is actually quite a high level API. So like I said earlier, TensorFlow sort of has a graph, a session routine that where you have every graph you need to actually build the graph and then you need to run a graph on with a session. So every graph is by default or actually mapped onto a session, and only if there's a session, you can actually run a graph on it. So what happens is here, it is actually quite cumbersome when you're actually starting to learn machine learning, and it actually becomes too much of a verbose code that you have to write just to try to say develop a simple deep neural network or a simple multi-layer perceptron sort of one approach. So what estimator does is it basically gives you a high-level API that actually makes machine learning program development super easy. So in this sense, what happens is you actually write very similar code to actually cycle on. You define your data pipelines similar to what an input function is doing, and you have an estimator that is actually saying, whether it's basically a customer estimator or it's a pre-made estimator or canned estimator that I told you about, and all it has to do is you basically call a similar routine like the train validate and the predict routine, and you get through the final model. So one of the good things with the estimator that I'll also talk about is that the fact that, so like I said, it actually encapsulates the train, evaluate, and it actually makes it, yeah, this is one of the biggest reasons why estimators are actually useful. In production, what happens is it is very difficult for if you use TensorFlow before, it is very difficult for you to move the TensorFlow model from TensorFlow model to a TensorFlow serving sort of a model. The routine there is actually extremely cumbersome. You need to actually understand which is the input, which is the output, what are the data types, and certain things cannot be directly ported into a TensorFlow serving sort of a model. What TensorFlow estimator does is it makes this super easy. So if you define the input pipeline, and if you define the estimator, effectively, if you can actually train an estimator with the pipeline that you already have, you can directly export that to serving. So TensorFlow estimators gives you a directly export-save model that actually lets you directly port your model into TensorFlow serving sort of a paradigm. So you don't have to worry about how this is done, or what will happen, or how the input functions are processed or what is going to happen to the output. The serving itself sort of gives you that feature. So you simply can train the model, and you can, so one of the other good things with estimators is that the summary writing becomes super easy. So all your losses, all your accuracies, everything is actually written by default. So you don't have to worry about writing all these to the file system all by yourself. Estimated does that by default, and I'll show you an example where this actually goes a full cycle, where all you do is define the pipeline and you actually train the model. Estimated sort of does the training, and it actually logs everything neatly, and then when you're done, you can easily export the model to TensorFlow serving. So you have a full paradigm where you have, from the data start to the pre-processing, do the custom, you define your model and you train, test and evaluate, and finally, you can actually deploy it onto a production system. So that's how complete the pipeline is in the TensorFlow serving sort of a model, in the estimator sort of a model. So yeah, like I said before, there are pre-made estimators as well as the ability to make custom estimators. So the pre-made estimators were earlier called as canned estimators, where all the models are pre-made, which means that someone has already taken the pains to ensure that the data integrity is taken care of, and that the input and output matches, and all you have to do is simply call a wrapper function, and you say probably DNN classifier, and you'll say the number of layers, and all you have to do is just define the input and the input pipeline, and call this and then say train, evaluate, and predict. So you don't have to worry about what the model is doing behind the scenes. The model itself was written and taken care of by someone who's already written the model at the back end. So you don't have to worry about how it's done, or what is going to happen. So all you have to worry about are basically the hyperparameters and how do you actually manage those hyperparameters and look at the final results. So this way, this is actually sort of, again, going back to scikit-learn. You can actually see that this is very similar to how scikit-learn evolved. So you don't have to worry about how things are implemented behind the scenes. If you are interested, you can definitely go ahead and take a peek or even manipulate with it, play with it, and change the source code. But effectively, if you simply want to use a pre-made estimator, you can effectively do that without having to actually worry about how to implement something that's actually run of the mill. So if I have a multi-layer perceptron, if I just want to use a DNN classifier and simply say the number of layers and how many hidden neurons every layer is going to have. I'll show you an example of how this is actually happening, and you can actually see how this is actually very straightforward and how you can actually, the scikit-learn version and the TensorFlow estimator version is very similar in that sense. So the custom estimator on the other hand, in this case, this is where you want to write the model yourself. So things like convolution neural network, you would want to actually define the convolution neural network yourself. In this case, the custom estimator sort of gives you the wrapper again. All you have to do is similar to how I showed you earlier, you have to just write the model and you have to define the TF, so they use like TF layers or TF Dense, you use the TF convolution. So all these layers, they've actually brought forward. So very similar to how Keras work before, TensorFlow works very similar to that. On that note, actually, Keras also graduated into the TF dot Keras, sort of a paradigm here. So now, if you actually have a TF Keras code, you can actually use that in TensorFlow. So effectively, all you need to do is first import all the Keras models in TensorFlow, and you don't have to worry about what is actually, if your entire code base is written in Keras, and if you want to move it to TensorFlow, all you need to do is just upgrade to 1.4 or above, and you can simply change your headers, which is basically import functions, and you can change it to import TF dot Keras. I have another notebook and a blog on this about how do we actually move from TF dot Keras, sort of a thing to use the Keras model itself and leverage the Keras model using TensorFlow. So earlier on, you need to install Keras separately, TensorFlow separately, and if you install Keras, then Keras would actually install Tiano also, and this was a problem, especially in production, where you don't want unnecessary software being downloaded and installed with the TensorFlow 1.4 update. What happens is TF Keras comes directly into TensorFlow. It was inside the Contrib model, Contrib module earlier, now it comes directly into TF dot Keras, which means that you can effectively do import TF dot Keras, and you can have effectively all the backend, all the time distributed LSTMs, RNNs, that you wrote in Keras directly running on TensorFlow. And the good thing with that is that you don't have to worry about how the model scales on distributed systems. So because you're running TensorFlow behind, you don't have to actually, so effectively you don't change your code base, you just need to change your import level models and your code itself, and effectively you will be able to get all the goodness of Keras and the advantages of TensorFlow running on distributed servers and giving you all the logs and TensorFlow board and all that. So the advantages of both sides live here, but you have you doing minimal effort in this case. So let's move on. In this case, so then we'll talk about the pre-made estimators. What pre-made estimators allows you to do is it sort of encapsulates the best practices like I said earlier in probably determining where different parts of the computation graph should run. So first of all, to TensorFlow, just like how we spoke about earlier, it actually figures out the computation graph and how it should run behind all by itself. And this is actually written by someone else, so you don't have to worry about how it's doing it. And the strategies about how it's run on a single machine or how it's actually going to be distributed on a cluster, all of this is being taken care behind the scenes. So which means that if you're running actually distributed systems on your, say on your network, effectively what will happen is you need to show all these, you just need to show the estimator where these systems, the machines are running, and it will scale up and simply train and then it will come back with all the results. So in the back, it actually does that all by itself and you don't have to worry about how it's actually doing this for you. And again, the last part is about the summary. Like I said earlier, the estimator itself is going to write all the logs and the summary for your training, which means that you don't have to go through and rewrite all the summary. You don't have to go through the logs to actually create a graph and all the results and you don't have to plot it and then see what are the final results. So that way it's actually extremely useful. And I would say it actually saves quite a lot of time because in the end you don't have to rewrite all the core, all you have to do is, if you want to change the model, just change the input function and change the, it's probably if you want, change the custom estimator and that's all you have to do because effectively your pipeline remains intact and your results are going to just reflect directly on the file system. So a typical workflow or probably the best, the recommended workflow for using a pre-made estimator. So this is probably like a best practice is sort of an approach here, is that you basically write your data set input functions and you define your feature columns. Feature columns are something very similar to how you define probably in this case, you have population which is basically a numeric column, crime rate which is again a numeric column, median education here which is again a numeric column but what will happen is in this, some of the cases it could also be, so it will take care of how these features are actually being mapped in the end behind the scenes. So you can actually figure out how your data is going to be in this case and the median education is also going to have a lambda function which is going to actually run, it's actually going to find the mean and it's also going to compute it and then it's going to store it as a feature. So effectively your median, so you're effectively finding out the feature columns, defining it beforehand. So it's how in C++ we actually say you have integer, you have floats and you're trying to tell the compiler that you have some arrays which are of particular data types. The feature column is basically trying to do more or less the same idea. It's trying to allocate the data before even it actually looks at it or it looks at the original data itself behind the scenes. Once you do that, you actually instantiate the premade estimator. So in this case, we're talking about the premade estimator, so it could potentially be a DNN classifier or a DNN regressor. So it could be any of the premade estimator and finally just call the train evaluate and the and or the inference method. So you don't have to worry about, so most of the other things like, this actually makes sure that your code base also remains small. And it also like I said earlier, it actually makes sure that all your, you have to do when you actually change the model is, change just the estimator part of it or rewrite the custom estimator here, if you're planning to say change the estimator. Again, so I'll talk about some of the other best practices also. When talking about the custom estimator, the custom estimator actually provides the capability to implement your own model function like I said earlier. But in this case, I'll talk more detail about what's actually happening or the best practices to do with the TensorFlow estimator or custom estimator framework. What will happen here is that typically, one of the good things that you can do is, because you have premade estimators, don't worry about writing a custom estimator straight off the bat. Instead, what you can do is you evaluate a premade estimator if possible. So stick to a premade estimator whenever possible. Try to evaluate a premade estimator and see if it actually makes sense. If your data pipeline and all the other, your integrity of the data, everything actually goes well. Then what you can do is you can evaluate, you can go through the same idea, you can actually go through and evaluate other premade estimators. So the idea here is that you don't actually go for a custom estimator until and unless you're actually quite sure that your estimator is working, your data pipeline is working, your premade estimator actually probably gives you a baseline evaluation. So this is what a typical machine learning data scientist would probably do is basically first start a baseline estimator, and this is what is the best practice also. So you basically start with a premade estimator, try to use a premade estimator, and then see if the premade estimator is still not good enough, and that is when you will actually try to go for writing your own custom estimator. The final step you would probably write and compare the estimator that you actually wrote, and you will actually, again, one of the good things that, so if you took the deep learning AI, the course by Andrewing, one of the things that he keeps saying is that you need to have a constant single measure. So if you're going to evaluate a model, one thing that you have to do is keep a single measure to actually evaluate, say probably even 20 models or 25 models, because what will happen is in the end of the day, you will have a lot of experiments and a lot of different scores. You will never know which one to compare and why it actually works well, or in the end, you are in a dilemma as to choose a particular model. So always stick to a single performance measure. This actually works very well. Even if you're trying to automate, say, something like a grid search, then what will happen is you will using a single performance estimator, say, like f1 score could actually use, could be very, very helpful to you, because what will happen is in the end, you need to actually say or convince your boss about, say, getting a larger GPU or training a deeper model. So this could be a way. You could also say, first of all, convince yourself that probably the baseline is not working or that you need to actually develop your own custom estimator. It's also to show that you need more compute or you need to actually see if your data is actually, if your model is overfitting, or if your data is not sufficient. All these can be done if you have a single performance measure. So I think this is one of the important points, and as the best practice would be to actually still try to stick to a pre-made estimator, because it actually has all the goodness that TensorFlow actually can do all by itself. So otherwise, of course, you can always rewrite the custom estimator and you can implement the model. So let's keep moving on here. So then I'll actually switch to the demo here. So what I will do is I've actually shared the code. So all the code is actually available on this bit.ly link. So if you're interested, you could actually download the code and run the code all by yourself. So just to keep tab that all of this is running on TensorFlow, 1.4, scikit 0.19. So you will have to do a check on your machine and run all this code by yourself when you're testing it out. So let me know if I can actually switch to. All right. So for the quick test, so what I will do is first I will go through the first the example here, which is comparing scikit-learn with a pre-made estimator, which is the DNN classifier. For this dataset, what we will go through is basically the one of the commonly used datasets, which is the breast cancer dataset here. The task here is to evaluate and actually classify if it is malignant or benign. The dataset is also available on scikit-learn here, and what we will try to do is we will compare. So in this case, I've used an earlier version of TensorFlow, but you just need to change some of the import statements to actually run this with a newer version. So what we do is first to load the breast cancer dataset. The first point is all about just about loading and evaluating. So the first few lines, the first few cells, so are about the trained test split, and just making sure that the data is actually scaled. So this is purely feature preprocessing. You get everything set before you actually think about the actual classifier. So in this case, the implementation is basically a very simple deep neural network classifier. So this is an example back in 1.2.1, where you use tf.contrib.learn. So this is the tf.learn sort of an approach that was used before. I will show you another example how it looks currently. So this is to juxtapose scikit-learn with TensorFlow estimators. This is a pre-made or canned estimator. If you see what happens is you can simply use a feature column. In this case, I say that the data is basically real-valued columns from the input, and the input is basically from the scikit-learns in-train. Here all I do is define a deep neural as a DNN classifier. This is an example of what I was talking about in terms of not writing your own custom classifier. All I do is basically say the input, which is basically the feature columns, and I say that because it's a multi-layer perceptron, I simply say that there are hidden units, and I just give it the hidden number of units, which is basically a list of the number of units. I've defined units to be 10 here, and the number of classes is two because all I'm trying to do is see if it is magnet or benign. If you see, it can actually straight forward, it can train quite straight forward, and the final results are about 96.5 percent. If you do the same thing, so if you see this, what happens is you're actually moving from scikit-learn to TensorFlow to scikit-learn to TensorFlow. So this pipeline interweaves, and it actually becomes quite easy for you to actually go in and get out of through any of these APIs without any issues. So all you have to do is in this case, simply define the canned estimator, and you can actually effectively get the final results just like how we would do in scikit-learn. Just for actually understanding if the results are good, we actually plot the confusion matrix as well. But this is not sufficient. So what we are trying to do is simply just train a scikit-learn sort of a model. So what is a great big deal about just training this model and then showing this? So what we will do is we'll also talk about the structured data. So before we talk about structured data, we will say, first of all, what is the structured data? In this case, like I told you earlier, we'll have data which is actually in fixed fields, say CSV or TSV files, for instance, so Excel sheets or Google Spreadsheets, for instance, where the fields are all fixed. You have probably 10 columns with probably each of these column having a string or a category or something where everything is fixed. So of course, every column could be a text and the length could also be long. So that's not our question here. We have things that are actually, so here the data is actually structured. You know when you actually get in and you see the data structure, you know that you can actually get and retrieve proper fields from it. One of the objectives of this notebook is also to visualize the data that we have. In this case, we'll be visualizing the UCI census data. So UCI actually published the census data for US in 1990, so from 1990. So this basically says the sum of the age and income of different people in US, I think in different countries also. What we will try to do is we'll actually see how we can train and evaluate and also explore the data to actually get more insights into the data even before we talk about classifying the data and getting some results out of it. So if you check the GitHub code, there is also a notebook called Facets. So what Facets does is it basically is an open-source machine learning exploratory tool that actually lets you understand data and visualize structured data for it. So what you can do is you can actually download the data. It's an open-source project. So you can actually clone the data and look at it. In this case, what I did was simply download the data from the archives of UCI, and I'm just trying to see how the data is actually looking. So let me simply run this code. So typical to what we would do earlier, basically see a head, the Pandas head from the CSV file. If you see that you have different columns here, basically the age, the work class. There are some number here which we don't have to worry about. It's defined in the names file anyway. There's an education which is telling how different people, the highest qualification of everyone, the education number, the marital status, occupation, relationship, race, and the sex, and the capital gain, and the capital loss, and hours of it. So if you see that, we have different data types, but everything is in a structured format. In the end, the target is basically saying whether it's less than or equal to 50k or not. So the question is this task or the idea is that the data itself is just giving us some information about people in this setting and the income that they actually have. So this is basically the target income. So we are trying to see what the target is, given this data that we have. So what we will do is we will basically go through this data and then we will see, we would like to actually get some insights from this data. So let us run through this. So what Facets does is it basically gives you a very neat feature engineering or another feature engineering here. It's basically a feature visualization tool. What is great about this? It's interactive. So what I can do is I can actually see each of these columns and I can actually see what is the percentage missing. So if you see the number of zeros here are 91.67 percent because it looks like some of these are non-zeros and here also things that actually pop up. You can also see that you can actually change the way in which the data visualization actually goes through. You can change it to have a log and you can actually see that. So if you see go through the age, the distribution of the age is actually quite cute or here. If you see people more than 60 is actually quite low, and if you see people are invariably less than or equal to 40 are probably very high. So number of people in the age group of 20 to 45 probably are quite high. So all this analysis gives you can actually do some sort of, even before you think about the data, you can actually see what distribution the data is actually having. You can also see that the blue is in the train and the orange or the brown is actually the test data. This also shows how your test and train distribution is, and hopefully you should have this a very similar distribution hopefully, and this is one thing that will give you that insight, which means that if you have an imbalance in the train and imbalance in the test, hopefully the imbalance is very similar in the train and the test so that you can do a sampling that actually hopefully, you can actually have a very similar distribution and your training data does not actually have a very high skew towards say a few classes. So all this can be actually seen here. So there are also other things that you can actually do to see the percentage and you can also see other categorical features. So in the categorical features, you can actually see how many data is missing. So this is one more thing that's actually very important. When you're actually training the data, you actually need to know if there are any data missing over here. If you see in the work class, apparently 5.64 percent of the data is actually missing. So these things actually show up just right even before you think about training the data or actually looking at what the data is at. So these are all insights that are actually very useful for you, even before you actually know the distribution of the data itself. So I could actually go and see how many of these I can separate this by male or female. I can actually see the distribution of, if you see this itself, the proportion is actually quite different in this case. So let me expand this here and see. So it actually gives you very nice visualizations about different categories and how they are. So the skew in the data itself. For instance here, most of the data is from male, and only probably less than half of the data is from females. So this is again going to show about how we have some bias the male dominance here in the data in our society itself. So the exploration of the data itself is going to go and show us how we are actually evolving as a race and as a human. So it's actually going to show the actual underlying problems from the data that you have. It's going to show, say, semantics. So if you're going to explore words, it's going to show how words are interrelated, for instance. So these are all things that was actually a very interesting work done by one of the universities in seeing bias in word embeddings, which means that how words are actually occurring together. So they actually did an analysis for the past 30 years, and they actually see how words have evolved, and how the context of the words itself have changed, and hopefully for the better, and they actually see that we are actually moving towards hopefully a progressive society. So this is the reason why data analysis is very important, sorry, even before you start training your data. So without exploring the data, without doing any data massaging, it makes no sense to start training a classifier and blame the classifier in the end and say that your data is not good or your classifier is not good, without understanding the data. And probably you have to think about what happens if I'm giving probably more data. So if the test data is purely about females, and only from females, for instance, then what is going to happen to a classifier that is trained with such data? So these are implications are very, very clear when you do a data exploration and a visualization. When you have a visualization, it's very easy to understand the data itself rather than just look at numbers. And in this case, Facets does a very good job in this case. And one of the nice things also is that if you go to the Facets website, they give you a deep dive, which is basically it takes the entire dataset and actually looks at each and every single data point. So if you're going to have like say a million data points, you can actually still do a visualization. It's going to take some time though, but it'll actually give you a wonderful visualization and it is going to actually show you some very, very interesting insights into data even before you look at or start to think about training some data, some classifier on it. So I would highly recommend looking at Facets. Go through the examples that were provided on the website. See some examples. So if you're skeptical about installing things on your laptop, or if you think it's going to break you something, then try they have some examples on their website. You can upload a CSV file, probably something that's out in the open and see and explore and play with the model yourself, play with the data of the software, and see and appreciate the effort that actually has gone through in making the software possible to be open source. So with that being said, what we will do is we will actually go to talk about the structured data classification. So having seen the data and about how the data is actually distributed, we actually see that some of the data is actually missing. Say for instance, we actually saw that the work class or one of the terms, one of the features actually has null data points. So when you actually go through this notebook, you'll actually see some pointers and comments in the notebook itself telling you why different functions have been used and what to do for training this using this entire notebook. So let's simply go through this and run this notebook while I talk about this. So this requires of course 1.3 and above. So we are going to use 1.4. When you actually download the UCI data, it's not going to have any headers. So we define the headers and just like how we would actually do a pandas, we actually do the same pandas head here and the tail here if you want. And we actually split the dataset into, so we actually pop the data. So we actually get an x and a y so that we have a target value and we apply this lambda function so that we can actually get true or false value so that we can see if a person actually has greater than 50k or less than 50k. So that's the only thing that you're trying to see. The target is basically to predict or to estimate given the input data, if a person actually gets more than 50k or not. So just from the features that we have as inputs. So just again for sanity check, we will do the test-trade or train check. So it's reasonable. So like I said before, you have in data files. You have the input function that is a pipeline. You use the estimators and in this case, it could be a pretrain or a custom estimator, whatever it could be. This entire notebook has descriptions about what the input function is, how you can actually use the input function. What I will basically do is have two functions, basically one for train data and one for test data. Why do we have two? That's because for this case, I want to use all the data here and I want to batch the data here. I would like to run this forever. So effectively, I would like the classifier to keep training till the end or when I have a break function, sort of a trigger which is going to say train for enough when the training loss, maybe the validation loss starts to go up probably. So these are things that you can actually do with having different input pipelines. So in this case, the estimator, if you see directly I can input Pandas input functions. There are different ways you can give the same input function and I'm just using the Pandas input function here directly, so that TensorFlow estimator knows that the input is basically coming in from a Pandas frame data frame. In the first case also, I have the ability to shuffle the data, which means that although I would be having an input pipeline that's for training, in the test I don't need to shuffle, so I don't need to waste compute and effectively, that's the reason why we have different input functions here. Moving on, so we'll basically do some feature engineering, which means that we will basically convert the data into numeric values, bucketized values. So bucketized values means that you're going to have split the data, if you're going to have a continuous age range. What you will try to do is you put it into bins or buckets. So that's the idea of behind bucketization, and for a numeric column, you will simply use it as a numeric column as it is, and you would try to predict the data. You'd like to use the data here, and let's go through here. We'll use the age as a numeric column, and we will simply get the data. Like I said, so we're going to have buckets, and in this case, you see the buckets from the facets deep dive. So if you actually run the UCI data behind facets deep dive, what you can actually do is you can see the distribution of the data across ages. So you can actually see that there are actually 17 to 31. So you have buckets basically, 32 to 46, 47 to 60, the age buckets which you can actually see, and you can actually split it, and then you can see the distribution of the data here. So if you see the blue is basically the target values. So there's a high imbalance in the data here. Most of the people have less than 50k, and probably just one term, I would say 10 percent of the data has a greater than 50k. So this itself shows that in this bucket, you have quite a lot of bias, and this is the reason why you need to actually do the again data visualization. So if you're actually going to predict, say, from a bucket, and if you're going to say most of the time, I don't get the data right or the results right, that could be because of the high bias in the data here, and because most of the target values are actually less than 50k, and you're actually trying to predict something that the classifier never saw for that particular feature. So if you put this into a particular bucket of the 17 to 31, you need to understand what are the implications of not doing a proper sampling here. So that's what we will do. Basically, the exploration also allows us to choose the buckets properly, and in this case, we'll choose the buckets based on what facets is going to give us and how we are going to use it. So I'll quickly run through this. Also, I'll also run the educations of feature column. So basically, I'll use only the educations where you have a set of numbers. So there are quite a lot more things that are there on the UCI data set, but we'll effectively ignore those or just simply just truncate it. So we don't worry about more than this. This covers more than 95 percent of the data. So effectively, we can still continue with this data. The next thing is again, we'll go with the categorical information. We'll again use that data and this is the hash bucket. So the hash bucket is very interesting. The hash bucket tries to, if you have categorical information and you actually try to use the categorical information to say as a category. So what you're trying to do is you're going to allocate a lot more memory for every category. If you have say 25,000 categories, what will happen is you're going to allocate more than 25,000 values here. Instead, what the hash does is it basically comes in, it creates a hash. So of course, there's going to be collisions, if you're talking about a hash of course. But what will happen is those are negligible, and actually the estimator has a very interesting way of handling collisions also. So nothing to worry over there, and of course, the collision rate is going to be very low. So we don't have to worry about that at this moment. But the hash basically tries to reduce the memory usage by actually getting a hash for the category instead of allocating an entire memory space for every category that we have on the radar. We can actually have a hash function which is going to get a very interesting hash for that particular category value. Finally, we train the canned estimator here, which is basically a linear estimator. Like I said, like we used earlier, we'll basically use the train and the test input function. All we'll do is just say, let me just use a TF estimator, the linear classifier. In this case, I will give the input as the feature columns. Here, if you see one more thing, it's the graph that actually I'm telling estimators to save the model directory in a particular place and train it as a linear classifier. I tell TensorFlow estimator how many classes are there for it to actually train in the end, so give a prediction or. So I'm just training it for thousand steps just to do it quickly. If you see the final result, we should probably get about 75 percent, I think. So again, we'll use the test input function. If you see, all this follows the Cycadlan path. So all these are basically inspired by Cycadlan. So again, the input function is the one thing that you actually change here. All you need to do is just give the estimator.evaluate, estimator.train, estimator.fit. So sort of a paradigm here, and you would simply just do that and then you see that it's about 76 percent accurate. So this is the reason why I said like I said before, we first evaluate a pre-canned estimator, which is a premade estimator, and see how much of a performance that we can actually extract from this data. So the next thing that we will do is basically see if we can actually train and okay. So in this case, we just want to see some results, which is some debug just to see everything, so probably see what's an example just to get some results. So if you want to get better results, then you need to do better feature engineering. You need to understand some data. You need to actually properly ensure that all the points are sampled properly. You actually get the right features in, and that's also the next things that we will do. What we will do here is basically use TensorFlow with the numeric features with the embedding. Basically, with the embedding column, just like you would do on a word embedding, you can also do that over here. What TensorFlow does is it sort of gets you a categorical column as an embedding column. So you can actually bring the occupation into a 100-dimensional vector, similar to how you would actually represent a word in probably like a word to here you can do that with the TensorFlow embedding column, and that is also possible here. So we will basically again, in this case, we will use another approach which is basically the DNN classifier. Earlier on, we use the linear classifier. So now what we will do is we will train the DNN classifier, the deep neural network. In this case, we will use three layers, and we see that we have just like how we will start with 256, and we will actually go down to 128 and 64. So if you see that if you are trying to grasp, say TensorFlow or if you are trying to evaluate how good it is, this is a very neat sort of an approach. All you need to do is understand the data, and you just call a simple estimated DNN classifier. What it will do behind is basically implement the neural network for you, and then it will take care of batching it, and things like that. You don't have to worry about how to save it, how to see the results, and things like that. So it's basically finished training already, or is it has it knows only initialize the training. So let's go and then train it and then see. So in this case, now we will train for 2,000 steps, because we have a much deeper neural network, and you see that it's actually giving some more information about what the losses and the number of steps per second that is actually processing, these are all very useful information. Say suppose you have a very deep network, and you would probably training it for a long, long time, the steps per second will tell you how much time the model is going to take to actually finish the entire process. So in this case, you will actually get a much better information when you actually are training for deeper network. So this is of course very small, and you see that with just a deep neural network, we're already improving the performance to 83 percent. So we were just at 76 percent, and we have now been able to improve this to 83, just by changing it from a linear classifier to a DNN classifier. So without doing anything to the input pipelines, and without doing anything to the train functions, test functions, not changing anything from the model perspective, all we can do is we can actually train and then test it also. So again, like I said, one of the other things that we can actually do is because we have trained it using TensorFlow, let me show you the TensorFlow logs. So if you see this here, there is a graphs which it created with DNN and linear. So these are all the checkpoints that it's actually producing every single time it actually looks at the data and it's actually training. So there are different, because they're trained for 2,000 steps, you have a checkpoint data which is the index and the meta, if you trained a TensorFlow model before, you would see this is very similar here, and you can see that there is also a folder called eval. The eval is also going to have some more data over here, which is going to have some of the events that are pertaining to the evaluation of the model that we just trained. So just to ensure or just to see how this data, the training looked, so what we will do is we'll do another interesting thing. So just because we want to see, oops, I'm already in graphs. Okay, so I can just say dot. So what we will do now is basically go to TensorBoard. So what TensorBoard is going to do is give us some neat results. So if you see this here, we are going to have a lot of data here which is not. So let's first look at the linear classifier. So if you see on the left-hand side, so if you've used TensorBoard before, it's actually quite good. So it's actually quite interesting to see you have different tabs talking about embeddings, talking about histograms, talking about the scalars. If you are actually training a CNN, you can actually look at the images that it's actually trying to classify even before it is actually finished training. So all these are actually possible with TensorBoard. So when you're training a model, the actual model can actually go through and then you can actually keep looking at the logs while the model is actually training. So let me just show you the, so these are all different things that's going to happen. So things like global step. So if you see it started off with say close to zero and then it's increasing and then it's about 340 steps per second on average. So this is saying how quickly or how fast your input pipeline is, which means that if you're say pumping in images, or if you're say pumping in text from say probably say from another data source, and so you need to see if that is a throttle, if that is actually taking much time, and because you cannot keep blaming your CNN, or your neural network for all your time lag, you need to understand how quickly the input pipeline is also. So you can actually see all these other global steps that is actually processing things like label here, the NQ input. So all this is actually going to show the results. So the loss here will also show how good the loss was. So basically it started off with close to 25 and it's sort of doing some sort of, I think if we train longer, we'll be able to get the loss much, much lower. But you can see that you can actually do it without writing much here. You actually basically we wrote barely four or five lines of code, and we are able to already do, say we are able to save it as a tensor board logs, we are able to see the accuracy, we are already able to see an improvement, we are already able to change most of the things even without having to actually write too much of a changes in terms of what we are actually doing back in there. So if you see what TensorFlow is actually giving you, it's actually allowing you to look at most of the details, even without abstract most of the details, even without having to worry about how it's actually done. So this is quite contrary to how TensorFlow was actually working before. So earlier on you had to write every single thing, and you had to ensure that everything was proper, all the model directories actually lined up, even files were actually thrown out, things like that. So all these were actually you had to do it physically and you had to ensure that none of these was actually lost. In this case, so estimator takes care of all this for you, and you don't have to worry about how. So if possible what I could also do is actually put it onto a TensorFlow serving. I can simply see if you can see the model here, if you actually do an export function, if you do the export function over here, you'll basically see another folder which is going to have the export some random number which is going to have the same variables and the variable folder also in it, which is basically ready for TensorFlow serving. So you don't have to write much. All you can do is you can simply use the estimator and then take care of, it takes care of scaling, it takes care of serving and all you need to write the pipeline though. You need to write the model function and you need to actually get the model out. So it's pretty straightforward in that sense. So let's get back to the presentation that we have. So some of the key takeaways, I would say that probably is the first thing and the most important thing is to understand the problem and visualize the problem. The most important thing is also to get your stats right. If you do not get your stats right, then you're going to actually work on skewed data, which is not going to give you good results. However good your classifier or your machine learning model is. So that's very, very important. Another point is about the test data. So one thing that I see or I have encountered is that people don't care about test data. So basically the test data is treated like validation data. Invariably, you use the test data and then you see, okay, so I train a model, it does not do well on the test data, I go back and then I train the model again, and then I do a better job on the test data. So this should not be done because what you're trying to do is, you're looking at the test data and then you're inherently through your knowledge of the test data and the error on the test data, you're actually porting the error back into the model. So you're effectively biasing the model and you're actually training the model again. In this sense, you're sort of cheating. So you're not cheating on an exam here, but what will happen is I've seen a lot of instances where people train this and then put it into production. What will happen is it will not scale well. The reason being that the test data was already looked at and that people actually you go back and then you train again based on what you inferred from the test data. This is a big problem that I've seen and this is the reason why if you look at the test data, then throw your model. So that's the best practice. So that's the reason why you split the trained and the validation within the trained data itself. So split it in between the trained and validation never look at the test data. This is something that you learn the hard way probably, but if you want to take the best practice, this is definitely a best practice. If you look at a test data, throw the training away and then start from scratch. So if you're going to start from scratch, probably because you still have the bias, you might have to start with new test data, but that's okay because you're still not porting that bias back into the same model. Hopefully, it should be okay. So yeah, that's the most important point, is to treat the test data with utmost respect. Another thing is very, very important is experiment fast, fail fast. This is one of the reasons why we are having GPUs, and why it's very, very important to understand the latency behind GPUs, and it's fancy to say I run a P100, I run an M40 even without knowing what it's actually doing behind the scenes. If you do not understand the latency between how your data is actually ported into the GPU, and how you're taking the data back out into your CPU and do the compute, you're missing out something but it's perfectly fine. But it's important to fail fast. So you might try a hundred different experiments. Not all of these will be good. Invariably, you have to do this very, very quickly because you don't have the infinite time to create a model. You need to do it as quickly as possible. If you see one of the recent failure from Baidu was that they actually did this. So they actually cheated in that sense that they actually have a lot of compute. They actually created multiple accounts to actually do a lot more experiments. This is although you so in this sense, you have more compute which means it does not give you the power to cheat, but effectively you need to actually experiment fast and you need to fail fast. So you need to actually say that okay, the model does not work, throw it away or use the experience that you gain from that model to actually build something new. So that's some good takeaway that you should actually have. Yeah. So one thing that I learned very hard way is logging the experiment results. So not just the results, logging the parameters also is very, very important because we're talking about deep neural networks and you have a lot of hyperparameters that you need to tune and you need to actually save and you need to save. So what happens is at times, you will actually very, very quickly you will change a lot of things like say the embedding size or probably the batch size or the learning rate, and in the end what will happen is you would have done four or five things. You would get a better model, but you'll never be able to replicate it again given a new data. So this is the reason why you should actually log every single experiment and you should actually follow it up and understand the experiment. So this will also give you a very good rigor in terms of making an experiment proper. So tomorrow if you get new data, and if you want to say, this is the time that it's going to take to actually probably develop some model that actually comes up to par with the earlier model, your experimental results are going to help you out in actually giving you that knowledge. So if you are going to be answerable to someone, in saying how long it's going to take to build a model, your experimental results and the results that you actually gain, the experience that you gain when you actually did your first set of experiments is going to give you immense value. So that's the reason why it's very, very important to log your experimental results and your parameters. So don't forget the parameters. So results are always good, but if you do not know what is actually going to give you that result, it makes no sense to actually have the good great results. So tomorrow you need to make it repeatable. So that's the reason why you need to log it. The point is also about scalability. So because most of the data that we have now are on enterprises or on a start-up, so I might have a start-up which is actually trying to say extract features from images for instance. So then from day one, I cannot say that I will basically build a model and then think about scalability when a million users come in. If you don't think about scalability from the start, then what will happen is when you want to think of scalability, it will be very, very difficult for you to actually use a model that you actually build from scratch. So this is going to be very, very difficult and this is actually again, learn from experience. So this is again one of the best practices that you should do. If you're thinking about putting a model into production, so scalability should go always hand-in-hand with all your experiments and all your model development. So it is fancy to get a deep neural network that does state of the art. It makes no sense if it is not going to perform in the end. So if it's going to take say two hours to just do one inference, it makes no sense to put it into production rather than get a very quick model which does say probably five percent less than the state of the art is still okay. For if you're not writing a paper, it makes no sense to have that five percent or 10 percent improvement when you're actually talking about production level use cases. So test for new domains. So one of the again important things that you should do is test on new domains, which means that I could probably train for something on say for instance, I could train a classifier, but I need to actually understand how it actually performs on other domains. So this is going to give you insights into how the classifier is going to work. If it is not, so there are two ways you can evaluate it. One, you can train and then test it directly on something, even without training the data on that domain. The second thing is train on the domain and with the exact same parameters that you did on the first domain. So for instance, if you're having an example with say healthcare test data or train data, and if you have an example with the banking data and you have to see you're going to effectively use the same training classifier, but what you will try to do is you'll do no change in the classifier parameters and the training parameters effectively, you'll just change the data pipeline. You'll simply plug this out and plug this in, the data pipeline here, and you can actually see how your classifier is going to perform across different domains. So why this is important is because we're talking about scalability again. So when we are saying that we want to expand, so now that we want to target more customers, we want to target say more domains, more verticals, this is going to be important because in the end if you cannot actually scale up in terms of even say new domains, then it makes no sense because you have to keep training or looking into a model every single time you're actually looking into new domain. So this is the reason why you need to look at and test on new domains. So this becomes very important in the end. So this is some of the best practices that I actually gain from some of the words that I've been working on for all along. So that I think concludes my presentation. Yeah, any questions? Yes. Your skill is referring to data or referring to the classifier? Yes. So thank you for asking the question. So in this case, so your classifier should first of all be anyway, it's actually batched, so your data is batched. So your scalability in this sense is going to be in terms of inference, right? So I'm going to say now I have trained a model. Effectively, what I'm trying to do is now for inference, I would like to see if this one can be used in production. I want to actually see how many say inferences I can do per second. If I'm thinking about say every inference is going to take one second, then I cannot deploy it on say behind the cloud or get an inference where people are going to keep clicking some mouse buttons just to get an inference for say probably looking at the content of an image. So probably if I'm developing an image classifier for example, and if my inference on the image classifier from that model that I've trained is going to take two seconds or three seconds, it makes no sense if there are 10,000 people coming to my website to actually do some sort of an inference, then it makes no sense to actually put it into production. So there you see the scalability in terms of the number of people who are trying to infer something from your model. So that's something that you actually look at. So in this case, your data itself is coming in through different parameters. So the user is one way, and the users are basically feeding in the data. So data plus the pipeline that you have trained, the model that you've trained, is going to talk about the overall scalability of your model. Yeah, and you get one here. So this is for you? Oh, sure. Thanks. Oh, sure. So I asked this. Do you need to put this in? Yes. Sorry. Sure. I have a question. I think in your talk, there's some kind of that's about the accuracy of the incident tonight. So the question to you is how much do you have at the change of that accuracy in terms of the process of the time and also the accuracy Okay. So the question is about, so if I understood correctly, so when you're evaluating your baseline, so how do you evaluate it in terms of? Determine and what is the need for that? Yeah. So when you're actually evaluating your baseline, you don't actually care about the inference time. Because in this case, you're trying to establish the base score that you have to rather match up. So the reason is because these are all classic methods of getting, say, a classifier. So an SVM, for instance, could actually do a linear classifier. That's one of the things why I started. So linear classifier, these are all straightforward. These are all classifiers that have been there for a long, long time. And these are all things that actually show to work and have always been working. So in this case, your baseline should always be the simplest model. So the reason you actually don't care is because you're not going to actually use it in the end. So rather than... Then why does it matter? Yes. So the question is why does it matter? Because you don't have a performance evaluation about how well you're doing. So say suppose I am actually developing a deep neural network. So if my deep neural network cannot even do, say, 15% better than a linear classifier, for instance, then it makes no sense to actually develop a deep neural network. So that's the reason why you actually do a baseline evaluation. Your idea is to beat the performance. Of course, probably in the end, your deep neural network might not even be as good as it... But if that is the case, then you go back and then you optimize the linear classifier or a baseline. Until then, you just use it to have it as a baseline score for... Basically, it's like telling myself that, okay, I am doing a great, better job. At least 10% is much better to start with. And then I start up improving that. So did I answer your question? So you have a question. Yes. Test the question. Yes. Actually, I have two questions. First is regarding treating the test data with respect. Because if I want to deploy a model into production, I have to test my model with a test data, right? If that turns out good, I need to go back to the model. No, that's why... So that's a good question. So this is a mistake. So you never test with the test data. So you always split your trained data into a train and a dev or a validation. So this is why you need to know that your test data is always... So the thing that you have to evaluate is, finally, in the end of the day, when you're going to, say, put up a performance on, say, your website. So that is the only time when you'd actually do a test. You never do an evaluation on your test until you're going to, say, put up some performance measure to a customer or something else. Because, yeah, your question is valid, which means that you want to say, even before a pre-reproduction, how do we evaluate my model? That is why you actually split your train into train and validation. So the test data is only for reporting? The test data is to ensure that your model generalizes well. It's not for reporting. Reporting was just an example I gave you. It is to ensure that your assumptions about you making changes on to your training data and your model itself from the validation data actually reflects back into generalization across different test data. What if the test data is very bad? You have to accept it. That's a mistake. So this basically, that's when your exploration about your, so that's when your exploration, data exploration shows you probably did incorrect assumptions about the data in which how it was actually probably distributed. So this is one good time when you actually look back into so the root cause basically seeing you have to understand the features, you have to understand the data, how the data is distributed, what is it that actually contributes. So one more important thing could also be the error analysis. So when you do a test data evaluation, you need to see how, where the actual data failed. So it's great to report 93% accuracy, but if you understand the 7% cases where it fails, you can actually see probably, you can develop ensemble methods. So things like other classifiers that does well on the 7% person and you could probably think of combining these two classifiers to make a better classifier. So the error analysis gives you a way or, you know, something out of an insight into how you can develop a model better for the future. But you have to, there's no other goal, but you don't go back and then change your training model again. I still deploy my model to production, even if test data works. Yes, unfortunately, you have to do it. If time is limited, then you say this is what is happening, and then you go back to your drawing board, restart from what you actually saw. Because of the lessons that you learned earlier, you're going to start from scratch, hopefully not biasing the training, the model with the test data that you've already seen. So you don't do your performance analysis on the test data until an analyst you actually have understood what is actually taking on the train data. So yeah, so the train test data, that's why it's very, very important. Otherwise, your model is never going to generalize. You're always going to bias your model with the information that you have from the test data. My second question is, I'm testing out the model on new domains. When you come to new domains, the data will be different, the features could be different. How do you base the new data into your existing model? True, that's what I'm saying. So that's where, so the assumption here is that when I mean by changing the domains, it's just that you're actually moving in from, so your data pipelines are more or less fixed. I'm assuming that your classifier is fixed. Only the source from which the data is actually obtained is actually changing. So my assumptions are that I have certain things that are constant, which is basically for me, it's the data types. The data input is basically the same. The classifier is the same. All my data is the data source. The people who are generating the data is going to change for me. Even if the data type is fixed, meaning it could be totally different. True. So that doesn't make sense. Yeah, so in your case, that's a particular different domain altogether and the problem is totally different. The cases that I'm actually referring to say, for example, you train a classifier, text classification, then you would basically say, use it for different datasets. The same classifier, without changing any parameters in the classifier, you will try to basically say, use a TFIDF plus, so things like you will actually use a linear SVM and you will never change this pipeline. All you will do is basically change the input data and the target variables and you will simply keep training again and again to see the effect of how good the classifier can actually perform and how much of a tuning that perform the classifier is to a particular data. That will actually give you whether it can actually generalize across different data. So the long discussion was about taking care of what you're doing. Yes. And also remembering what you're doing. True. So in this case, what's your best practice to log whatever you're doing? Yeah. This will make you realize. Yes. So one thing that we do is we have an internal tool that actually logs every parameter change that we have in an experiment. We actually wrote it ourselves. Basically a Python class that's actually logging in all the parameters that you want it to be logged. That's a very complicated tool. You don't have to worry about it. The best way that I do, rather, I was actually doing was have an Excel file. So the minute you actually have, say, for example, I showed a deep neural network, so three layers, 256, 64, 256, 128, 64, you have put on an Excel sheet and then simply keep changing the different parameters and then see the final impact on the performance. So all you're trying to do is your input is a constant. You're changing the hyperparameters which are basically the number of layers, sorry, the number of parameters in the every layer, and then nothing else changes. So you cannot keep changing everything. So you need to keep something constant and then change something else. What was the error? Yes, the error. The important point is the error. So the train loss and the validation loss is what is going to give you the final evaluation about the model. So, however good it may do, say, 95% or 96% is not going to say whether your model has actually learned all of it that it has to learn. So, for instance, if your training loss is very high, it actually shows that your model has much more to learn, which means that you can actually fit more into the model or probably have a deeper network and it can actually extract more information. Although this may probably give you 85%, you never discuss about the 85% probably only when you talk to a customer or someone who, where, this number is going to make sense to them. But to you as a data scientist, it makes sense to actually log only the loss values. Because this is going to tell how and this is the reason why you need to keep things constant. If you're going to change the optimizer, for instance, your values are going to change. If you have an Adam Optimizer as an RMS prop, things are going to change here and there. So you have to keep certain things constant and you need to log all those constant things and you need to keep changing only a few parameters and you need to log everything in one particular sheet. And what you do is you, for all those changes that you made, if you make a change in a different parameter, for instance, if you're changing an RMS prop to an Adam Optimizer, then you change it to a different sheet and you can do effectively the same experiment again and again. With Excel, what happens is you can actually plot it neatly into another sheet and you can actually see the results again beautifully. So that's the reason why I'm saying Excel is actually very good. Of course, if you want, if you don't want to use Excel, you can, of course, go to Google Cloud and then sort it on a spreadsheet on the cloud. So of course, that's possible. But yeah, this is one way you can actually log the experiments. So something just came in my mind. Let's see. Because as I said, it's a program like we do simulation. It's a brute force. So maybe we do a lot of for loops inside. And then we, let's say there are so many parameters. There are a few models. Is it possible in this TensorFlow environment also that one night I just hit the button, keep all the things in the range, in the loop and they pick up a few combinations together? Yes. Yes. So there's an experiments function with TensorFlow. What happens is the experiments, they put a different set of lists. And your model is going to remain a constant. And what TensorFlow will do is basically go through the entire experiment and then it can actually produce, so the TensorFlow board, it'll actually produce a result on TensorFlow board by saying which model is doing the best and it'll actually log everything. So yes, it is possible. That is also one way to do it. So if you're completely on TensorFlow, but the question is this is not going to be like a quick change. So in general, why this is okay is because if you have a lot of compute and if you have a lot of time, you can do the experiments sort of a method. But in general, if you're training a deep neural network, say for instance, if you're training an image classifier, you will not have the time to actually say train for 100 epochs. You cannot let the experiment, they say if I'm running 25 experiments, I cannot run it for 25 times for over say 200 days. Instead, what you effectively do is you train for 5 epochs and you train all the experiments for 5 epochs and you see the impact at the end of 5 epochs and you choose the model that actually does best across those 10 epochs and then you use that model to train till the end. So probably for 200 epochs or 100 epochs. Effectively you come down from a large set of experiments to probably a few. You could probably not come down to one because you're still not certain. So effectively you can still come to say 5 or 3 and you can still run those 3 on probably a grid or some cluster that you have that probably has a lot of compute. How far we are for pi torch? So that's a good question. So what TensorFlow is doing is it's actually one of the things that reason why TensorFlow is evolving, especially with the 1.5 update if you look at it, what TensorFlow is doing is basically going somewhat similar to the pi torch method. So that's where you can actually do evaluate of the method right in place. All along if you actually create a variable you will have to run it in a session to actually look at the value itself. What pi torch did very, very effectively was give you some sort of a Python-esque sort of an approach where you in place, you do all the compute and you do the computational graph is taken care of behind the scenes. With 1.5 and above there is a new mode called eager mode which is going to do exactly what pi torch is doing or probably do that better. So there is also something called a tape where all the computes are basically logged and then you also have something, so it's going to be very interesting sort of a few weeks there to follow up because the RC0 was released for 1.5 and even for the past few months the eager mode was available on the source anyway but the eager mode is something that what pi torch was actually doing and TensorFlow sort of catching up to it and pi torch does not do certain things well TensorFlow was not doing certain things well and now TensorFlow is sort of catching up to those things where it was not doing well. It's definitely good to follow TensorFlow. The target is not about TensorFlow or pi torch. The idea is to learn about what deep new networks are and to apply what you've learned theoretically into practice. So if you find pi torch to be useful go ahead. So just because I have some affiliation to TensorFlow and I cannot say that this is the only thing that you have to follow. Of course it depends on how comfortable you are with using a tool. TensorFlow is definitely good at most of the things. Certain things it is definitely not good but it will definitely improve. But to start with effectively TensorFlow and pi torch almost have very similar counterparts if you see but yes, so it's up to you, it's completely up to the developer. But TensorFlow is definitely I recommend TensorFlow because of its pure fact that you can move from development to production without much of a hassle. I've used it personally and we use that in SAP as well. I noticed in my experience with SK Learner, I really like the customer evaluation metrics. You can write your own. Does the same thing happen in the new rebound TensorFlow? Yes, so it was available anyway even before. So even in the earlier versions of TensorFlow, you could always write your own loss functions and you could do this with the new ones. You can also do that. So you can write your own probably if you don't find a triplet loss for instance, then you could write your own triplet loss and then you could still use it for your training. Yes, it's definitely possible. I notice with unbalanced datasets that really sometimes they're really hard to evaluate. Just out of curiosity, why did you use accuracy to evaluate what it actually does? In this case, I should have used precision and recall to actually see the effect across different classes. That's true. This is just a quick example to show a juxtaposition between scikit-learn and TensorFlow and how you can actually interchangeably use the pipelines and things like that. But in a real use case, probably you should do better sampling. So probably use something like SMORT where you can actually over sample for classes where the bias is high. So these are all ways you can actually avoid this bias. The best way or the easiest way is to of course choose the class which has the number of records for the class which has the lowest and then use that for training. That is also one way to do it. But yeah, so just because then I have to explain about what I'm doing here about under sampling and over sampling and things like that. So I just wanted to keep things simple because as someone who at least starts TensorFlow, someone who's looking at TensorFlow, I want things to be complicated. So this is we keep simple. Of course, when people encounter such problems, they will definitely go and understand about bias and things like that. Thank you. I was springing model Keras and I saved it. Can I put it into the TensorFlow? Yes. So one thing to understand is Keras is basically just a high-level API specification. Keras is not running anything underneath. It either uses Tiano or TensorFlow or CNTK in the newest version or a lot more underlying frameworks to actually do the training. So it's a very nice API, high-level API. So effectively if you're using a TensorFlow backend, all your variables will be stored in the TensorFlow format. So effectively when you store your model, even if you use Keras to train it, you're still going to get a TensorFlow model in the end. If you're using a Keras-HF file, your HDF file would still be compatible with TensorFlow. Because in the TensorFlow, the model is saved in the meta and the index and so I had a model in Keras, HDF file format. I want to use it. I had some problem. You mentioned that you have something. True. So that's when, if you have to specifically move from Keras to TensorFlow or where you have that sort of limitation, you basically restore the variables into Keras and then you save the variables for each layer individually and then you restore it into TensorFlow. So that way, because your model is effectively the same, your network structure is the same across the Keras and TensorFlow, you can save the model and the weights separately as an umpire variable probably and then restore it into the new TensorFlow graph. So that way, you're effectively maintaining consistency between the weights and you don't have to worry about the file format in the end. In the end, it's only the way in which you retrieve it and then how you're going to see the data. But it's not that safe. Thank you. One quick question. Yes. First, thank you for doing the sharing session. Sure. So you mentioned about interchangeability between scikit-learn and TensorFlow. Yes. So one of the things I like to use scikit-learn for is to actually do some processing of features in all of my model. And I found that the scikit-learn pipeline class was actually pretty useful in defining the whole pipeline of how the pipeline transforms and feeds it to the model. So is your knowledge of TensorFlow estimators suitable for something like that? Yes. Does it play well with TensorFlow serving? Absolutely, yes. So you have to understand that scikit-learn the data processing pipelines are still completely independent of the TensorFlow estimator. Estimators are just a classifier in this case. It's trying to define a deep neural network or probably a linear classifier to do is basically the separation between the data pipeline and the classifier or the machine learning model pipeline is completely different. So you don't have to worry about using the scikit pipeline with the TensorFlow. Like I showed you, you can interchangeably use the pipeline along with... So if you're doing pre-processing with scikit, absolutely no issues using it with TensorFlow. Because in the end, these are all LIME numpy variables or just in the end a tensor matrix and it's just a representation in the end. So Python takes care of how it's actually done. Thanks. Just to follow up. I'm not too familiar with TensorFlow serving. But could you package both your processing pipelines and scikit-learn as well as the classifier and put that in? Yeah. So what TensorFlow serving does is basically give you a way to actually put a model into production. So effectively I could basically, if I have a server that's actually looking at the inputs it's basically like an API web service. I could define the input pipeline and I just define what is the output going to be. Effectively, it's not going to look at how the data is pre-processed and how the data is going to be post-processed after it's been... So the TensorFlow serving model itself is just going to take in the data, whatever you give it and whatever it is trained, it's going to produce an output basically from whatever it's trained. So if you don't give it data that it has actually trained on based on the pre-processing that you actually used for the training, it is not going to do well if you don't pre-process data. So you have to do the pre-processing separately. No, this is purely for the graph definition, for the inference and for the output. Yeah. All right. Okay, then thank you and have a good night.