 Or is it going to flake out? OK. So I know it's been maybe a little bit of a long night. I introduced myself earlier. My name is Yufang Guo. I'm yet another developer advocate. I am from Google. And so Kaz talked a lot about machine learning use cases and some of the tools that are out there available today. I'm curious, by a show of hands, how many of you guys raised your hands if you would consider yourselves, that you do machine learning? In some form at work, nights and weekends, you've played around with it to any extent. Great. And then how many folks have done something with TensorFlow? OK. So everybody who didn't raise their hands, those are the people you can ask for help later if you have questions about TensorFlow. And so my talk is going to focus more on running through a framework to think about how to approach a machine learning problem, how you can use this overarching idea to then inform, especially some of the earlier steps, to save you time and energy down the road. And what we'll do is we'll walk through the conceptual aspect of it relatively quickly. And then I want to get into a more concrete kind of discussion. We'll run through some TensorFlow code, and I'll show you guys some other tools that I use to achieve each of those seven steps. So with that out of the way, let's get going. Oh, also feel free to hit me up on Twitter if you have any questions, things like that to follow up. That'll be at the end of the deck too, I think. So we've talked a lot about machine learning tonight already. My super short definition for machine learning is that it's just using many examples to answer questions. And you can split that up, of course, into these two sides, showing many examples and answering those questions. And broadly, those kind of split into training and prediction. And the reason I showed this slide is to emphasize the point that machine learning is useless if you drop either side, in the sense that the best model in the world, if you can't serve predictions to your users reliably, securely, and scalably, assuming that those are all important requirements to you, then that model isn't very useful. Conversely, if you have this really great scalable system and backend, but your model is really inaccurate because your training didn't work very well and it's not recognizing the data patterns, that's also not gonna work. So machine learning is kind of extra challenging because of that. You have these two aspects that are kind of fundamentally different. Training requires a certain kind of mindset where you're thinking about the data, its structure, how you can get more information out of it. But when you start serving predictions, you kind of have to switch to another mode in your head to think about how you can serve these predictions. And so we'll look at that and all the steps in between, right? I promise seven steps, there's only two here, so don't worry, you'll get your money's worth. And let's see, I think there's one more here. Right, sometimes we saw that with the machine learning APIs, for instance, on Google Cloud, that's just the prediction side. That's a situation where you're not providing your own training data, so you don't have to do the training, but you still get the benefits of having a kind of production ready, production scale, prediction service. And I think the only other point I wanted to make with this slide was that you can do machine learning both in the cloud and locally, but kind of Kaz has already talked about that. So now we're gonna do this really random thing where I switched to a different presentation because I didn't stitch these together because they're in different, they use different slide software, so that's good. All right, so to talk through our seven steps, we'll use an analogy, we'll talk through a hypothetical problem, okay? Let's say for some reason your employer or a friend, perhaps, came to you and said, gosh, I wanna build a system just for fun maybe to tell the difference between beer and wine. And you might say, well, maybe just have a sip and then you'll know, right, you can just have a drink. But play along, and if we think through this, how might we go about doing this? We might say, well, we gotta have a model, or we're gonna make some kind of model and we're gonna either predict beer or wine. And to do this, you might say, well, some of the things that tell them apart, maybe color, right, wine is red, beer tends to be yellowish, of course there's lots of exceptions to that, and alcohol percentage. So where can we go with that? Well, let's, we'll run down to the store, we'll buy a whole bunch of beer and wine, and then we'll go to the electronics store that's attached to your local grocery store and grab a spectrometer to measure the color and a hydrometer to measure the alcoholic content, right? So then we hit that first crucial step, gathering data. Before you can really do any sort of custom training, gathering data is paramount, and thinking about how you wanna gather your data, and what data you wanna gather will make a large, large difference down the road in terms of your model quality, its accuracy, its longevity, et cetera, and thinking about whether you're gonna wanna update it, too, right, if maybe you wanna build a system where if a new type of beer gets invented, is that a thing that happens? I actually don't drink that much beer. If such a thing were to happen, would you be able to update your training data and update your model accordingly? And so maybe you make it this table, right? This is like the simplest case, like I said, conceptual overview. You got your color, right, in wavelength, nanometers, alcohol as a percentage, and the answer, beer or wine. So you have your data and a label. And that brings us kind of to our second step. Once you've collected this data, there's typically some extra work that you need to do. This being a PI data meetup, this is kind of a step that I feel like gets emphasized or discussed more often than in a lot of other audiences in terms of making sure your data is well shuffled, making sure that it's well distributed, right, across the two different categories. You don't want to have a lot of data points in one side than the other and having balanced data sets and things like that. So normalizing your data, making sure it's shuffled and all that good stuff. And I think this is saying the same thing, right? So then of course you want to split that data. So this is also part of that data preparation step. Once you have this giant body of data and you want to split it into the data you're going to use for training and then a subset that you're going to use to evaluate your model, which you don't show it during training, your test set or evaluation set or validation set, lots of words there, but making sure that you do that shuffling beforehand for instance, especially with really large data sets, when you're talking tens or hundreds of terabytes, how do you get a good shuffle in so that you can have a representative sample in your evaluation set? And how big should that set be, right? If your data is only one gigabyte, maybe holding out like 20% is like a reasonable size, but if your data is just running on and on terabytes and petabytes of data, your evaluation set may end up being a very small percentage and still be representative, but that all depends on your actual data structure, right? If the data tends to be very consistent over time, then you might not need so much repetitive data just to like test or set. So a lot of this is kind of a soft art rather than just the science. Is there a category for data arts? Like everyone calls it data science, but maybe we should have a data arts next to that as well. So that brings us to our third step, right? We gathered our data, we prepared our data, and now we have to choose kind of what kind of model we want to run against it. And this is kind of a meta step in a way because you end up playing around with what you use and it's not necessarily a decision you make up front and it's final and never changes. And the clip art that I chose for this lovely slide comes to us from just to refer to the fact that there are models that are particularly well suited for images. Some are better suited for music or sound, things that are more developed toward text models and some that are more geared toward structured data, your standard kind of CSV file or spreadsheet data. And so you take your model, you run it through and you see how it does, right? And that brings us to our training step. So you do the training and the training, I'll kind of go into the math a little bit here, feel free to zone out if you're... This is the simplest version, right? For those who are not familiar with this, this is just for like a linear model. This is literally thinking back to middle school like Y equals MX plus B, like draw a line of best fit. And another way to think about training is that it's kind of like when you're first learning to drive. I didn't research a Singapore driver's license laws, so actually I hope this analogy holds. But basically in the US when you first become eligible to drive, most states it's when you're 16, some states it's 16 and a half, some states it's 17, you get a permit license and you're able to drive with someone else, another adult supervising you. And so you're a student driver basically. And then you start out, you don't know how the car works, right, you sit down, you're like, wow, all this, all these controls, I have a steering wheel, I have a gear shift, brakes and pedals, maybe you have a clutch if it's a manual. And over time through experience, and hopefully not too many crashes, you learn how to drive better and better. And so a student driver then becomes a fully licensed, you know, real driver, adult driver if you will. And in a similar way, training your model in machine learning is like that, right? We're showing the model lots of examples, we're practicing, driving on the roads. And of course now that I say all this, it occurs to me that this has some really direct analogies to all the self-driving car hype and such. So the analogy is perhaps a little too close to reality. Anyhow, you train your model and going back to the math that I promised you guys, I know some of you were really looking forward to this part, when you, when we do, when we learned about drawing a line of best fit through data, right? Which is like the slope and the input outputs with these animated in, okay, it went forward too much. No, oh gosh, okay. So with machine learning, what makes it better than just drawing a straight line, right? Because anyone can draw a straight line, any good or even mediocre spreadsheet software can draw a straight line of best fit through your data. So what makes machine learning different? What makes using linear regression on a bigger dataset different? Essentially, you can think of it as the fact that your data is more than one kind of layer of data, one 2D level so that you're actually fitting multiple lines of best fit for each of the different aspects of your data is typically different features of your dataset. And for, sometimes, you know, Kaz mentioned the image data, where we flattened it out to 784. So that basically has 784 lines of best fit, one for each pixel. And so you can take those slopes and in terms of a representation, collapse them down into an array and you can take the y intercepts, collapse that down into an array and you can re-represent the whole thing as a matrix, as a matrix multiply. But I don't think I made a slide for that. Slides to add for the future. Okay. And so we have our training, the data goes into the model and we make some predictions. And so in a nutshell, the training process takes those predictions and compares them against the true value, right? So we say, here's the color, here's the alcohol percentage and then the model, which at first is basically initialized to random guesses, we'll either say wine or beer and then we'll look at the actual data that we collected and we'll say, was this right or wrong? And so that feeds back in and updates the model. And so through many, many examples, you get a better and better model that reflects the reality. So that brings us to evaluation. Earlier we had that moment where we split the data and we held out that evaluation data set. And so evaluation is in a lot of ways very similar to training. The only difference is we do not update the model anymore. Literally the same except we don't update the model. We just use this as an opportunity to test to see how well the model is performing. And once we have, and then I left this tip in as an 80, 20 split, but the caveat here being that it's really dependent on data size and the kind of quality of your data. And I don't mean that in terms of high quality or low quality, but what your data feels like. Because some data sets you really need wide representation and I've seen evaluation data sets that are split 50, 50, but others that might be split like 99 to one just because of the way the data is kind of aligned. And so that brings us to almost our final step. So we did our training, we did our evaluation, and now we wanna go back and see what we can change. What other models are there? The various knobs and levers we can pull and turn on the model itself and how we run our training. And so we can tune these quote unquote hyper parameters because the model itself has parameters and these are the parameters of the model kind of meta. Maybe we should call them meta parameters. And so that set of like, why are these slides repeating themselves? Okay, so then the data will kind of, you run through the training oftentimes in parallel, you can kind of kick off a bunch of different jobs. And this is kind of where local running can kind of, you can get in trouble with that. Folks will like run out of space on their disk, will run out of memory, things like that. If you kick off multiple jobs and you have like large data sets or large models and that's where cloud et cetera can help, but a lot of times I'll just do them one at a time and see kind of how it plays out. And so you run your parameter tuning and say you're happy, you found a good set of parameters. And so now we're ready to make our prediction, right? Finally, I said there's training and prediction and eventually you get to prediction. So then we can predict on new data, like new data that's coming and maybe your friend comes to you, you're like, all right, let's test this new wine that just came out of Australia or something and run that through your model or try to get a prediction and hopefully it is what it thinks it is. So that's kind of the conceptual overview. We have our seven steps. Gathering data, preparing it, choosing a model, training and prediction, I mean evaluation, so we use the same training and prediction. And then we do our parameter tuning and then finally we can do predictions, right? We make that scalable system hopefully that does our prediction. So that's just the conceptual overview. Let's get into kind of the meat of it a little bit more of how each of those kind of steps looks. In particular, how that, how it looks with TensorFlow. Kaz introduced TensorFlow for us earlier, so I'll be brief in this bit. The thing I'll add is that the name TensorFlow comes from the two words TensorFlow. TensorFlow just refers to a multi-dimensional array or a matrix and the flow is referring to flowing through a computational graph. So this diagram is like a super simple version of a computational graph, right? The two and the three feed into an addition operation and a five comes out the other end. The notable thing of this, like it's very simple, right? This is a very simple example, obviously, but it shows this example that you construct the graph first and then you can feed data through it. So this graph has two input nodes in addition operation and an output. It knows nothing about what the input values that may or may not come later. And then the two and the three show up and the five comes out the other end. And so another way to conceptualize it is that the outputs are pulled through the graph. They're pulled out by the desired output node. So that five is saying, I need everything related to, I need to calculate this five to be computed. And it's not shown here, but if there was like some other operation off to the side, like almost like a dangling node that wasn't instrumental in computing that output, it wouldn't get computed. And so sometimes with constructing computational graphs, you can run into trouble there where you have this thing off to the side that you want to run, but it won't run because your output doesn't need it. So they won't flow through there. And I think Kass touched on most of this as well. TensorFlow runs on just about everything under the sun and being open source, it continues to get more and more support on lots of different platforms, even beyond what I've shown here. There's like a bunch of further like embedded systems chips that are getting TensorFlow support or TensorFlow light support. In terms of TensorFlow's architecture, the whole thing is built on a distributed C++ backend. And that engine is all nice and fast and a lot of stuff, but thankfully we don't have to interact with it. And that's why we're not, this isn't called C++ data, it's called PI data, right? So we're gonna care more about that Python front end and the various layers of libraries above it. So the first one is called layers. It literally is just like you're manipulating the layers of your model directly. And then above that is what's called, we call the estimators library where it's kind of pre-made framework for constructing your model and running the training and evaluation loops. And then sort of above that, this is like an optional layer where it's basically, not only is it a framework for constructing and running your models and doing the training evaluation, the models themselves are already predefined and all you have to do is supply some parameters, many of which are optional. So that's the layer that we're gonna focus on today because it allows the greatest amount of experimentation and initial flexibility without having to get bogged down in the details of constructing your own custom models. And so my recommendation is typically to start at the top and as your requirements arise as you need more and more customized ability to send into the depths of TensorFlow. Okay, so what does that look like in code? So this is one example of a canned estimator. So this is a DNN or a deep neural network classifier. Kaz talked about those layers of neurons and such. So this is basically, you're creating a network by configuration. We're basically saying that this network will have four layers and the first one will be 1024 and all the way down to 128. So literally you just supply an array and some of you may be wondering what about all these other parameters that I care about? My activation function, my decay, my optimizer, my learning rate, you can have those two if you want but there are reasonable defaults supplied if you don't. So you can get away with this but you can also do that. So you can go for a while before you reach the point where you say, well actually I need to rewrite my entire network by hand from scratch because the options, they go for a while which is nice. So let's look at a concrete example of this. Usually I have an example here that talks about food and fried chicken but I figured in the interest of it being evening time and some folks may not have eaten dinner so I went with something a little less savory. So this is just as exciting as food and I promise it's census data and so this is the 1994 census. It's become somewhat of a classic data set. It's not like the hardest data set in the world and it also is small enough to fit into memory. It's easy enough to run on your local machine and the task is simply to decide whether or not given socioeconomic data whether a given household has income of above or below $50,000. And I have a couple more before I have to sit so I'll stand for now. Here's some columns. So they're the columns you might expect like age and education, marital status, what occupation they did, what kind of relationship they might be and whether they're married or not and then hours per week that they worked and then finally income bracket, above or below $50,000. And this is literally what the data is. It's the string greater than sign 50k or less than equal 50k. So we'll do some preprocessing there when we get into it. Okay, so let's look at this concretely here. So I've got TensorFlow fired up on my machine through a Jupyter Notebook. Oh, another opportunity for a poll. Who uses Jupyter Notebooks? Lots of people, yeah. Jupyter's the best. Okay, let's see. Folks in the back, can you see the words on the screen? I got some nods, great. So importing TensorFlow and installing TensorFlow is all kind of the standard stuff, right? PIP install TensorFlow and import TensorFlow as TF. There's sounds, okay. I was worried it was coming from me. All right, so we'll import TensorFlow. I happened to be running 1.3 in this particular environment but ever since 1.0, all the versions, all the minor releases are backwards compatible. So 1.4 is what is out today in terms of what's the most, most current. But we'll pull in pandas. We'll take a look at this. Oh, I didn't run this cell. Good job me. Yes, that is your job. Remind me to run the cells. I always forget to run the cells and then people think I wrote bad code but I just didn't run the code that made the variables exist in the first place. So we'll load in the data into a data frame just so we can take a look at it. And we have kind of those columns that I mentioned, lots and lots of columns and that last column with the wacky kind of data representation rather than doing zero and one, they went with greater than or less than 50K. And one of the things I like to do is use the dot describe from the pandas data frame to kind of show some stats about my data. And you can find out some interesting things about your data set through this and it's stuff that you would probably be able to figure out eventually but it's nice to just have a standardized easy call to make. Things you might notice here, apparently in 1994, nobody worked more than 99 hours per week. Perhaps the form just didn't have, only had two digits that they could enter in, right? And also nobody in the United States was over age 90 that filled out the census. Also interesting. You know, in hindsight, this is the data set that was prepared from the census data, right? This isn't literally the raw census data so I suspect they put some limits on both ends of the spectrum because you see the minimum age is also 17. So you can also do dot describe on the categorical columns. We have, you know, education. We see that the unique row here, that's what usually is most interesting to me is you can see which columns have a lot of different values and which columns maybe don't have a lot of different values. And so in terms of data gathering, I guess we kind of glossed over that with this, where the data sets provided. But the data preparation step is interesting here because I wanted to take this moment to introduce to you guys, if you haven't heard of this before, facets. So this is a project out of Google's Pear Lab which is, I should know what it stands for. I think it's people in AI research. And it's a really neat tool to visualize your data. And I feel like they haven't done enough to kind of tell people about it. And so this is our census data loaded into facets. Facets has two components to it. One is called facets overview and one is called facets deep dive. So facets overview does what it sounds like. We can see our features and you'll notice that it is very similar to what we saw in pandas, right? It's very similar to that described. We got our mean, we got our standard deviation. We got our min, we got our max, we got our median, all good stuff. But we also see some other things. We have a missing percentage. We have a zeros percentage with the ones that are very high turned red and bolded. So it's an even better way I would argue to see any anomalies in your data, strange patterns, weird behaviors, data imbalances. And over here we can see the distribution. The blue is our training data. The kind of pinkish red is our test data set. So the test data set is about 16,000. The training is about 32,000. And we can see kind of the distribution across different categories and different values here. And we can also check that the training data and the test data match. And that's really what I'm looking at here is the blue and the pink should roughly follow each other, right? Because if your training data is skewed in a different way from your test data, then you're not gonna be training in a way that's gonna be favorable. And then we have our categorical feature which have, again, similar representation. We have our unique. We have our top values. We have our frequencies. And we can flip between showing the raw data and showing a chart of the raw data. One interesting observation you might make here is that in our test data set, for the country of origin, there are only 40 unique values, whereas in the training, there were 41. So then maybe this might prompt me to go digging into my data, figure out like why isn't one of my countries represented in my test data set? Perhaps it should be. Or perhaps there's only one example of that 41st country. Maybe I should throw that data set out as an outlier. And so it really brings some nice insights to your data and just really helpful in that way. And then I promise there was a second thing, facets dive, deep dive. This is where it gets really fun. So faceting is kind of like when you're shopping online and you choose like which kind of, maybe you're buying shoes, like shoe size, color, brand type. And there's like these lists of different aspects, these different facets of the product. So here we can, I've already done a few things here. I'm splitting this by age and education and we can see all the different groupings. And I can zoom in and out all the way to a raw like data point and click on that. And on the right hand side, see the entire data, all the data for that one data point. And so maybe we just say we want age-based faceting. Maybe I'll, what did I do? I'll turn this to age, turn this one off. And this is all written with like polymer web code. And is it this way? No, hours per week, yes. And when backed with TypeScript or JavaScript TypeScript, so you can embed this in the web as well as in your Jupyter notebooks so it can like really become part of your workflow. So what I did here is I have across the top here, age buckets and you can, tune the number of buckets you want. Maybe you only want six buckets or seven buckets or I can't make it say seven, seven. And maybe we want to see, do people work more hours per week across different ages? Your intuitions might say, yep, probably. But how does that affect whether their income is above or below 50K? And do we see that trend generally? And so I'm doing a scatter plot sorted by hours per week on this axis here. And so you can kind of see this general trend of this. It kind of goes up and down as you go through the different ages. When you're younger, you're working a few hours, when you're kind of in your quote unquote prime working years, you're working the most hours and then it starts tailing off toward the end. And you can also see in the distribution of the blue and the red, that the red tends to kind of scatter toward the top and of the hours per week. But there are certainly exceptions, right? It's not like a perfect split. And so that definitely will factor into making it difficult for any model to make accurate predictions on this because the data is kind of messy. You look at this block, for instance, look at where the red and the blue are. We have red dots down here. We have blue dots up there. So there are people who make both above and below 50K. So the red, I think here, we have the legend. Red is above 50K, blue is under. And you have the red down here mixed in with the blue. And if a human can't separate it, maybe the machine can, but also maybe not. So when we run the training, we'll see how the system does against this kind of a data set. So that's facets. I find that to be a useful kind of step. I wanted to kind of present that to you guys as perhaps another tool in your data preparation tool belt. So TensorFlow reads in data via what are called input functions. Basically it's a way for you to configure your data that pushes into that TensorFlow graph, into that model, in any way you want. Because your model is over here, your data is over here, you just need somewhere to connect it to. And the way that they've chosen to architect it is to basically let you provide an arbitrary Python function. And so you can do whatever preprocessing you want in that input function. So that's kind of the idea there. If you're pulling from a distributed data store of some form, that's a great place to make that call. If you need to do any preprocessing, call out to other functions to do that preprocessing, that's another great place to do that. Here we're largely just gonna read in our data and kind of push it onwards. TensorFlow offers a handy utility function that just constructs a pandas input function for us from pandas data. There's also an equivalent numpy input function if you have numpy arrays or are able to shove your data into numpy arrays. Oh, at this point I should have mentioned earlier. There is a column that some of you guys might have noticed that is unpronounceable. It was used by the statisticians who processed the census data as a metric of how much the folks believed in the data, like the actual people filling out the census. So not terribly useful for prediction income, so we kind of throw that out of our analysis. So we make our input function. And this leads us to our next step of kind of creating our model. And the canned estimators have a nifty piece in that the model itself is canned, but the way it receives the data that is coming in through what are called feature columns is very much under your control. And here you can do a number of transformations. So for anything you didn't do in the input functions or perhaps was too complicated to perform the input functions at scale, you can do it here within the TensorFlow graph where it will execute in C++, it will distribute, it will do all those good things. And so the first thing we'll do is we'll split out our sparse columns, our categorical columns, and there's a lot of different controls. Here I'll just show two of them. One is called categorical column with vocabulary list. Apologies for the really long method names. I don't recommend typing them out, just auto-complete them. And categorical column with hash bucket. One is if you know exactly what the possible values are in your particular dataset for that column. And then the other version is for when you don't know or you don't care to type them all out. So the hash bucket size should be at least the number of unique values. So a lot of times this you can literally just, you can automate it by just saying that column.unique. If you wanted to do it that way, if you had like hundreds of columns, you could just script them out that way by just having this inside a for loop essentially. Run that. And then with the continuous columns, it's a little bit easier. We just make numeric column and call it a day. You'll notice that I just give a name and nothing else here, right? These feature columns are just that, they're kind of placeholders for the data to kind of flow through when the time comes. So the input function pushes it into the feature columns, which goes down into the model. And before it goes to the model though, you can take these feature columns and you can do some transformations. So the first example I have here is taking a continuous column like age and turning it into a categorical one by bucketizing it. So we saw earlier in the facets example, right? We could do these facets with different number of buckets. And so you can control exactly where those boundaries are when you actually go to do your training by just supplying those boundary values. And so now age buckets is a categorical column. And then we can do what are called cross columns where you can take two categorical columns that make a new third column or a new extra column, which kind of does the and of the two. And when I say two, what I really mean is two or more. Like you can actually take multiple, including taking the previously continuous column age, which we turn into age buckets. And now we're gonna pass in and cross with the other two more columns. So you can do quite a lot in terms of different combinations of different columns. And it really comes down to understanding your data and looking at how you think, right? This is the art part of the data scientist artist work where you say, well, based on my knowledge of this domain, my knowledge of this particular data set and my explorations into it, I wonder perhaps if crossing these two columns or bucketizing this might yield a better result because I can kind of instrument it this way. And you might say this is, we're kind of leaning in the direction of manual feature engineering and stuff with this, right? But my argument is that, if you can do this in one line and it yields great performance results, or you can add a bunch of layers to your neural network and train an extra 20 hours, like it's kind of one of those trade-off time versus energy kind of things, right? So it may be worth trying out. And this is what I get for not running other columns, other cells. Okay. And then finally, I'm gonna group my columns into kind of, in this case, all the categorical ones and the crossed ones. I'm just referring to these as like wide columns, categorical linear columns, and then the deep columns, which I have the numerical ones, and then all the categorical ones get in embeddings so that they can have kind of deep continuous representations. And when I mentioned earlier about selecting a model, here's an opportunity to see that in action, right? So I have three examples here. One is a linear classifier, so that's kind of what I showed earlier on the slides. Your standard linear regression applied to a classifier. So we just pass in our wide columns. And then earlier, we saw in the slides that DNN classifier, your deep neural network classifier, and we can supply how big we want it to be. In this case, I made a small one. It's just 150, just two layers, and then those deep columns. And then there's actually a third option that's much longer, and again, don't type this out. But it's the DNN linear combined classifier. And there's regressor versions of all of these, where it combines the wide and deep columns to make kind of one supermodel. So you can put the wide columns and deep columns in, supply the hidden units again. The model there for all of these is just a string path for where to store the artifacts of TensorFlow's work. And then so when you run that, it kind of creates the model object in Python. And we're gonna say that to just M. And when we're ready to train, we will call m.train. And when we call that, what we're supplying here is, we're supplying the input function, which is we're calling and passing into the model. And when we created the model, we supplied those feature columns. So the core of the model with the feature columns attached on top is like your custom model. And then the input function gets passed in at training time, which means that when we wanna do evaluation, we can pass in the evaluation input function. So it can be different, while still kind of having that same call structure. I'll show you guys a peek here. The evaluate is just m.evaluate. It's the exact same thing. So we'll do the training and I'll let this run. And here, while this is running, I'll show you guys another tool. For those of you guys who use TensorFlow, how many of you guys use TensorBoard? Crickets. Wow, okay, so this is TensorBoard. TensorBoard is TensorFlow's built-in kind of visualizer. And you can visualize a lot of the stuff that's happening in your model. And if you're using these canned estimators, you get a lot of metrics for free, right? You didn't see anywhere in my code where I was instrumenting any of this, but you get it for free. And let me refresh this and see if we got our new ones coming in here. So I have some old ones here and you can drag and select to zoom in. Let's see. There's a number of different things that are in here. But what I find really interesting is the graphs version here. Like we talked about how TensorFlow is a graph. I showed the two plus three equals five. I said that was a really simple version. And for some reason, why aren't these not connected? That's really odd. Okay, so they're usually connected, but they keep updating this. So maybe this is just like the new look. Here's our linear model. And so this is kind of a visualization of what's going on under the hood. So I'm double-clicking to expand here. And we can see here our columns that we used. We can see the ones that we created that were crossed. There's our age bucketized. And over, ooh, zooming out a ton here. Let me make this smaller. Over here, I think, nope. Where is the deep neural network? Did anyone see it? I lost my neural network. Okay. So over here was our deep neural network. We should see our couple of layers in there, as well as the input, which I am having trouble expanding. Da, da, da, da, da, da, da. All right, so there's all the different columns. They decided to put the deep one. Really widely on the screen, unfortunately, but it's a little backwards. So there's all the different columns that we had for the deep one. You can also see a set of distributions about how the weights are being distributed across the different, in each layer. And you can kind of use that as a sanity check to see if there's any values that are like wacky, really big, really small. So here's the actual distribution of the linear one, for instance. And you can see that it's, is this the most recent one? Like I built one at some point that was purposely like weird and messed up. Okay, so these are actually different models, but sometimes you'll see them like not changed. It'll be really tight where like it'll like all be going down or all be going up. So that's like ways to see that things might not be behaving properly or behaving well. Let me uncheck most of these just so we can see more. So if you hit the circle, it'll just select just that one. And then I'll just use the checkbox and get the both. And so you can see the linear activation here, as the training progressed. So just by way of explanation. This view is like, this first line is at the first training step. And then as it goes down is like, this is 400 training steps, 700 training steps in, and so on and so forth. So it starts out as a peaked, normally like specific distribution that it initializes to, but then as the training happens, it gets distributed out. And so sometimes if your training is not progressing, for example, then you would see this, the tall peak just continue the whole way and not ever spread out. And that's like one indication of like, oh, something might not be working so great. So yeah, there's your linear one. And then the deep ones are harder to see just cause the values are like the peak value is so big. But if I make this bigger, you can kind of see, there's a couple of them that pop up. Yeah. And then there's like a whole bunch of other values as well that we're not utilizing here, but like if you're doing audio processing or like images, you can display those as well here and it'll show up. So that's TensorBoard, another kind of tool for your tool belt that'll help you with seeing how your model is performing. And speaking of how our model is performing, we got to 83%, which for being a relatively straightforward data set, it seems like it's not that great of a result. Partly because I just kind of pick some random parameters, I cross some random columns and I suspect and hope that you guys can do better than me. So I'll leave that as a, what is it, an exercise for the reader. And then in terms of making a prediction, it's gonna start looking really familiar here. We're gonna use that same M and call.predict. So here I'm just constructing a simple five rows of data for it to do a prediction on. I'm just pulling this out of the test data set. And so yeah, this should be less than, less than and greater than, less than, less than. So 00100 is what we're hoping to see. And we'll make our prediction input function and call model.predict, passing in our prediction input function. So we had our data that we constructed. And so this is basically just a little mini function generator that supplies that in one shot. And then we have our predictions, which I'll run live. And with any luck, assuming I ran the cells before it, we'll get our 00100. And you can see the probabilities coming back. And I formatted this output a little bit, but it does have a little more rich data than only these couple of points. So that's kind of the journey from the, where is it? Okay, yeah. From training to gather, from gathering data to preparing that data to training, evaluation and finally prediction, right? So we saw a tensor board. That's how you would kick it off, by the way. And what it does is it spins up a local Python server and then you go to a local host, call in 6006, and the port 6006. Does anyone want to venture a guess as to why the default port number is 6006? Anybody? Yes, yeah, it spells out goog, exactly. I used to say it spells out goog if you turn it upside down, but then someone pointed out that's like, well, it could be a capital G. And I was like, oh yeah, that's fair. And a personal plug. I'm working on a set of videos called Cloud AI Adventures on the Google Cloud YouTube channel. What are we now like 12-ish episodes in? And basically, I stepped through some of the concepts I talked about here, but in more detail in each episode, kind of we'll talk about a specific piece and we'll kind of play around with stuff. And recently, I've also been trying to get time with some of the folks, some of my teammates, some of the other folks, my colleagues in Google Brain and such to sit down for interviews and such. So we did one on natural language generation. That was a lot of fun. And yeah, I won't tell you who I have, what I'm trying to get later. I'll tell you when it comes out. So yeah, feel free to reach out to me on the Twitters, some resources. And the code that we looked at was wide and deep-dash census. I also have an example called, well, it's called Wide and Deep Code, not the best name in the world, but basically it's the same code before a different data set. So it's largely unchanged at the core, but the input function is different, the data processing is different, because that data set is 41 columns. It has three different sizes depending on how big you wanna work on it. So I have like a sub-sample that's like small, and then the big version is up to a terabyte. And it also has anonymized hash values as data in some of the columns rather than like literal human readable strings. So that also has interesting implications in terms of doing machine learning on anonymized data sets in terms of privacy and such. So if you're interested, you can use that as a way to compare the two and say, well, now how do I apply my own data set, right? Let's see how, what changes I made to adapt the code to this other data set and then you can kind of use that as guidance to apply it to your own data set. So yeah, that's all I have. Thanks so much for sticking around. I know it's, it's a, you know, latest evening. Hopefully Felipe's pizza ordering work that may have- Maybe if pizza is not available until nine? The pizza is not available until nine. So it's- You only asked me to have a quick- Yes, I was just gonna say, so I'm gonna have to tap dance up here for 20 minutes. Okay. Thank you, Han.