 So this is the Python Intensive Flow talk. Thank you very much for coming to this hour-long session right after lunch. I know many of you are going to start getting sleepy right around the 40-minute mark, 30-minute mark. So do your best to stay awake. I'll do my best to keep you awake. So let's kind of work together to get through the talk. So if the slides will advance, just to introduce myself, my name is Ian Lewis. I'm a developer advocate at Google. I work on the Google Cloud Platform team, so that kind of encompasses all of Google Cloud Platform, so a few people are not new people, but a few you guys are familiar with things like App Engine or Computer Engine or that sort of thing. That's what Google Cloud Platform is. So I'm on Twitter at ENM Lewis. I've been tweeting sort of throughout the conference, so you should be able to find me fairly easily on Twitter. And just a little bit more of background about myself. I'm based in Tokyo, Japan, so I've lived in Japan for about 10 years, and I've been kind of active in the Python community there as well. So I'm one of the four people who kind of founded the PythonJP conference, which is about a 600-person conference, just to give you an idea of the size. And we're going to be having the conference in September. In the third week of September, I believe, it's from the 20th to the 24th, I think. And if you look at PythonJP, you can find out how to register. I think there are, as of now, something like 20 slots left, so hurry up. I'm also pretty enthusiastic about other kind of communities, so the Go community as well as the other open-source kind of projects coming out of Google, like Kubernetes and Docker and those type of containerization type of things. So that's the type of thing that you can expect to hear from me if you follow me on Twitter. So first, just as a kind of a background, I want to go over kind of what deep learning is and kind of give a very high level, not necessarily a high level, but sort of a quick overview of what that is. How many of you guys went to the talks earlier than day about the deep learning? So quite a few of you. I'm going to try, as my best, to kind of build on that, but there may be a little bit of overlap. So what are we talking about when we talk about deep learning? So we're talking, in terms of deep learning, we're talking about a specific type of machine learning which is using neural networks. And neural networks are a way of doing machine learning where you build this kind of network of nodes that are interconnected. So you essentially give something like this cat picture. You change the pixels into a kind of numerical representation. You pass that through as the input layer into the network. And each of these internal nodes will take the values from your input and do some operation on them and then eventually give you the output. So these are typically organized in layers. So you can see this blue one is the input layer. The orange one there is what's called a hidden layer. So if you think of a neural network as kind of a black box, the hidden layers are the layers that are actually inside that do the operations. And so each one of these little nodes does some sort of operation on the input and that's called basically an activation function. And then each of these are kind of linked together using weighted connections. So each of these little lines connecting the layers will be weighted to indicate a strength between each of the layers. So what are the neural networks good for? So neural networks are essentially good for classification and regression problems. So these are very wide class of problems that you can apply machine learning to. So classification is basically putting things into buckets so you can have like a bunch of predefined buckets like A, B, C and then you get some input and you say which bucket does it go in and then you basically put it through the network and you get a probability that it goes in A, B, or C. And regression is a little bit more, somewhat more complicated in that you get instead of like a probability that it goes into a single bucket like between zero and one, you get kind of a scalar output. So say you get some values into your neural network and output you want to say like is a temperature. So like from say zero Kelvin to some value or some temperature that's more like a scalar or could be solved by a regression problem. I'm going to be talking mostly about classification problems but regression is also something that neural networks are pretty good at. So what does that actually look like? So here's a little demo that's available at playground.tensorflow.org. This is like a little demo that allows you to kind of look into a neural network and kind of get an idea of what's going on. So here we have some input features. So these are some values that you add, they do input into the network and then you have some hidden layers in the middle and then you get some sort of output. So if you went to some of the earlier presentations you saw something similar to this where you have say a, my example I'm going to use is like say that you have like the weight and height of a person and then you have two different categories here like these orange ones are maybe children and these blue ones are say adults. So if you want to classify a new piece of data coming into your network you could say you know train a network to do this but this is really you know very easy to do like this is essentially a linear classification problem or a linear regression type of classification problem where you can just draw a line in between the two and get a way of predicting between, predicting between the two. But let's say you have something a little bit more complicated. Let's look at one where the one category is completely encircled by the others. So if we were to do something like this and then try to train using just some you know X and Y inputs this would actually never basically never converge it would never figure out how to do this properly. So we can do things like add a hidden input layer that will essentially do the this kind of linear classification multiple times so you can say let's do this one time we'll see that it basically creates one line here so everything on this side it will classify as orange and this side it will classify as blue but then when we add like new layers whoops not layers but new nodes we can actually see that it gets a little bit more complicated so we could say it now figures out two lines or uses two lines and then aggregates the result together so you can see in the one node it's done one kind of linear regression in one node it's done another and then when you combine that together it kind of makes this band here and then whoops not this one that one now if we do it with three we can actually combine the result three times and we get kind of a triangular type of structure so like as we add these kind of nodes and and hidden layers we can do things that are more and more complicated okay so that's great so like but how do we classify something that looks more like this so this is kind of a spiral looking thing and these this spiral is blue and this spiral is orange this is something that's quite a bit more difficult and we can't necessarily classify this using something like just x and y inputs so you can imagine maybe sine would be good or x squared but even still like we don't really get very very good output just by having a very shallow kind of network with just three nodes here so in order to actually make this a little bit more complicated or to or to solve these more complicated problems we need a much more complex network so for something like this this may or may not actually converge but it's getting there so so right now this is actually not terribly stable but it will stabilize so like so once you if you have these kind of more complicated networks that you can kind of put together you can actually start solving more and more complex problems and so I'll talk a little bit about why that's important a little bit later but you can see that each end of these individual nodes like has their own kind of addition to this to the to the final output and then each of these little lines here are weight show the weight so these blue ones are positive and these orange ones are negative so that's going to show us that the negative ones are actually a inverse inverse relationship so with these inverse relationships you can essentially just you know reverse the orange and the blue in order to get the right type of output but essentially you can have these kind of positive and negative weighted connections between the different nodes let's turn this off so it doesn't burn my CPU and then let's go back here by the way that that demo is basically just a way of getting to understand neural networks it's not actually using TensorFlow under the hood it's like all done in JavaScript in the browser but it's essentially a way of like kind of getting a familiar more familiar with neural networks so is a neural network so a neural network is essentially when you break it down is essentially a pipeline of basically taking something like a matrix what is essentially called a tensor and putting it through like this pipeline of operations and so you can imagine that each of these is say like a matrix multiplication type of problem or type of function where you take one matrix multiplied by another matrix multiplied by another matrix and another and another and another than another and then eventually you get out a tensor that represents the output for a particular for your particular problem and this is basically very loosely modeled after how the brain works and in how the the the individual nodes like have have a kind of strength or a weight in between the neurons in your in your brain have a certain weight between them but from a from a practical point of view you're essentially doing matrix operations on on a bunch of times in order to do some sort of prediction so I mentioned what a tensor and this is where the this is what a where TensorFlow gets its name but a tensor is not something that people necessarily think of very often or don't encounter it too often unless you're a machine learning type of person but most people are familiar with things like vectors and matrices and a tensor is essentially a generalized version of that so you can imagine like this kind of like 2d like Euclidean space or 3d space and then you have some sort of value out here in the space and so for something like a vector you know you would have like you know a 2d type of vector and that could be say represented by a a single array in a programming language or a matrix which is a you know a three-dimensional vector or two-dimensional vector but a tensor is essentially a generalized version where you have this n-dimensional type of vector so you could have like any number of dimensions so this could be one dimension for each type of feature that you're actually adding to the into the into the network and you can essentially do the same sort of operations on a tensor as you would on a say a matrix so like matrix multiplication matrix addition that sort of thing so how the how a basic neural network works is that you would have these kind of connected nodes so this is our input vector or input tensor with x1 x2 x3 then we have a weights tensor or that is represented that you can then multiply against and then finally we add the resultant result biases in the form of a tensor and then softmax it to get a the output so this is a very very basic one one-layer network but you you can think of these individual not these individual weights or whatever are not each matrix multiplications but essentially this this matrix times this matrix is this kind of interconnected makes this kind of interconnected pattern and so this is how like the the if you went to the some of the earlier talks most of the the operations are formed in this way where you have the input x times w which is the weights plus be which is the biases and then you can do that multiple times for for each layer in the network and so these are basically just the multiplications and additions and then we have this kind of softmax thing at the end this softmax is essentially just a form a way of normalizing the data once it comes out so you'll typically see these at the very end of a network so what happens is after you've gone through this network these outputs what it would be at this level is you would have some sort of value like say 50 this one's like 50 this one's like 20 this one's like 0.32 you know and like so you don't really get an idea of like what that actually means that's kind of these values are kind of a relative value for your actual network so when you put this through the softmax function this will actually normalize it to a value between 1 and 0 so you can get essentially a prediction output so and then these these individual values would represent whether the the percentage say of that's a particular value goes into a particular bucket say this is a cat dog and this is a human and we put in an image value the output might be like that it's 0.99 percent certain that it's a cat and 0.01 percent certain it's a dog and 0.001 percent certain that it's a human which essentially means that it's a cat so that's great so that's how like we actually like do predictions so like we say input and input we go through all these operations and we get some sort of predictive output right but how do we actually train a model so a model is trained in in this way where you have you use a method called back proposition which was talked about at some of the earlier talks but essentially what you have is do you have this this here is the neural network as we've been talking about it before so here's like say one layer here's a second layer here's our softmax and here's our output and we actually go through here and we do the prediction but then what we do is we use we use test data to actually as we put it through our network so we have some test data that says here's here's the actual data here's the cat picture this is a cat so you have the actual value or the actual output the expected output and the actual the test data associated with each other so you know which ones are cat pictures which ones are dog pictures and so what we do is we put this say that cat picture through here and then it comes out with it with a result and what we do is we take that result and we end the expected value and then we find essentially the difference between those two values so say if it came out that it was you know 0.86% certain that it was a cat but we know that it's 100% certain that it's a cat we want to be able to nudge our our network in the direction of actually determining with 100% accuracy that it's a cat so we'll take this output and use what's called a loss function to find the difference so a typical loss function might be cross entropy but there are a number of other loss functions that you can use depending on the situation and then you go through these other you kind of optimize the results by using something like gradient descent those were also talked about a little bit earlier I'll talk a little bit more in general about these kind of optimization or especially gradient descent but essentially what you do is you put through this this optimization function and then back propagate all the values into the weights and biases for each individual layer so these these this weight one weight two bias one weight two and bias two are actually the weights and biases from that are used in the network here and so what you're doing here is you're essentially back propagating all these values and updating the weights and biases and kind of nudging the the network in the direction of actually doing doing the giving you the proper output and then you do this essentially many many many times you know training it over and over and over and over again and it eventually nudges into the direction of being a very an accurate network at least that's the that's the theory so this doesn't always work but but in general that's the idea behind it so that's kind of like a relative overview of how the neural networks are like so why are we actually talking about this so one of the earlier talks mentioned things like that these like image net is a you know a famous open data set about for for machine learning would get like say like 25 percent error rate in like 2002 but so essentially the reason why we're talking about these kind of deep neural networks all of a sudden is because people have started to get very much better at creating these neural networks and this is because of a number of kind of breakthroughs in terms of training these networks to do things that are actually practically useful so you can think of the the quality of a neural network kind of like this so like kind of traditional deep or traditional learning algorithms would kind of as you give it more data would kind of increase in performance but would kind of level off at a very quickly and then you would have like small these small neural networks which would also kind of level off quickly and so essentially what people did was they would you know train the amount of data to about here or give it a certain amount of data to about here and then they would basically you know they wouldn't adding more data wouldn't actually make it much better so they would essentially be able to stop right here but we've since found these kind of these you know neural network methods that allow us to scale the the learning much better so as we throw more data at the problem they actually get quite a bit more sophisticated and have quite a bit better performance so we've been able to create these large deep neural networks that will continually get better as we give it more and more data and with that comes like other problems which I'll talk about in a second but essentially these these like medium and large neural networks have become possible recently and so here is a model of what's the this is a Google the Google net network that was used this is called this is essential the inception model that was trained on Google net and so what this is essentially doing is like labeling pictures or labeling images you can think that each of each one of these as being say say a matrix multiplication or some sort of operation on a matrix and then it goes through several different kind of layers and then eventually gives you out an output tensor that tells you the labels so this is what we mean when we talk about deep neural networks so networks that are essentially have like many many many many layers before they actually give you these this output and by adding these layers we actually can start getting more and more complicated you know solving more and more complicated problems and actually getting pretty good results about with them but this gives us a problem where we have you know you can imagine that each one of these is a matrix multiplication and these tensors might be you know a large image like a megabyte or something and you're changing that into a tensor and then doing a matrix multiplication on it you can imagine how many actual you know operations you have to do in order to actually train this or to do even prediction even just once so you have to do this many many many times over in order to actually train a network and so what people do is they use GPUs and these GPUs are very good and high-proud but still you're essentially waiting for like weeks or sometimes even months for the results of actually one one single training run so what people are starting to do is late they use like supercomputers in order to train models faster but still this is a problem because not everybody has access to a supercomputer how many of you guys have access to a supercomputer somebody does that's that's the most I've ever seen it was like three or four I think so how much do you pay for that by the way so those the those are supercomputers are something that you have to lease time on so they're like the old the main frames of old you know where you had to like lease some time you know 730 to 830 and you know like in the middle of the night or something like that and you pay tons of money for them so they're not exactly the easiest or the best way and we wanted like you know the ideal thing is to be able for everybody to be able to do machine learning so what you need is kind of distributed kind of training and so like at Google we've been able to do that and so we use it for a lot of practical applications things like Google photos and like detecting text in street view images so there's a lot of kind of exciting things that are going on and essentially recently we've these kind of breakthroughs allowed quite a lot of activity at Google so this is a number of projects internally at Google that use that use learning machine learning this is just number of directories that contain a model description file but you can see from 2004 we've got this kind of hockey stick growth and yeah by distributing it we've been able to do you know much much faster okay so now I'd like to talk about TensorFlow itself so TensorFlow is an open-source library it's a generic or a general purpose machine learning library for particularly for doing neural networks we are also kind of expanding it to encompass other types of machine learning but it was open source in November 2005 and it's used by internally at Google for a lot of our internal projects so it supports a number of things like you know you know this kind of flexible and intuitive construction you know do basically be able to do a lot of things in an automated way and you can it supports training on things like CPUs and GPUs etc but one of the nice things is that you define these kind of networks in Python so before I kind of dive into looking at what TensorFlow looks like some of the core concepts is that you have a graph so TensorFlow is the name of TensorFlow comes from the idea of like taking tensors and then having them flow through a flow graph or a directed data flow type of graph so a graph is a representation of that these the operations of the actual nodes of the the operations that you do and then a tensors of the data that's actually passed through the through the the network and then we have other types of structures so we have these like the idea of these constants which can be something that doesn't change but then you have things like placeholders these are basically inputs into our into our network these these variables so variables are things that I can actually change during the training so these are the things that you usually use for your weights and biases etc and then session is something that actually encapsulates the the overall like connection between TensorFlow's core and how you actually and the models that you define so I should mention that TensorFlow is a library that is based on the same sort of concepts as many other libraries kind of scientific libraries we have a Python interface or an API and then it has a kind of the C++ core that enables you to do these kind of very fast operations so when you're actually doing training you don't you're not actually going through the Python VM so these are you know a non-exclusive or a non exclusive list of all of the operations you can do with TensorFlow so things like math addition subtraction multiplication division of these tensors matrix operations stateful kind of operations etc etc so let's actually look at what this looks like so I'm going to run through so this is a this is a Jupyter notebook so how many people have heard of Jupyter use Jupyter etc okay how many people have been asked that more than five times at this conference yeah that's what I thought so I'm gonna just assume that you guys know Jupyter and just kind of go from there but let me actually just restart this kernel here yes this is a Python 2 one because TensorFlow also supports Python 3 if I remember right but this particular example is Python 2 yeah so TensorFlow is pretty easy to get started there's like this is this is just using the kind of MNIST example so one of the so mr. Rashid was talking about the MNIST example earlier today but it's essentially a bunch of images that are kind of handwritten numbers and then you just OCR on those to determine which type of which number is actually present in the image so the training images look something like this where you have 55,000 images and they're all in this big huge long array of and each one has 784 pixels and they're basically monochrome so like they're just black and white but the and so if you look at the shape of that you know it's a 55,000 size array with the 784 pixels but if you look at the these the images they're essentially each value in it is the this is each one of the images in a kind of a two-dimensional array and each of these values is a value from zero to one of essentially how dark that particular pixel is so some of these are like 0.23 which is kind of a light gray all the way up to one so that's essentially what the data looks like so that's how we've actually represented here like if you had a color image you would need to represent it a little bit differently but that's essentially how we're doing it in this case and then this is just using this is just showing an example value so using that plot wide so this is just one of the input images so that's essentially what the training data looks like but then we have these training labels that are associated with each image that says that's basically a 10 you know an array or a vector of size 10 with you know a bunch of zeros in it and a one in the right location that indicates the number of the for that particular image so for this image we have an 8 here so if we look at the training labels the shape of that it's a 10 size and then if we look at this the particular one for this 8 we can see that the one is in the what is it this is like 0 to 9 or something in this particular column so in the 8th column I think this is like the 0 is for is actually for 0 so it's from 0 to 9 so that's that's essentially what it this is these are actually called one-hot vectors in where you have 0 in all of the values except for 1 and this is used often in training data but the data that you'll get out of it is actually similar to this except for it will be a bunch of values from 0 to 1 essentially a probability and so here's some of the images but as your some images of that I've kind of shown earlier but so once you're training it you can kind of get these you can train it to show these different set these weights and biases so that like individual pixels like will indicate whether it's a a particular number so in this case we're actually using a very simple neural network which will kind of with with just one layer which will work this way but essentially like if you see pixels in these blue areas that's probably a zero and if it's in the if you're there's any pixels in here in the red area that it's probably not a zero and then it like basically aggregates the probabilities and we're to decide whether this is a zero or not and you can kind of see that in in many of the other ones so like this one to one so if you see pixels in this area with a two it's like in this area and three in this area so they look similar to the actual probabilities the numbers you're looking for so I actually execute all these okay good so the next one this is actually us defining our our network so here we're defined we're importing TensorFlow and we're using the placeholder that I talked about earlier this is our input into the into the neural network and we have it is a size of 700 e4 so this is the the size of the the number of pixels and then we have these weights and biases as variables which can be updated as we train the model and then here is actually where we define our our network so here is this is just a single layer network so what we're doing is you can define it very similarly in Python so the way that you would do it in in mathematics so here we can say that we're doing a matrix multiplication on the input times the weight and adding it to the bias variable and then doing a softmax on it at the end and then TensorFlow internally will take these and build our kind of our like data flow diagrams or our data flow you know kind of model representation so once we have that out we can then use that to to then train our model so this is actually our our our neural network and then we have a placeholder for the output with this y prime and we've defined a cross entropy function here's our loss function and then we can basically put it into this gradient descent optimizer and optimize it using cross entropy and this will then create our kind of training step so this is this encompasses the entire the entire neural network plus the training that we need to do and as great like like the some of the other explanations and some of the other talks gradient descent is essentially a way of kind of nudging our neural network in the direction that we want it to so I think one of the talks talked about using going down a mountain using a single little you know flashlight or a torch and then kind of just going a little bit of the time down the mountain but essentially that's the idea you're essentially going down moving it in the direction towards towards a minimum to actually minimize the loss so that the say that this altitude would be the the error or the loss generated by the loss function and then we basically nudge it in a direction where the loss gets minimized and then essentially do that just over and over and over again so each one of these would be a training epic as we're going down the gradient descent optimizer so here what we're going to do is we're going to train a thousand times on a particular piece of data and so what's great is that we can also do this kind of mini batch training which is a way of you basically pick just a small subset of the total training data so we're not actually training every single time on the every single piece of the training data we're actually training on a random randomly selected batch of in this case a hundred I think or yeah a hundred elements and we're doing that essentially a thousand times this is taking way longer than it usually does and so the what's good about that is that you don't have to train on the entire training data you can essentially do something that's you basically pick a randomized you know subset of the training data and that's essentially the same thing you do when you do like say a statistical survey when you ask a bunch of people something you you basically get a significantly or a a a representative sample of the data okay so this is actually done by the way okay so the so this I've actually run through the training and then at the end we can we can actually check the accuracy of our neural network so in this case we're actually got about 90% which is pretty bad but this is a very simple like one layer neural network so that's essentially like kind of how you can you can use TensorFlow you can basically create these 10 these these steps to to run through it but all of these steps are actually or all of the actual computation is done under the hood in part of the CPU or in the the the C plus plus core and that's also mapped to it maps actually two devices so if you have GPUs or CPUs available it will actually map the operations to those particular devices so in this case I'm running this on like say like I think a 32 core machine so it'll actually map that so I'll talk a little bit about TensorFlow later but let me go back to see if I can find the back button it's not the back button it's a whole new thing okay so here I'm going to look at a little bit more a little bit more complicated example where we can get a little bit better accuracy so here we're we're training or we're using the same exact data that we did before but we're going to actually use build what's called the convolutional network and this is this was talked a little bit about earlier and so this is this basically allows you to to look at the image like kind of in parts and and basically pick the specific features from each part of the image and this helps with things like like say if you write the way that I had it earlier you know you had the the image and you had like if you saw pixels in a certain location then that would indicate what number it was but what happens if you write this the the the zero or whatever but you actually translated it slightly over a little bit that would actually change the way that the you know the net that particular network wouldn't be very good at figuring out that that I just moved the zero over a few pixels instead so so stuff like that that's kind of convolutional looking at it helps a lot but in this case what we're going to do is we're going to initialize the weights and biases a little bit different I think this one is just doing this kind of I think this one was picking like kind of random random weights to begin with but here's the kind of convolution part so essentially what we're doing is we're going over the image and we're picking a particular these are these are what with the what are called kernels I guess over the image and then we're kind of building this other value or this other kind of tensor that indicates that has a particular value for each of the for each of the the picked kernels over the over the image and then we can actually work on these so this is just picking kind of features of each individual part of the image rather than looking at the whole image or the image as a whole then we can take take that those type of things and use what's called pooling pooling is another kind of a method that you use to basically kind of the most common example is max pooling where you take the the individual value from a part of the of the tensor and you pick the maximum value so this kind of like gives you a somewhat of a representation of a particular part of the image as well and then kind of put that all together into into a layer and then you can do that like create several layers that like that look like that. So here we're doing a set of full our first convolutional layer by building these bases the building the weights and biases and then building our layer here and then the second convolutional layer takes the inputs from the or the outputs from the previous layer and does the same basically the same sort of thing and then at the end we create like a this is a densely connected layer so as the some of the previous talks talked about like the convolutional layer is not 100% connected between the values because they're actually using this kind of translated kernel over the image but the final output layer is kind of a densely connected layer which allows you to kind of just do basically the exact same thing that we did in our previous layer but we're just we don't have we didn't have the convolutional part and this will allow us to you know get a much better kind of output. I'm not going to really talk about dropouts but and then we have basically the real layer and this is essentially just doing a softmax on the outputs of our of our last part of the of our last output from the previous layer and then you can kind of train to execute the model this in this particular one we're doing this using the same kind of cross entropy but we're using an atom optimizer instead of a gradient the regular gradient descent optimizer and then with those kind of optimizations you can kind of get a a much better output or a much better performance so here we're actually doing a lot more training on this particular one because it's a deeper network and we can we can train it or it scales a lot better our previous one if we continue to train it more and more it probably wouldn't get it wouldn't get very much better than 90% but in this case we can train it quite a lot more times in order to improve the accuracy so we actually train it about 20,000 times on mini batches of 50 and so they will go through this is actually me doing this because it said this takes about five minutes or something to actually run through but you can see from the output that we get about 99.2% accurate which is a good deal better than 90 90% right so instead of one in 10 you know it's like around one in 100 is is is classified incorrectly so you can do things from from very simple networks to more cut much more complex networks so let me go back to my so one of the other things that you can do because TensorFlow has this kind of internal representation or knowledge about how the graph and everything is working together is you can essentially law like write log output files as you're doing training and these can then be read what by a by an application called a tensor board board so we were obviously very unique nice with the names you know tensor flow tensor board you get the idea but what this is really what's really cool about this is that where is it can I make it where do I make this bigger there it is is that you can look at the the things like the accuracy the values of the loss functions and these look at these kind of graphs as you are training and going over the data to kind of see how your network is performing so in this case we're seeing the actual accuracy as we're training it so this is one of the this is data I think on an all on the simple version so once we get up to about 90 percent we get there pretty quickly but we don't really get very much better as we train the data but you can look at like things like the accuracy but you can also look at the actual loss functions this is cross entropy looking at the cross entropy value and that kind of goes down and down and down and down so this should actually be the close to the inverse of the the accuracy but you can also look at many of the other values and this these basically corresponds to the to the variables or the the the individual parts of your of your the basically the values that you have so here cross entropy was an actual object a python object that you can use or that was defined and then you can get this kind of log output data so other things like the max and the mean and min and stuff like that are all kind of part of that the part of that let's see these are like kind of input images that you can look at but one of the other cool things is you can actually look at the graph of the data itself so or of the the the model itself that you're building so here we have a two-layer network so if we have a two layer network we can actually just kind of like zoom in and look at the individual pieces of the network like the weights and the biases and things like that for for individual parts of the network look at things like the dropout values the the loss function as well so like this this basically gives you from the python that code that we wrote will give you a full kind of graph representation of the of the network so in the case of say something like a very complicated you know the image net thing that I was showing you earlier you know you would see this huge huge graph of for that was generated by that but this is really cool because it helps you visualize your your neural network that you you've defined so let's go back here I have about 10 minutes or so left or something right yeah well we'll get there so the the main difference between this between distributed training and or between TensorFlow and many of the other length libraries that are out there is that TensorFlow is built from the from day one with distributed training in mind so essentially TensorFlow is built in such a way that we want to be able to production eyes or to actually do practical work with our with our network with our live with the library and with our networks so we want to be able to do to train things faster and based on the kind of like hardware kind of breakthroughs and improvements that we've done in the past we've made in the past we want to be able to utilize those to be able to train models faster so TensorFlow supports multiple different types of of parallelism so multi model parallelism which is essentially breaking up your model so each one of these machines takes a different part of the model and you basically feed it through through here and basically break up the work that way it also supports what's called data parallelism which is hopefully on one of these slides no it disappeared so data parallelism is the opposite where you basically break up the data instead but each one of the the the machines has a full copy of the model so you're basically splitting up say like record one through a hundred and sending it to one machine and you know a hundred and one through 200 and sending it to a different machine and then breaking up the values or the work that way and there's a number of kind of trade-offs between these whether you use like you do like a full graph or a subgraph type of model parallelism or synchronous or asynchronous data parallelism this is going to help or like you know there's kind of these pluses and minuses to each of these so that it's kind of there's no like similar bullet to this but I do know that at Google that we use pretty much data parallels or use data parallelism pretty much exclusively so TensorFlow basically supports in a number of ways this these different types of model parallelism and and data parallelism so data parallelism is this one okay so this is where you take the data and you kind of split it up and each one of these replicas has a full has the entire model and then once you've done some training you can pass this to the parameter server so this is the thing that holds all the weights in the biases so as these are updated though it'll push those back to the model replicas and then there's like kind of asynchronous and synchronous versions of data parallelism where you're updating the model or updating the weights and biases in parallel or you're updating them in synchronously for each you know kind of iteration asynchronous is is much faster but can kind of add some some noise to your to your model because these the parameters are kind of changed midway through can be changed midway through a run whereas the in synchronization you kind of run the you split up the data and then you wait until all of the models are finished a particular epic before you go into the next one so this will reduce it but it'll actually make the make the data a little or make the training a little bit slower so this is a kind of an example of how that would run with TensorFlow where you have a bunch of workers doing the paralyzing and then you have some kind of parameter servers and then in between the servers they use gRPC to to communicate so why is this kind of data parallelism important okay so let's say that instead of a cat getting a cat out of our network we got a dog and we're like well we want to improve our network we want to make that better so where do we fix what do we fix in order to make that actually better I don't know maybe this this is probably a good idea I don't know so we do that we'd make our tweak and we run this again and we're like okay yeah like this is right nice and it's like running on my GPU and it's getting fine like and then it comes out it's like it doesn't make it better and you're like oh crap where do I do now like you got to go back and you got to start over again so in order to like run these kind of experiments like you want to be able to run these experiments over and over again like very quickly you don't want to have to wait a week in order to figure out that your tweak went well or not and this is this is a problem with with people who are even experts in in machine learning is like you basically you have a you have experience and you have literature that you can you can use to kind of figure out you kind of narrow down what things you might want to tweak but in the end you need to be able to run that and test to see if the data or if the actual tweak that you made improved or doesn't didn't improve so you basically have to test it and this takes time and so that's why it's very important to be able to do this kind of distributed training but one of the problems is that as you scale up these number of nodes like these number of connections the number of connections between your parameter servers and between your workers increases like kind of exponentially and so this doesn't essentially scale you essentially bottleneck on the network because these these guys are talking over TCP and you essentially get kind of like you know in the order of milliseconds latency between the between the machines so you essentially need to build this kind of like you need to have like a dedicated network or a kind of dedicated hardware network a lot of people use things like if in a band or whatever in order to make this go faster but this is actually something that's a really big problem at the moment so one of the things that we did at Google is like we're releasing cloud ML like internally what we do is we create we do our distributed training but we have a district a dedicated network that doesn't use TCP IP and basically skips the whole TCP IP stacks and is able to have the communication between the the machines run on the order of you know of milliseconds or nanoseconds instead of so nanoseconds or microseconds instead of instead of milliseconds so this is something that we are planning on making public as what's called cloud ML which will allow you to basically run TensorFlow graphs on inside of Google data centers we're also planning on exposing as part of the API the dedicated hardware that we are using for so instead of GPUs you can use what are these these are called tensor processing units I think is what they're calling them but essentially what they are they're dedicated hardware used for doing tensor operations so so we're basically being able to like expose those but to to other people so that they can use use that kind of dedicated hardware in order to like kind of do more experience and things like that so I think that's all I had so I want to thank you for coming and spending you know the last hour here how many of your store wake raise your hand so about like 70% of you so but thanks a lot for coming definitely check out the tensorflow.org website there's tons of really good examples like if you go here and then if you go up here there's like tutorials and documentations this is actually really really good and has like lots of good examples about how to how to use TensorFlow and especially if you're also a person you know there's different ones for different levels of people as well as how to use TensorFlow serving to kind of move towards actually productionizing your your models so thanks a lot for coming and I hope you enjoyed that presentation yeah two-ish questions I think we have like two minutes so that works your question has to be about 15 seconds and then you know maybe 30 seconds for me to answer it is there anything like profiling for this kind of models like to have an overview of how many multiplication how many parameters that does each block of a flow graph need so you're talking about like actual like time that it took to to run it I don't know if if tensorboard gives you that I think that it probably should if it doesn't I don't know if that offhand if that actually is but I think that that could be something that you could visualize with tensorboard as part of the out you basically log that that is a as a value that you can view here in tensorboard and then kind of see that you know how each part of your graph performed and sort of like things like that other question we got one right there behind you so the previous talks today mentioned that you typically have to do some feature extraction before you can actually apply neural networks will TensorFlow help me speed up my more manually designed feature extraction or is it designed only to do neural network stuff so at the moment it's mostly geared towards neural networks I mean obviously you can do like feature extraction using a separate neural network so you could do like a neural network that does feature extraction and another one that does the actual like classification and stuff but there's there is some work going on there's like I forget what it's called it's like like TensorFlow wide or something like that I think it's called that is essentially instead of like having deep neural networks the idea is that you have these like kind of more standard type of machine learning algorithms and so I think there's there is work going on there to like incorporate more standard machine learning algorithms so you can do that sort of feature extraction beforehand and stuff like that that's kind of ongoing work you might try to search about TensorFlow wide I haven't played with it personally so I don't can't really give you details yeah thanks a lot