 All right. I think we're going to go ahead and get started here. Matt, did you want to say anything to launch this offer? Should I just jump right in? OK, great. So this is the session on running distributed TensorFlow on DCS. If you were at the keynote this morning, you heard EriX talk about how we just yesterday released a package into the DCS catalog for running distributed TensorFlow. And what I'm going to do today is kind of go into the details of what problem this package solves in the community of people trying to run and deploy TensorFlow in production and how we actually built this package to solve those problems. My name is Kevin Cluz. I work at Mesosphere. And I lead a team there called the DCS Cluster Operations team. In terms of contributions to Mesos that made in the past, when I first joined Mesosphere, I added the initial GPU support to Mesos. From there, I worked on some of the pod support or what we call task groups, which I worked on the containerization pieces of that in order to enable that support inside Mesos. And then I moved on to work on some of the attach exec support. So if you've ever used Docker exec to jump inside of a container and run a process there, I added similar support to Mesos and pushed that up the rest of the stack. And so that's kind of my link into Mesos. And then the GPU support is what sort of launched me into working on TensorFlow and getting some of these other things around. How do we use the GPUs now that we have that in Mesos to do more useful stuff on top? So what I want to start off first with is just giving a really quick brief intro into TensorFlow for those of you who don't know. This is the same slide that Eriak showed this morning with this quote from TensorFlow.org that says an open source software library for machine intelligence. That's what TensorFlow purports to be. So what exactly is machine intelligence? Well, machine intelligence is a broad term used to describe techniques that allow computers to learn by analyzing very large data sets using artificial neural networks. So if that's what machine intelligence is, this is a representation of one of these neural networks that you might build in order to do this kind of analysis. And so what you see here, the typical flow for one of these kind of networks is you have some large data set that you want to flow through your network to produce some output at the end. And so one of the things that makes deep learning a little different than traditional machine learning is you have lots and lots of layers here that you're passing your images through. And so in this example here, you might feed a whole bunch of faces into this network. You have some layers detecting. If you've got some patterns of local contrast in those faces you want to find, you pass it on to the next layer. You see if there are some facial features that pop out from that. And the final layer, you're actually able to say, oh, here's some faces. Now whenever new images come in, I can recognize if that's a face or not some level of certainty. OK, so the second part of this definition is that TensorFlow is a software library. So if that's what machine intelligence is, what does TensorFlow really give you? It gives you this library that makes it easy for developers to construct these artificial neural networks to analyze their data of interest. So if you take a look at this picture here, it kind of shows the picture of, OK, I've got some Python application I want to run. I can import this TensorFlow library. And it's going to take care of all these low-level details of how do I actually execute some Dataflow graph? What compute kernels do I want to have? How do I handle networking so that I can actually ship distributed jobs between different machines I have in my cluster? And if GPUs or CPUs or even TPUs are available, how do I make sure I schedule jobs on the right piece of hardware automatically to take advantage of that hardware that exists underneath? OK, and then the last piece of this, that it's open source, which is great for us because we're able to take it, leverage it, change it, do whatever we want. And it was open sourced by Google back in November 2015. OK, so second part to this, we're trying to take TensorFlow run on top of DCS. I think most of you by this point probably know what DCS is. But for those of you that don't, it's an open source distributed operating system. And its main goal is to take Mesos and build upon it with some additional services and functionality. So the way I usually like to think about it is, what it gives you on top of Mesos is some built-in support for service discovery, load balancing, security, ease of installation, some extra tooling, including a comprehensive CLI, a GUI on top of that, built-in frameworks for launching long-running services. The canonical scheduler we use for this is Marathon. We now have Kubernetes support. And then we also have batch job support in the form of Metronome or Chromos. In addition to that, we also have a repository or app store for installing other common packages and frameworks. This is what EriX was referring to earlier as the catalog. And for a while now, we've had Spark, Kafka, Cassandra, some of these other frameworks. And now we're adding TensorFlow in there as well. So a very high-level picture of this. Imagine you've got some GPU-enabled hardware underneath. You've got that running on the cloud. You've got it running on-premise. You've got Mesos sitting on top of that. You've got your batch scheduler, your marathon scheduler, any other frameworks that you want. This all packaged together is what we call DCOS, so that you can leverage all those things together in a nice, coherent way. And what we're adding now is support for integrated support for running TensorFlow on top of this infrastructure. OK, so a quick overview of the talk. The first thing I'm going to do is I'm going to go through a really quick, deep learning overview primer just to give those of you that don't really have a background on deep learning and what it is. Just a quick intro to know what types of problems you can actually solve with TensorFlow. Why would you use this package? And then I'm going to really quickly set up my demo because it takes a while to train your models. And so I want to set up the demo, get some of my models training, and then we'll jump back into the rest of the presentation, and then look back at the demo and see how they're doing later on. And then we're going to go through a typical developer workflow for TensorFlow, how this looks today, both in a setting where you're going to run on a single-node environment or in a distributed setting, which is the problem we're trying to solve for here. I'll then talk through what the existing challenges in running distributed TensorFlow are today and then how running TensorFlow on DCOS helps to solve a lot of those problems. And then as I said, I'll jump back to the demo. We can analyze how it's been performing. And then I'll talk about some of the next steps to where we're going next with this TensorFlow package that we've built. OK, so real quick, deep learning overview primer. For those of you familiar with traditional machine learning, the process that you usually have to go through if you want to train some model is, you've got some input. Let's say we have some pictures of dogs that we want to analyze. The first step traditionally used to be you'd have some human that would look at that dog and say, OK, what are all the different features that I have? There's ears, there's a nose, there's some eyes, there's some paws, there's a tail. I would have to mark all these things and make them available to my model so that it could take that, look at pictures of dogs, see if it could find all those different features in it, then use the neural network to classify, oh, can I actually recognize a dog given all these features, and then output, is this a dog or is it not? What deep learning lets you do is take the human out of the equation, and now all you have to do, instead of marking all these different features of a dog, all you have to do is say, hey, here's a picture, and here's a little box around what a dog is, and then the neural network will take that, figure out what all the features are that make up a dog, and then in addition to that, start to recognize what a dog is, and pop out, is this a dog or is it not once you get to the phase of actually trying to infer what pictures look like, right? So diving a little bit deeper into this, if you think about deep learning and the phase that your different models that you're running through go through, there's always a training phase and an inference phrase, and the training phase is the one that takes hours or weeks or days where you have this model, you wanna make it recognize dogs in this case, so you feed it a whole bunch of pictures of dogs, you label those dogs saying this thing, this object in this picture is a dog, the neural network eventually after it has all of these, lots and lots of samples of what a dog looks like, it eventually trains itself to figure out what that looks like, out pops a trained model, and then from there you can use this model to in the future infer whether this picture is actually a dog or not, and that process becomes instantaneous, and so by instantaneous meaning you hit some end point, it walks through the network really quickly and tells you with some amount of certainty this picture is 97% a dog or it might be a panda in some very small cases, right? The advantage of this is that you can train your model offline for a very long time, and then hand this to someone else and say okay, this now is a trained neural network that is able to recognize dogs and they can put that in their software and it will run very quickly. Okay, so just to recap again real quick on what these different layers do, so instead of the human having to extract these features, the deep neural network is actually able to, at one layer, it'll automatically detect some sort of pieces of what might end up becoming a dog's snout and at the next layer, that sort of forms itself into I recognize this is a common feature that makes up parts of a dog's face and then at the final layer it's able to say okay great, this is actually what a dog looks like and because of lots of different samples it's able to over time extrapolate kind of the abstract idea internally in the computer who knows exactly what it's thinking, but it's able to figure out what a dog actually looks like in terms of its own neural networks. Okay, so for the demo itself, what I'm gonna do is I'm going to train the Inception V3 image classification model on the CIFAR 10 data set, so for those of you not familiar with what those are, the Inception V3 image classification model is an open source image recognition model, it's something I just took off the internet, it's able to, if you train it with a bunch of data it will eventually be able to recognize a bunch of images. This image that I show here down below that's actually a snapshot of the representation of the real Inception V3 model, it's not just a mock-up, and then the CIFAR 10 data set is a well-known data set with 60,000 low resolution images with 10 different classes of objects, so there's trucks, planes, ships, birds, cats, et cetera, so we're gonna train this model to recognize all of these different types of objects, right? And what I'm gonna do for this demo is I'm basically gonna set it up in two different ways, I'm gonna run two different TensorFlow jobs on this. One of them is gonna be a non-distributed job running on a single worker, so I'm gonna feed the input into that, it's gonna run on that one worker, try and output a trained model, and then I'm going to do the same thing in a distributed job where I have several workers, some of which will be running on CPUs and some of which will be running on GPUs. In this setup, I basically have a one master eight agent cluster where each of these agents has four Tesla K80 GPUs on it, eight CPUs and 32 gigabytes of memory, and I'm running this all on top of a Google Compute Engine to do my calculations. Once the demo is actually done, so we won't look at this part right now, but once it's all done, we'll be able to connect a visualization tool called TensorBoard to the data that's output from this trained model, and we can kind of see it won't be done because this is a serious model that takes weeks to run, but we'll be able to see the progress of what's going on in real time as to how the two models are progressing alongside each other, so the single node version and the distributed version walking side by side. And just to reiterate what I just said, this is a serious model, it's gonna take potentially over a week to fully train, even on a cluster of expensive machines, and the goal here is simply just to demonstrate how easy it is to deploy and monitor these large TensorFlow jobs on DCOS. Okay, so with that, I will jump to the demo, and I usually like to record my demos and I didn't this time, so I'm really hoping everything works as expected. But I have a script here basically just to kind of give you a quick rundown of everything that I'm gonna do. So the first thing I'm gonna do is I'm going to show you that I actually have this cluster up and running. So I have a terminal already open here, I have a command line tool that I wrote called DCOS GCP, which it's not something that's available for everyone to use, but it's something that I use myself just to kind of get clusters up and running on GCE hardware. And so you can see that in this one, I'm running a stable version of DCOS, which is DCOS 1.10. I've got a single master, I've got a single agents, I'm sorry, I've got eight agents. The status of the cluster is currently healthy, and this is my leading master IP. From there, I want to show that we now have this TensorFlow package inside the DCOS catalog. So if I come back to here, and I go to the catalog, if you go ahead and start typing TensorFlow, you'll see that there is a beta TensorFlow package available, this went live yesterday. So if any of you overspin up a DCOS 1.10 cluster now, this should be available for you to use. From there, what I would do is I would clone this repo called the DCOS TensorFlow Tools repo, it's got a bunch of examples, including this CIFAR 10 example that I'm gonna run through. I've obviously already pre-cloned this, and I'm gonna CD into it, and then I'm gonna show you these two examples that I'm gonna run, the CIFAR single and the CIFAR multiple example. So if I go ahead and open these two guys up, we can see what an actual package definition of something you might wanna launch on DCOS TensorFlow is. So one model would be, if I wanted to launch this package, I could click on here, I could go to configure, I could go here and manually specify a whole bunch of parameters here, but typically the simpler deployment strategy is to have a JSON file defined that fills in all those parameters, then you can use the command line to deploy it. And so that's what I'm gonna do here. And so we can see that we have a URL to a zip file that contains all of the artifacts that we need in order to run this application on any individual node in the cluster. There's a path to the actual binary that I want to execute once that zip file has been downloaded on any individual machine. I've got some name for the job that I wanna run. There's some job context, which is some variable setup inside of the job that basically there are gonna be parameters that I can see in TensorBoard and that I could potentially change on the fly to optimize how this model actually runs over time. And then a bunch of things pointing it at a shared file system where I want my output to eventually be put that includes the data for the model being trained and so on. And where the real meat of this comes is that you can see that there's sections in here for specifying how many GPU workers I want, how many workers I want, which are just normal CPU workers and how many parameter servers I want, right? So I can quickly just specify exactly what I want, the configuration of this cluster to be in terms of how many parameter servers and workers I have and then DCOS will take care of stitching that all together and getting it running for it. And so obviously in this single example, I've got a single GPU worker and a single parameter server. And if I go ahead and just quickly show you the CIFAR multiple one, the only real difference in here is that the name has changed and now I'm gonna run it on three GPU workers and two parameter servers. Make sense? Okay. All right, so with that, I think the next step is, oh, the next thing I want to do is I just want to show you real quick, I mentioned the storage bucket where all of this output's gonna be dropped. I wanted to just go there real quickly and I ran this a little bit earlier today so there's this folder in here that's produced called Mesoscon. I'm gonna actually go ahead and delete that just to show once this actually runs, that will get created again and we'll have brand new data populated inside there. I also want to just quickly show that at the moment, there is absolutely no CPUs or GPUs allocated to anything in the cluster and once I launch these guys, we'll see that they get allocated to these different jobs and then they start executing on there. So right now I've got 64 CPUs free, 32 GPUs free and so on. So if I now take this DCS package install beta TensorFlow commands and run it with that CIFAR single JSON configuration that I showed you guys before, go ahead and run that from the command line, say yes, I accept that it's a beta version, it's gonna go through and execute it and if I go ahead and start up the second one, there we go, agree to that and then jump back to the dashboard and go to services. I should see both of these guys coming up and running soon. So they're both in the deployment phase, so once these guys actually start running, they'll have a whole bunch of jobs underneath them because the TensorFlow package is built with our SDKs so it's not just a single job that's launched, it's a collection of jobs where the first job that comes up at the scheduler and then eventually a bunch of other jobs will pop up. So I'm just gonna leave this running for now and we'll come back to this at the end of the talk and see what the progress is that these guys are making. Actually the CIFAR single one we can see is already started and is healthy. Okay, and so is that one. All right, so jumping back to the talk. So what I wanted to go through next, as I mentioned, I wanted to talk about really quickly what the typical developer workflow for someone working with TensorFlow looks like. First in a single node environment and then in a distributed environment and sort of show you what some of these challenges are with actually getting something to run in a distributed environment, right? So the first thing someone would do if they wanted to work with TensorFlow is they'd probably download and install the Python TensorFlow library. You need to do that as a very first step so you can get anything going, right? Then you wanna design your model in terms of TensorFlow's basic machine learning primitives and make sure that you can use those primitives to actually do the job that you have at hand. Then you write your code, you optimize it for single node performance and then you train your data on that single node and out pop some train model in the end, right? Pretty straightforward in relative terms. Now, if you wanna move this to a distributed setting, the first three steps are very similar. You download the TensorFlow library, you design your model in terms of those, you write your code, now optimized for distributed computation rather than single node computation. But where the real challenge comes is that now you've got all these other steps to get this up and running, right? You have to provision some set of machines to run your computation. You have to install TensorFlow on them. You have to write the code to map any of your distributed computations to the exact IP address of the machine where those computations are gonna be performed. You have to deploy your code on every machine and only once you've done all of that, then you can start training your data on the cluster and eventually output some train model, right? And this whole process can be tedious and error prone, especially if you wanna iterate on different optimizations that you're trying to make in the code over time. So what DCOS does is it automates all of this process for you. It basically gives you a very similar workflow for what you would have with your single node setup, but now you can do it in a distributed way. So what are some of these challenges in a little bit more detail? So the first one, which I hinted at a minute ago, is that there is a cluster spec that you have to specify at the top of any distributed TensorFlow application that you're writing, and it's incredibly tedious to actually try and keep in sync with whatever machines you've created and deployed. So users basically need to rewrite this code for every job they wanna run in a distributed setting. You go provision your machines, you then come back to this cluster spec, you specify the IP and ports of all the workers that you've provisioned, you do the same thing for all your parameter servers, and then you even have to do this for any code that you inherit from your standard models, right? So if you take some code off the internet and you wanna run that in some distributed setting, you have to go into the code, create this cluster spec, and then stitch all of this stuff together before you can run it. Next thing is that dealing with failures is not typically graceful in a traditional TensorFlow setup unless you do a lot of work yourself. So users need to stop training, change their hard-coded cluster spec, and then manually restart any jobs if something happens to go wrong. There's no easy way to just tear it all down and deploy something new if you found a bug in what you were working on. The next thing is that manually configuring each of the nodes in the cluster that you want to run your TensorFlow job on can take a long time and is typically very error-prone. One example of this is setting up access to a shared file system so you can actually check point some of these summaries in the output data that you want to have come back from your trained model. That requires authenticating onto every node, somehow passing some credentials there potentially, and so on. In addition to this, if you ever wanna tweak any of your parameters to make your model run in a more optimized fashion, that requires re-uploading your code to every node and then redeploying everything, right? How does DCOS fix this? Well, instead of having you define one of these cluster specs for everything that you do, we have this high-level service definition, which I showed you guys earlier for the two examples that we're running now, where instead of saying, okay, here's the exact machines and here's what type of worker or parameter server or component they associate with, instead you can just say, well, this is how many workers I want, here's how many parameter servers I want, and then DCOS and the SDK will take care of automatically creating this cluster spec for you and then passing it down to the code that you're actually gonna launch, right? So that's kind of what this next picture shows, and the thing I wanna highlight from this one though is that what we're really doing here is we're sort of separating the deployer responsibilities from the developer responsibilities, right? The developer can say, I'm gonna focus on how do I build my model the best way I possibly can, and then I'm gonna leave it up to the person that actually goes to deploy this to specify, okay, well, this is how many parameter servers I want, this is how many workers I want to actually execute this model for this instance of running it, right? And the SDK and DCOS will take care of this mapping for you. The other thing that we do is because we're running with the DCOS Commons SDK, we have the ability to cleanly restart any failed tasks and reconnect them to the cluster if something goes wrong, right? As I mentioned before, if this happens in a traditional setting, you have to, first of all, notice that it happened and then go in and restart the jobs yourself manually. Here we'll restart them for you using the SDK. Second one is that any credentials that you wanna pass, you don't have to get authentication onto the rest of the system. You can use our integrated DCOS secrets service to store secrets beforehand, and then every service has that available to them. So I actually use this in this example to give my TensorFlow application credentials to write all of its output to the Google storage bucket that I have running external to the cluster. So I didn't show you this, but one thing I did before when I first set up the cluster was I went in and I downloaded some service account credentials from Google, put that in a secret, and then I pointed my jobs at that secret so it would know how to pull it down and use it to write its output, right? The last thing is that we actually use a runtime configuration dictionary to quickly tweak any of these hyperparameters that you might have between different runs of the same model. So if you run something, you realize, ah, this isn't quite running as quickly as I'd want to. I wanna be able to optimize something. I can quickly just jump in and change these parameters without actually tearing down and redeploying the cluster from scratch. You can just tweak these on the fly and then for the ones that matter, the SDK will automatically make sure to redeploy just those instances of the workers and parameter servers and so on. Make sense? Okay, so that's it for the meat of the talk. Hopefully I've given these jobs that I launched on DCS enough time to actually produce some output so we can look at it through TensorBoard. So if I go back to my UI, I should be able to quick through CFAR multiple and I see that there's still some node staging here. Let's see, it looks like I didn't quite give these guys enough time to actually finish and start running. Either that or they're actually failing, which is not good. Let's see, there's nothing I can really do to combat this at the moment. Fortunately, I'm not gonna be able to show you data from the live demo. This is why I normally record this stuff, right? But what I do have is I have a backup of this. So instead of pointing my job at the data that I would have had in this bucket, I do have a run of this from earlier today or last night actually where if I go ahead and I grab the TensorBoard command, which would be pointing at all of the data that popped out from these. If I point that at a Cluska TensorFlow backup repo, I should be able to pull this data that I ran through earlier to gather the data from these guys. So okay, so if I do that, I now have a server running on my local host. So I'll go ahead and grab that, come back to the browser and execute it. What this is gonna do is it's gonna load all of the data from those and we should be able to browse through all of the data that results from it. Let's see. Actually takes quite a while to load. So we may not see it come up for a while. So why don't I go ahead and jump back to the slides and just finish up what I have from here and hopefully we can come back and see what's going on there. So one thing I wanted to do is just reiterate what the demo setup was gonna be. It's training this inception V3 image classification model, running it on the two different settings and then visualizing that through TensorBoard. One thing I wanted to just touch on here then is kind of what our next steps for all this stuff are. So what we have today is this single framework. So what that means is that you take this framework using our SDK, you install it via standard DCUS package management tools. So the command line that you saw me run was a DCUS package install beta TensorFlow and so on. But because of the way that that's currently run, you actually have to manually start and stop and remove the framework from the cluster whenever it completes. And so this is a little bit tedious because you have to monitor it, make sure you know when it's done, tear it down, remove it from the cluster and then if you ever have a second one you wanna run, you have to go and do the same thing. The world we wanna move to is where we can have this sort of meta framework that you launch once. It sits up there in the cluster, you can interact with it through a CLI and then what it will do is it will take these instances of these frameworks that we're currently able to launch today and it will be able to launch multiple of them side by side, right? So that this meta framework you can interact with it to track the progress of any of the frameworks that it's launched and notice when they've completed, analyze their data and then even once they have completed the meta framework will store some of that information about how they ran and where their output data has been shipped to and so on. And then in addition to that it will automatically start and stop any of these frameworks once they've completed and remove them. So kind of the flow that we wanna see for this is the ability to be able to say DCOS TensorFlow, run this job, I want this many workers, I want this many parameter servers and then you communicate with that meta server and it just goes off and launches this, right? So again, we don't have this today but this is the world we wanna move to in the next few months or a year. Okay, so special thanks to all collaborators on this stuff. So Sam Pringle, he was the one that really sort of took control of this and built a lot of the functionality that we see in this. He was an intern last summer and he's continued on throughout the year working kind of 10 hours a week on improving this. And then a bunch of other people are involved, especially people on the SDK team being very helpful with any questions we had to get things up and running in the way that makes sense, not just for oh, let's get it kind of working but eventually moving towards a certified version of this package over time, right? Okay, so questions and links, you can follow in these links. All of these slides are available on the schedule.com or whatever the website is for MesosCon. You can navigate to this talk, you can download these slides and all these links will be available in there. The third one is probably the most important if you actually wanna learn, just quickly download a cluster, or sorry, spin up a cluster, download the TensorFlow package and then start running something today. So yep, I think that's it. I apologize that the demo didn't quite go as planned but these things happen sometimes and that's record your demos beforehand. All right, thank you. Oh, oh yeah, sorry, good point. So yeah, I mentioned that sometimes TensorFlow takes a while to load up so we can actually, it's still loading now but maybe after the talk, if people wanna look at it and see what's going on, you can come up here and we can talk. I think we have about 20 more minutes in the session. I ended early because of some of the problems with the demo there, so. Cool, all right, thanks again. Any questions? Yep, sorry. How different is it to the TF Mesos package which is available on GitHub? Yeah, so we originally looked at the TF Mesos package and we thought about using it or leveraging it and building our stuff around it but one thing that TF Mesos doesn't give you that this does is it doesn't give you quite the same integrated experience with DCOS that we wanted. We really are pushing to have most of our services running with the SDK framework because if you remember something that Eriex mentioned in his keynote earlier today is that if you build the frameworks using the SDK, there's a very common pattern for how you manage them and upgrade them and these sorts of things. So we really wanted to make sure we were building it in that same ecosystem. And alongside that, I actually was working with the TF Mesos people for quite a while to figure out how to get it running on DCOS. It runs very well on standalone Mesos but I couldn't quite get it running on DCOS the way that I wanted to. And it wasn't that I just gave up but it was, well, we had these other reasons we wanted to get it with the SDK and so I just kind of moved to that instead, so. On this removing engineering from between data science, the scientists and the cluster topic. So GPUs are quite scarce usually, right? So can we define quotas for the particular data scientists on GPUs like how many hours of say K80s or P hundreds they can consume? Yeah, you definitely can in Mesos. I don't think that support's been pushed up through DCOS yet but it's definitely something we're looking at because it's, that is the model we definitely want to move to in the future, right? We have some things in place to try and combat this scarce resource problem specifically with GPUs so one of these is that you have to be a framework aware of GPUs so that you can do GPU aware scheduling for any job that come in. Marathon happens to be one of these GPU aware schedulers which is why it can launch jobs on GPUs and we also built this TensorFlow framework to be a GPU aware scheduler so that it can do that but any other random framework that you bring up probably can't schedule things on your GPU and so it's not perfect but it at least isolates some jobs from randomly landing on those GPU machines and using up all the other resources and making those GPUs basically just idle because you can't get any other work to land there, right? Okay, any other questions? Kevin, I had a quick question. There was a couple of terms that you introduced early on in the slides that maybe for the folks like me who are not too familiar with the machine learning world and maybe you could quickly explain what those are. The first was compute kernels and the second one was TPUs. So compute kernels are, I'll probably not do a great job of this because I'm a systems person, more than a machine learning person but compute kernels as far as I understand are standard ways of analyzing typically matrices and doing computations on matrices in order to produce some results from that. And so there's lots of different types of compute kernels that are very standard and that you can integrate into your machine learning algorithm or any other thing that you're working on. Yeah, that's all I was trying to get at was that TensorFlow has implementations of all these things so that you can leverage them from your code. Yep, and TPUs? And TPUs, so CPU is central processing unit, GPU is graphical processing unit and I don't know what the T is but it's a something processing unit and it's a piece of hardware that only exists on Google infrastructure. So they built this up and it's kind of a more heavy duty GPU I guess is one way to think about it. And they're designed specifically for running TensorFlow type applications. Yeah, maybe it stands for TensorFlow processing unit, does it? Okay, yeah. Okay, so the comment was that it's a collection of GPUs basically. Awesome, thank you. Yeah. Any other questions? So these jobs that you showed us it seems to me that it's mainly batch oriented. Correct. So maybe this is not the time to take up this specific issue but I have a very specific issue where I have a lot of vectors that I need to compare with other vectors. And I would like to do this as part of my web front and would it be possible to set up and use TensorFlow as a backend for a request in real time? Yeah, I'm probably not the right person to answer that question because I'm actually not a TensorFlow expert. I know how the backend works to get these things running but I don't know the details about what the best usage of TensorFlow itself would be, unfortunately. All right. Any other questions? One in the back. Well, I think it's mentioned that it's currently closed source. So is there a date or something that will be available? There's no date yet but we are in talks about making it open source. There hasn't been a final decision about when or even if it will be but the hope is that it will be fairly soon. Yep. Okay. All right. Any other questions? All right. Thank you very much and I hope you enjoyed the talk. Thanks.