 Okay, so Hi everyone for those of you who don't know me. Hi, my name is Sam. I And I'm one of the Google developer experts for machine learning. I and today. I'm gonna be talking You know, I I also do. Yeah, I work in deep learning A little bit about myself. Yeah, I work in deep learning mostly in stuff related to language and dialogue. I I've done multiple startups in the past both B2B B2C. I Currently I'm working on a startup called Red Dragon with my co-founder Martin who's also here today and so what does Red Dragon do so I We do a number of different things. We're doing deep learning consulting and prototyping We're also doing education and training. We've just finished doing a training in Singapore just recently We'll have more trainings coming up later in the year. We're also working on Some key products that relate to conversational computing Creating natural voices and learning to sort of reason over knowledge bases All right. So today's goal is I want to basically go through sort of some of the key steps Whoa, why is it not coming up? Ha ha I want to go through some of the key steps of sort of like what it takes to actually make an AI product And I think that one of the things I see is a lot of people learn how to do MNIS They learn how to build a basic model they learn to do stuff You know maybe in Jupiter notebooks, but then they don't think about actually sort of taking a basic model and transforming that into an actual AI product I so I want to sort of discuss some of the real world challenges and I'm actually going to do it with a real-world example that I built I About ten days ago. I So I gave this talk last weekend in in Bangkok at the day there And I built this just a couple days before that so you'll see how it goes anyway So Google pretty much all of Google's products now contain some form of of machine learning or deep learning. I One of the classic examples was there's this slide. That's actually very old now This is almost coming on almost two years old that shows the the number of repositories inside Google that basically have I Machine learning or deep learning or ideally tensor flow In that repository and it's well over six thousand now I think it's getting close to seven thousand or more already and Basically, you know deep learning is now being used inside Google for all these different Products and certainly the reason why I sort of show this is to show you that you know Google is definitely leading the way but Everyone else is following behind and you're gonna see more and more that most products that are sort of digital products in some Way will will use machine learning you've got companies like Netflix who are making extensive use of this sort of technology You've got all different sort of fields, you know doing that So what I wanted to do today was basically look at okay What are some of the challenges that you have when you're building, you know an AI product and the first one I May sound a bit silly, but this is one of the you know most important ones I think is that just can it be done? Is it something that's even possible? So we get a lot of people asking us, you know, oh, can you build something that does X? Which is just you know like can you build something that you know? They might as well be saying can you build something that can read people's minds and then predict the stock market and do All 25 other things in one shot, right? and So the big thing you want to do is see like okay Is it possible and so one of the other things you want to ask yourself to sort of justify that question is has someone else done it before I? Another big and sometimes you'll see that you know that just because someone hasn't done it before doesn't mean you should you know Stop I but you should be wary of you know things that maybe haven't been done before I Is that data available? I think this is one of the key questions that we're constantly asking clients or people that we work with is You know you find time and time again that what people want to do is actually quite doable if they had the right data and You sort of end up saying to them well your business could collect this data But you need to go away now and collect this data for the next six months to 12 months And then come back with it before we can sort of do something with it The other things other sort of questions you ask yourself is like okay Can you find or synthesize the data? Especially if you're a startup you're often not going to have a lot of data at the start so you're going to be looking for ways to synthesize data and You know some of the key things I just wanted to sort of point out was that These sort of ways of looking at things are very different than sort of just building a model to learn about machine learning and also It's very different than academic sorts of things of building up You know a paper to be at NIPS or something like that. All right, so The way I like to think about it is can this task be broken down into component parts? One of the biggest sort of myths is that people think that okay Just because they learn about end-to-end models that in industry everything then to end usually not the case Right usually you have different parts or different models doing different things The other question I like to ask is okay. What's your problem like? Is it a classification problem? Is it a regression problem? Is it a generative problem? I hate you be surprised that a lot of things that will look a certain way Can be sort of refactored or changed to be a different way? I a classic example of this would be smart reply inside Google So the first you know paper for smart reply Or first of all this started out as basically an April Fool's joke in 2009 The okay Google made this product that would just reply to your emails Then people started to think about okay Well, maybe this could be done and the first sort of academic papers that we're looking at this Would use it doing it as a sequence-to-sequence generative model So it would basically read your your subject line It would read the text of your email and then based on that it would generate a bunch of text for reply the problem with this was that It would generate three replies and it turned out almost always that one of the three replies was I love you Because it was just safe right and the model learned that okay if it put that you know That was a safe response for a large proportion of things To get to the actual sort of production version and again, this is an older slide I don't I think actually smart reply now on mobile makes up about 30% of all replies. I the It needed to be changed and what happened was they actually changed it to be becoming from moving from a generative problem to being a classification problem that it basically looks at your email and decides okay for these responses I will pick one of 29,000 different classes So it's a very it's moving from something. It's sort of infinite to something. It's very finite and You know, I must admit when when we first heard about this We were quite disappointed because the academic paper was so amazing at the time That we thought oh wow if Google's actually got this working This is gonna you know This is could be a really big step along the way to you know a whole bunch of different products But it turned out in the end that okay I it was just turned into a classification problem and this is something that's very common in When you're building products like AI products for the real world Something that will seem to be sort of like totally generative when it actually makes it out into the real world. It becomes very um You know becomes a classification or something much simpler than what it was in the academic sense So, okay, this is what I wanted to start out doing So I I remember reading about this paper I don't know if you guys read about this last year and I thought oh, this is really cool at some point I would like to do this so probably about two weeks ago I started actually reading the paper and thinking okay Well, how am I you know gonna implement it look at some of the examples that we're out there And what this basically is is you draw a picture of a website and it turns it into code And it basically just takes your drawing and it works out Okay, how will it represent that in HTML? I and this is quite this is the paper that it came from I and the original one was was picked to code where they basically tried to do for HTML I They did it for I think XML for for iOS displays and and for Android displays for me I just focused on the HTML part now one of the first questions I always ask myself is well what is this like what is what kind of you know What kind of problem is this going to be compared to other things that I've seen in the past so the classic one I That this is is like is basically like image captioning So you may have seen this you know a year or two back the whole show and tell Concept of where you basically train up a model so that it looks at an image and it writes a caption Based on what it sees in that image I and you can see that it's pretty amazing that it has the ability to look at this picture And even though we can't sort of see any any sort of string or anything between the put the human and the kite The model is able to work out that okay if the person's facing the kite if the kites a certain height in the air Then it probably is a person flying the kite and Given enough data, you know both images and enough captions It has the ability to you know to you build the model so it has the ability to learn to do this So this is something that Martin and I were both very interested in a bit over a year back now And actually Martin showed a good example of doing this with a quite a what's become now famous paper Attention is all you need at our TensorFlow meetup So I encourage you to come along because often we show very cutting-edge things there So this is the first thing it's that I thought okay. Well, this paper is kind of like this The second one is that the whole idea of that if you basically use some sort of RNN or LSTM You can generate a config file So the slide here comes from a paper called neural architecture search Which basically led to nas net right which James was talking about you know This current state of the art in image Image net models is nas net and the way that this model worked was I can't go into all the detail But one of the key parts of it was that it basically had this RNN controller that would predict Certain parts which would then become like a config file Which would then get turned into basically a network. So for example, you can see there It's got like the number of filters. So here it's predicting one layer of a CNN And it's basically deciding the number of filters the the filter height filter width your striding. I and What it would do then is it would basically take that just very quickly about nas net What it would do is it would take that it would then build the model train it up and Then check the accuracy score and then use reinforcement learning back to feed into the RNN as a reward thing I so then over time it would change But the key concept here is that I was interested in was this concept of basically you can take some sort of RNN and You can use it to make a config file And that config file could be used then to build something like an HTML file And so another thing that this could be like that you can think of is I Some I'm sure many of you have played around with Cha RNN That how many people know the the model Cha RNN where you basically build a Recurrent neural network to predict the next character of text And maybe you've done it with Shakespeare or maybe you've done it with you know Some sort of text and that you find that after a certain number of iterations the model has the ability to actually predict readable text after a while so These are the things that I started out thinking well This problem is kind of like that because we're taking a picture and we're going to turn it into some sort of config file Which I'm then going to turn into an HTML file So let me sort of explain a little bit about yeah about the thing So there's key two key sort of bits of data that we're going to input we're going to input images and We're going to we're going to input sort of GUI configs So the GUI config on I'll show you a picture of one later on is basically just a representation of an HTML page in a much simpler format Then the model that I'm going to use for this is basically a convolutional neural network going into one of these LSTMs So we're going to basically a you know extract features and then go into that I The GUI file is basically going to become sort of like our vocabulary that we're predicting And then the last of all we're going to use really something that's not deep learning base But just simple coding base is this compiler and this compiler will basically take our predictions out and then use those Predictions to turn them into a proper HTML file So another way if you can look at this is you've got your CNN your CNN you can think of as being like a feature extractor It's looking at that hand drawn image and it's trying to determine. Okay. What are the key features in in here? Then that's going to make the configs and so it's a then we also make some configs and we tokenize them We then put both of those into a model to basically train An LSTM to come out of this and then last of all what we're going to do is as we're coming out We're using a softmax just to simply predict. We're turning this into classification now simply predict What is the next token based on what we've seen already? so I and Then we take those tokens where we call those tokens the domain specific language and we convert those to HTML So let's actually look at sort of a picture of the model architecture So here you can sort of see we've basically got an LSTM and a CNN going into an LSTM and That's going to predict our domain specific languages and our tokens in that case Then we're going to basically went at you know when we're doing in France We're just literally going to put in the the image itself and rely on the that to extract the features and to bring those out into our sort of LSTM and that's going to make the our config file So another key part of this is I've kind of touched on this already is the compiler so This is sort of like a real trick of the trade that you see for a lot of these sorts of things and that if we were you know How many of you know LSTMs and RNNs? Quite a lot right? Okay. Good. So if you've dealt with LSTMs and RNNs You know that they have a very limited amount of prediction power meaning that there's no way we could really train an LSTM to predict You know a thousand characters of an HTML file It's just not going to be that you know accurate at doing it And it's certainly going to fail at things like learning to open and close Different tags or learning to do that kind of thing. I So we need to produce this sort of config file That's what I was showing you before with the nasnet slide and then we have this the domain specific language these tokens That are going to represent and be compiled into the raw HTML stuff All right, so this is what an example of a config file looks like So each one of these is a separate token that basically the network is going to be able to predict Now, of course, remember we you know, these are going to be then tokenized and turn into numbers and fed in and But when we turn those when we make the the prediction out of numbers coming back out This is what we want to get something like this and you can see this has got a start a start token and end token so any of you have worked with Any sort of NLP in deep learning or any sort of sequence to sequence stuff This should be quite similar to those things you've seen before We then take that GUI and We basically have to then use a template to basically turn that back into the HTML and so this GUI basically, you know This config file just represents chunks and snippets of HTML if you think about that And it's just going to basically stick it all together and one of the things that we need to have though is a template file So we have a template file that is mapping from those tokens that we're predicting to actual raw HTML in this case and you can see that we've got for example an opening tag a closing tag I we've got it, you know different things for a header for a button for, you know a few different things like that Okay data now. This is where this becomes a real challenge. I is that Synthesizing data is you know one of the biggest challenges you're going to have in making any sort of AI product. I Usually you're going to have to find some way to make data at scale if you're going to make a commercial product It's been it's it's big. It's very well known secret I think that Google, Baidu, Facebook, etc. often will make products just simply to gather data Right, this has been become very controversial obviously with a lot of the things going on with Facebook recently I but you know Andrew has talked about this that Baidu There are a number of products that they've made with no intention of making any money out of that product that it's a lost leader purely to gather data I And if you can do something like that for your particular situation that's something you definitely want to do This will usually be one of the biggest challenges you face in making you know AI products I you then also often have to build your own tools to annotate data or to be able to you know Find some way of tagging data Another thing I don't have time to go into today But you can also use some sort of unsupervised learning or semi-supervised learning way to actually Take sort of raw data from the wild and you know label that or turn that into something Obviously the more data the better But the data must simulate you know in this case for example the data had to simulate what someone might draw and Another just really sort of simple tip is that this model crunches that image right down But when you're gathering data try and get it at the highest resolution possible because you never know When you're gonna want something that's higher res and in this case I actually got two of my staff to just sit there for two days and draw pictures right to get the data and If I had then gone back to them and said oh, sorry the image is not you know high res enough They would be pretty pissed off at me having to do that again. I But you know yeah, so like you do in this case we needed a lot of hand-drawn stuff So one of the things I did try with this which didn't work was trying out different sort of CSS You know styles or different ways of actually making it So this is sort of one of the things that I found on code pen Was a way to basically try and make buttons look like they were hand-drawn in the end though This didn't seem to work at the you know for what we were trying to do I if I was doing this probably further on I would maybe look at maybe even building our own sort of CSS stuff to make some way of being able to just Suddenly churn out, you know a hundred thousand web pages, but What we did was drew draw them. So these are some examples. I from from a test set The train sets basically the same sort of you know very sort of similar sort of stuff and you can see that the idea is just to get An understanding of okay, where the key parts of the layout of the website go So prototyping I The next thing once you've got your data once you're starting to build the model You know you've got to start then thinking about prototyping stuff So I you want to map out the sort of product workflow you you want to sort of keep about you know, I You know creating each part of it and stuff like that The big thing you want to do though is you just want to get something working ideally for me It's in a Jupiter notebook so that then I have something that I can basically you know take to people and and show Basically test to see does this work or does this not work the other key thing though You want to think about is and having the back of your mind is okay? What's going to be my end UI for this product? What am I actually going to how people actually going to interact with this product? so Let's have a look How's that for size? Okay, so this is the sort of prototype that I put together for in a Jupiter notebook So I'll walk you through it We basically I so I built the model right I've got the model I'm just loading the model I did the whole thing in Keras just to show that you can build something like this in quite a simple thing as Keras and Certainly, I would encourage you if you're using tensorflow and you're new to tensorflow use Keras Right, it's very simple to start out with as you get better at it You think could graduate to other things. So we've got our model All right, I'm gonna load up an image here and Here is the here's the image now You can see that what I'm actually doing is crunching the image down to actually just 256 by 256 So when I tried building different versions of the model at the start I use very similar to what James was talking about I was using transfer learning and I built basically You know, I just started out using different models Using different models to see like okay What would work in the end it would turn out that the image net models are probably a bit too complicated for the feature extractor that I needed So I went with a sort of simpler Network, but I stuck with the you know the whole sort of size just like the paper where they stuck with the 256 by 256 and Three channels deep in this case So then we've got the the tokens So these are the tokens I've got and you can see basically now I'm just doing a sort of lookup table to map the token to a number which I'm gonna feed in I Then need some some way of actually sort of generating those you know those Those predictions so basically what we're doing for those of you are not used to this sort of model system I'm basically making a prediction of one number that I'm passing that number back into the network to predict the next number So I'm predicting one after the other so I will get a sequence And you will see that actually here. So if I if I run this We can see that it what it's doing is it's predicting each one out now It's basically predicting that as a number. I'm using the lookup table to change it back to the GUI in this case. I I Then just right so then that's really the key You know the key sort of deep learning part has already happened there We basically extracted the features from what we passed in I'm saving it and then I've basically got the compiler set up to actually generate it into HTML and This is the HTML that it it generates. Let me look This is what this is our prediction Let's see how did it come compared to what we put in Not bad Okay, so if you were like me you'd be very suspicious that oh Sam picked one that's just gonna work So what I did then was I basically thought well, okay, really we should I Really we want this in some sort of web form or we want some way that people can draw it so I'm gonna do one on the fly and I'm gonna be drawing on my trackpad. So it's not gonna be very good but Let's say I Draw a few buttons And this doesn't always work because it's still very much in prototype phase Three or four buttons Let's go for four All right, and then I want to have sort of one long box Not exactly neat We have a heading we want a bunch of text there Oh We have a button I tell you what actually that's too similar to the one that we already did so let's add some more stuff we can add Now you'll notice that one of the things I've got in here that I've actually added in I is the page style so We can actually then sort of say to this we can pass in an argument or a condition of What type of pay, you know, what type of page do we want this to be? So that really relates to sort of like the CSS that we want to pass in so I'm gonna in for the first one I'll do bootstrap And then basically press my create HTML And you can say this is definitely not as neat as the one that I you know The would the training examples or this even the test examples. Okay, let's press it and see All right So Did it get the buttons right? Yes when I did it last week it added an extra button for some reason I'll just quickly go along and show you though. We could actually add even more to this. I So the the model can actually take, you know, quite a bit and this is sort of just to talk about this while I'm doing drawing You want to sort of stress test your product, right in the real world people would you know I'm not going to draw nice and neat So there's no point in sort of trying to you know draw neat or anything like that you want to basically Do it as as as much as a real-world scenario will do and this is one of the challenges that I see with things like You know, I work a lot in things to do with dialogue and stuff and people when they build chatbots We'll think that humans are gonna respond one way and humans respond a totally different way All right So you always want to try and gather as much data as you can from the the the wild and actually one of the things I actually put into this is that every time I draw something here or every time it basically we create an HTML page We actually save the image it as well So that then if we if we were looking at it and saw that okay It got this one right. It got this one wrong We could actually go back and sort of use those as training data for things going forward and this is a really common sort of thing for I You know that Google does that a lot of other big companies do so let's try material design I don't know how well it's going to do with material design. All right. All right. Let's to create HTML Thinking thinking another challenge with this is is that okay? For example, this response time is just way too slow for the real World so if I was going to release this as a you know as a start-up or as a product or something like that I would definitely need to be looking at optimizing for speed All right Did it get it all right? Yes, okay Let me go back to the presentation So Where to from here? So okay? I've now got you know something that's a prototype if I wanted to take this into production There would be a whole bunch of different considerations to think about and this is where you're going to be working with product managers with You know other types of people who are maybe not not going to know a lot about the deep learning or your machine learning or any of that sort of tech But for example with this product would this product be better as a mobile product versus a web product in some ways I think I could probably sell this as maybe an iPad app You know or something like that. I or what this product might be good is making an iPad App for some sort of web hosting services to give away for free But it only generates, you know files that work on their web hosting that kind of thing You want to think about I you know microservices versus monoliths I'm not going to talk about that much today But generally you want to sort of with a lot of these sorts of things you would want to have You know the TensorFlow running on maybe a microservice or intensively serving like James talked about You would want to think about okay the you're going to use a GPU or not Obviously when I was the reason why I was taking so long is I was doing the inference on my CPU here So therefore it was taking a lot longer to to actually work if I was doing that with a GPU in the cloud I could get the speed much quicker, but then also I need to think about you know the cost per API call All right, and especially if I'm doing something at scale those sorts of things become really important Your model size you always want to be optimizing for smaller models faster models One of the tricks for doing that we don't have a lot of time to talk about today Is you often train a big model and then you train another smaller model to learn from the big model? All right, and that's that's a way of being able to get around a lot of those things So if I was doing a mobile version of this, I definitely want to be using something like TensorFlow Lite or core ML If I wanted to make it for you know I could also then add in the bits for iOS or for Android meaning that I could actually make Layouts for Android apps or for iOS apps one of the things I wouldn't personally do this So I actually I had a lot of experience in mobile certainly on iOS. No one really uses the stock You know Apple widgets anymore people tend to make their own So I kind of feel like that wouldn't be a big advantage in doing that sort of thing If I was making it for the cloud, where would you know? Where would I deploy those HTML pages to at the moment? It's still just all running local business considerations Just now this is one. I think it's really important just because you can make something that's technically great doesn't mean people will use it and I actually kind of feel that this this particular product falls into that category I That I kind of feel like it's a really cool gimmick And it you know it looks really cool, but how many people would actually use it to design their website probably not many Right, so you always want to be thinking about well, you know, will people pay for this? I see this all the time when I'm you know mentoring startups and stuff that I see often They will make something that's maybe technically very great But really in the real world is no one's gonna pay for that product again, you know Is it a painkiller is a vitamin? It's kind of you know Does the ML or AI shine more than the actual product? This case I think it does And I think it's where your perfect example is like when I show it to you guys you understand that all there's a lot of You know hardcore deep learning going on there I showed to the average person on the street and it gets one button wrong and they're like I suck So you want to sort of think about you know those sorts of things From a more research perspective The other thing that I think would be really cool for this is you could actually develop a GAN to do this This is maybe something I will play around with in the future You could actually build a GAN for both making training images and also for making the different types of GUIs I You could also build a really cool system to be able to basically just go out and crawl current websites and Use that to make training data for you So that people could actually you know do things Another thing is that I would definitely want to add more tags to this so at the moment I don't have any sort of image tag So really what I should do is have a box with maybe a cross through it and that would be showing that okay We want an image there or something like that anyway, some tricks of the trade I Follow Google's example play to your strength. I Google is definitely the king of search and they tend to turn things into a search problems as much as possible I while I've signed a number of NDAs with Google I can't tell you you know sort of anything secret but I would say that you know one of the things that we've definitely learned over time is that Google will often turn things into a rank and retrieve problem or some sort of search problem Rather than just a straight-out prediction problem So that's something that you should you know definitely think about and obviously Google Home really is just a big massive search engine in many ways I Play to your data make things that you either have the data for can get the data for can steal the data or can create it I This is by far. You know when I when you see AI startups and stuff like that the It really comes down to okay. Do they have the data? Do they have a unique set of data that will allow them to do this sort of stuff? Medical data can be really hard to get yet So many people I see trying to do the medical thing and the radiography thing Really when a lot of the models now are just becoming like off the shelf You can take the best model from Kaggle. It's really comes down to who's got the data to you know be able to implement that I Getting millions of examples of something that you need from the public is really really hard So that's something that you also want to think about and that's it I'm happy to take some questions. There's some links to the original paper and original implementation of the paper as well Okay, any questions One right at the back. Like is it turned on I can't hear anything There was no reinforcement learning in this use no that was in the nas net one Right Alternative There's no real need to yeah for me I just certainly didn't see you know basically you're talking about tokenizing the actual vocab, right? I so in this case I think I use the kerosene built function often and I will write my own function depending on how I want to do it Often with the prototyping things is you you start writing something while you're thinking about something else too, right? I don't tend to use scikit-learn that much anymore too. That's why Right Right Okay, yes, so in this case I basically went for about I think it's 18 tokens That would define an HTML page, which is what the paper actually did There's no reason that why you couldn't build that up to say, you know double the size or even maybe triple the size quite easily Any other questions Okay, just simply because when I was using the the image net models They are all three channels. So I already have my pre-proc pre-proc written for that So I just went with that for some simplicity and for speed Definitely, you know, that's the kind of thing that if I was then going to go on and change this for production I would go back and redo some of those things like that and refactor those. Yes. Good point All right, any other questions? No Okay, next up We have one of our former students and