 I'm going to talk for about 40 minutes, and on the subject of learning, I hope during that 40 minutes you will learn to learn to learn. Hopefully you'll also see what it is. This is kind of a faster version of something I did at FosAsia, but this is definitely a fuller room, so that's good. In case you haven't been to the TensorFlow Meetup, this is a very quick rundown, I have a background in machine intelligence startups finance. I moved from New York City to Singapore in 2013. In 2014 I just spent doing open source staff, reading papers, playing with robots and drones. But then since 2015 I've been doing this kind of serious AI thing with a local company, natural language processing, deep learning. With Sam we've been running a developer course whilst doing Red Dragon AI. We've also had the opportunity to publish papers, we've had something at NIPS, so that's really moving somewhere, so that's good. Red Dragon, this is a site which Sam has really had. We've been doing deep learning, consulting, prototyping. We're also doing education and training. I've got an ad slide at the end. We're interested in conversational computing, by which we mean actual natural voices output and also listening, and also how to interact these things with a knowledge base. So if you've ever had a conversation with a Google Home, you'll know it's kind of a limit. Even though Google Home is kind of king of the assistance, it's kind of limited, you can't talk for more than a sentence or two. We think there's a lot more to do in that. So learning to learn to know. I'm going to talk about the very basic ideas of learning, how to learn from a lot of data, how to learn from some data, and how you would learn from just a little bit of data. Now how many people have seen this TensorFlow playground? It's quite a lot, because it's kind of a go-to thing. So if you haven't seen it, I'm going to just do this very quickly, even better if it really works. So basically this is a, let me just set it up. So this is a very simple playground which Google put together to show people about how the basics of neural networks work. And I'm just going to show you the very simple steps, like first steps, so you can understand what I'm talking about with machine learning. So over on the left-hand side, this is my data set. Basically the data here is a bunch of orange points and a bunch of blue points. What I want to do is train a model to say, is this area meant to be blue or is this area meant to be orange? And you see at the moment by using these two features which are lefty and righty and upy and downy, it's actually made a model which is almost entirely wrong, right? It's saying that this area should be orange and this area should be blue. This is incorrect. So what these models do is, because it's making an incorrect assumption, I'm now going to try and penalise it for having the wrong answer. And how is it coming to this answer is by adding up these inputs. You can probably, yes, you can see the line here. It's adding up a contribution from this one and a contribution from this one, but it's getting the wrong answer. So the way in which you would fix it up is decide which one of these is to blame for me getting the wrong answer. And I can work out how much it is to blame just by looking mathematically, albeit the derivative of my badness. But basically I'm assessing my badness, figuring out who to blame, and then adjusting the weighting to that particular thing. So if I do that repeatedly, so I'm going to look at how bad my model is, I'm going to assign blame, adjust the weight, and then do the whole cycle again, I will learn very quickly how to make a nice model. So this is essentially the badness of my model has gone from some pretty bad model to like zero badness very quickly, because here it's now correctly assigning, this is the blue area, this is the orange area. So every time I kind of restart this, it's coming up with a random initial model. So here's a randomly pretty bad. I can now do a retraining there. See, the model has learnt the difference between the blue points and the orange points pretty easily. Now suppose I go for a slightly different thing, and let me just get something flat here again. So here you can see that we've got a checkerboard kind of pattern. You've got some orange points and blue points in a checkerboard. Now how many people, well, rather than put you on the spot, I'm going to claim that you can't do this with one line, and it should be fairly obvious that this, if I train this model with mixing these two things together, with one pair of weights essentially, I can't ever get anything which will match this up. But if I start to add like more, like combining them in different ways, let's try this one. What it'll do is it will now, it's combining, it's trying to combine a little corner up here and a little corner down there into something which will at least classify two of these properly. Let's try this again. Okay, here's another way of doing it. So this is trying to put kind of like the best fit two lines to fit as much of the data as it can, but it still hasn't got the right idea. But if I actually start to add, if I add more units here to produce more intermediate results, it might start doing a better job. But this isn't a great job in that it's really now used these three lines, you can see it lying here, lying here, lying here, to kind of approximate what's going on, but it doesn't really understand that this is a cross. And the way, the way in which I might encourage it is by adding another hidden layer. And say, well, what if I did this using the three things? Oh, now this has got a better understanding of the data. Because it started to essentially pick out that I want features which can carve off this piece and this piece to make a nice answer. So in some sense, this is a model which is much closer to how a humid would look at this problem. But the interesting thing that this has done is in order to do this, I've added these intermediate steps. And this blame gain, saying, well, if I make a mistake, do I blame this one? Or if this one's making a mistake, do I go back? This is called back propagation. So this is kind of how errors can flow through from a model which would initially be bad. OK. And then gradually would come into focus. Hopefully, it will figure out. It may have actually got stuck here. Let's do it again. That was a stuckage. Ooh. OK. So you can see that, but you can imagine that if I were to make this more complicated, I could get it to fit better. On the other hand, if I go for something like even more complicated, like this is the donut shape, maybe this model is sufficient to figure it out. Maybe not. Ooh. OK. So this is, you can see that by having a deeper and or wider model, I can now fit the data better. And this is a very nice example from Google. We can let's go for an even more complicated example. Let's go for spirals. OK. Now, if I just leave this training for a bit, it's going to make some attempt to make this into spirals. But the problem is it doesn't understand the concept of spiral. It has no way my five-year-old could solve this pretty quickly. Just tell her that there are two spirals. Now draw some color this in orange and blue. This is now going to try and fit as best it can. And gradually it will try and wrap around these things. It probably hasn't got enough potential. Let's add some more stuff. Maybe. OK. So now with this is a much bigger model, my guess is with this kind of, now it's getting bigger, there's some chance it will get there, but it's going to get there much slower. Because there's a lot of detail which it's going to sort through. And it might start getting stuck in that here it's worked out that this sector has a lot of blues in, I don't know, because they've got this unfortunate gap here, right? It doesn't understand that that's a gap in the spiral. It just thinks here's some data, there's some data missing, here's some other data. And gradually this is filling out a picture of how to build this thing, but it doesn't understand that this is spirals. It has no concept of that. This is kind of entertaining. But this is a typical neural network training. So here we've only got, I've got six and five and five. So you've got 16 neurons. You'll have some hundred weights or something. And you can see it's kind of, it's now solving it pretty well. I guess this orange area it's not doing so well across here. It'll now, and this is the training curve, now it's kind of got stuck. There we go. So this is a thing which has now kind of learnt. And so my statement here is, okay, the goal of this was to learn how to predict the regions. You've got the input features. We know what a single neuron can learn, which is just like a straight line. We've described this kind of blame game. And then how these deep neural networks can create features that they need. So this is where you have seen, or you've learnt, how to learn. Okay, so this is the very basics of how all these models work. So let's go for something a bit more like a bigger idea. There's a bigger image competition called ImageNet, which has been going since the early 2000s, I think. They have 15 million images in 22,000 categories. And this has been an ongoing competition for machine vision people. And what used to happen in the 2000s is that people would write papers on how I wrote an eye detector or how I wrote a fur detector or how I can detect metal in images. And these are things that people would write papers about in how to actually identify these things because it was a very low level feature creation exercise. That was kind of the academic game. How do I create features or recognise these features? The point of these neural networks is that they can actually develop the features that they want just by giving images and classes and saying just learn this thing. So this is a Google network from 2014. So suddenly we've now got quite a lot of layers. This is pretty heavily hand engineered by Googlers and achieves excellent results at the time, which essentially completely wiping away. By this stage the deep learning things are just taken over because instead of writing a paper of all the different feature detectors you've figured out you just say we have a network which is shaped like this and we trained it on data. That then moves on to like 2015. Now we've got a more complicated network from Google and there's essentially we can draw a curve of these things over time. So this is going from his early days to recent times before 2012 this is what people were doing with their kind of feature detector papers. 2012 Alex or some guy with a GPU in his bedroom came along and he figured out yes we can train this whole thing end to end and apparently at the computer vision conference most people didn't even know what these networks could do and he could suddenly beat them by huge huge margin. So as time goes on these networks are getting better and better and the number of layers is going up and up. This is a Microsoft contribution. Human performance is about 5% this is superhuman performance and this is now 3 years ago 2 years ago. So now basically this ImageNet competition has been abandoned because these things have got better than groups of people so now it's moved on to let's do it for video let's do it for other things or let's do it on super small images just because these machines have got like too good. So what we can also do is so what we've done here is we've got an ImageNet which is trained from zero so you train this from just like the spiral detector or the donut detector we've started this with just random weights we've used huge numbers of images and huge computational resources and this is going to do exactly what we told it to. If you put in an image it's going to tell you the class one of a thousand. So the next trick we could do is we could actually do a thing called transfer learning we could take an existing model pre-trained on ImageNet for instance and then use that model to learn new classes so this is stuff which aren't in the thousand classes let's use stuff which are novel to it. Hopefully we can do this by using fewer examples than millions of images. So I can do a little demo of this. We kind of at the oh it's not even here. So I've got a repo online which has got all of this stuff in so all these notebooks are right there. Transfer learning and Keras. I can rerun it but there's not. So basically I have limited time today so I'm going to take a pre-trained network and I'm going to load up Keras which is our kind of go to high level thing and then from Keras well actually Keras has a model zoo and the model zoo includes all of these different models or many of these different models and this is one of the very early models this VGG model and this has hundreds of millions of parameters and it scores in the 70s in terms of performance. As time went on this inception V3 which is in 2015 one has got many fewer parameters and much better performance so gradually as time progresses it's kind of like an efficient frontier in financial terms of models which are either better in terms of performance for the same number of parameters or lower parameters of the same performance and currently the efficient barrier here is things called NASNET so I'm just going to load up one of those which has includes all this structure I can do this in one line here this NASNET mobile I just load that thing in there's the model I have a couple of image to input so this is kind of pre-processing I can then just train it on I can then look at the results for a single image on my disk so this is its votes are for 67% TabiCat so this is something which you too could pull this model out of my repo have a look at images on your disk and it would do good things and you could even put them images directory and then get a whole bunch of different images so this is the I hate you Siamese cat this is the disappointed owl I guess it doesn't have snowy owl as one of its training examples or as one of its classes so it has no concept of what this owl is clearly it thinks it's a fox and other things which might be white or golf ball or stuff so this is clearly this is outside its scope the she knows it's closer but actually no sky it's not in the training set here's the TabiCat again so this is essentially using many many examples and it's doing exactly what we told it to but it's actually doing it in particular these ones are wrong because it can't be right so let's do something called transfer learning so this is where we can use a limited sample of our own data to get actual proper good answers out of this so what we'll do is we'll take this image network where I've got an input image and I put it through a CNN and I get these probability or logits but instead of using this best of which one is the most likely I'm going to replace it with an SVM so this is a standard psychic learn kind of SVM and so what I'm going to do is the reason for doing this is because when it makes an error it will make the same kind of error for the same kind of objects because this thing kind of understands images in general and the pattern of errors is actually indicative of the class so let me show what that looks like some more kind of helper functions I'm going to use a bunch of classic and modern cars so I've got a bunch of these cars on disk and they look like this so this will be a classic sports car is another classic sports car so I've got 10 classic sports cars on disk and I've got 10 modern sports cars so the modern sports cars are clearly different to a human on the other hand these are not part of the training set for ImageNet it doesn't know about sports cars in general so what I'm going to do is what this does is it goes through each of these images and runs it through ImageNet up to the pattern of errors it will make so it can't possibly get the right class but the pattern of bad guesses it makes will be different between the modern cars and the classic cars so it may be that it thinks that this looks like a lotus leaf or lotus seed pod or something whereas if you look at the classic cars these things look more like plates so these are classes within ImageNet which it will identify this as being this one is pretty platey and the other one is pretty lotusy the fact that there's some difference there I should be able to pick up with an SVM so having put in these essentially 10 of each I can then train a linear SVC whatever this trains within half a second it's only got 20 examples to train from each of which has got a vector of a thousand things now I can then go through and classify a test set so this is now reading through a directory of test images which it's never seen before and it's running this through the same ImageNet network and seeing what kind of errors it makes that is then put into a classify which is just an SVM and it says okay this is a modern sports car this is a classic sports car so this trained within half a second so we're leveraging Google's 62,000 GPU hours whatever and we're using it half a second on our own classes so it's making good guesses so far Prius well it's only got two options here it's not exactly a sports car and it's got the classic ones this one is misclassified for some reason so it's not perfect on the image it's also not perfect but this is kind of an example of how we can use very limited data and it could be classifying kind of e-commerce images I'm going to use an existing pre-trained network and then fake it into training on our stuff so so that was so it's just okay so let's just talk about this so next level learning so let me just recap so train from zero was the plain image net transfer learning we took an existing model and then we kind of leveraged that to train on existing classes so next level learning so the problem with the previous methods is we had to learn from a very large amount of data but humans can learn from very little data indeed what we'd like to do is have models which could also learn from very very little data ideally we'd want the models to learn how to learn so we don't just want them to be taught we want them to be willing students already so there are two main types of metal learning so this is called metal learning so there are two main types of metal learning one is to learn how to build the very best model which is called structure metal learning so this is we're trying to make a very good model which is willing to learn what we're going to teach it another way is to build a model which is ready to learn as quickly as possible so it's just it's kind of on a nice set that will learn every example you give it very very quickly so this is kind of two directions which this has taken so for the structure metal learning the issue is that it's very difficult for humans to build models so it took Google a long long time to build their Google a net or their inception it's kind of lots of graduate students it's called stochastic graduate descent or whatever but what we want to do is we want to give the computers the ability to build models and we do that by enabling them to search through all the different architectures efficiently and then also being able to predict which is the most which is the most we're increasing the volume for no reason so what we want to do is once we train them we want them to then predict what would be a good architecture and then kind of do it like a game so it has some bad networks getting better getting better and now it wants to be able to predict where would the next good network come from let's create that structure now the nice thing about this so there's a nice data set which consists of 1623 handwritten characters from 50 different alphabets so not only have you got the Latin alphabets but you've got Greek and all these other ones but the key thing they've got in common is that they're all drawn by hand by someone or by I think it's 20 different people so they've got quite a lot of commonality in that they've been drawn as strokes so if I can have a network which learns what strokes are common then that would then be primed to learn stuff quickly so one more time on the Omnigotlet so we've got 1623 characters each drawn by just 20 people contrast this with MNIST which was 10 characters drawn 5000 times each so what we have here I've got a little demo and you could play this at home kind of thing each is a one shot classification each task will train on one example for three of three different classes it runs in javascript using I think TensorFlow.js and it shows that what this meta task is and so it's hands on you could actually do it on your phones I guess so here we go it will look something it should look something like this so it may be that I could draw some so this is one example I can draw a B draw a C and so here I'm going to say well let's draw there so it thinks I'm starting a C which is probably right what happens if I do this now it's convinced it's going to be okay what happens if I do this we'll get English Confluent now I think it's on the B this model has so I've given all the training data this is the examples of the three classes this model is set up to learn these classes as quickly as possible and it can now learn essentially one shot learning so I suppose I do this oh A right so it yes it's all clear okay so we can play this game again and again this is a no that's not very good is it let's draw a cat there we go there's a cat and there's a mouse and there's a house okay there we go so if I draw an ear oh that's not right another ear oh I think he likes the mouse oh okay okay the cat's mouth is something which gives it away so there we go oh what if I draw an ear so if you go online there house okay so if you go online this at redcatlabs.com you can just play with this yourself no need to listen to the rest okay oh I've got a tense so I do have a tense flow t-shirt so this is the tense flow content as you've heard that this pays the bills so I need to go through this I'm not going to mention PyTorch on this slide so tense flow eager mode gives you tons of quick development benefits which researchers love tools which do that right but one of the benefits of tense flow is that it's kind of production ready because it has this whole story about being from your model that you build you can then move it to tense flow serving you can do all this distributed stuff you can also move it down to your mobile phone there's a whole bunch of interesting stuff that tense flow does and the TPU thing which is kind of like awesome hardware which maybe we might get to use one day so at the summit they all announced this thing called tense flow eager mode which essentially enables you to use tense flow in a very similar method that you might use other packages and there's a nice video online and I've got a little example here with this reptile sign so basically I went through this and instead of this other package instead of this one I just replaced it with the tense flow eager or the tense flow way of doing this and so for each line I've taken out this whatever this is I've replaced it with a Keras model and essentially this I'm replacing it line by line I'm doing exactly what you might do in your research quality or prototype quality code I'm doing it in proper sense of flow here so you can see that also Keras also will give me the nice model description Keras will also give me tense board interface all this kind of thing and a little bit of fiddling around because there's some mechanics which they do to make this work but the same thing can be done in tense flow quite easily and then you can get similar kind of stuff happening so basically this enables you to actually use tense flow not just as your production thing but also as a kind of research tool so we're kind of excited as to where this might go but it may be that sometimes we might use other tools who knows so what you have learned you've just seen how to make machines learn at all right so that was the tense flow playground you've seen how they've learned from a lot of data which was the image net so now some data which was the transfer learning and learn from a little data this is the learning to learn thing so you have learned to learn to learn in doing this I've been learning how to teach you how to learn to learn and now you've understood how I've learned how to make you learn to learn that's metal learning so one of the nice things so this is wrap up the field is advancing extremely rapidly lots of stuff is coming out like day to day it's still kind of within the grasp of individuals just me and my laptop or me and my Nvidia card stuff which is like up to date cutting edge research so something from Uber came out like two days ago is lots of MNIST results you can still see very interesting things about MNIST of course people with huge hardware they can do huge of things but still there's room for the gentleman or lady scientist to do this at home and the other thing is the open source the open source kind of applies to research too all of these machine learning things are on a site called archive X you can read new papers every day that people are publishing code within three days it's not like you have to wait for it to appear in your university library it's all right there there's so much of it that there's a tool called archive X sanity to just make it so that you only have a sane amount or less insane amount of stuff to deal with so there's a ton of interesting things from people I would encourage people to write blog posts give talks, do this stuff cos it's quite easy to move into the actual doing something category and it's an excellent thing to do so thank you very much oh before so I should mention we've got a deep learning meetup group the next meet I'm not sure 17th is accurate now okay some time we're going to have a meetup in May we've been holding one regularly like at Google every month and so that's one of the good things a lot of people turn up to that it's now one of the largest TensorFlow groups in the world which is amazing for Singapore um and we have stuff for beginners stuff for the bleeding edge and kind of lightning talks as well and whoever kind of wants to talk they can talk it's a jump start thing which is now we've just done our first batch of these we'll probably do more batches of these it's to make people actually do a project rather than just listen to stuff and then by tomorrow well it's kind of start slipping away if you actually do it there's a big benefit we believe so this basically we get people to do a project over a week and it forces them to bang their head against a wall for instance which is a good learning experience and then they'll have built something for themselves we also last year this is so this is a 2017 thing so this is not now we did an eight week developer course so this is kind of hardcore probably too much information for eight weeks right we will probably do something similar upcoming so we're trying to figure out the format also with funding sources for Singaporeans because we like that but that's also going to be kind of intensive it may be that the first module of that is the jump start course itself so you can start this up and then there's a lot further to go after that okay questions any question about anything within reason to translate like a sequence to sequence model between the two things ooh that would be good except we'd need to make sure that the google code worked right so you could also automatically generate bug reports that might be a good thing but there's a lot of interesting kind of learn to learn things and also the open source like the linux kernel and stuff provides a huge resource of known good code or known interesting code so you can play all these kinds of games very there's lots of interesting things to do who's next and then Alan should start up anyone else oh I know you were asking for the next question yes yes yes let's switch over to the laptops first and this is on my hand and thank you guys