 Mae'r siade fawr yn ysgolwyddiad. Mi'n gweithio i ddim yn angen i ddim yn ysgolwyddiad, oherwydd yna sy'n gofio'n gwneud, sy'n gym unboxingu Lacteaethus yn dda. Mae'n cefiad mwyaf o memeol gyda gyda Llywodraeth gyffredin, byddwch yn fawr gweithio'r gweithio'r pethau. Dwi'n gweithio'r pethau yn gweithio'r ysgolwyddiad na fawr o gymwyslwnion��가u yn yn ôl i holl o'n hyffordi drwy yn gyfodol yn bobl yn bryd i'r ac mae'r fan wrth gwrs yn ei wneud, a'i gynhyrchu'r mhwysig iawn i'r gweithio ymgwrs. Mae'r bwysig iawn y dyfodol, oherwydd mae'r fan o'r mhwysig iawn i'r prydau mewn, mae'r cymhwysig iawn i'r gweithio. Mae'r gweithio yn ei ffynol. Ond ydych chi'n gweithio, yma'r ffynol ychydig wedi ddweud, ychynnu'r gweithio, mae'n gweithio'r folder yma. Ond ydych chi'n gweithio'r index o html,you'll get a screen like this. This has got the links to the different. There's a link to the presentation, and there are links to the javascript example and what to do with the urg revealing function. Let's not leave ahead. Let me start with the deep learning. OK. So I'm, I'm Martin, I moved here back in September 2013. I have a background. Fel enliw, rwyf wedi ei weld cyllidion am gweinwyr i'n cyflwyffydd yma sprwf cael gyda'r pethau. I'n ffordd, wedi gweld ar gyfer gweithio nasi, i gweithio gweithio dweithio'n gweithio. Roeddwn i'n ffordd y mae'r cyfrwyffydd yn pwyng o'n meddwl, sydd ymgyrchol yn gweithio'n gweithio. Roeddwn i'r cymoedd 2014 yna'r cynnig a llungeidwch yn gweithio'n gweithio'n gweithio. Ff interests yw'r cymoedd a'r dron sucho, ar gyfer gweithio, ond, sy'n 2015, mae'n meddwl yw'r ysgrifennu sydd wedi'i gweithio'r company yng nghymru, yn ddiddordeb llangor o'r prosesau a'r ddiddordeb lluniau. Felly yn ddiddordeb lluniau, nid yw'r ysgrifennu, dwi'n ddiddordeb lluniau. Felly, ddiddordeb lluniau, gallwn ddiddordeb lluniau, mae'n fath yw'r ddiddordeb lluniau ym yng Nghymru? Felly, mae'n ddiddordeb lluniau yw'r ddiddordeb lluniau, ddiddordeb lluniau, ddiddordeb lluniau, ddiddordeb lluniau, ddiddordeb lluniau, mae'n ddiddordeb lluniau yn ddiddordeb lluniau, a yna maen nhw'n ddarparu cyffredin hon. Rydych chi eisiau yn ymlaen gael gweithio a bapwyndau cannons o unrhyw own ymlaen, o'n ddiddordeb lluniau hynny. Peizio cegydd ymlaen, bod yr ysgrifennu ddiddordeb lluniau erioed gwell addyn, Yn ymgyrchwyno'r ddechreu yn y ffordd yn y ffordd yn yw'r wlad. Mae'r ddechreu yn y gweithio'r ddaf yn y Cloud, yn gweithio'r ddweud, yn 2014, mae'r ddechreu yn y ddechreu yn y ffordd yn y ddechreu. Mae'n ddweud. Mae'r ddweud yn y ddaf yn y ddweud, mae'n ddweud ffordd yn y ddweud. ymdill, mae hynny mae'n rhaid i'n gwleisio'r ddweud i'r peth yn ei gwybod yma. Mae hynny'n ffordd. Mae'n argynnu i gwneud i gael. Mae hwn o ddweud. Mae'r ddweud i gael o ddweud o ddweud o gweithio. Rwy'n gweithio'r gweithio ond y gweithio'r ddweud, yn olygu i chi'n gweithio'r ddweud o gweithio'r ddweud, mae'n gweithio'r gweithio'r ddweud o gweithio'r ddweud o ddweudio'r ddweud. Both they were checking that they were human with one of the numbers and the other number was finding out what they thought about the other numbers, so they can now do this better than humans can because they can know exactly the error rates of humans because they can get people to disagree with each other. All trying to be human, you find out how many people disagree, you get a percentage of error rates, this thing is better than humans now. ImageNet, we'll talk a bit more about ImageNet going forwards but this thing can recognise photos and objects in images and this ImageNet competition has given rise to a huge amount of investment and excellent stuff in deep networks. Captioning, so by reading the captions and labels that everyone has put on their flicker images or their Google drives, they can now label what's in images fairly accurately. If you give it, if you put an unknown image into one of your Google folders, it will come up with an image or label it hopefully correctly. If you've got this on your drive and you can follow this along in the presentation, you can have a look at some of these labels. Now on the left hand side, it's pretty good. So you've got a person riding a motorcycle on a dirt road, pretty accurate. Then it gets less accurate. Next column over is two dogs play in the grass. It's three dogs play in the grass again. Next one is a skateboarder. Does a trick. Well, this is a motorcyclist. So that's fairly wrong. But another very wrong thing is a dog jumping to catch a frisbee. That dog is not jumping. There's no frisbee. It's confused. So this is kind of at the tip of what people are doing with this stuff towards photos and images. Then people are starting to look at that. Google started to have some fun with reinforcement learning. The major nature paper had a thing where they learned to play Atari games. So here they would learn to play space invaders essentially by giving the computer pictures of the screen of space invaders and four buttons to press, four or five buttons to press, and just tell it the score that it was receiving along the way. And this thing in about two hours can learn to play it kind of as well as a human. It gets pretty good and which is interesting. But then they moved on to go. So this was kind of the major news story like in AI terms this year I guess so far. Beating Lisa Dahl, which was very cool. They're going to do another one later this year versus this Chinese player who is potentially better. On the other hand, they didn't switch the self-learning off so it may be a challenge. So we'll talk about the reinforcement learning. In fact the reinforcement learning is something where we have an actual module, like an advanced module, which we'll talk about at length later on. So this whole talk is basically is meant to be an hour and a half. It will divide roughly into three sections. So first half hour is going to be this little piece of introduction and some JavaScript examples. The second half hour now is going to be the piano and the Python environment for doing this deep learning. And the last half hour is going to be reinforcement learning, which I've got a nice little example. So one thing which I just should preface the whole thing is the AI effect. This whole AI field is exploding, but as soon as it's done people will say, well that was pretty easy. I mean, go, go is only a game, right? It's hardly interesting once the computer can do it. I mean, before the computer could do it, they were saying, well, you know, it goes like the pinnacle game in the humans. The computer will never do it. It's too intuitive. Now it's kind of a mechanical process. Hopefully there won't be too many unemployed go masters, but the AI effect is a real thing. And it kind of artificial intelligence along the way has given us a lot of cool stuff to do. But as soon as it becomes doable, it's not AI anymore really. So that's kind of a bugbear for the field. So let us start with the very, very basics. And the interesting thing for neural networks is this has been going on since the 80s or 90s or even before. The idea was let's try and do computation like the brain does. And a lot of people are very uncomfortable. A lot of reporters like to report about this is a brain like computer. And a lot of the scientists are saying that it's not really much like the brain at all. Because basically we use matrix multiplies and a lot of them. This is not how the brain works at all. The model, the idea of the connectivity may be the same, but the actual mathematics is completely alien. So here is a picture of what we call a single neuron. And I'm sorry this is going to, many people have seen this before, but just so everyone's on the same page. Basically on this diagram we've got the input coming in at the bottom, which is the x1, x2, x3. There are some weights and then there's a, the neuron is the thing in the middle which sums up all the weights. And then the output of the neuron sums up all the inputs times the weights. The output of the neuron is just some kind of nonlinearity. Now what people have discovered, people used to be really worried about whether this is a tan function or a logistic function. What kind of nonlinearity it was. People have discovered that just taking the positive only part of the answer is good enough. It doesn't really matter that you can't differentiate it or anything. This simplifies the whole requirement for mathematics enormously. You're just doing a matrix multiply then taking everything above zero. And if you want to change the function you're doing of x, you just need to change the w's. So what you can then do is you can say well let's have, still have some x's at the bottoms and have some outputs that we want at the top with some intermediate units. Now these intermediate units, we don't actually have any data about what they should be. We know what the output should be if we're doing supervised learning. We know what the inputs are but we don't actually know what the representation in the middle is. So this is another thing which caught everyone up in the 80s and 90s is that how do you train these hidden units? And it turns out it's quite doable. You just have to use brute force basically. So supervised learning, which I mentioned before, is you pick a training case. Say this is x to some target. Then you say well what does x actually give me? And then you make it so that the weight, you change the weights so that what it gives you is slightly nearer what you wanted. So suppose I have a picture of a cat. So I start at the top. I pick a picture of a cat. I know it's a cat, it's a picture of a cat. But if I look at the output of my whole neural network it says dog. Now this is wrong. So what I do is I change the weights in the neural network so that it's slightly more cat-ish than before. And then I go on to other examples. Millions of examples, hundreds of millions of examples. And once I've done that a lot of times eventually this thing learns that cat is a cat. The surprising thing is this really works. And it never used to work. But now we've got hundreds of millions of images which you can download. The internet is spreading data like crazy. People have got very fast machines and GPUs. You can do an awful lot of processing. And that is kind of the secret source. So here's just the main idea of this gradient descent. When I said let's move the weights so it gets slightly closer to the right answer. Basically you can take a gradient of your error function. So I want to move in the direction which makes my error less. And so quite how you work out the gradient of this horrendously complicated book. You can work out the gradient of a signal neuron pretty easily. It's a linear function. When you've got multiple layers you might have got several routes through each weight. So it becomes a bit more challenging. And you'll see that people start to do more and more complicated things. So finding the gradients is not so easy. But gradient descent is what people do. So now we're getting to the time where we can actually do something. So what I want to do is to train a little neural network. And we're going to play with how wide the layers are and how deep the network is. And we're going to use stochastic gradient descent. So if you now go to your presentation folder and you have a look at this. It's the second page of your little presentation thing here. You'll see the JavaScript example painting. Is anyone utterly confused by this? If you're utterly confused ask your neighbor. There is a thing called painting here. If you click on that you'll get to this example. If you're not connected or if you don't have a laptop. So if you don't have a laptop you're stuffed. If you do have a laptop but haven't got virtual box anything installed or you couldn't copy anything. You can go to the web and look at this continent.js.com. And what we're going for is there's a cat exam or a fuzzy picture of a cat. Because that's what we're going to be learning about. So if we go into this painting example. You'll see and hopefully people are seeing this. And maybe it's time for a demo. So excuse me. Let me go open this thing. So here is my version. This is demo. You can see that gradually this thing. Sorry what this network is doing. And you can see that the network is defined in this box at the top. There's an array. This is JavaScript. And we say there's an input layer. Then a fully connected layer. Then an output layer. And that's all there is. There's a very simple one layer network. And then we're just training this as best we can to produce the colours of the picture of the cat. Now the thing is there's only four neurons. And what you'll see is it's trying to fit four lines as best it can to fit the image of this cat. And it's not going to... Obviously you can't draw that picture of the cat with four lines. So let's try something a bit bigger. Let's go with 12 neurons and reload. Sorry, when I change the box on the left, it's updated the code in the top. So now you can see the number of neurons is 12 in the third line. So here we've got... This is 12 lines trying to approximate this cat. Now the problem here is it doesn't really have any concept of roundness or eyes. I mean there are higher level features of this cat that you can't approximate if you're just trying to draw lines on it. Even if we move up to 48 neurons. Try again. Sorry, at some point my laptop will die. Okay, so 48, we're getting quite a lot of lines we can draw. But basically you can see some of the catness coming out. But basically every time it draws a line it's then going to have to cancel out the bits which don't match and it's got to kind of fit this up. So here we've got 48 neurons. And it's going to have a tough time. For obvious reasons. But if we say let's instead on the... This is the fourth one down. Two layers of 24 neurons. So now we've got different code. And we restart. Now it's got the chance to form some kind of intermediate useful feature. Like am I in the lower half of the picture in which case do this. And you'll see it gradually fixing up something a bit more like a cat. So it's got like a two step decision it can make. And it's getting somewhere. So it's also training some internal representation. And we don't really know what the internal representation means. We haven't looked at that yet. But it's doing a better job to me. It's plainly doing a better job than the single layer network. So the reason it's lines is a single neuron is input number times weights with a little nonnilarity on top. But basically to make the nonnilarity positive is just a linear function. There's a linear delineation between positive side and the zero side. So it's going to be constructed from lines. So all of this stuff is not much going on in here. So it is strictly lines. This it's lines build of lines. So one side's got some slope. You're then saying let's do slopes of slopes. This is two layers of 24. Let's go to four layers of 12. So we're still on the same number of neurons. We're still on the same number of weights. But let's see what this is. So this starts off really not knowing anything. The internals of it know nothing about images or whatever it's doing. It's just constructing a representation internally which hopefully we can start to see it figure out what's going on. So here we're starting to get things which are more curves. If we can let this run and run and run, you'll see this turn into a fairly decent cat. And certainly we're only on 48 neurons. Now a decent size model will be 48 million neurons. So we're not going to run that in our browser. So this is a simple example. You can play this in any browser. We don't need the virtual machine or anything like this. You can play with this, show people. It's kind of interesting. You can play with. It shows the effect of having the same number of neurons but having layers. Now what are layers? Depth. And what is deep learning? It's depth. And it works surprisingly. Let me just... I'm going to kill this actually. Can I go back? Excuse me. Let's kill this. Oh no, actually don't kill that. It's this one. So this is just... So if you want to show people the presentation, it's got some images of what we're talking about. This last one I let run with quite a large number of... This is deeper and wider. This thing is beginning to understand something about... beginning to understand something about shapes. So what's going on inside? You've got this thing which apparently learns some kind of function. Let's see whether we can see what's going on inside. So first we're interested in what features we're using. So for that image we're just using coordinates and colour. But for sound you'd want to use the waveform or the Fourier transform or some other features or for words you may want to featureise these in some way. Then inside you want to figure out what's each neuron itself learning. It's great that the whole thing learns something but what's actually going on and how is this converging? So that's what the next thing is. So there's another example on there which is another JavaScript thing called TensorFlow. I believe there was a talk about TensorFlow this morning. This is Google's simplified TensorFlow in the browser. It's a nice playground example. If you open this up you'll get something which looks like this. And if I can do that... They're promising they're not going to break your process. I'm not sure it's true. Is this working? Yes, it doesn't. Sorry, some of the... I'm not connected to the internet. Maybe it's pulling some fonts in from somewhere which I'll deal with. Basically you can see that this is a neural network. So this is on the left-hand side there's the inputs which it's got some features. So some of the features are left and right or up and down. Then there's another layer which has learnt slightly different features from there because it discovered, oh, I want something to the right, something a bit further to right, bit sloppy, bit that way, bit that way. But then combining those all together I can make a round shape. If we want to add another layer, we can do that. Replay this thing. This is kind of an interesting... You can play with this. I find it kind of entertaining to pick bad features because that's kind of... You wanted to re-engineer out of what your mistakes have been. Because typically you think of the real world as being against you. So can it survive being given easy data, difficult data? So the fact that it's got just these two features is... Let's give it this... See, that's a bit easy. How about that? What happens here? So here it's learnt to distinguish the... what I would call blue in the middle from the surroundings. All it can do is this function with the extra bits. So it's made some mistakes at the top and the bottom, classifying these incorrectly. It's apparently the best it can do. We can try again. Gradually it'll figure out how to do something. So this is how neural networks tend to learn. It's nice that you can do this in the browser and you can easily fiddle around with this. In fact, it's interesting. It's hardly using this neuron at all. This is the one it's interested in having a look at. This is doing most of the work. But you can see the internal states, what the patterns, each one of these is learning. And just by doing this gradient descent trick, you can learn to do things with circles, or this is too easy. Okay, this is too easy. Spirals. So back in the 90s, everyone was playing with really toy examples like this. Now, there's not enough information. Let's give it a bit of help. Toy examples like this. Grad, you're going to learn something. People in the 90s were doing these toy examples and running into enormous trouble because it was so difficult to learn these internal features. And people basically abandoned neural networks for quite a long time because they couldn't make it work. It proved very difficult. But then people discovered that you can do, if you have more data, your world isn't quite as adverse as these toy examples. And suddenly there was a resurgence back in, say, 2006. Then on to 2010 when people had GPUs. There's a whole bunch of interesting stuff. But basically in the mid 90s, most people gave up. It's only now that the stuff works properly. So let's go back. Okay, now we're on to the big deal at one half an hour. So hopefully you have virtual box installed. If you've got a Mac, you already do. If you've copied the one gigabyte file, if anyone hasn't got the one gigabyte file, now is the time, you really need it. So there are keys out there. There are 13 keys in the audience. I shouldn't be holding on to anything. Oh no, no there aren't. Now in New York, all of the keys would have disappeared completely. In Singapore, we're doing a lot, lot better so. I should have expected that. If anyone needs a key, then they should raise their hand or whatever. So there is a, in the thing you've got, if you should want to open, let me do this. I can open. Sorry, I'm running Linux, but it doesn't really make much difference because here is open box. Sorry, here's virtual box. You want? You have all people keys. So basically you want to do a file, import appliance, import appliance, and that will then, it will want to receive the OVA. And once you do that, you just say okay, okay. And you'll get something which is here, powered off. So what we do here is we just turn it on. And basically you'll see that this is booting. Now you discover you need it. This is booting a Fedora machine inside your machine. It's completely isolated. And it'll just come back to, basically you've got a login prompt. And that's all we need. Because what we do from here is we open Jupyter. So this session of Jupyter, which is formerly known as IPython, is running from within your virtual machine. And it's got a whole bunch of, what I would say, a whole bunch of cool stuff ready to roll. So start virtual machine. Okay, open Jupyter. Now if you really wanted to SSH into your machine, because working on the console is terrible, because it doesn't have the right keys or anything. So there's an SSH session on 8.282, user at localhost, password is password. It doesn't have to be secure because you're not exposing it anyway. And let's go. So what people do with, in order to move on to bigger and bigger networks, is you want to have something whereby you can express what you're doing better than just matrix multiplies or four next loops, right? You want to do this at a higher level. And then having explained the neural network at a higher level, you can let the machine map it onto the cores. So you've got a lot of mathematics to do. The machine will be doing a lot of mathematics. So if you've got a four core machine, all those cores, if you've got a GPU, you've got 2,000 cores. You want to be able to map your problem onto this thing. And so people use frameworks. And I would say four big ones now. It suffers from a JavaScript framework kind of problem. Everyone wants to do their own. Caffe was one of the fairly early ones. It's used a lot for vision. There's Torch, which has Facebook and Twitter and Facebook supporters. But Caffe is very C++-ish. Torch has kind of a lure interface. And in a way, they describe the problem precisely as they want it done. So which has advantages in terms of efficiency because if you know how your hardware works, you can lay it onto your hardware. However, if you want to do something really funky and you'll see some funky stuff, you're going to want something higher at the level than that. And that is what frameworks like Theano and then TensorFlow, which is very much modelled on Theano, one of the main developers of Theano's, worked on TensorFlow. Theano is a Python library which is to do this higher level descriptions of computation. And it was developed by people in Montreal and it's used very widely. TensorFlow's just came out, come out from Google. Theano is very much duct taped together and it works. But it could be better engineered in many ways. And I think everyone would agree that basically creating code by having print statements is probably not the right way to do it. But basically, Theano will spew out C++, it will spew out Numpy code, it will spew out Cuda, or, in fact, it will spew out OpenCL, it will do all of these things for you, basically construct it all from Python. So basically, Theano, we do optimising around computation. What you do is you explain your whole computation as an expression tree and then Theano will then go through the tree optimising pieces out. So instead of explaining, A equals B++ and then C equals A++ but you never use B again, Python will just read this through, it won't actually take out the intermediary step if it doesn't need, Theano will do that. Now, A, B and C could be enormous matrices. There could be all sorts of redundant computation. Having this kind of metaprogramming is what you should be doing for, particularly if it's Cuda code, for instance. No one actually wants to write not many people want to write actual Cuda code for the GPU but using Theano is a great way to do this and or TensorFlow. Since we're Python here, I'm going to talk about Theano only. One other reason I'm doing that is because TensorFlow was programmed at Google. The Google engineers all have great, huge machines particularly they tend to focus on good CPU and big RAM. Theano, people want to focus more on the reasonable machines that everyone has on their desk unlike the Google engineers who have fantastic machines on their desk. Putting TensorFlow in this, I did an interesting attempt to do TensorFlow into this virtual machine but by the time you load any decent size model you're blowing through 8 gigabytes of RAM no sweat. TensorFlow is all very well and maybe they get the memory question under control but it's not. Google doesn't really care because they have the machines which just lay things out in RAM and in a way it's what you want to do if you don't have a RAM constraint. For us, much better to be doing stuff built by people who are working with smaller machines. Basically, there's another little workbook to do and this is the first workbook called Theano Basics and just in case you haven't seen iPython before there is a play button at the top which is over here. There's this play button. If you don't want to read everything or if you want to play along or you could just watch me but one of the advantages of having this on your machine is you can come back to this and there are instructions and everything. Okay, there's no need to log into the machine because of system D it all boots up automatically. You don't need to log in. If you started the virtual appliance and got to the username prompt you can then immediately go to localhost colon 8080 and that should get you to Jupyter. Does it have a Red Cat Lambs logo at the top? If so, then it is. Okay. Assuming people can see what's going on basically you're going to step through these cells the first thing you'll do is load up Theano and how Theano works is instead of defining X as being an actual number you can explain to it that X is a variable that will be a number and then you can then manipulate so here we're saying Y is some function of X where X is yet unknown it's another variable but so if you actually say well what is Y well it's actually an add function because the actual Y is looking at the plus one in the up here it's looking at the plus one because it's the head of the tree because the way in which you would do this function of X is you take X you'd square it you'd then multiply by 3 you'd then add one to it and that comes your answer so if we now pretty print Y okay well this is not very pretty but this is what I just said this is the tree which it's manipulating this is what it looks like that's a graph so but we haven't yet evaluated Y for any given X this is just what Theano has created as a tree to represent what Y the computation behind Y now obviously this computation could be enormously difficult and that's what it's saving us from so if we evaluate Y at X is 2 and Y is 13 it's hopefully a thorough answer so you saw that there was a natural when I clicked go on that there was a pause basically when it was doing that pause having constructed this tree it was then writing out numpy functions connected appropriately to do this then went away executed numpy functions those numpy functions on X and now may have compiled that in C++ if we told it this was GPU and it had to be on the GPU it would have written it to the graphics card in CUDA, done it on the graphics card pulled it back and told you the result so this is there's some cool stuff this is compiling functions so not only can you just say well I want a simple relationship you can say I want more complicated functions of multiple iterations and this is HAC so this is telling you exactly what it's going to do it's if you look at it's done something interesting here because Theano knows that you're going to have libraries installed and it's detected that I've got a BLAST library installed it's mapped this onto a matrix multiplied with a vector add because that's what it should want to do compared to manually doing all of these things so because it's got control over the whole expression it can output efficient code so if we just keep going along with it we've got lots of different tensor types we've got we can do funky indexing so there's a little bit more of it there's a lot of functionality in Theano but for our needs all we need to know is that Theano lets us do all this computation beautifully so let me just kill this so that was that so back to so now we've got some kind of computer machinery involved let's go back to one of the older examples this is a test set from the 80s used by the US post I think and it was originally considered quite tough and this is a training set of 50,000 images of numbers written in boxes on letters they're 28 pixels square and the interesting thing is that now it's not really a useful benchmark used to be a benchmark that people would compete on now it's one which you test to see it's like the hello world of neural networks so what you can do with this is you just take a simple network so now we're talking about 184 inputs with some hidden units and then the output is which classes am I looking at a 0 am I looking at a 1 or a 4 and so there's a thing called a softmax basically each of these classes will vote for how much of this number am I and you pick the top one and there is an example of this in your virtual box we will have a look here so here we can just run through importing a little bit more now one thing I am importing here is I'm not sure what I mentioned it yes so on top of theano I use a library called lasagna which helps with the layers the intuition there there are other frameworks on top of theano I like the lasagna one because it doesn't try and remove if you want to interact with theano directly you can always do that there are some other libraries like Keras for instance which has a TensorFlow back end as well which is good except it means it's abstracted away everything so thoroughly that you don't know what you're working with anymore so it's very difficult to get at the mechanics of it lasagna is simple in as much as it leaves the theano exposed so what we're doing here if we go through this thing okay so we this thing already has the MNIST data installed and this is 50,000 50,000 examples each with the inputs is 28x28 we can have this is what they look like so this is these are examples of the images and we'll define some kind of so here is the one I've just executed this is probably worth having a little look at is this is defining L L in is a variable is an input layer L out is a dense layer acting on L in with 10 units and the non-linearity is just is a softmax so this is a very simple this is this is a very simple network and if we just step through this and then okay so now we have a network now we need to define what do we mean by a bad result or a good result so we need to set up a function which is how much do we hate this result now it may be for this one we want to score everything equally is it as bad to say a 1 is a 2 as a 1 is a 7 but it may be you really hate the number 4 and you never want to mistake the number 4 so you would actually have a higher weight as a penalty I mean this loss function could be anything you want people tend to just choose simple stuff but but this thing still needs defining you can't just assume that everyone's going to want to do it the same way okay now the neat thing in this next box is grad equals t.grad is part of the secret magic which is why you're using these higher order things in the first place because this tells me that grad is the gradient of the loss with respect to the parameters so this is working out doing all the derivatives calculations simultaneously and producing an expression tree which is another theano expression tree which you can evaluate in a one line so this grad thing is doing something really amazing to do with the chain rule and derivatives and stuff which makes our life so much more easy because then we can just say I want to do sgd using the gradient on the parameters with a certain learning rate so this is this tells it how to do updates and now I will define a training function which tells it here are some inputs and some targets and I use the updates it's thinking of it here so here I'm just producing some functions to help me along the way and now I'm going to define some batches of training and finally we can now train this thing so executing this box you'll start to see some training up here so you can see we've done there's 10 epochs of training we've got an accuracy of 91% which is you're basically missing one digit out of 10 which is not great but we've got a very simple network right here it's a fairly simple new network we can then say what do we what have we learnt so these are the things which it's learnt about the different digits you can kind of see that it's doing some kind of pattern match on the digits so each output is looking for something with a hole in the middle for instance for 0 or 3 is looking for some loops some of it's not so clear we know that 10% of this is garbage so there are some other exercises here I'm not going to drag you through the exercises and we can kill this so this was just showing a fairly simple network getting like 91% so let's go back to that okay now so the very simple intuition about having these layers of network layers of neurons all fully connected to each other gets old quickly because if you've got a large image and you want to connect every pixel to every other pixel you've got an awful lot of weights but the reality is if you've got an image of a cat that pixel and this pixel have some relationship but it isn't that different from the relationship between this stuff and basically you've got some kind of local property in images so because images are organised you've got kind of up, down, left, right this whole thing, if I had a picture of a cat and I shifted it by a pixel it would still be a picture of a cat so what people did is they started to apply convolutional filters which are very much like a paint shop filter so it could be a blur, it could be a sharpen it could be that kind of thing and the trick is that you're using the same filter over the entire image to give you another image now you would use several of these images and that's your next that's one set because the parameters are the same you've got very few of the parameters for defining it so here's it kind of more mathematically you've got the picture on the left hand side coming through a little filter to the answer so this gives some kind of intuition it's an intuitive explanation convolutional neural networks what you typically do is you have some convolutional layers at the front and then some fully connected layers at the end and then your answer and we can have a quick look at this with this example so basically we've got a lot of the same thing but here is here we have another network being defined which is an input layer as before and then we reshape it from liner into a square which is the actual picture we then do a convolutional layer on top of this we do three different convolutions and we then have a dense layer on top of that with the 10 outputs so same stuff this is going to say done yes this idea of doing convolutions on pictures paint shop has it so it can't be a bad idea and if you then talk to neuroscientists they may say well the brain does recognise edges so maybe edges is a good thing to recognise and maybe it's intuitive that these pixels are not related to each other unless they form part of a cat so if you also look at the brain you can see as you go from your optic nerve through the brain function you kind of get different features that are being recognised and this thing is going to train for a few echos so basically what happens is when you start training this on huge huge numbers of images it starts the first layer of convolutions will recognise edges say the second layer above those edges will recognise curves yes because you can we'll see yes so above the curves you then start to get segments and shapes above the segments and shapes you start to get pieces so suppose it's a database of faces you'll start to get noses or eyes above that you'll start to get sections of face and you'll be able to piece faces together this all comes out of just matrix multiplies it's a remarkable thing that it works but it works and that's not inspired people aren't doing this in multiple layers because of the brain purely because of the brain but it seems like a reasonable thing to do one thing you can think about the brain is that a single neuron takes tens of milliseconds to produce a result so if you can react to a picture of a cat in 100 milliseconds you've only got ten layers you can go through so there's a there's a kind of constraint on how deep you really want to go in this kind of this kind of very hierarchical way but the brain is doing it to some depth but can't be doing it very deep okay while I've been talking so that's a good question we've done five epochs of training so this is doing something more complicated it took a little while but now you can see that the accuracy of training we're at now at 98 or 97.9% so considering these pictures of numbers as pictures is a win compared to pictures has just been collections of dots so let's have a quick look at what we have so we've only got one layer here so we're just saying well what is the network actually producing so this diagram this output is we've got three output channels before the 10 so we've got one hidden layer which is a convolutional network so you can actually say well what picture are your filters producing because we've got three channels we can might as well say the rid RGB right you can see that the blue stuff when it's looking at a 4 you can see if you look at these it tends to be looking at the underneath of stuff right the green seems to be looking at like some so they're looking at different parts of this thing different aspects so three is kind of a whole bunch of colours jumbled together in a way that one isn't so that's quite an easy thing to distinguish so this featureisation which it's done just by looking at the 50,000 numbers is intuitively reasonable which is very encouraging because we haven't actually had to do any hard work in the sense of telling it how to recognise digits we've just told it these are pictures here are the answers tell me what you want to we don't even care how it works really it's interesting ok so ok so now we're moving up in scale and so now we're moving back so this is taking us to the end of the 90s now people are saying ok well what can we do now we've got Google scale data sets we've got large clusters of computers and suddenly people realise they shouldn't be working on toy problems in fact they've made the whole thing way too complicated if they just initialise these things properly it will work out much better so what started to happen is that there became I think there's an age old competition called ImageNet which is there's a particular competition to do with ImageNet they've got 15 million images in many many categories and the competition is recognise which out of a thousand different classes each picture is so you can see so here I think each row is a different class so I guess at the top you've got something like soups I can't see it so you've got soups and then you've got like hot dogs and then you've got like sandwiches hamburgers and then you've got something I don't know so basically these are very small images so it may not be clear what they are unless you know you can see them clearly and you're human in which case it's completely obvious what they are but if you're a computer this is just a collection you're given a collection of pixels in 15 million images and told what to say after each of them so this is kind of a tough it is surprising if it could work and what tended to happen is it didn't really work very well there's lots of people doing image recognition without neural networks because you can do fancy filters you can do object placement you can do bounding boxes you can do all sorts of games with standard open cv kind of stuff but then suddenly the neural network guys took over this because suddenly it started working and it took everyone by surprise and no one is doing this there may be people doing image processing outside of the neural network people but the network people have got such embarrassingly good results and even though we don't necessarily know how the models work precisely or hardly at all so this is the problem and of course the network's got a little bit more complicated so instead of the two layer network this is a network called Google and Net which was winner of the competition 2014 or the highly-placed in 2014 it's got quite a few layers there's convolution, there's pooling there's top maxes all along the way so this is assembling the answer in some way and you can train this it could be many days on a cluster of CPUs but then some bright spark in a bedroom found that he could do this on his GPU which was kind of an people didn't realise what powerful cards they had so suddenly you can train this in now like a few days or a few hours with a decent GPU and you can also do it sorry you can train this thing training these things you need images lots and lots of gradient descent because you're descending little by little but at the end of the day you have a network with weights so fortunately we have another thing so this is ImageNet with the Google and Net network so this is a 27MB parameter file and that's in the virtual machine and this builds a model now the Google model is kind of large as you saw it had many layers but it's still fairly simple it's still a convolution another convolution some pooling layer by layer so theano is working away on my little laptop loading this thing come on now your laptops probably aren't doing quite as much of that okay so there we have the so this is loaded this is loaded and built probably I'm not using I have a GPU in my laptop a weedy GPU in my laptop because of the way virtual machines work you probably can't access the GPU through the virtual machine because it's like really advanced hardware issues to get that going so we're just using the CPU and I might be able to get I'm not sure how many CPU cores you've allowed the virtual box to use so it might be two it might be four it might be one so I'm not sure how many I've got going here and that's loaded some parameters there's some kind of housekeeping in this notebook to show you the different things okay so here is an image so there are some little test images on the disk and we can then say well let's prepare that image which is pretty much the same okay and then we print the classes so this is just printing which of the output of this Google and that thing is 1000 little softmax things and it's saying what classes this is this is tabby, tabbycat and this is not a bad art it's quite a good art so it's been shown like thousands of examples of tabbycats and it chose this one it's never seen this one before this is what it thinks it is I pull these off random you'll see why I pull this off random pages so in your your virtual thing there's an images directory you can put any using the Jupiter thing you can put any images you want in there and just evaluate the cell it will tell you the classes it thinks for all the images in the thing so if you want to play around with this at home it's all right there it will just pick up the images and so here's what it says tabbycat which I approve golf ball less convinced about that but you can see why you can see why it's saying that and it may not actually know baby white owl it may not be one of the classes it knows about but it isn't making that greater choice really it's gone it doesn't know bandaid nipple so yes you can do many things Simon's cat is actually quite good so I just found some images this is what it thought it's kind of nice so this is Google Lynette which is a 2014 style network which is which was trained extensively on clusters of stuff I need to move along sorry just so if I'm just but I need to hurry it gets more complicated so this is Google inception which is a 2015 network this is little units which they're replicating and this is deeper and deeper and deeper and stuff which is why they call this inception in fact we've got the Google inception network on the thing and what I might do is I'll just run all the sorts if that worked sorry so you can see this is you saw on here that there's kind of units going across of units of layers this is the definition of one of those units this inception A there's another inception B which is another of those units and then you link them all together with this network thing of all these units of units of units of units of units of units and when you do this build network you think for quite a while because it's out of all of that stuff it's now constructing numpy or if you had a GPU it would be constructing CUDA code to do this and off it goes and I think probably in the interest of time we're going to come back to this and I have no internet connection so it will just see what it did so this thing will also go out to the internet and find some images as well oops sorry ok so this is on the thing there's a fully pre-trained network this is tens of megabytes the problem is if you use this in tens of flow you get a six gigabyte model theano does it so this kind of the fact that I've now moved along just shows you the need for speed and if you look at the stats basically it tells you you need GPU programmers because GPUs are fantastically better at this stuff than your CPU is on the other hand they're less flexible as hardware but and this is now at a date price the 700 there's much much faster things coming out from Nvidia all the time on the other hand Google for their Go stuff they've started to produce actual ASICs which do convolutions so they've got a like an ASIC on a card like this which will be much better power consumption per per computation so that they've actually lept off the GPU bandwagon for well-known operations let's just see what this has got anywhere ok so this is the image net on the same four images Simon's cat well we thought that's pretty good it knows this is a cat for sure so this is the top five results um it thinks this is a spaniel my guess is it doesn't know anything about owls so it can't tell you it's an owl but it's talking about other dogs it's talking about a poodle I think poodle is probably the best answer it has tabby cat then here it thinks it's a kind of dingo or a dog husky it's doing corgis not anyway so here is this is an improvement of a year in neural network terms rendered simply these networks are getting pretty good and they're getting better all the time I can't get stuff on the internet there's code in there to do that if you want to so having got this thousand classification but knowing that it takes a long time to train this network is there anything interesting to do is there anything you can do if you've got pictures of stuff which you don't know what can you do do you want to train some scratch do you have a million images of louses there are tricks you can do so this is using I have another example here which we're not going to go into in the interest of time and it's on it is in the image what you can do is you can use these networks to classify images which it doesn't know anything about and then it will produce a whole collection of guesses but you can then say for this image I actually wanted to have these whole other classes and you can do a single SVM classification on the output there so this is something which does work here and basically I'm calling it commerce the example here is I've got a whole bunch of old cars and modern cars sports cars the image doesn't train on that stuff but you can tell it and then do an SVM classification to distinguish between modern car and classic car and it's doing that because modern cars are much more angular so it will mistake them for cranes whereas the classic cars are much more round it will mistake these for wheels and tractors so the fact that it's making these mistakes is very characteristic of the type of car and you can then do an SVM on those outputs to learn this so this is a nice example it's available in the machine so there's another thing you can do you can take one of these things and abuse the network so not only can you get the features out of your images you can tell it what features you want and see what images it produces so by maximising the response so this is known as Deep Dream which came out maybe last year and if you look carefully at this image you'll see that it's been told that it wants animals wants to emphasise how much animal it sees now this is a picture, obviously a big picture of a river but in there it sees lots of animals it sees some kind of slug this unfortunately this is cut off it says, careful when you search for these things because some of these images you can't unsee it's like anyway so there is a thing in so as an example of this this is also in the virtual machine it's a thing to do with, it's called style transfer it enables you to put in a photo of your own and then also an artist's style that you would like it done in and it then can then match up that photos layers with the stylistic layers it derys from the photo so from the art essentially what happens is it's pretty pretty you'll get your photo as a starry night it's kind of effective I can't do it now, there's no time now we're processing another thing which we can't go into now but there's a one of the things about language processing is it's all kind of variable length sentence will be lots of tokens all of the stuff so far is we have an image, we have a set of stuff set size for variable length what do we do so what you do here well this is the trick which was invented in Europe called LSTMs but in the 90s it never used to work everywhere because people couldn't do it basically you have a one unit which has an internal memory state and you pass it over your input the fact that all the parameters that you have some desired output you have a known input all the parameters have some bearing on the relationship between its guess and what it should do you can still differentiate this whole thing I mean there's no way you would want to differentiate this thing by hand but because it's got a complete graph of everything it's doing the machine will do that because it's differentiable you can now minimise the error because you can minimise the error you can make it improve so this is an LSTM unit it's complicated internals for our purpose it doesn't matter the machine takes care of the mathematics of it there is an example on in your drive which is the natural language it takes weight particularly since it takes too long to train any decent language model I need to find a better example but basically if you get it to read some poetry and produce poetry it produces essentially line noise at the beginning then it starts to get the hang of maybe spacing and word something but it's still rubbish some rubbish now it's getting the hang of some forming some words this is 1000 epochs in this is it's like larger network looking at Shakespeare's place it actually understands that somehow that it's got to introduce the characters and it's got to have some nice spacing and it's still rubbish but then this has produced one character at a time so it doesn't understand anything about sentences or it's given no preconceptions other than Shakespeare's place so another thing you can do in the same vein is let's take an English sentence at the bottom and I want the French sentence at the top and this is kind of crazy because this is crazy it works so what you have at the bottom you have dual LSTMs going back and forth to produce some output and that output then suggests where the network at the top should be looking in the sentence for its next word so at the top if you look at the French if you look at the English you've got economic growth has slowed down in recent years in French the strength economic is something in last years so it's had to switch around the order of this thing and translate it into the best word in French and actually make all the grammar work but it's all differentiable because it's differentiable you can learn it so if you have parallel texts you just throw these parallel texts into this thing you apply a graduate student and you get a translation module or a team of graduate students so this thing it doesn't have to know about grammar it's all implicit in the text it reads so this is part of the theory that getting rid of linguists makes this thing better because actually understanding the sentence isn't important as having more sentences this whole thing can be used for image labelling so you've got an image you get some output you put it into a hidden state of this thing and just tell me and you say give me words you think of and then once you've said it may be an image of the dog that catches the frisbee the first word it might think of is for the picture may be the dog next word what would you say generating one word at a time at random until it comes to the word stop it generates just a spew of words but because this is all differentiable and you've fed it you've trained this first thing this image net thing as well this thing will produce captions and that's how this captioning stuff is done it's insane that it works but it does work now on to this is the final example and we're going to have to do this in 15 minutes I'm afraid because we all need to get to the picture so reinforcement learning has been one of the big advances and this is an actual module which is on your inside the virtual machine we're going to have to rattle through it really quite quickly reinforcement learning is interesting because instead of having a training set with answers the training is ongoing because every time you do something the state of the world changes and so you need to know what impact your actions have and try and discover what good actions are so if you look at google winds half ago basically they're trying to choose what is the next best move to make and then what would my opponent do and then how much do I like this because my eventual goal is winning the game so people thought that this would take decades to get to but it was done much more rapidly and so google bought deep mind because they really wanted to play go well or they wanted to do advertising well so you can think of advertising as a game where I want to know whether to give you an advert for razor blades for instance it may be that some adverts are better than others or maybe you've seen it too much or maybe I should challenge you with an advert for dog food the question is should I be exploring your tastes and learning about your tastes or should I be exploiting you to try and get you to click on the razor blade advert there's a whole game to play with people and you implicitly learn about people which is why they're interested in this stuff so here's the setup basically at the top you have an agent which is your actor at the bottom you have the world and going down the right the actor does something which changes the state the state you can observe which is your picture and you get an award or you could do it the other way around you decide what to do in order to optimise your award so there's this whole thing which is going around and around there's no supervisor per se you get a reward, the reward may be very delayed so in go you only get a one or a minus one basically you either win or you don't or your probability falls to zero of winning but there could be hundreds of moves away so it's very difficult to learn this over time this is what there's a technique called Q learning and this has been around for a long time but it didn't it was very difficult to learn because people couldn't learn these things it's very difficult to train these functions so basically your Q value is an estimate of the entire future of life from your current state so it could be how much do I like being in this go position and the reason you want to do that is because you can then say well if I were in this other position how much would I like that if I were in this other position how much would I like that so depending on the different actions you can perform you can decide which is your preferred next board you then choose that action so you do the best action you can and then you see what happens now it may be that that wins you the game if that was the last move in go and you won the game that was a good action actually the move beforehand was a great position to be in because there in that position I could make the move to win so gradually by updating these Qs all of this stuff is one Q value related to the next Q value you can one thing the next step supervises the previous step so you take a Q choose an action measure a reward update it but it used to be very very difficult to think about these Q values and how to train this thing but why not make it the output of a deep neural network you have an input which is the state of the world you have your output which is your Q value who knows what that's going to be and then you just train it again and again you let this thing behave on your Atari game and see what happens and gradually these Q values will hopefully become meaningful so what we have on your virtual machine is a strategy game so rather than tackle go in the next five minutes we're setting our sights slightly lower and let's say we're going to want to play this bubble breaker game this is kind of like a candy crush I can quickly explain the rules um it starts with a full random grid you can see that in the left one you've got a red L shape if you click on one of those all of the reds disappear so if you've got more than two together also if you've got two or more together you can crush them and as you crush them this thing falls down so it just mechanically does this and then if you crush a complete column new columns arrive and so you just keep going but it's game over when you've got no two things next to each other now this is I like this, this is an android game it's free you can spend way too much time playing this because there are lots of interesting it is actually more strategic than you might think because you can plan ahead which bubbles are going to arrive next to which bubbles and you also have uncertainty arriving in columns from the side so so let's go to the reinforcement learning example and basically I'm, so this has been written from scratch um and we've been trained this whole thing from scratch basically this numpy array is your bubble breaker field and this is your bubble breaker board and this is your bubble breaker board with dots in so because of the magic of Jupiter and Python as a back end and JavaScript we can now play this game so this is this is, I like it so okay and I went to an expert despite the time I spent doing this but you can see that okay so you can see here you can see what I'm working at doing some of these things scratch scratch okay gradually what happens but here I had the potential for doing a good thing now I'm screwed there trust me trust me sorry we can redo this trust me when you start getting this once you start doing the columns you'll get new columns in and this is a terrible this is terrible basically I'm going to score like mid 500s but that was not a strategic game in my history of playing this I can be sure of getting a thousand points which is like a couple of screens worth but I've tried lots of techniques to figure this out anyway so let's move along but let's try this with a smaller board so the smaller board works like this it should be not as difficult but I'll probably fail there we go, there's a cleared column oh now it's not working so you can see that we're playing the same game on a small board and there's an element of planning involved and so this is not a trivial game so this has code in there I'm happy to talk about this whatever which turns this into features there's logic to run a game and we're ready to train a network and this should start training what was your model? sorry? what was the model architecture? so there's a couple of there's one interesting thing about this is that there are five colours in a small board there are four colours now if you just say the colours are 0 for blank and 1, 2, 3, 4 that's actually bad features because you could actually learn more because permuting the colours doesn't change the game so you should be able to learn four factorial examples from every one you do in the main game there are five that's 120 you want a better representation than that so the features that I'm using are for a given board is I take an outline of the board and I also say if I shifted the board up by one which points are the same? if I shifted it across by one which points are the same? so I do that for several different shifts so it can see how connected the board is but it doesn't know about the colours per se so there's some featureisation going on there then I do a couple of convolutions here and then one output which is what do I think the reward is going to be? and the reward is how many columns do I get in? so that's the whole story but if you want to know the code is right there and I'm happy to answer this is all on github so if you've got any issues any problems talk to him on github and well we'll get to that so here we are with this thing learning something and you can see that it's learning it does 100 games in 22 seconds so this takes a couple of minutes and this starts when we do the start of this it starts at a score you can see that the mean score here is 214 and we're at 4 minutes to go so the mean starts so this is just a random network so we're playing just at random scoring 214 this thing is now going through training and training and training so we're training for a thousand 900 ok so here we've got to the final so now you can see that the mean so this thing has just been learning a Q value it's mean score which is down at the bottom is the 384 so this thing has definitely learnt to play something it's also scored some decent scores on this little network but just because you probably want to see how it plays so this is well you can see it's way better than I was so this is the network which has learnt to play this game while we've been sitting here it's not doing too badly ok so now it's got to a failed state now but let's just since we can something's wrong oh sorry I need to reexecute something so it's all very well saying sorry can it do the real deal because the real deal is the one on my phone and I want to know whether it can do that so I need to re because I redefined the sizes I need to reload the model and here's a real board and here it's going to do some real play so the interesting thing is from my perspective that instead of doing it down in this bottom right corner it's choosing to do it play really near the advancing edge so there's tactically something which I've already worked out was a good idea but it's also playing around with leaving stuff undone it has some interest there's stuff to learn about Candy Crush from what it's learnt the other thing is that it's actually doing pretty well and I would probably say this is a better player than I am which is sad it's sad though I would have an opinion and it's anyway sad so this was trained on the GPU in about five hours obviously it would be nice to run this for longer and see how good it can get it's still it's still improving after five hours it's pretty good at this game okay let us what does AlphaGo extra they do some stuff with this is learning one step at a time so it only has a look ahead of one move AlphaGo is looking ahead many moves they have a tree they have a way of looking at the right parts of the tree they played itself a lot and they run on 1200 CPUs and 106 GPUs so there is more than we have in five minutes I can show you so wrapping up learning can do some cool stuff having the tools in one place actually having a GPU is really helpful this is all on github so not only are all the notebooks on github but also the thing which constructs the virtual machine it's all there because I want to have something where like a Singapore originated thing whereby you can do deep learning in a workshop, in a box handed out in keys hopefully some people have managed to follow along I know it's been very fast if you like it please start at github that's my KPI please start but not for any particular reason it would be wonderful to have people start it if you have any problems with it let me know and I'll fix it up if you have any ideas I'll be glad to hear them questions apart two minutes you know I'm not really the one to talk about that so I did finance in the CDO markets which had the great financial crisis now I want to do machine learning but now on the other hand now I can choose to do machine learning so finance was very exciting and fascinating and everything I have nothing bad to say about the finance stuff but this seems like disjoint learning to predict the stock market is probably the most difficult thing you can learn because you've got a crowd optimising against you all the time so is beating these adults easy because you're just beating one guy beating the financial markets is full of noise data is expensive for AlphaGo they can play millions and millions of games against themselves for the financial markets you haven't got millions and millions of days of data because and every time you use a data point you should be paying for it because you've lost subjectivity you've lost objectivity by looking at data because as soon as you know something doesn't work you know a whole bunch of models not to do because you've done that all the remaining models are coloured so I'm not that optimistic about deep learning or not that I want to stand up about in some ways it's very easy to apply badly so I'm very wary for being machine learning even regular machine learning I'm sceptical on the other hand if I'm trying to recognise speech or text people aren't optimising their reports against me they're trying to communicate information if I learn stuff from Wikipedia because Wikipedians want me to learn it it's not like they're trying to obfuscate what they're telling me unlike to say the financial markets where people are deliberately not telling you what you should know so this is why I like machine learning in a more general there's a lot of stuff to do which cancer is not optimising against my machine learning it hasn't got time to because the financial markets in six months can optimise against me yes if anyone's got a still got a usb key and doesn't want to go to hell then they should hand it back sorry sorry so one of the my day job is involved in using deep learning on natural language but it's kind of to do that I need to understand sentences and paragraphs and entities all this stuff understanding is a very tough term but to recognise the dependency tree within sentences or entities are doable things but really you're talking about a crossover between the fuzziness which is language and hard facts and facts and rules have kind of a different feel to them and so this is one area which is very very interesting and kind of an advancing kind of research so one of the interesting things about pictures is that pictures of cats tend to be stuff you know fairly flexible and near each other in some way but facts are like needles in haystacks you have a correct fact and there aren't nearby facts it's it's a very different kind of game you're playing and so this is a very very interesting thing and it seems to me that the bigger AI goals is not doing this intuitive stuff like finding human intuition or dog intuition better and better and better until it works there's a meta level where you're thinking about facts and manipulating knowledge which is a whole different game so I'm interested in the bigger game this is just cool