 If you don't have a machine, I'll probably post the link after this, so if you want to follow along at home when you have the video, your work, I want the keys back, so let me see. You have not. You have such a hipster in your hand. Pass it along. Basically on the key, there are two directories. There's a big OVA file, and there's also a presentation holder. You can want to copy both of those. Basically, if you go to the presentations directory which you may have copied, you'll have a whole list of things that will be in the main presentation, a variety of browser based demos, which we can do even if you don't have a copied or you don't have a virtual box. And then we've got this whole virtual box thing. So, all our directors have, I guess, started the presentation, and we'll get on once you keep doing the copying thing, because we'll get to that later. Basically, we're going to be talking about deep learning. This is a hot topic. When I've done this workshop, I started this at Forsage last year. This has turned into quite a large repo, which has all opened source. There's lots of fun and experiments in it, and I've done this whole variety of things. I've actually prepared something new today, hopefully this will work. So, this is about me. I have kind of background in machine intelligence, start-ups, finance. I'm in a single port as a team. Basically, I spent 2014 reading papers and choosing to open source and basically getting back to kind of the forefront, because I didn't look like a machine learning PhD in the 90s. So, I just wanted to get back away from what I've done financially, and I want to do the machine learning thing seriously. So, since then, I've been, since 2015, it's been kind of serious. I've had a job doing proper natural language processing, deep learning, I've opened some papers, all in Singapore. So, here's a quick rundown of what can be done now with deep learning. I'm sure you've seen this a lot before, which is worse than the quick ones. So, we can do speech recognition, enterprise relation, project resolution, automated captioning, real and false learning. I can just go down this list. This is not going very quick. Speech recognition, Evie Andrew of Phone Hasses since 2002 in the clouds, 2014 on your phone. Translation, Google phones can do this, or basically they can in-place translate text on the screen in real time, which is great. This is all deep learning. Google has been capturing house numbers from huge numbers of cameras in other countries, but also because they've got captures which people fill in to verify that they're humans, Google at the same time has collected data on what the actual house numbers are, by cross referencing these humans or supposed humans. So, not only have they got this data, they've now got a human analysis of data sets. They've now recognised this stuff better than humans. There's, I think, an image classification. Basically, we've got here, this is a container ship. This is a method. This is stuff which machines can just look at these images and tell you what they are. So, Google's got this in, say, their Photos app, whereby you're automatically letting all the signal photos. The same with Facebook, same with all the images we've got so far. They can also generate, like, detailed captions. So, over here, over here you've got someone on a motorbike, two dogs, or playing a game. So, you've got this thing where the machine can actually just spew out captions of which explain all the scenes in the picture, as well as just having a signal link there. We've also heard about reinforcement learning, where it's been applied to AlphaCo. So, actually, I've got, in the workshop folder or the repo, I've got stuff which addresses all of these things. But we're just going to go from very basics today, all the way to something that's interesting. The basic thing about neural networks is that it's been, to feel, this fairly unchanged, unmentally since the 1980s, in that you've basically got very simple units, producing very simple functions. But when you combine huge numbers of them, you get something much more complex. So, here is a single neural. And what we have here at the bottom of the look on it, inputs. So, this will be the features that, it could be the values of a pixel, or it could be the value of the temperature and humidity. This will be rising from inputs. These are then added together with weights. Then, from here, we then put them through this kind of nonlinear function to get this output. Now, this nonlinear function is just, if it's more than zero, keep it, if it's less than zero, it's zero. So, you would think you can't do very much for this, and that's true. So, we'll have a little look about what we can do with this, and why making these things, joining these things together makes a difference. So, basically, you will then move on to a multi-layer neural network. This has got the same kind of inputs at the beginning. Translate it through, each of these things, this one has got weights to all of the other inputs. Same with this one, but these weights will all be different. Similarly, this unit has now got connections only to the ones in the layer before. So, the final output is a combination of the layers before that. So, basically, each of these is a very, very simple function. This sum stuff, weighted sum, maximum is zero. So, we've only got something very, very simple. How can you actually make this turn into something useful? So, what we can now do is go to the first of the things that you'll find, which will be this tensorflow if it doesn't drain. Basically, if you've got this stuff on your machine already, you'll find it in the presentation folder. You should easily find something like this, and then you can click on this tensorflow thing. Is everyone following me here? Has someone not got a USB key yet? Are we all done with the USB keys? If you haven't got a USB key, this is actually something which Google put up on the web. It's called the TensorFlow Playground, so you can just find it anywhere. The reason I've got this on the USB key is in case the last year of the FOS agent Wi-Fi was terrible. So, everything on the key, this doesn't need a different internet connection. So, if you click on this, basically, we get to this kind of page. Does anyone fail to get to this kind of page? If you do, I'm going to move on quickly. It's already set up so that we can look at what this page is. It's a neural network playground. What the task we're trying to do is to look at this over here. This is a set of orange dots and blue dots. In these orange dots and blue dots, the idea is that we want to create something which will separate them and predict which regions of orange and which are blue. Now, what we're doing, what we have to play with here is just something which tells you how or lefty and righty it is and how or up and down it is. These are the only two features. So, we're only allowing two features here. So, if we combine these two features, the only thing we can get is basically we can do any diagonal line. So, if you click this training button, basically this will say I know I've got some weights from this one to my output from this feature. I've got another weight from this feature to this to the output. I'm going to adjust these weights until I make these regions as good as possible. So, we can start this. Well, it's completely the wrong answer. What happens here is that we've got, it actually chose the weights back and random, the wrong way around. So, when it picks one of these blue dots and says, okay, I think it should be orange. You say, well, I've got that wrong. Why did I get that wrong? Well, I paid probably too much attention to this feature here. So, you can then start saying, well, if I paid too much attention, I should be changing this weight. So, I know which weights to blame for getting this wrong. So, at every stage, if this is coloured wrong, I then feed back to these weights to adjust it. So, I prefer this. I don't want that feature. I do want this other feature. What will happen, actually fairly quickly, is that this will then train to categorise this property. So, this is a very simple blame game. Basically, we're trying to categorise all of these things in the blue region, all of these things in the orange region, and if it gets getting it wrong, it tries to de-merit the feature which it, which gave it the wrong hint, and increase it for the one which worked. Now, this is fine until we say, okay, let's place it like this. Now, this is one for which we start. So, this one is like a quadrant in the gut, like a checkerboard. And the question is, what is the best line that separates these two? Because all we can do with two features and two weights is generate a line. So, if you try to do this, basically, it's not going to do anything. There is another way you can separate this checkerboard with a single line. So, here's where you say, well, what I really should do is have some kind of hidden units. And this will then say, let's just start with two. So, if I add a layer in the middle, these hidden units can now produce a single line. This will be called a mission of two lines. So, that may be something where it could figure something out. So, it's beginning to get the idea. Now, in fact, it can't possibly, it can't do this in reality because this is a parity problem. But, basically, it's been able to say, well, I've got the lower bit, which I'll ignore, but I'll fix up the other bit. So, this works using two lines to try and do something. Equally, we could... Equally, we could... I'll tell you that. Can I add some more lines? So, here, by... Essentially, by just putting the right blame and deciding who to, what features to use, it can then cement this thing by using this hidden layer in the middle. So, that works well. But, let's try something else. So, this is a ring around the circle. So, if we just rethink this, and we try again. So, it's had to work a bit harder. But, basically, if we now look at what it's doing, is when it's looking at these orange pieces, it's going to say, I love the stuff around this area and the stuff which has given me the wrong idea, which could be there. I'm going to have a demerit against that one. But, because of these demerits on the weights passed back here, this would say, well, I wouldn't have bought that demerit had I not been given the wrong information on my previous level. So, at each level, they get... not only has the weight changed going to them, but also how bad the information they got, how well they contributed to the whole outcome was. So, this enables each of these things to then have its own demerit system. So, this is how the error is being back-propagated through the network. So, the signal is getting forward-propagated through the network. But, because of this whole fitting and error and the blame game is actually on every weight, you can then pass back the demerits through the whole network. So, this is what the back-propagation learning is. This is a fun little thing. So, it's not clear whether this will work. It probably won't work. This is basically what this kind of network can do. It's kind of to illustrate how not only can you vary the features, because there are other features you can use, you can also create features which could be useful. This isn't going to be... So, this is just to get the idea of features and the ability to generate these internal features. So, this answers what hopefully what we've run. Basically, the goal here is to do some supervised learning. We're trying to learn to predict what the inputs mean in terms of here is blue and orange. We can also choose which input features we're going to use. We can see what the signal looks like in neural learnings, which is hardly anything. We can see how this blame game is plagued with back-propagation. And then, how the neural networks, that blame game actually, will create internal features which are useful for making this work. Those are the takeaways from this. So, let's talk about something where deep networks have had some success. This is image classification. So, through the stems of the 2000 Os, basically, there are lots of... OpenCV will be a classic image recognition or image processing library. And people would detect. They would have lots of different features that they would look for. So, if you're trying to detect... I'm not sure about if you're trying to detect a cat, you might have built a specialised third detector. You might specialise in the eye detector. You basically build an arsenal of features. You combine them to make the best possible detector. It happens in 2012. And, since then, the deep learning people, basically, instead of having hand-picked features designed by humans, they just said, basically, we should let the output drive back to the features that we want and let the machine determine every feature that we're going to put into this. So, since then, and this is one of the things which suddenly made deep learning a huge deal, is the deep learning just basically took over those competitions. So, in order to understand what's going on with these image things, is that instead of doing the orange and blue dots thing, basically, we're looking at the whole image. And this is normally considered or previously considered something which humans would have, but machines would never be able to have, because it's just so fuzzy what is in the image. But one of the nice things about images is they're actually organised. So, you've got the whole idea of up-down and upright. So that the actual pixels are related to each other, like the pixel one above is closer than the one elsewhere in the image. You've also got the idea that the cat could be anywhere in the image. Basically, images have some organisation compared to just x1, x2, x3. So, the idea for this, what it's now called, CNN, so this is a slightly different kind of neural network, but it's all the same transport, is to use the whole image as being our feature. And the parameters that we're going to twiddle are basically the elements of a Photoshop filter. So, no Photoshop filters, you can have a sharpen, you can have a blur. Basically, it takes around, I guess that, a mathematical term for these Photoshop filters would be convolutional filter or convolutional kernel. A CNN is a convolutional neural network. So, basically, the idea of a CNN filter is you have this little matrix. You have, here's your input image. And here's a little matrix. And I'm going to pass this matrix and just add this stuff up and multiply to produce an output image. So the purpose of this is this, these numbers are the parameters of my convolutional filter or the layer. So, in this, the parameters here are just translating one image into another image. And what we can do in order to see that more carefully, actually clearly, we can have a play with a convolutional filter, which is one of these. So, here is something which hopefully you've got. It's also available on the web, just that I made it last night. So, here is a simple thing where we have this is our input image and we can play with this one. This is our output image. And basically, if I start to change some of these parameters, this is a 3x3 convolutional layer. This is one convolutional layer. So, I can just change the numbers. So, you can see that by changing these things I'm going to emphasise different pieces of the image. Either I can start to make it lefty-righty or I can look at edges. I can make it blurred. I can make it sharp. But basically, it's not my job to make this happen, because I can let the neural network play the blame gate on what features it wants to have as an input. So, then that's the presentation. Here is the idea of a CNN, the workflow which is going on in a big CNN. So, basically, this is our input image. And then the first convolutional layer is basically a whole series of images which are, I'm not sure what they are, which are basically the car with different filters applied. And then what you can do is you then put this non-linearity after it. Then you do the same thing. So, you're going to say, OK, let's put another layer. Let's represent the fuzzed-up version with a sharpened version. So, I'm not sure whether you played with Photoshop, but if you blur an image and sharpen it, it's not the same image. So, you can imagine if you've got a righty image and a lefty image and a lefty image, and then you combine them, you can start producing many more different kinds of images, and you might be able to start highlighting different elements. So, if you look at this in detail, or you look at some of these pictures which you can find online, basically, you'll start here with just kind of outlines and vertical lines and different aspects of pixels. But as you start to get over here, you start to pick out shapes. These layers will be very responsive to circles, which mean car wheels. If you start to train on a whole bunch of cars, it will start to pick out features which are useful for recognising cars, and it will do this purely automatically. So, just to complete the flow, basically, you'll take all of these image pictures, and there's also these things called pooling layers. Basically, it's a squashing operation. So, you take your whole image, you just do a 2x reduction. So, basically, at the end of it, you get some rather small images which are basically saying, well, there's one wheel in this and there's a headlight in this and there's some in this. But I can't necessarily explain what terms there will be. It doesn't tell you. All it will be able to say is, in order to predict this is a car, which is the output I'm trying to strive to, I would like there to be wheels and headlights and some. Or whatever it is. So, First day you say that it's a convolutional transformation, but it's like the same filter, back to every optimization. How do you know that it's doing that? Is data adjusted? But by construction, the only way in which I'm allowing it to do is applying this filter. So, it's not going to align each pixel to reflect every pixel. It's that filter with its nine weights. So, if you're doing that, it has no option. So, this is it. So, particularly, you'll have not just one filter. You would never have just one filter. You'd have 64 filters or something. Because you want it to have a palette of different things. This is actually a three-plane thing because it's got three color channels. This would then be a 16-color image. This would then be another 16-color image. Right? Because we want to keep the we like the translation in the right side here and we know that eyes do kind of something like this. So, we're going to exploit that. So, now let's explain why this has been so successful. Basically, there are competitions to do this. And this ImageNet competition is a big deal in image processing land. It has 15 million level of images, 22,000 categories. Basically, people are playing a game to win this thing. So, it could be Google playing it. It could be Microsoft Search playing it. We'll get to that. Basically, it's now turned into that since the whole deep learning thing, I guess this is only still a four-year-old kind of issue. People are pouring money into winning this game. So, what it's doing is it's advancing the state of the art extremely quickly. So, the game, the essential game here is to take this image. And this is a visualization by a guy called Carpathian who's from Stanford and is now working at WI. This is a picture of a cop dog. Here, each strip of these are different classes which could represent of the stuff which is in the ImageNet database. So, going across here, Mae yna'r crimfyniad wedi symud i bobl ddweud, oedd hi'n dwy flynyddoedd, byddwn iawn amlwg amlwg i'n ddech. Byddai'r imiwg nesaf i gandwydurio allan o fflicer. Mae'r hwn yw hyffordd â hyffordd hyffordd â hyffordd â hyffordd, ironio i'r myfioedd i'r cyffordd. Byddai'n granteis, gallwydig yma, ac sétio'n cael ei gyfrifon o'r model. Mae'r hyffordd yn gallu i'r imiwg nesaf oeddi, i'r twfyrdd yma i ddweud. y cwmhysg yw'r cyffredin o'r cyffredinol yw'r cyffredinol. Yn yw'r cyffredinol, ond mae'n mynd i'w wneud y dyfodol yma. Wrth gwrs, mae angen i'w gweithio'r gwymau a'r gweithio'n eu cyffredinol, mae'n gwneud y 25% erioed o'r cyffredinol. A'i gilydd i'w gweithio'n gweithio'n 12, mae'n ymgylchedd y fferdd ymgylch, mae'n gweithio'n 16% erioed o'r cyffredinol yn y gweithio. Ieithwyr ydych chi'n bwysig am ymwyfyrwyr o'r perffordd o bwysig a'i gweithio'r cychwyn i'r ddweud o'r ffwrdd yn ysgolwyr o'r cyfrifnid yn yr ymwyr. Ieithwyr ydw i'w gweithio ymwyr, mae'r ddweud o'r cyfrifnid yn ddweud o'r cyfrifnid yn ysgolwyr, mae'r ddweud o'r ffwrdd yn cyfrifnid yn ysgolwyr. Dyna, mae'n ddweud i'r ddweud o'r ddweud o'r rai ffwrdd. A on ap mae'r dgweithio'r ffordd yn ôl yn 5%. A'n greu dechrau 100% er mwyn ei ddweud eich wath y mae'r cyfnod rha o bwys. Rhaid y gallwn ei ddweud eich nitw i ffwrdd, yn hollai hun. Ond mae'r ddweud bod rhywbeth i'r ddweud ymchwil, pan rwy'n credu sut yn complimentary o'r berynion o'r gweithiau, oedd ei ddweud sy'n eich ddweud. Ond mae'r ddweud yn ei ddweud. Mae'n dal i gyda mina iddo yn yr hyn yma yn gwaith yn y cerddwyr yn edrydd. Mae'n hanes gawr yn ddechrau. Mae'n rhaid o symud yma yn gweithio'n dail ar ddweud i gwych yn symud. Mae'n ddefnyddio'n gweithau yn yffernig i gwahosig i'r bwysig, ac mae'n ddweud i gweithio'n gweithio. Daeth yn peidio'r bwysig yn ddat â'r syniad gwazio. Pau'r ddoch yn ddysgu'n ddweudio. Mae'n ddweud yw'n gweithio i'r gymunedur yn'n gweithio. pan hwnna'n gweithio Gwbl Onet yn 2014. Mae'n gweithio allan o'r cyd-dweithio gyda'r Cyfnod CnN yn ddweud. Mae'r cerdd yng Nghymru yn ddweud o'r Cyfnod CnN, neu mae'r cyfnod cyfnod cedlau ar y cyfnod cedlau, ac mae'r cyfnod cedlau ar y cyfnod cedlau ar y cyfnod cedlau. Mae'r cyfnod cedlau ar y cyfnod cedlau. Felly, yn y cyfnod Cd强f, mae'n gweithio Gwbl Onet, ac fel y cerdd, If you haven't got a virtual box, then you can't press this virtual box thing. What I'll do is then import the clients. For this clients thing, I'll just get them in. So I've produced this. Cross fingers this works. So basically what's shipped here is a container through a virtual box client which will load this and run it. This VM contains Fedora 25. It has TensorFlow setup, yes, the ANOS setup. It has data sets. It has pre-trained models. It's got Jupyter, which if I do this, basically. So this is the first time I've run it. So everyone else should be able to see this. No matter what machine you have, this should be able to do it. We have to get to this login point. Let's go back to the presentation. On your localhost, you can go to... Basically, we don't have to log in. The login point is misleading. You don't need to log in. If you do want to log in and have a look around, the user name is user and the password is password. But it's not designed to be secure. It's designed to be... There's no need to go into it. There's also instructions in... Let's kill. You can also SSH into it. Now you know the user and password. You'll get a console. I don't know if there's various things around there. This runs TensorFlow as well. It does stuff. On the other hand, it can't attack your machine history. It's fine because it's nice to bring it in. Here, what we're looking for is this image net. There's a Google Run app thing. You'll see this as being this number three. Image net with Google Run app. What I'm going to do is I'm just going to run more. I could go. Normally, I would go through this in some detail. Because one of the things I've been doing is explaining how these frameworks work. The reason that people can piece together these neural networks so casually now is that people have come up. The open source movement in particular has produced not just different datasets, but these frameworks which enable you to just piece these neural networks together and have this whole blame game implement. Blame game is really the derivative chain rule. You can calculate the chain rule by any kind of real network. The frameworks can take care of this. This one's implemented in the thing called Theano, which is a Montreal project. Google thing or the MXNet or the C&K. These are newer variants produced by actual... Research seems really good, but it's basically a series of duct tape gradual projects. Whereas Google has actually started out. Other frameworks have started out with a much bigger overview of what's needed and the actual engineered product. What this does is it loads up some stuff. It imports the model. There's a model definition here. This is basically the model we're looking at here is this. This is just a rather deep neural network. This defines layers, opens this big thing which populates the layers with predefined model values. There's a way of preparing an image. This is an image I've got. There's images on the device. Basically, this is all you would need to do is to print. Basically, you get the features of the CNN and then you just say, well, what is the most likely of these classes? It's heavy capped. This is a good result. Basically, here we've got a neural network which is fully open. This whole thing is open source. You can go in and have a look. You could play around a bit. One of the disadvantages is you lose the ability of your model parameters to be predefined. The nice thing about predefined models is you can download these free, restricted, but it takes to do one of these image net things. It can take a month of GPU farming to get to a good result. It's worth making use of other stuff. There's other scripts here which enable you to drop files into these images into a directory so it can classify everything it sees. It's just called Tabbycat. It's got golf ball. This is not a great... I don't think it actually has owl as one of its training examples anyway. It basically can do... Rabbit is probably okay. Band 8.9. It's been missed there now. Muzzle of the Golden Tree will make me sign his cat. This is basically... From 2014, a state-of-the-art thing applied to mean photos I found on the internet. You can play with your own photos. Whatever it is. But wait, there's more. The thing is that the next year Google came along with Inception 3. This is a much larger network and is even better performance. There's also a pre-trained copy of Inception 3 over here. Except instead of taking half a second to run or a quarter of a second to run it's going to say five seconds to run or some extra time because there's a lot more ops going on in this. There's a lot more flops. And then there's this thing about going deeper and deeper. So you start back in 2010 at these comparatively shallow networks. This is going forward in time this way. This is the performance. So basically before the deep learning it will be at 25% errors. This thing came along down to 16 errors. The number of layers is now 8. 19. 22 layers. So this is the Google and the MEP which is sitting in a 19 layer thing. 20 layers. But most recently it hats off to Microsoft for doing this. Now it's beating human-level performance with the 152 layer of monotrosity called a Resonate which we haven't got on here because it will kill everyone to machine class. This is a really deep idea and my guess is there will be more of these ideas. One of the problems with scoring above human performance is it's very difficult to get any training data. Because who do you trust to tell you more than human performance? So this is one of the reasons why all of this stuff can get up to human level and if you then start to have committees of humans you can exceed a single human but to beat committees who do you trust? So that's a problem. So what we've seen so far is CNNs are good at images. And hopefully something is interesting. The fact that your machine from zero at the beginning of the hour to now and that Simon's cat is pretty amazing since it's very difficult to distinguish between a Simon's cat and a Tabby cat without talking about stripes and who do you trust? Which are very much more abstract human concepts than angles and pixels. So these machines can do a huge area of research also a very commercial kind of area of research and CNNs come along a huge way. So they're so good why not use them to do things they weren't meant to use to do. So one of the simple examples we're going to do here is speech recognition which is kind of not what you should be doing with this stuff but these things are so flexible why not? So basically what we're going to do is turn speech recognition into an image recognition task and then solve the image recognition task. So on your theme there'll be a there's a folder called speech under there there's a thing called data and so this thing is dividing into two pieces one of which is the thing which prepares the data sets and this is if you're into any kind of data science you know that this is the 80% problem the 20% problem is actually learning to do the thing but the data collection is a pain. So I had to admit that last weekend I did not know what I was going to do at all I was still searching around for some idea for what to do on Monday I decided let's try speech recognition Monday night I kind of figured out spectrum rounds maybe Tuesday I was collecting some speech so what I did is I have a nice little app called voice recorder on the end which produces WAV files so basically what you can do is just excuse me. So basically when you get a WAV file on your phone it sounds a bit like basically what happens is that I've rescheduled this so you can't see how so it's the graph package of this basically at the beginning you can see this is me thumbing with a phone this is me kind of thumbing a phone out I want to crop this this is the digits so this is my version of endless for audio recognition so this is 0, 1, 2, 3, 4, 5, 6 so basically I've created a crop tool which enables you to just crop this thing and basically this stuff is this speech is just a series of non it's an umpio rate it's easy to crop having done that you can then save it back then what I've got is so this is basically I'm trying to lay out how this all works so you can play with this for yourself you can play your own data it's all open it's pretty easy to get into a function which can use a spectrogram so this takes the sample, takes the mean has a hammy window FFT it's moving function so here is what I get with my kind of very naive spectrogram which is I've got my thing which is hopefully so what I've done here is so you can see so this is for the bottom there's low frequencies high frequencies this blue line is kind of the amplitude of it because I want to then crop these into simple words because rather than take the whole thing as being my image it's easier to swap the isolated words so if you're into speech you're going to know that I'm cheating I'm going to do this on isolated words which is so much easier than continuous speech because if you've got continuous speech you've got to then figure out where the word boundaries are here I've got a very simple nicely separated thing so this is where these other lines come in it's a detected sign with that 8 so another nice thing you can see you can see that this is 6 and 8 I've got two distinct pieces or several distinct pieces in that thing 5 this is 1, 2, 0, 1, 2, 3, 4, 5 you see this little tick this little tick here this is the I same with this this is 9 so you can see just from the spectrum at this point I'm like ok maybe this could work so you're greatly encouraged on Tuesday so I then have this kind of contiguous region detector which just then slices it up into a nice little pump here and then this is the other thing I also spend a lot of time looking around of all the open source data sets it's not so easy in speech because there's a lot of closed data or a lot of non-commercial data a lot of scripted use this is why I started doing it on my phone there are huge data sets all of these are kind of being numbered so I couldn't really hand out and go to the keys so this thing allows you to collect a spare amount of speech it's got this idea of having a prefix so these sentences are 0, 1, 2, 3, 4, 5 6, 6, 7, 8, 9 there's some animals which I've got below, that's kind of extra and then there's some other things which are kind of interesting this is the quick brown fox jumped over the lazy dog has all 26 letters of it in English but the quick beige fox jumped in the air over the thin each thin dog look out by shout before he's foiled who again created chaos but it's every phoning in the English language in kind of the shortest sentence so these would also be really interesting data sets to collect huge numbers of words for because then you'd have examples of every phoning in just one sentence and there's a whole bunch of those there's other typical data sets called Northwind there's interesting stuff to do because now I'm down to three days left there's also some time but it's something great on being so being has a very nice speech API so of course from the speech API you can get it to say creating free data for speech recognition so thank you Microsoft so there's some being on here but you can't get the thing is that there's the open source so with that it would use beautiful words but there's only one variation so now I'm going to move on to this let's have some actual Python speech features this is an actual package for people who know what they're doing so this can be better but it's my slightly worse here's the nice spectrograms this is a nice version of the spectrogram thing can see just the same style but because they're proper speech people they've made the spectrograms look like they correspond to what you hear rather than what the FFTs say and what I've wanted to do is I wanted to look like what you hear because I'm going to do a vision on it so looking alright is a good sign for me so I've got some other code which helps to build a data set so here I said this stuff 16 times or 15 times and also being has contributed Catherine, Linda, Ravi, Susan George, Sarah and Benjamin who are actually people from different what, machine people from different countries so there's another label and then I've got a thing which converts the WAVs to stamps like the long combined things combine these into stamps I've chosen I'm going to make my stamps look like that they're all going to be the same size so I can then just put a vision and I'll say Python so I'll you can see what word this is this is clearly 6 now 6 is interesting because it's got the x it's clear what's going on because it's so clear this is another good sign so here's all my 6s we're hearing as much as 6s and then I basically said well I also want to have some test data so I'll keep my test data to be separate of we can see the directory structure and so here's the digits what they look like for the test data and now it's time to train and run by the way there's also some other stuff for maybe for another day so there is insufficient time to really see what's going on here basically what we're going to do is now train our own CNN to recognise these stamps I have a label I don't have much data I have maybe 20 or 18 in different of these stamps I'm going to try and make it tell the difference and when I wrote this to the Google guy which I met I got maybe 20 examples and it's got to train in less than 5 minutes he's like definitely joking so what we've got here this is the actual definition of the CNN we've got the input features which is the stamp itself we then do a convolutional layer so you can see this is tensorflow it says layers 2D convolution and we do a poolwork layer and we do another convolution another poolwork and we do a dynx layer and this is the car recognition this is just probably 90% it's Google's code for doing N list which is a digital recognition we have a dropout layer which is another exciting example to try and call and then there's the first boilerplate to make this work so here because I said train all basically this thing has now started processing this stuff and all gradually you can see the training cycle going on so it's done 1400 steps each step is 20 examples picked a random and you can see that this loss number started at like 2 and it's now going down it turns around a bit so the next thing is as well as having this training this training data basically also with each of the training with each of the examples I store a random number between 0 and 1 and what I do is instead of having a strict partition between training and validation and mess around with lots of different variables I have one training set and I partition it based upon the value of this variable so I say I want 90% of them basically I see what is the random number more or less than 90% if you know it wasn't me to use just one data set so let's see what's happened basically after the training I then did this thing which did this evaluation basically I have an accuracy of 1 which means it guessed all of my digits correctly which is amazing but this is from my validation so let's just see how this stuff works for these predictions so basically this is from the test set so this is something you've never seen speaking, you've never seen quite a lot of me speaking but it's also got some other voices I would love to collect voices from around the room and have them all into one beautiful data set and then a bit of multi-speak I'm not sure, I don't even know whether it will do multi-speak awesome and this is kind of the matrix of this things it's 99% likely that my 0 is a 0 it's not so clear about 1 it thinks it could be a 5 so basically this is so here's the test set just illustrated so we've trained this thing it takes probably 3 minutes to train this if we trained it more obviously it certainly gets better but there's no time so because he's having a reasonable idea of most of these digits it's got number 1 it's very confused this one it would also get more but here we have so in a nutshell we have speech recognition live on your machines you can train it if you just do 2 rounds or 3 rounds this will become perfect there's another interesting thing which I started to play with but we haven't had enough time basically we've heard the animals that we eat cat dog fox bird what you can do is instead of saying well I want to have a whole training set on cat dog fox bird I would see what they look like in terms of numbers I have to say this doesn't encourage you but if I feed these birds into my numbers on your network I can see what numbers do these look like for instance bird looks like somewhere between 8 and 9 which is kind of weird but cat anyway so you can see that these are kind of what if you classify them with the wrong thing you go to like a fingerprint this is a fingerprint of errors because none of them are correct answers but the fingerprint of errors you can then start to train on so you can actually look in more detail by taking logs looking at the logents of this thing and you can then train an SPM on this thing okay so here's the answer that it's getting so I put in some training places as well so I have 5 cat dog fox birds which I then used to build mine so I've got only 5 examples of each to train the SPM on the errors and then I put in next classifying cat it thinks it's a dog dog it thinks it's a dog, fox thinks it's a fox bird thinks it's a bird so just training on the errors it's actually learnt to recognise words which it was never talked about in the initial training so this is another way we've only ever used CNNs to learn how to do speech recognition we've exploited what we did a nice train model to do stuff we never trained on which kind of opens the door to why we've used the big CNNs train for ImageNet to do recognising of T-shirts all this kind of other stuff you can do just by manipulating so that was that and that back up so deep learning may decide some height it's kind of cool what it can do the field is advancing very rapidly this is in a way the CNN thing in my mind is fairly done there's a whole bunch more they can do with speech recognition but with recursive neural networks or company neural networks tree structures there's a whole lot of interesting stuff to learn there's a lot of data out there but having a GPU would be very helpful and that's kind of one of the reasons that it needs a price to do this but you'll see from your laptops that we're sagging this is all over the source it's called deep learning workshop I should point out there's me and this guy Sam are doing a deep learning meetup group called the TensorFlow and deep learning in Singapore it's hosted by Google the next one is Monday it fills up rather quickly we found hopefully people abandon us so yeah it's hosted by Google so it's a mix of some TensorFlow which is one of these big frameworks and a lot of deep learning which is kind of the driving factor we typically want to have three things we'll have a talk for people starting out I will probably change this I will cut it down and make this into a CNN talk for me years something from a bleeding edge which I think can be generated for the serial networks coming up maybe we want to do lightning talks show what they've like shown to we've already had people who've been abusing the models that I made the previous workshops to recognise their family and this course that's a meetup.com is where you can find that we're also looking to do a 10 week 8 to 10 week deep learning developer course this would be it's like a Udacity course but actually in person sessions we'd have some instruction but also projects because what we see is as a hiring person when someone comes to me and say well I did all these courses in university and we had this project because it's kind of meaningless whereas if you've actually done a project of your own that means something so it's like one of the benefit of people standing up and saying stuff of these open source things that they've actually done something rather than just being shefforded into doing a project cost to be decided soon it's not very easy any questions this is a fun question I have a question but I'm not sure if it makes any sense to me you have a question go ahead so the idea that the deep network finds the best features to specification but how do you find a structure or the best structure of a network? it's called the graduate gradient descent it's called graduate gradient descent you know I'm in the structure but basically you throw graduate students at a problem but certainly until you find the right structure but basically this right so I took the endless network from Google's code example it seemed a bit big so I dialed it down but I still had 10 classes I put it down so it was quicker it still kind of works really behind the conclusion that what there is a translation environment so I really have to give that in the network instead of hop for the network to guess it what insight do you have to have a hunch that this could work for speech recognition because when I look to the spectrograms I could tell a difference that's it the visualized could work that was the only intuition then so the thing is if you know what an eye looks like you could then spot that's a tick check you could then imagine passing a window even though of a continuous speech looking for eyes so one thing is to know that this thing still if you can spot it yourself then it's definitely doing more so one of the rules of thumb that Andrew Bowman has come up with for can AI can machine learning do it is if as a human you can do this yes so don't learn a task which you should be able to do in other words because the machine will be able to do that pretty soon so driving a car is going to be doable or recognising words or radiology there are these huge things which are going to be very doable I mean I have things which take a lot of pondering that's going to be much more time this stuff just back to the features question the guy who had recognised his family based upon basically he was using image net errors to recognise his family but the things it picked out about his family may have not been the shape of their face the things that you'd expect to look at may have not been maybe that his wife's eyes were particularly round or had like a nettle song it loves the nettles it doesn't care what the person is because it's going to pick his daughter will always be ahead of the serve who knows what it's really doing you can delve into it but it's not necessarily what you'd think of good features but the fact that you can spot something means it probably won't do a question because both this inception and Lenin seem to have seven structures so there were like sets of convolution drop out and pulling layers and then they have this kind of joint layer where they run these things in parallel so this is also an instance of graduate stream descent that you have got one guy who figured out that this is a good idea and he was then employed the next year like you will still be it's going to be in all the networks so this is where the Microsoft things were the resonant thing the residual network which is basically the idea is you have a fairly shallow network which does a kind of terrible job and then you kind of fix it up by stuffing in more layers but basically you're trying to pull off errors all the time which is similar to other statistical concepts so basically they've been unable to instead of training 152 layers from GEDDO they trained a very small thing and then pulled it out and each stage errors were reducing so this is the other thing to do is kind of a fake number but it is the actual number which gets executed so the Google thing they're probably a bit stuck on the nice modular structure that they have it takes someone else to come up with another brutally different idea this is one last question when we ran through the example that you had with the training set the results that you had your screen was slightly different from what I saw on night so the cat dog fox for example it got its spot on to the cat sample was the cat sample I have to say that this morning before I blew the USB keys I was still fiddling around because I thought I'd set the seeds to fix up the training regime so I'm surprised that our machine and I'm encouraged that our machine is different because it means I've missed out something my question is is there a variation between the learning that Dave's based on so basically all these networks started out with just random filters and you may have seen occasionally you'd see this even in the TensorFlow playground you may get two neurons which you've got five neurons two neurons which you've got fundamentally the same weights and these two lines will just be difficult to un-entangle so they'll just sit there on top of each other right but if they were to start off skew with then you could manipulate them both because they're sort of like co-linear you never get enough grit in the thing to actually make it happen so people with different random nationalisations could have started off all in different places and there's not enough data to force us all into the same place at the end if you had more than 200 examples you could probably get to where we'd be forced into at least local minimum which we'd similarly enough so playing with the noise element or the percentage of noise for example would be useful in some sense right, yes in fact the noise thing was kind of one of the big things which made the difference between the 90s and now in the 90s people we always used to set their weights to be like some small number like there's one of the like the footnotes just oh we'll make it point 1 but what happens is if you set all your weights as being such a small number basically from layer to layer you're shrinking the whole thing like crazy which means that all of the all of the gradients turn to be miniscule and things go horribly when this thing never learns so people learn that you need to make the initial values like certain range and then it all just works beautifully and that was a big big big difference between that's a 15 year kind of difference some it works just like choosing different random numbers one more question did you use pre-initialised networks for training or just pre-initialised? no it's just random form zero and also the data is horribly dirty I mean it's like near my phone the things it works I mean that's the really encouraging thing can you take one more question? shot for me so the accuracy is one because I've only got like 20 test examples or validation examples so either I get the right or wrong it's like a cross-validation error so it gets right if I have a huge dataset then I have much more fine grade control one of the things about the image net dataset which is huge is there's also mislabelled images there's also a human labourer fortunately because of this unless being gave me the wrong the wrong words for what I gave it because I didn't listen to them all I think I have no errors in my labels but a typical dataset will have errors in the labels too and also bad examples so big data on the other hand when you have Wikipedia which is a fairly typical machine learning natural language processing training set 6g of data so thank you