 Siwogi'r maen nhw'n go iawn i gael eich bod wedi chi'n gwybod y cael ei ddweud awg. Yn y mod i ddweud llawrwch. Mae'n dweud ni'n syniadu rhoi'r blaenoriaeth lleol, mae llawrwch ar ddweud llawrwch ar ddweud llawrwch. Mae'r ddiuweithio y gallwn achos yr eich chi'n gyf11aeth cael eu ddweud llawrwch ar ddweud lleol. Rym ni'n gweithnu'r hyn o'r blaenoriaeth gofa, rym ni'n gweithio i llwymu'r hyn yn cyfwml, I've been doing machine learning startups finance in New York for a long time, 2014 I moved here, I had a fun year and basically I was doing machine learning, deep learning, natural language processing, a bit of robots, a bit of drones, since 2015 I've been working with a fine local company doing serious natural language processing, deep learning, I've written a couple of papers and I've been having fun doing this kind of stuff. Quick outline, I'm going to talk about the tools that we use, dense networks, CNNs, RNNs, embeddings, just go over it very quickly, I'll then talk about what the captioning problem is, I'll talk about what sequence learning involves, talk about embeddings. Now in fact what I started out to do was to make this whole talk about how I would play around with the embeddings, what can I feed in, what can I pull out, what would work best because I kind of find the 27,000 neurons output is kind of an embarrassment. It would be nicer to do something a bit cleaner in my mind, but anyway unfortunately that stuff just didn't work. So on the other hand what did happen is that apart from the LSTM or the standard sequence model I decided to let's do something with CNNs because there's a nice deep mind paper which was kind of I guess November, that would be an interesting thing to do. Having done that there was a nice Facebook paper maybe a month ago of attention with CNNs so I was like okay I'll do that too. And then last Thursday Google came out with a great paper called attention is all you need so I just had to do that. So this talk has turned away from the embedding thing which was kind of an itch I wanted to scratch to a let's do some like the most state of the art model. So there could be a demo with the voiceover though I may well run out of time. All this code is in GitHub, open source, you may see quite a few updates a day, quite a few updates a night. So quick review, this was done previously in some of our meetups. On Redcat Labs in the presentations folder there's always stuff to look at. If you go you'll find it. So quick show of hands, does everyone know what this slide means? Does everyone know what one neuron does? Yes? Okay. We're going to combine them into multiple layers. Does everyone understand this? The trick is how do you train these hidden layers? That's pretty tricky. You've got your inputs and you've got what your outputs are. How do you train the bits in the middle which you don't know what they should be? So has everyone played with this TensorFlow playground? Yes, quite a lot of nodding heads. If you haven't this is very cool. Basically this is a Google thing, JavaScript. It allows you to play around with some inputs on this side. Here's your desired outputs on the other side. And then you can just fiddle around with how much stuff in the middle. You can press the big training button and off you go. Works very nicely. So the main takeaways from all that, so that was the introduction to dense networks. What we're trying to do is we're predicting output for a given input. We train it by using tons of pairs of input and outputs together. And we're going to play a blame game. This is gradient descent. This is the application of the chain rule. When we have an error in our output, we're going to say, well, why did I get that error? And then adjust all the weights through the network to try and fix up that error. So the next time I get the same input, I get something close to the right output. I'm going to do that tons of times. Tons meaning millions, hundreds of millions. That's gradient descent. The neat thing about these deep networks which never used to work is that they create these features. So this is something where in order to create the answer or the output properly, I would really like to have some higher level features. So in a sense, the English sentence input needs to be converted into this meta language vector. And now I can do something with it. And to do that, you have to be able to get to higher levels, which evidently linguists can't do as well as machines now. So processing images. So this is my introduction to CNNs. When you've got an image, the images are kind of organised. So the idea is, well, let's make the whole image the feature and what will apply is something like Photoshop filters. So the mathematical term for this is a convolutional kernel. CNNs, a convolutional neural networks. This is a picture. Is everyone familiar with this picture? Basically we're applying a little kernel all over this picture, the same little kernel everywhere. So instead of having lots of individual weights, this one kernel will scan the whole image, essentially converting a picture of a cat into a picture of a cat with highlights. So there's a little tool here, you can play around with that. Okay, that's my introduction to convolutional networks. Processing sequences. The thing here is that variable length input doesn't really fit into any of these previous things. So we're going to run a network for every time step, but use the same network at every time step. So when we learn one of these mappings for something near the end, we're going to apply that same thing all across time. When you actually run this network, we're going to pass along some kind of internal representation that it makes up all for itself. These are kind of high-level features which it considers as time-wise. The idea is that if you can train this enough, it's going to learn features that are useful, and it's going to do that because you can still differentiate everything. So here's a picture of basic RNN, which as Sam said, this is basically you use one network and feedback this internal state to itself, and the way it looks when you roll it out is you're putting your inputs here, which is how are you, question mark, and your outputs are, you know, interrogative, whatever, verb, whatever it could be. Basically this is a one-to-one mapping of it could be part of speech, it could be no, it could be, but the question is for this, how do we produce something which isn't one-to-one mapping? Now each of those little boxes where I said RNN, in fact I'm going to be using grooves, which are gated recurrent networks, which are composed themselves of lots of little boxes. There ends the lesson on recurrent networks. Word embeddings. This was a major advance in 2013. Basically the idea has everyone heard of word embeddings. Does anyone need this? Everyone seeing this? Basically if you scan text, words that are close together in the text should be somewhat similar to each other. So you initialise a huge random vector for every word in your vocabulary, and for every word within the window, you nudge them together, everyone which isn't, you nudge them apart. You keep iterating until it's good enough. This may take hours or days. The cool thing is that this vector space self-organises, and for instance, you can produce a very nice visualisation using TensorBoard, which you can go back and have a look at our RNN thing which was last month. Basically it will, similar words will be in similar places in this vector space. Basically this is the first step you'll do with basically any language task because you may want to put characters in, but words are a good unit of thing, and if you can put these into a pre-made known vector space so much better, it's also possible to work with characters. But even with characters, it helps to have a character embedding. That's embeddings. So here is the task we're going to play with. The idea is we're going to take an image and we're going to try and generate a caption. Here are some sample captions, so this is even more obvious on here. Here's a dog, here's a bush, here's a car, here's a hose, here's a stream of water, here's a hose on the green grass. These are human generated captions, a large brown dog running away from the sprinkler in the grass. A brown dog chases the water. So these are produced by humans. This is what a good caption would look like. And what I'm using here is a dataset called Flickr30k. Basically, you've got 30,000 images, all annotated by humans, and I've got about five decent captions for every one of them. And it's downloadable for free, but I can't hand it out because you need to send them for an email link or whatever. This is the paper. You can find it on the web pretty easily, but not on Torrent. So what I'm going to do is I need some... in order to featureise this to move this into something which I can handle, I'm going to featureise the image here, and basically I'm going to use Google's Inception v3. Now this is something where you can pull down a model. This is a big CNN. Basically you put in your picture at the beginning, it chugs through this, and you get, essentially, particularly if you cut it off here, you get something like 2,000 features or 2,000... This whole image turns into 2,048 numbers. And that is enough to do the classification task or whatever, but we don't want to do classification, and we're just going to use that as the truth about the image. For text, what I want to do... because I wanted to look not at the... like an industrial scale kind of application, I wanted just to get some nice results, I'm going to make sure that all the captions are learnable. So I'm only going to use captions which are... have common afterwards. So I'm going to throw away every caption which has a word which doesn't have five images associated with it. So if there's only one image of someone in a kimono, I'm going to throw away that image entirely, because I won't ever be able to learn enough about kimonos to do that. On the other hand with 30,000 images, I think kimonos is probably in the vocal. So... so there's... also I require all words to be in this 100,000 word type thing, and I also want to make sure that all the stop words which are really common words at the beginning, now I'll explain why. So... well, will it... I will explain why. So here you can see a link. There is in my deep learning workshop which is on GitHub. In this CNN, I'll explain why, captioning thing, there is a folder images to features which will just featureise every image you have in a folder and just produces a nice blob of data for you to load in. Similarly, there's something which will just take all these captions, throw away stuff, build the embedding and give you hand you back a vocab with everything in the right order, and these things just kind of laid out because it simplifies the next step a lot because then you can just load this stuff and it's ready to go. So... what I want to do here is now explain the things which I didn't get to. See content from networks, explain the word by word which I do for testing, teacher forcing, training, these are things you've heard before from Sam, I'm a glutton fashion, embedding choices which I did want to talk about but it's not going to work. Generating sequences. Basically, here we have some, this is what I'm going to do, I'm going to put in somehow these image features at the beginning. Magic will occur in this box. I will kick off this thing with a start symbol and it will then pop out something like word one. I will then take that word one and say well actually mean word one, pop it in, more magic occurs, I get word two. Keep going, eventually get to stop. At which point that's the end of my output and my output will be word one, word two, the end. So this is when at test time basically this is the run you go through. But at training time actually I know the caption that I want and I know the features from the image. So basically I've got known at the left hand side. I've got known stuff at the bottom and I actually know what the word should be for the loop just by shifting my inputs over by one. So because of that I've got a fully known network, I can just I've got a fully known everything around the edges so I can just force all the weights to converge to what they should be to make the magic work simultaneously. This feeds things up. So here's a quick thing about the embedding choices. You can think about word vectors, which we've talked about. You can think about one hot embeddings which is you just have a number which or a position within a vector which is the word. Or you could actually numeric. It should be numeric index. So basically you could just say I want the number 28 to be the output of my network. I don't want to say it's the 28th vector position I just want the number 28 maybe in binary. Let me just quickly one hot. There we go. So word vectors, fixed dimensions, great. Independent vocab size, good stuff. The problem with stop words is you might get a very murky embedding because the embedding for the is kind of in between lots of words. What is the really? Its cat should be near feline and dog and I don't know, burglar, right? But the is something different. So maybe pure word vector for the is going to be very difficult to guess. Also these action words like start and stop and I'm never actually going to have an unc by construction. But the padding I've got to have some kind of embedding for that. I've got to make it up somehow. But these things are often used as an input stage by many people. The one hot thing if I've got a vocab of 7000 words which is what I have here I've got to have a vector which is 7000 long. It's a very large number of things which are going to be like 1 and 0 or mainly 0. Often used as an output stage basically the index of my word is going to be the max arg of a softmax function. So this is the keras kind of thing that you'll be doing. Now this is kind of crazy. If I could actually output just the number of the word I would only need 14 outputs to cover 7000 3rd cabrary. So it's difficult to believe it could possibly work. Because how could 14 instead of 7000 options I've got 14 things. I have to get it exactly right. If I get any digit wrong I've got completely the wrong word. But it does work because there's some Japanese guys who published a paper maybe a month ago that made it work. It's insane. There's some error correcting code. I would love this to work anyway. So this is what kind of attracted me to doing this let's test out the embedding. So wouldn't it be cool to not have 7000 outputs? I would love to have 14 outputs. Or we could do some combo. This is kind of where I ended up. Basically you'd have one hot encoding over at the action on stop words at the beginning. And then I'd have slapping dimensional word embedding at the end just concatenate them. I'm going to use that as my input stage. So basically if I say the it's going to be one hot position 20. But if I say kimono it's going to say I'm going to use the extra embedding which is going to be an extra position 6 will be use the extra stuff. And then the kimono will be in the embedding at the end. So let's just do some extra machinery because I said I'm going to be doing some new networks. Who's heard of dilated CNNs? Yes? Yes, okay, some, some. Batch norm? Some, some. Residual connections? Okay, we're kind of moving through time in a way. Gated linear networks? No, gated linear units? Yes? No, probably not. Fishing nets. Actually that was a test question because fishing nets are used by fishermen. So attention is all you need layer. Okay, this is pretty new. This was announced last week on Thursday. So dilated CNNs. This is a thing from DeepMind. They announced this WaveNet paper less than six months ago now so or maybe not. Basically you can find this in Keras with this dilation rate. Basically instead of having a convolutional layer which just looks at a little patch you actually expand the patch by having gaps in it. So here you're just looking at you know patches just below here. So this one looks at patch directly underneath it side by side. But as you move up here it's looking at wider and wider strides. The point of doing this is that if you look at the receptive field behind this neuron it's actually getting exponentially big. Now why would you want to do that? If you just had a regular CNN basically the receptive field gets big just as how many layers you have the receptive field gets that big. Here you would have basically two to the end kind of receptive field. So if you want a really big receptive field you might want to do this. And when would you want to do a really big receptive field when you've got a lot of history. And when might you have a lot of histories when you use sequences and you don't want to use recurrent neural networks. Batch norm Sam talked about the activation parameter explosion problems if you've got very deep networks. One way you can solve this is just to add new layer which learns to scale your activations. So basically it tries to squash the layer into just being like a mean zero standard deviation one. In Keras there's a function batch norm you slap it in there it will fix it up for you. So there's another idea and so basically it takes over a whole batch it learns to fix the batches up so they're all of the right mean and variance. But doing that means that you never have a weight never have an activation which will be always zero because it's always been blown up to be into a decent range. So it's much easier to learn from. There's other things called layer norm not to do that yet. Residual connections this is a thing from Microsoft in their ResNet paper which was probably a year ago maybe 18 months. Basically the idea and so people are playing around with this thing basically now these skip connections are very common. What you do is you have your input layer and then you have what you would ordinarily have like a ton of stuff happens to it and then instead of just giving the answer you add back the input layer with a huge skip you just add them together. So either if I'm now training this thing which will be going backwards it would allow me to skip my training down through this really easily because this is just a one wire and also train this network. So the nice thing about these residual connections is it allows you to do much deeper networks without having the multiplicating multiplicative weight problem because you can always just miss out the layer and get back to the earlier and earlier stages. Gated linear units oh it hasn't turned out so well. Basically you've got this is a Facebook use this so this is what I'm explaining it in their convolutional thing you have your input you form two new layers and you either use the layer with a linear activation or you squash it with a sigmoid thing and then you just multiply them together. So you basically you've got your input is split into two different kinds of stuff and then it can kind of switch itself on and off this is kind of funky but it works so that's why people are using it particularly for natural language. Now here's the thing from last week attention is all you need layers. Basically and this is it gets kind of complicated so Sam explained that you would have a 30 step by 100 value input for your one sentence so each position would have a value of 100 things right. What I would like to do is choose what position I'm using and what should I pass up to the next layer which will then also have 30 30 word positions and 100 dimensions but I want to know which one to promote up. Now it may be that I just want to copy it in which case I would say choose this one choose this one choose this one choose this one but what these guys do is they convert the 100 layer also into a query and a key value now the key value says how much does do what is this word about and the query is what do I want in my position to feed up and so you then essentially do a big matrix multiplier of all the queries with all the keys and then softmax and then feed up where it is now basically this is giving you attention over your input feeding up to the next layer. This is something where you have to read the paper several times to see what an earth is going on. So the reason it's called an attention is all you need layer is that they don't really have anything else in the network they have attention on attention on attention these things are just looking up each other all the words look at all the words they manipulate whatever they need and feed it through not only do they do this so that's the simple version where you may have this multi-head attention whereby this single layer actually you have multiple heads multiple queries passing up multiple versions of feed and then you just slap them all together the crazy thing is so once you've implemented it which I've now done in Keras which I guess is the first time anywhere because no one else has had a chance right once it works it just kind of works you just slap it in as a layer there we go so this is very hot off the press now even though by LSTMs a state of the art everywhere these guys would claim that it's not no longer true anyway this is why I wouldn't this is turned into models a day so here's some network pictures basically this is a grew thing so this is the basic network so I've got my notebook has four different models in it this is the grew version basically you take your inputs you pop this into some a bunch of grooves 200 wide and outputs you're going to get a softmax you know argmax kind of one hot thing to give you a word you then pop this word in here this is grooves network 2 this is the deep mind like thing with a dilated CNN basically I'm going to treat the sentence not like a recurrent thing I'm going to treat it like a picture and just do this using CNNs only so the whole thing is a picture I'm mapping words to words it doesn't sound like it should work that's why I wanted to try network picture 3 so this is using the facebook thing which they fixated on which was this gated linear units so instead of using CNNs I use whole rafts of CNN-powered gated linear units so let's see it's kind of radically simplified but this is another thing network picture 4 so this is basically the layout of the Google translation thing so they were putting in English and getting out French they would take in their English sentence apply some kind of clocks onto it I didn't mention clocks and then this thing would just attend to itself a lot and then output some like a field of stuff that might be attended to by the other thing and this is the output and this is just one flow through so instead of having like recurrence or anything this is just outputs from the bottom they would have six bunches of these layers it was in order to get to state of the art this is a pretty complicated thing on the other hand once you built the modules you just glue them together this has got way fewer parameters than these other models what I'm going to show here is basically I I got a picture so I didn't need this thing I only tried one layer so you can try two layers anyway so this is the stuff which works it's basically my GPU has been running continuously for the last week trying to get more and more results out and I can do a quick walkthrough but let me just okay let me I can do a walkthrough except it's probably getting late I think it's getting quite late basically and we're also this is partly to do with what we're going to do next time is that they're going to probably have a tips and tricks kind of session there's a ton of tips and tricks in this notebook I can whiz you through it just to prove that I did do something for today and let me do that right now it's online so you can have a look yourself just hold on now so this is what the so I've it has some documentation it's just crazy can you see this so basically because because I've saved all this stuff all the caption stuff off I can load in in one go here's all the features I can load this all in one go and then this basically I've got a ton of imports because Keras for the win and I can and then start to create this IO stages so this is where I want to play around with the different embedding styles so what I've done is I created some da da da da representation classes so basically it allows me to pass in a class which has encode losses and decode functions in it so this means that I can abstractly pass in what I want as my embedding and the network will just do it because I wanted to make it pluggable because I wanted to try every kind of embedding in with every kind of embedding out and fill around with this the code's all there and it does work but does it work not so much I've got some interesting results so basically I've got some test code just to make sure that what you pass in and then if you encode something can it be decoded for this one embedding there's also a lot of subtleties associated with this so now let's go on to the models so here is essentially the RNN captioner so this is captioning based upon recurrent neural networks and I just have an RN is a GRU and then I output a time distributed dense so basically this is making the right kind of thing I do one layer of GRUs I could have multiple layers of GRUs whatever so there's by modularising it like this the actual models turn into pretty simple code so this CNN with dilations thing this is the deep mind paper basically there's some features stuff stuff stuff basically the core of it we've got this kind of CNN is the conf one D of CNN lots more lots more piles and piles of CNNs all with this kind of dilation factor over here honestly there's a dilation factor so you can see that these dilation rates I'm doing this kind of I have a single word at a time or skipping words or word by fours or word by eight but it's only really not words at that point it's creating new new features that it wants ok here's another one this is to do with the gated linear units basically you can see it's got the A and the B it's multiplying them together there's a residual thing there's a bit of batch normalisation and then I use this thing for a captioning thing basically I take this and I dilate it and I add them up and it may seem scientific but it didn't happen that way attention is all you need so basically there's this attention layer is a bit of a brain strain but then once you have this attention you need captioner you just say I want an attention layer and I will then residual anormit so the nice thing about Keras is it encapsulates all this stuff really nicely and if you write this stuff I haven't delved into TensorFlow at all I have zero TensorFlow dependency in here and in fact the Keras embedded in TensorFlow lags too much to be used for this so you need to use separate Keras because because Keras I think Shalai is pretty badly over pressured with stuff to do so fair enough so having defined my four different models I can now plug them in basically here's my embedding idea what I wanted to say the whole reason for doing the embedding classes at the top was so I could say here's my embedding input and my embedding output should be a one hot with embedding and a full one hot so I just wanted to be able to say here's what my embedding style is so I could then flip it around if I wanted to and I can then here's my model choice so basically here's a bit of a here's a bit of a thing so basically I can choose a model they all have the same input output and I have this model compiles it produces some of these captions okay let's not run it because you'll see maybe if I just wanna go up to above here okay while that's cooking okay so here is an untrained network you'll see that it's first output is cables burning gracefully pin shine soups arranged marshy solar boards so this is not a very good description of the image so I'll just do the models in order basically here's my typical training regime so I've got a 141 dimensional input for my stop words and action words with a 50 dimensional embedding so it's 191 my output is gonna be this 7000 vocabulary softmax internally I'm gonna standardly use like 200 width stuff 100 layers of cnn sorry 200 channels of cnn but then a whole bunch of layers depending on which model I'm running I don't actually play around with the learning rates at all because that's seems too enthusiastic to me 50 epochs of this so 50 runs through all 30 or 150,000 images 150,000 image caption pairs takes about three and a half hours on my now two-year old, two-year old, Titan X so I'm very fortunate to have that on the laptop one epoch takes four hours so so here's the grew version a black dog running on a park two big dogs play ball across the grass so okay you'll see a bog chases a ball while a man in a vest holding a hand okay so it's clearly this thing knows something about something right but sorry this is a test image so it's never seen this image before and it learnt to do this in four hours so it's gone from zero it's only knowledge of in English languages like what words are similar to each other it's only knowledge of images is which image net class does this belong to but not even that because I stripped those away so from just some numbers to 2,000 numbers from an image and a whole bunch of numbers on the words it learnt to write these sentences which is kind of cool in four hours so here's another one so this is the dilated CNN a brown dog is standing on a yard so I've got a ton of training data I can see how these performed kind of numerically one dog bites another baseball player who's found behind in the background okay this is a bit more consistent that it knows about the dog and the park and the running and the playing so winter grass I don't understand what's going on a gage of linear unit one so this is kind of like a Facebook grey dog is running on a grass field a dog jumping over a bush I don't see the bush but it's got some good ideas a dog on unleash is near a fountain so now it's picking up actually more things a brown dog is running through the muddy rain it's kind of interesting a one dog with a brown jacket is playing around so I'm also saying that this is without any language model so this is just raw that this is the five first things that came to mind at random so if I had a language model some of this stuff would look wrong and it would just say throw away these things or if I had like beam search or something a lot of these papers don't mention too much the beam search and stuff that they're doing or detail but it cleans up these models like a lot so the attention is all you need I wish this worked better but it doesn't quite yet so two dogs play in the grass my guess is it thinks this is a dog but anyway two dogs race by the two dogs fighting to a grassy yard brown dogs on either side of the fire so here it hasn't got the idea of that the water at all whereas the other one so at the moment Facebook is in the lead so any units there we go so as a wrap up this whole session was more challenging than we did before so this is sorry just in general because we wanted to have like an advanced thing the question is if you're a beginner who's kind of like whoa do come back because often the beginner stuff is more targeted to you on the other hand if you thought that was so cool that we need more of that thing then yes we should do more of that thing you should tell us whatever if you're so into it that you read these papers and want to research then there's really cool guys who are starting a pie torch meet up that might be interesting to you but I can't really talk about that here I didn't mention it in fact because it's not tense of it there's tons of innovation going on in natural language processing so LSTMs seems so last year now this stuff seems so cool but who knew CNNs could be useful and then attention everywhere that's even more crazy but having a GPU is you need a GPU you can definitely use fantastic GPUs or TPUs that Google will offer you on a cloud but you'd certainly need to spend money on that or you can spend money on an Nvidia card maybe AMD will help soon don't know about that a lot of deep learning people will have just something sitting beside the desk to keep the room warm and then when they're ready for real production they'll put it into the cloud this my stuff is all on my github for this my KPIs please add a star I'd love the star and questions go no no it's one caption at a time I just generate five examples because five is more illuminating than one that's all yes yes at the top of each of these is a softmax so while I'm trying to during training I'm trying to force it to the correct word during testing I've never seen the input before I've no idea so I start with the word start and then I say well what word do I feel like next and there'll be a distribution amongst the words maybe A A is a very common first word or 2 seems like a common word but basically it'll flip a coin between A and 2 and then say okay I chose A and I say okay if I chose A as my first word what's my second word and so I keep flipping coins amongst the softmax and if it's determined just to do one particular word it's an easy choice but sometimes it doesn't know what much and then I get really huge variety one of the problems is if you don't get much variety it might just say concrete concrete concrete concrete because if the word following concrete is concrete then it will just keep saying it again and again so there's some interesting failure modes in here but yeah does that suggest no I just essentially like I do a weighted Monte Carlo kind of thing I just it may pick an extremely unlikely word it's extremely unlikely anything else maybe I should we'll do a gaggle thing I guess I've got some announcements so the deep learning meetup group if you don't know what this is you should probably join it our next meetup is on the 20th of July we're going to do something like tips and tricks just because doing this huge modeling exercise is huge whereas tips and tricks we've used these throughout and so we can show you a ton of tricks and tricks but also all of you are probably doing something interesting which would have tips and tricks which everyone else would find interesting we've had a fantastic talk today there are lots of people playing around with this stuff if you're enthusiastic even if it's just you can maintain enthusiasm for five minutes everyone would be happy about it right and then you'd actually have on your resume yes I gave a talk at the Deep Learning TensorFlow and Deep Learning Singapore group and we are apparently larger than London now so we've done pretty well in fact everyone here has done pretty well I was going to say on Saturday we have a one beginner thing 9.30 is when it's starting now just in case anyone wants to know it will be playing with real models one question is $15 just for lunch but it's now full so this is redundant the 8 week Deep Learning course we're embarrassed to keep saying July but with no actual fixed numbers this is going to be we're desperate to make it happen but we're also desperate to make it happen so we're thinking of an 8 week developer course it would have three hour sessions each week probably also have another clinic kind of session where it wouldn't actually be taught but coming along with your laptops we'd be there saying why don't you try this because a lot of it's going to be project based and I'm not telling you the price that's your question is it possible to reserve it in advance? not until we know the price I guess right the idea here is the projects are kind of important because having structured products or university or Udacity all these things are fantastic except that if you turn up to an employer saying I did everything that was required of me that's kind of lame whereas if you did Kaggle's pretty good then you're actually on a world leader board on the other hand if you said I was really curious about my dog's feeding habits and I built this neural network and I feed her then you've actually done something and you can talk so much more authentically about it so that's what we want people to do we don't want people to have team work or people will need a project they want to do it could be anything as you've seen you can do any crazy stuff with this stuff but then you'll really know it and really knowing it's kind of the key to doing it like as a job if you want more information there's redcap.com there'll be a form to fill in we may think about putting some more structure around this but this is kind of to watch here kind of thing and this is not going to be easy it's also probably not going to be cheap but as I expect to work hard we want to make it work questions and also home time maybe we should say go home or stick around the front and the speakers will be here and we'll answer questions ok thank you very much