 Ydydywch chi'n bwysig ar y dystod. Yn elfen, yar gennych chi oedd y具ifigau ymlaen ymlaen neu eu wasbwynt o'r pwyllfaaniaeth i'r erbyn, nid i feithio y tu'r tawch ar gyfer Brachium aneddwn dednis arlwyneidol, ac yno ychydig Sam yw amgylchedd ar plio ei wneud gydag llunio'r dweud fel rhan ymarfer yn Ymlaen ar gyfer mewn gyfrifol. Felly, o ddoli'r mewn ffordd, cyfnodd gweinwch yn unigwm rhywbeth, byddwn i'r unigfawr gyda rangyniadol ar y laff yn ymgylch yn y teladol, ac ond efallai hefyd yn holl i'r bod yn rheiddiwyd. Dydyn ni'n byw i'r byw i'r unigwm ryw gilydd iawn yn unigwm rhywbeth. Vitrwch i mi, roeddwn i mewn amser unigwm, gwahog o'r newidol i'r ffordd, roeddwn i fynd arnyf yng Nghyr iawn yn 2013. I spent 2014 just reading papers, writing code, giving talks and things, and now basically since 2015 I've been a kind of a serious machine learning person, having essentially rebranding myself from finance guy to, or finance quant guy to machine learning guy, I've written a couple of papers and this whole deep learning thing has come just at kind of the right time. So what we already covered last month and the month before, just some very basic neural network ideas, we've understood hopefully if you've been there. Can I just quick show of how many people have been to one of these before? Quite a lot. How many have been to zero of these before? Okay. How many of people do not know what a CNN is? That Google guy. Okay. So one of the takeaways from this is when you do training of these deep neural networks, you're actually creating features. Now one of the things about most machine learning methods is that creating features is a difficult thing and people would hand engineer these things. The deep learning, the trick is that just by training this thing on huge amounts of data, it can create intermediate features which are helpful to solve the task. So the benefit of training is feature creation and basically the layers for the depth of the network will create different kinds of features. Essentially there will be a hierarchy of understanding within this network. And if you want to look at the previous presentation, so if you want to go and see this online, if you have a laptop, this is under redlabs.com, which is redlabs.com, slash p. That's the kind of always my latest presentation. So there's also an intro to CNN's presentation there too. So just quickly review what's in that presentation that there was a lot more. The basic idea for CNN's is that pixels in images are organised. So it understands within an image rather than just independent inputs, the image itself has some kind of coherence. So the idea of up, down and left, right are actually meaningful when compared to just a bunch of other features which could be, no, rainfall, sunshine, temperature, time of day. All of these things are kind of orthogonal features. Images, it looks like a lot of pixels but they're not independent. The very fact that they're organised into an image and then colours, this is meaningful. So the idea here is we're going to use the whole image as a feature. So this is, instead of trying to create very small independent features, we're going to use the whole image and look at images as features in themselves. And in order to transform from one image to another, the transformation or the filter parameters or the parameters we're going to use are basically those behind a Photoshop filter. And a Photoshop filter mathematically is a convolutional kernel so the CNN is a convolutional neural network. So mathematically what happens is you have your image which is this thing at the back. You have a little kernel of filters, this is a three by three. And then you calculate the value of this summed across all of these different locations here. And that gives you one answer. And then you scan this across the image, each giving you a pixel out. And basically you construct this entire image. Now this looks like a lot of array operations and that is why you should be using a GPU to do this stuff. Because GPUs are ideal for this kind of thing. Okay, online there's a little example of how to play with one of these filters. Basically this is a little array of parameters, a three by three array. This is my input image. And this is what it looks like after being passed through this filter. Now what is picked out here is it loves these like edges, these horizontal edges. So I just put in these numbers at random. This filter or this essentially you can think of this as a new colour plane. Is something which highlights the whiskers basically. Now this image consists of three planes. This is one of maybe 64 planes or 256 different ways of looking at this one image. And that would be one layer. The next layer of the CNN would be operations on those layers to do with each other. So here is the basic layout of how a CNN like a proper network is laid out. Is you will start with the true life image. And the target is to classify this as being say car versus truck or airplane or however many categories you have. And the first convolutional layer you'll basically have in this case I think it's say 10 different ways of representing this image after passing it through some little matrix. You can then do some non-linearities on that and then you have a layer which then applies to this layer. Similarly you'll essentially do some non-linearities, some downsizing and then apply new layers on top of this and then new layers on top of those. Gradually layers upon layers of these convolutional filters. And at the end you have one big fully connected layer which takes all of this like hierarchy in a hierarchy this is a high level features of this original image. But it's been trained to be as useful as possible to classify this into the appropriate class. And the fact that you're forcing it to be as useful as possible essentially then let's by assigning blame lets you train all the other parameters in the network and back propagation. So when I said that people typically train these on classes there's a very common training task that people do called ImageNet. And this has been an ongoing competition since maybe even the 90s. But suddenly in 2012 neural networks began to dominate it and which is kind of the start of the deep neural network exploration kind of thing. The task here is they have 15 million labeled images and they have 22,000 categories. But the actual game is played on a thousand categories. So basically this image here I believe is hot dog because there's a hot dog with mustard in a bun. And then here are the different classes of foods which are most similar. So I think these are all hot dogs. This is more like sandwiches I guess. So anyway so there's a bunch of different things which is not necessarily that easy for a human to tell. But people know that human level accuracy for getting it right within the top five guesses is about five percent. So as of 18 months ago these computers exceeded human capabilities. So that's quite impressive. The problem now is where do you get the training data to get better. Because the only training data you can get is from humans. And so you have a committee of humans maybe. The same problem will occur in medical images. You can't be that much better than radiologists just because who do you learn from because there's no super radiologist. Anyway people have been playing this game to improve on this task. And this has led to a huge surge in GPU technology network technology. Google versus Microsoft versus Amazon versus whoever else. Facebook by do are all tackling this problem. It's kind of a toy problem in the sense that no one's particularly interested in the thousand classes. But it's a hard it's a hard problem. And typically people will train a GPU cluster for weeks or months in order to submit their final result for this. So one of the nice things that people like Google will do along with TensorFlow is they've released pre trained trainings pre trained weight sets for these along with the network used to create them. So this is a Google network which is a 2014 kind of vintage. Things have got you know got much better since then. On the other hand the size of the networks have got larger and the number of parameters have got larger. Which means that since my little laptop is going to run these examples on its CPU bigger networks take much much longer. If you've got a GPU and you're running one or one or two is fine. But there's definitely a trade off between number of parameters speed accuracy. So for what we're doing here we don't actually need an accurate classifier for reasons I'll explain soon. So basically here we put the image in at the beginning and then we go through different convolution pooling and like collection layers to rival. There's set these green things kind of represent points at which the whole image has been condensed into some featureization and this goes through a whole bunch of these to get to the final output of class. And what you find when you get one of these CNNs this is slightly earlier CNN kind of easier to visualize is this is the original image which is 224 by 224 by three color planes. And then this is expanded out into 96 filters each of which is 55 by 55 and this is then expanded out into this block of features and this block and this block and this block down towards these 1000 classes. Now what happens is in this these beginning blocks basically the things it's looking for a kind of edges and smears of color. But when you get onto these what I call higher level features you're getting textures. So basically these high level things are combining these lower level features to get kind of interesting observations about the image. Now as you go still higher it's beginning to recognize shapes. So look at eyes and noses and petals. So this is a high level representation beyond this. But you can also think of these as being parts of objects. So this kind of stage you've got a crossover between what is being what is easy to represent in terms of pure pixels and what actually is meaningful in a semantic sense. And then when you get to the end you're getting clouds of dogginess or clouds of cat like nature or tractonous which are very useful for classifying. So this is the way in which a CNN which is pre-trained to do the classification task actually learns a much more general visual representation or visual hierarchy. So what we did last time is we did some speech recognition by looking at if you weren't here. I took some speech recognition spectrograms or speech spectrograms and used a vision network to recognize them. And that actually works surprisingly well. So now instead of abusing the vision technology that way we're using it in a visual sense but let's use it creatively. Let's create some art. So the idea for style transfer and all this is kind of the problem statement is I would like to provide a regular photo of something. So that's my photo. I would like to copy a given artist style. So I'll take another photo which is which is representative of that artist. And then I'll try and understand both of these using the CNN features and in some way make a new image which is the combo. I want my picture within that style. And so the basic idea is I'm going to start off with a copy of my photo and then I'm going to change the photo on a pixel by pixel basis. So every little pixels up for grabs. I'm going to change these until it looks right. And I'm going to change it to mimic both the fine features of the original photo so that it looks as similar as possible to the original photo while looking mimicking the stylistic features of the photo. From the artist. And I also want it to give a good looking result in as much as I don't want it just to turn into noise which best fits the image or best fits these first criteria. I want it to actually look like a coherent image. So to go into the kind of more slightly more slightly more mathematical details. So I'm going to start with my actual photo or I could start with random noise but for speed I'm going to start with my actual photo. And then I'm going to update this image by gradient descent to minimize an L2 difference between the original photo features and my thing. And then a thing called the Grammatrix similarity versus some style features. So I'll gradually explain how this is working. And I'll also minimize a self variational loss which is kind of how jagged my picture is. And then I'm going to fiddle with the parameters according to taste. Now the whole taste thing is more where the art thing comes in because this isn't like a magic formula which produces the answer. This is trying to produce art and I'm no artist but there is a certain amount of you have to fiddle around with this thing and there's no right answer. I mean it just has to look good. I can't say well this is definitively good piece of art versus another piece fiddle with parameters according to taste. So now I'm going to do a demo hopefully which will work. And this all this code is available online. So I have a thing called this deep learning workshop which has a ton of notebooks in it. And so under the CNN directory there's a style transfer directory. This example is an inception TensorFlow example. It's been updated many times in the last few days. And so I'm just going to go through this and this is I'm also kind of updating this deep learning workshop. Before it was orientated to become just a virtual appliance so I could distribute it to a whole crowd of people and they could run it on their laptops live. But now we're talking in this kind of format. It makes sense to have it so that you can just load that one file and it will also pull in instead of being within the virtual machine it will pull in all the dependencies it needs for that one file. So this is I mean it will pull down. I don't want everyone doing it here just because it will destroy their network as it falls down get no tens of megabytes of data. But it's there and it will clone stuff for you do all the rights thing to make it make sure it will work. So just hold on a moment. I need two hands. OK so sorry this is the online thing. OK this is what this looks like. OK so this is can anyone hear me hear me like this. Can you hear me at the back. There is a nod. All right so that basically there's a little bit of introductory text and there's also there's some other implementations out there. This is kind of people have been interested in this thing. There's also other versions of this but no need to go there yet. Basically this is alive. This is live TensorFlow. Here we go. So basically this is loaded up TensorFlow the library. And then I've got it. There's a thing called the TensorFlow slim models which I've already got. So basically TensorFlow slim is a little framework on top of TensorFlow in which they've defined all of these all of the Google's latest snazzy models. And this is the one which is going to load. And as I said this has got several different interesting points. And if I just keep going down. So this is now loaded this checkpoint which is basically all the parameters. And this is now saying OK I'm loading this essentially this module which is I could show you the code for it. It defines a whole bunch of convolution layers all connected together in a wild way. So I define some kind of image preparation function and then I'm going to choose some photos. So the only photo in fact so you put whatever photos you want to in this in a directory. So there is a typical photo which people use it's either the lane of photo or this tubing and canal thing anyway. So here's a photo which we're going to use as my original photo. Assume I've been there. Then I'm going to choose some art. So what I'm going to do here is I've got a whole bunch of different art on my thing. I'm going to go through to find a nice one. So I know that this one works quite well. So this is this is so you've seen a bunch of. And this is you will see in this online people like starry night because it is very as very clear features. So so in that let's just go on. And this is basically this is defining the tensorflow graph. So if you haven't seen this before basically tensorflow you build an entire computation. You describe it to the machine. So rather than doing the computation step by step it involves describing the computation and then telling it to do it right at the end. Now the benefit of that is that the Google code can go or the Google smart engineers have made it so it can give you kind of optimized computation. The problem with it is when it gives you an error message you have the error messages are poor because you don't exactly know what line of your code corresponds to what cell in its computation graph. It becomes a bit murky. So what I do now is so I can just I can look at the endpoints within this. So here's basically the named pieces of this network which we're going to play with. And so then I have chosen some of these to essentially I want my photo layers which is in my original fine features to be. I'm calling this mixed 4B and then the style layers. I'm going to choose these other ones which which are kind of different layers within the network. So I've played around with this so that it works. And so this is where the kind of taste taste or not taste comes in right. And so basically I can now do a computer. I can now actually grab those layers and I can show you what's in them. So this is this is a this is one of the layers and you can see that. Well you can't see this is an easy one to see. This is what Conv 2D 1A 7 by 7 is it's basically several filters on top of my canal picture and also several filters from the starry night. But I'm actually not going to use the starry I'm either going to use the top one or the bottom one. So for these these layers I'm either going to choose a photo like layer or a style type layer. And you can see that a different a different points in this network is pulling out different things. So in particular whether I'm not sure whether you can see it in starry night there's a big swirly thing here. Now you can't really see it in these images you can see it here. But this layer loves the swirly thing this things cut this one kind of ignores the swirly thing. So different layers are picking out different features of things it likes. So if you just continue going down here's the here's another important thing. Basically I'm defining some functions to calculate how much loss the content which is my photo is versus the image which is just a mean square loss. Then I have a style loss which is something to do with gram matrices. Gram matrix is just you kind of explode out the features versus each other. And then I compare all of the features versus all of the features. It's kind of a huge operation. But I'm only doing this on fairly small matrices. And then there's this total variational loss. Total variational loss is to make a self consistent picture. I'm just making sure that there's not too much change across pixels up and down and left and right. So it's not all jaggedy. So having done that I then combine these things with some sort of parameters. So I'm just adding these losses up some I'm multiplying by 10 or dividing by 1000. I've played with these until it looks right. And having done worked out this total loss I can then work out the total gradient. Once you know the total gradient you know how to blame the network for the mistakes or the picture for the mistakes it's made. This allows you to essentially I'll start off the photo and then I'll start this training thing. So basically in this this is a I'm starting off another session for my TensorFlow. I am then using a sci-pi function rather than a TensorFlow optimization function. I'm actually running sci-pi this BFGS optimization which optimizes the whole image at a time versus the loss. And you can see that it's turned horrible. So here basically we start off at this one which is 72000 as a loss. And gradually the loss is decreasing to 2200 down and down we go. So this is for 50 iterations of this thing. And you can see that this is my end result at this point which I would call a midway result. I've got my original photo, my starry night and my hazy night. So let's try this again. So the way it's written is this will just execute again if I continue to execute. So we're looking at this number as being kind of the number to go down, which is the total loss. And if you're interested in playing with this yourself, the reason I've got these eval loss components here is that this loss component is how important each of the features is. So this is the how like it is to the original photo. This is one of the style elements and these are the other style elements and this is how much the self-consistency constraint is killing it. Now what we find here is it's actually more concerned about style. The self-consistency is hardly ratcheted up at all. So what I try and do in the thing above is make the weight so that these are kind of the same number just because that seems to be a good idea. Gradually here we go down. It's more swirly now. Let's try again. So I guess now we're getting to something which is more. So the idea here is if you just go online, this is all open source. You can pull in this thing. You might need to install TensorFlow nicely. It's a Python 3 thing. Basically you can put in your own photos of your family, of your dog or your cat or whatever. And then put in whatever style you find on the web for free. And gradually you can just get this thing to train and train and train and this will turn into a nicer and nicer image. So it's all for playing with. Playing with this stuff is the way to learn. So some quick examples with apologies to the artist. This is if I run it a bit more. We've got a starry canal. This is if we use this Mediterranean Picasso picture. This is the Mediterranean canal. And then if I use this Cezanne, we get a Cezanne cat, which is kind of a nice portrait of Tabby cat. So this is kind of just one example. It's kind of the intro example. Sam's got something kind of cooler. This what people have been doing is there's also a way of using this in a direct mode. So this for each pair, we have to do an optimization to get to the final image, which is a long process. But given the original and the final image, I could then use that if I used lots of originals and computed the finals, I could actually work out the transformation to go from Tabby cat directly to this without worrying about Cezanne in the middle. And then I could pass in this will be a single transformation. I could train a CNN itself to trans do this transformation. I could actually then run it on video. So people can do this in your phone or online in real time. But the genesis of this was about a year before when people worked out that this could even work. So the field is moving forwards very quickly. So as a wrap up to these CNNs, these have rich features. That's why we're going to abuse them. Turns out you can do some interesting stuff. If you want to play with this stuff, having a GPU is super useful. This is part of a thing on GitHub. If you like it at a star, I would like that. And we'll have some announcements at the end and that's me. Thank you. Sam, why don't you set yours up?