 Stuff new to deep learning. OK, quite a lot of people. All right. How many think that you're super experts or researchers? OK, a few. And intermediates? All right, another few. OK, so the fact that there's quite a lot of beginners in the room, how many of you have actually heard of a GAN or a general adversarial network? How many of you have built a GAN? A lot less hands. All right. OK, so tonight I'm going to be talking about GANs. And hopefully by the end of the night, you'll be able to go home and at least build some simple GANs to get started. So my name's Sam. I'm a Google developer expert for machine learning. And tonight I'm going to be talking about, since it's quite a beginner's audience, I will sort of approach the topic from this from a little bit more simple sort of explanation. And we're going to talk about general adversarial networks. So first of all, what is a general adversarial network? So Yang Likun, who's head of Facebook AI research, described it as the most interesting idea in the last 10 years. And you can see that one of the big things, just like we saw in the previous talk, is this whole concept of when you don't have enough data, being able to manufacture data by using the data that you already have. And this is what GANs do. And basically the key thing with a GAN is that it learns to look at a distribution, a probability distribution in a different way than we normally would with, say, a classifier or something else in deep learning, when we're just trying to predict a classification or something like that. We're much more interested then of just getting it right. We don't really care that we get the actual distribution right in that sense. Whereas with GANs, we're much more interested in being able to reproduce a distribution. And you can see that if you look at some examples here, we've got a whole bunch of training examples. And then we want to be able to reproduce images that are similar to these images. But they're not going to be exactly the same images. So we're building generative networks. You can think about supervised learning as basically all sort of deep learning is basically f of x equals y. So a function with our data in it will produce some sort of y as an output. You can also think about it as the probability of y given x without x being our input data in that sort of sense. And this is very sort of standard machine learning stuff especially even for deep learning classification. When we talk about general adversarial networks, though, we're looking for different things. We're looking for basically reproduction. And a good example of this is where sort of this whole f of x sort of thing tends to fall down is that often what it does is it sort of produces a mean. Now, if you think about it, if I have a pen and I hold the pen up here and I balance it nicely, what's the probability that it's going to fall to any point and it falls down? Or what would be the f of x of that if you were predicting the probability of where it's going to fall? Really, what sort of general machine learning would do would predict the mean, which would basically be sort of like particles spread right around the 360 degrees, but the actual pen wouldn't be in any one place. So this is where GANs are different. GAN doesn't want to do that. What it wants to do is to be able to show you here is a real example of how, say, the pen would fall and you would see the whole pen in one particular location. So a good example of this in one of the first papers with GANs showed was this whole concept, is that when we're trying to predict the next frame of a video, using sort of traditional ways of making a prediction, we don't get very good results. We get sort of this mean concept. And if you look at here, we can see that on the left, we've got the ground truth of what the next frame was. In the middle, we've got this mean squared error sort of thing. And you notice that the ear is all blurred. And it's really not good at being able to predict what an image is going to look like. Whereas an adversarial model can actually predict a distribution that is going to be specifically true for one instance in that sort of distribution model. And this is what the whole sort of concept of GANs is about. So they're originally introduced in 2014 by Ian Goodfellow, who's now a researcher at Google Brain. It took off a lot in sort of 2016 and also 2017. I actually gave a talk about GANs about a year ago here at Google. At that stage, a lot of this stuff was still very new. And certainly in the past year and a half to two years, we've seen lots of types of GANs, lots of variations of GANs. The DC GAN, which is, I think, what you were using to make the eyes and stuff like that. We've got a Vastastan GAN, which is basically some tricks for training GANs better. We've got things like Stack GAN, where we can basically take an embedding of text. Say we take an embedding of something that's text of. And then we can use that text as an input to basically predict an image. So we can do things like, and you've probably seen this in the news last year or the year before, of things like being able to type into an algorithm. It's a red bird with a yellow chest. And the model is actually able to make an image of a red bird with a yellow chest. So it's basically there. What it's doing is taking an embedding and converting that into the GAN and then predicting the actual instance of that. There's lots of GANs. One of the best ones that probably came out just at the end of last year was the thing called the progressively growing GAN, which really achieved a lot of good things in doing things with really big images. And it's probably one of the most accurate GANs to date. So what is a GAN? If you're new to deep learning, you probably don't understand what this concept of GAN but actually it's very simple. It's basically a model where we have two separate models fighting against each other. And you can think of it, often people talk about it as sort of like counterfeiting money. So you have the counterfeiter and then you have the police. So the counterfeiter's job is what we call a generative model, is to come up with new versions of counterfeit money, right? The police job is a discriminator. So the police is basically there to check and say this is real money or this is fake money, right? And here's the thing, the whole reason why this works, and just showing you like what we call a generative discriminator, the whole reason why this works is that as these two parts of the model train, they get better and they get better at producing it. So if you think about it with the analogy of counterfeiting, as the counterfeiters start to get better at making counterfeit money, the police have to get better at detecting it. And as the police get better at detecting it, the counterfeiters have to get better at counterfeiting it. And the ideal solution is you get to a point where you've got these two networks fighting it out and they're trying to sort of, both networks or both parts of the model are trying to win. And the ideal situation is basically an equilibrium where it becomes sort of like a flip of the coin of whether this thing is real and comes from a real distribution or whether it's fake and comes from the generator. And so the whole thing with GANs is we wanna balance that. And for those of you who know a little bit more, it's what we call a minimax game, right? Where we're basically trying to optimize. And this is a good example of what we're trying to do is we're trying to optimize this saddle point. And we wanna be right on that point of the saddle where it's sort of point five, where basically we've got our generator being able to fool the discriminator 50% of the time and the discriminator being able to detect the fake copy is 50% of the time. So that's the ideal. Now obviously, when we're training them, we don't always get the ideal, right? And one of the big problems that early on was this whole, you know, it's very hard to do. You can sort of see that what we're trying to do is basically make loss functions that sort of situate and balance the loss of two networks. If either part of the network becomes too strong, it will just dominate and it will take over the other part. So for example, if you've got a bad discriminator, you've got a bad policeman, he'll let all the counterfeit stuff through no matter how good it is. If you've got a really strong policeman and not a good counterfeiter, then nothing will ever get through. And therefore your generator will never get good at actually making new images. So we talk about this sort of concept of adversarial loss. And the thing to think about here is just really that it's basically a loss on the sort of two predictions. One where we're trying to, one side is trying to minimize the loss, right? Obviously the discriminator. And then the generator is trying to maximize that loss by fooling the discriminator. So this is an image from Ingrid Fellow that basically sort of shows exactly how the network works. We put in some basically some sample data. So basically we just put in some noise to kickstart it. And then the model will take that noise and reshape it into an example. And here's the thing is sometimes we give the discriminator real images and sometimes we give it images that are made by the generator. And it has to determine which ones are real and which ones are fake. So if it says that the real image is a fake, it gets penalized, right? If it, you know, and vice versa, right? If the generator is basically not producing something that's convincing enough, it also gets penalized. So this is where we're trying to get to this equilibrium. This whole sort of equilibrium that's going on here. So the beginning, the sort of the first one, and what nowadays is really called the vanilla-gan, it just basically uses this very simple concept that basically puts some latent noise, which we call Z, into a generator, which we call G. We also then take some real data and stick that in. And then we sort of randomly present these to a discriminator who has to decide are they real or fake? And then we basically use that to basically score our loss and update our weights to get the model better at being able to take that noise and be able to reproduce images. So really the big thing to understand here though is what we're most concerned about is the generator. Well, actually sometimes we want to use the discriminator too, but the generator generally is the one that we're most concerned about because ideally what we want is we eventually want to be able to just take the generator as a module, snap it off, and then just say, right, make me 100,000 eyepitches, like for the previous problem, and have it be able to produce enough images that are realistic and have the same sort of probability distribution as a real set of eye images. There are times too that you want to actually use this to create better discriminators for certain things too. That is possible. Okay, so we basically take the generative loss and the discriminative loss, we add those together, that's our total loss of the network. Obviously, we flip the generator's loss because the generator's actually trying to push the loss up, so we basically flip that and then we get a total loss and we then use that. But here's the thing, in GANs, we don't think about losses in the same ways that we do in say when we train a classifier or something like that. You'll find that often the loss when you're training a GAN really doesn't have a lot of meaning, so people will think that, okay, well if my loss is not going well, something. You want to make sure that the losses are being balanced. That's the key thing to happen here. Okay, so let's jump in some code. I'm gonna show you a sort of more advanced way of how to do GANs, and then I'm gonna show you an easy way to how to do GANs. So this is a data set that's called the quick draw data set. I don't know how many have played with the app, quick draw. If you haven't, go and check it out. It's a very cool little Google apps, quite old now. But basically what it does is you draw something and the algorithm keeps trying to guess what you're drawing. And it's got, I think, about 100 different things you can draw, or 80 things. How many things is, I've forgotten now, exactly. But Google often uses this at road shows where they have it on a big touch TV and people will draw on the touch TV to do it. And it's a fun game. It basically will start guessing. It sort of looks at what you're doing and then runs it through a model and then tries to make a prediction. Now here's the thing, is that Google, every time someone's drawn something in, whether you've done it on the mobile version or whether you've done it on a version in a road show, Google's kept your pictures. So just like Schupper was saying that people are trying to evil, trying to steal your data, well, the data of how you draw a bicycle has been kept. And the quick draw data set, which Google has now released for research, is really cool because it has 50 million hand drawings. And they're over, I think it's about 100 classes or something like that. And it is a really interesting data set because people draw things different around the world. For example, there's a blog post about this. I don't have it in here, but if you do a search for it and do it, there's a great blog post from Google about it, showing that, for example, in Russia, people draw chairs differently than they do in other parts of the world or something like that. Or in South Korea, they draw whatever it is different than other places. So it is kind of interesting to just sort of see that. But it also makes a really nice data set for us to train and train up a GAN to do this. And the particular type of GAN that I trained up for this is a conditional GAN. So the difference a little bit, let me explain the difference between a vanilla GAN and a conditional GAN. So everything's the same, actually we have noise going into the generator even though I'm not showing it here. The thing that's different is we also pass in some labels. So now the GAN is not just saying, I can produce an image from all these images that it's doing, it's actually producing a bike image or an angel image or a bee image or some particular image from that data set. And the way we do this is we pass basically the labels into the generator. So it learns to use the weights differently that, okay, so if it's generating a bike or if we're using MNIST, which we'll use later on, if it's generating any number one, it uses the weights in this way. If it's generating a number seven, it uses them in slightly different and it learns to be able to then be able to reproduce multiple things. And you can actually then tell it, okay, produce me 100,000 I, you know 100,000 ones or 100,000 sevens and it will be able to do that. Okay, so let's look at some old code for doing this and this is, I don't know if this is, if old is the right way of saying it, but wherever is it? Okay, so for start you will see, this is a lot of code, right? And one of the reasons is, okay, so we've got, you know, our normal things of where we're inputting data. So this is an example of a bicycle that someone's drawn and you can imagine that people draw bicycles very differently, right? Would you have drawn a bicycle exactly the same way as this? This is one of the reasons why I chose this data set is because it is kind of an interesting data set where it is very different, right? With MNIST, people draw numbers differently but certainly not as differently as they draw bees or basketballs or brains or, you know, all these sorts of things. So we're basically just gonna use a few conditions in here. I've brought in, how many did I choose to bring in 40,000 of each of these things? So it's basically reading in 40,000 bicycles, 40,000 angels, 40,000 bananas, 40,000 brains, 40,000 bees, all right? And oh, you can't see, I can see. Thank you, many for pointing out. Is that getting bigger? Okay, how's that? Okay, so you can see basically I'm just bringing them in from Numpy. Google's got, like I said, there's this whole data set you can download yourself. Whoa, that's gone a bit over. And for each, I think the data set is, how many, it's like 11 gig or 12 gig of data. It's not a small data set, by any means. There's some serious lag on this, which I'm not sure why. Okay. Why is Jupiter lagging so much on? So I'm just gonna leave it this size for now. Let me try and go in a little bit. So basically we're just bringing in five classes. I'm just assigning them some labels and stuff like that. And then what we're gonna do is basically build a model. And you can see here I'm building a fuller model. I'm having to basically write out the code for all the different, the different loss functions that are going on. I have to basically set TensorFlow that at certain times I wanna turn variables on to be able to change the gradients. At certain times I don't wanna be able to do that. The key point here is that this is a lot of code, right? I, just to show you some of the, some of the images that come out, though. When we train this code, we can see that this is what the model starts off, the generator's starting off like this. It really has no clue about how to produce any of these things. But gradually over time it starts to learn. And we can start to see that, okay, remember we've got like a bicycle, a banana, an angel. And we can see that as we go on, it's starting to basically learn how to produce different types of things. So we can see here that we're now probably getting some of the bananas. We've got some down here that are maybe like the angels. Up here we've got a B, I think. Anyway, then if we basically train this for a long, long time, and I'm printing out every about hundred steps or so, and by the end we've actually got a model that has the ability, the generator has the ability to produce these handwritten, sort of hand drawn, you know, figures quite well, right? We've got clearly got some sort of, you know, things that are like a bicycle. We've got our bananas, we've got our B. We can see that this is like the wings of an angel or something. And it's basically producing these in that way. The problem is though, to be able to do it, you have to write a lot of code. And so one of the key things that Google did recently, in December was basically release a library called TFGAN. And what TFGAN does is basically makes sort of building basic GANs and training them a lot easier. And I'll put this link up later on for you to be able to read the post, but let's sort of jump in and look at the same sort of conditional GAN with using TFGAN, right? So we're trying to do the same sort of thing. We're doing it on MNIST this time instead of the quick draw. But basically, we do our imports. We've got a couple of helper functions just to be able to print out images and stuff like that. We load our data. As you can see, everyone knows what MNIST is, yes. It's basically handwritten digits, you know, zero through nine. And then basically we make our models. So we basically make a generator and we make it a discriminator, right? And here it's using, for me, the biggest downfall of all of this is that it's using an old TensorFlow library called TensorFlow Slim, right? But that may change, who knows? Let's see. Basically you make your models. We've got our conditional generator here. And you can see the conditional generator, we're passing in inputs, where we're basically passing in some noise that latent Z, the latent Z matrix or vector that we had before. And we're also passing in one hot encoded labels for MNIST in this case. So that the generator knows what it's actually making, right? Then we've got a discriminator, which basically its whole job is to basically detect what's real, what's not, right? And you see that here our model is much simpler than the code that I showed you before, all right? So for those of you who were in, you know, the deep learning developer course, we went through all of that code line by line and talked about it, et cetera, and stuff like that. With TFGAN it's much simpler. And the cool thing is then we can come in and we can basically then assemble our model and say, right, we're gonna make a TFGAN model. Our generator function is gonna be this generator that we defined up above. Our conditional function or discriminator is gonna be what we defined up there. We're gonna pass in some data and we're gonna pass in some noise. And then we can basically tell it, okay, well here's the loss functions I want you to use. Again now, and this is one of the cool things with TFGAN. TFGAN basically has all the loss functions made for you. So you don't have to go through and code them. It also has optimizers, everything done for you in that sense. Then basically we wanna actually sort of get the model ready for training. We basically just say, right, we're gonna pass in our conditional GAN model. We're gonna use these GAN losses that we defined up here. We're going to basically pass in our optimizers for our generator, that. And then we basically train it. And then we can just train it and it will go. And it will basically go through. And here I've basically trained it this afternoon on GPU. And you can see that gradually over time it's starting to learn the numbers. And it's able to produce these numbers in a very, certainly by the end, in fact, even by halfway through the training, MNIST is not that hard to produce. Certainly a lot easier than Quickdraw to produce. And you can see that by here we're actually able to produce real MNIST digits that are not actually digits that were given to the model. These are digits that it's made up by itself. Let me come back to what I was gonna show you here. So what is TFGAN? TFGAN is basically a lightweight library for GAN's intensive flow. It has sets of pre-made losses and GAN components. So all the things that were kind of like the pain in the butt things to do with a lot of things with GANs, you can basically just take all these off-the-shelf losses and stuff that are built for you and then you can just put it into a model. It's a much simpler way to be able to make a GAN. Another thing which I'll show you in a second is you can also then make a GAN in an estimated GAN. So you heard Martin talking about estimators early on. For those of you who have been around intensive flow for quite a while, estimators, something that really has only taken off in the past two, three versions of TensorFlow. One of the big reasons why TensorFlow estimators are important is that TensorFlow estimators take all the details in regards to distributed training, training on TPUs, all that sort of stuff, and just automates it for you. So if you've got a model that's basically an estimator, you can train it on a TPU. If the model that I showed you before with the train loop in Python and stuff like that, that won't train on a TPU, right? So estimators gradually becoming more and more important, especially if you have any ambitions to use Google Cloud or distributed learning and stuff like that with TensorFlow. So why GAN? I think I've covered most of these things already, right? It's basically, it's very easy to get started and sort of check out how GANs work, get a good understanding of them and that kind of thing. Let me show you another one quickly, which is this one. So this is a vanilla GAN, but this is basically doing it as a GAN estimator. So estimators have a few key functions. We basically have the model function, we have the input functions, right? Then we have, generally, we have some sort of evaluation function, which the estimator takes care of for us. So here, basically, I'm defining, I've got my helper functions, which are basically just for plotting things out. I'm loading my data exactly the same as I did before. I've got my data input pipeline. So basically, here's my pipeline for basically pulling the data into the, in a format that is sort of estimator friendly. Then for the model, here's my, again, here's my generator and discriminator, very similar to what you saw before. And then for the model, I can basically just take it and I build this estimator. I basically just say tfgann.estimator, GAN estimator, and then I tell it, right? For the generator function, use this. For the discriminator function, use this. For the generator loss function, use this. And here's the cool thing, I'll show you, I'll just uncomment this to show you. So one of the biggest things that's changed in GANs over time, and one of the things that sort of improved GANs is different sorts of loss functions, different ways of dealing with these sorts of things. And tfgann has a lot of these built in. So for example, these losses, I can, let's see if I can, whoa, why is that not working? Why is the order complete not working? Okay, basically I, this is what's going on here. And I'll put this notebook up, you can play with it yourself. It's basically what's going on, is you can pick what losses you want. And it's literally now, you just pick off the shelf losses. So if you wanted to try building your GAN for your eyes and you wanted to see what works best, you would just run it through a few different times with different types of losses and see which works the best. Something like the wassar stand, wassar sign generators, these sort of losses, were not easy to code. So the fact that it's all done for you is basically just take it, use it off the shelf. We then put in our optimizers. You can see I'm just using basically atom there. And then we can set things like whether we want to show it on tensorboard or something like that. Then literally we just train it, just like we do any other estimator. And we can run an evaluation, and then we can print some out. And we can see that yes, this is also, in quite a quick time, is also producing eminence digits quite well. Okay, very quickly to finish up. So the challenges with TF GAN is it's heavily reliant on TF Slim, which I kinda hate. Partly because we've been told that Slim is supposed to be deprecated from TensorFlow at some point. The other thing too is that this kind of thing will work for a certain number of GANs, but if you get off the main path too much, it won't work. It has its limitations. But certainly for people starting out who are new to GANs and just wanna be able to say, oh yeah, I've built a GAN, I checked it out, I got an understanding of what it's like, that kind of thing, it's worth trying. GANs themselves basically have a lot of advantages. It's one of the most interesting areas of deep learning that's going on at the moment. There's still a long way to go. So far, they haven't been used for a lot of real-world functions. They have been used for some, but certainly not as many as maybe we would hope. And I think that will change this year. I think this year we'll definitely see a lot of things happen with GANs, perhaps related to medical imaging, related to a lot more detailed imaging. Things like the progressively going GAN has shown that you can build GANs that have unbelievably high-resolution images and produce them really well. Some problems, other problems with GANs and with some of the typical GANs, if you go and build some of these yourself, is that they have a problem with counting. As you can see, these are some of the images taken from Ingood Fellow's early GANs, and they often will sort of produce animals which look kind of like animals, but maybe not exactly right. They also have problems with perspective. Now, a lot of these issues have been solved in some of the newer GANs, right? That's sort of like, you know, the other things. I'm not going to go through the sort of more advanced stuff of problems of GANs and stuff like that. If you are training GANs and you have questions, feel free to, you know, I can go into depth later on. Same with tips and tricks. There's a lot of tips and tricks to do with things like label smoothing, batch normalization, virtual batch normalization, unequal training where you don't always train the discriminator and generator at the same amount. You sometimes train the generator twice as much as the discriminator. There's a bunch of different tricks that you sort of learn to be able to get good at them. I, for evaluating GANs, like I said before, we don't really evaluate them just on what's the lowest loss, because that doesn't really mean anything. We're trying to make images that humans think are real. So often what we use is what's called an inception score, which is basically running it through, like for example, the inception model trained on ImageNet, and we're looking for a very low entropy result from that. So basically we want it to say that, okay, I think this is a hot dog, right? Or I think this is a cat, or I think this is a, you know, and it to be very certain about that. Not saying, well, I think it's half a cat and half a dog, right? That means probably our image is not that great. One of the GANs that I love a lot, I showed this at the PyTorch Talks, is Cycle GAN. Cycle GAN, I don't think it can be done with TF GAN, unfortunately. And Cycle GAN basically, what you're learning to do is train things on what we call unpaired images. So there are a lot of models which are called the pics-to-pics models out there, that basically, if you see over on the left here, we've got a drawing of a shoe and then the model will produce an actual shoe that looks like that. The unpaired image, you know, and that's great if you've got like a sort of before and after, right? That are exactly the same, then it's great. Those models are fantastic, you can do a lot with them. Problem is a lot of times you don't have that kind of data, right? Especially if you're doing something like this. And this is a Cycle GAN that I trained up to basically look at an image and look at the image whether it's in spring or winter and then change it to be the opposite. So you can see on the left here at the top, it's got an image which is spring and then the model has learned to be able to turn that into winter. And you can see the picture on the right is actually the fake picture. Which is pretty good, right? It looks pretty realistic in that sense. Again, the same with the bottom. We've got one winter, one spring. Here are some more. Here are some ones that didn't work out as well. And you can see that, for example, this is where it's taking on the top left there. It's trying to turn that winter picture into spring but it doesn't really know what to do when you take the snow away. Now here's the thing is that this model has actually, we haven't told it what snow is. There's nothing like that. It's basically just learned through looking at a lot of pictures of spring and a lot of pictures of winter that there are things that are in common in these pictures and how could you sort of change from one to the other? The way, one of the ways, sort of a very simple description of how CycleGam works is that it basically says, right, this is a picture of spring. Let me make a picture of a winter and then let me convert it back from a picture of winter to spring and compare it against my original picture. Does that make sense? So it's a very cool trick and it's a trick that we'll see a lot more this year with things to do with text, to do with a lot of things that you're gonna see AI things happen. But the thing is, it doesn't always work and it certainly takes a lot of training. I did see a really nice example of this where someone who was making a home video took basically a video of the front of their car driving down their neighborhood and then they were able to turn their neighborhood into like a winter scene. And it looked impressive enough that you could probably get away with it in a movie or something like that. Here's what happens when you take MBS and you try and turn it into winter. Now the training set here had no, the training set had no buildings in it at all. So it shouldn't work at all, right? Really what I should have done if I wanted to do this was take pictures of buildings with snow and buildings without snow and then try and do it that way. But it does produce some pretty interesting images, right? So I like to call these the sort of nuclear winter or MBS in sort of like the end of days time or something like that. Another thing you can do is basically take, and this thing that this paper became very famous for was taking horses and turning them into zebras and zebras and turning them into horses. So you can see here I've taken two pictures of horses and turned them into sort of zombie zebras. And it sort of learned roughly what the lines were doing. Now this was after a day of training on a very, I'm sorry, a day and a half of training on a very fast GPU. So it does take a lot of time to do this sort of thing. But anyway, this is sort of what GANs are. I would encourage you to sort of learn them. They're not, a lot of people find them very intimidating, but it's something that once you sort of get the handle of you can actually build some really cool things with them. If you are looking for doing data augmentation or increasing your data sets or something, they're always worth trying. Any questions? I'm happy to take questions or if people just want to come up and ask questions, I'm also happy to do that. Thank you. All right. All right.