 My name is Vivek Singhal. I am a co-founder and data scientist at ShellStrat. We are AI startup. We do AI consulting, AI training, AI evangelism. So, anybody interested in a fight? Do you watch any kind of fight shows or anybody interested? You are, what do you watch? Oh, UFC. That is great. Anybody else? Oh, wow. I have to be careful with you guys around. So, okay. I am actually a peaceful guy, but I fight the neural networks. So, so we deal with GANs. So, let us come to our topic here. So, we will discuss generative modeling with GANs today. All right. So, this is just a plug on a startup, the AI startup based in Bangalore and Delhi. Just one minute intro on myself. I was in US for a long time, worked for AT&T, worked for IBM in Bangalore. Now, I run startup in AI space. Okay, so let us discuss about fighting neural networks, if you will. All right. So, GAN is an advanced type of neural network. And the theory is rather simple, but it has come around in last few years. Dr. Ion Goodfellow, who is a professor at University of Toronto, invented this concept along with couple of his fellow scientists. And the idea is we are dealing with two dueling neural networks, okay. So, is it part of deep learning, machine learning? Hard to say. Maybe you can say it is deep learning because we are dealing with neural networks, right? So, but it has become sort of a science of its own. And there is lot of progress and research happening. So, we are dealing with two neural networks. And all of you know what is a neural network? Raise of hands, please. So, most of you know, okay, fine, great. So, there are two neural networks. So, one neural network and let us talk about in context of images or computer vision, if you will. So, one neural network, which we call as the generator, okay. Also, the forger creates forged images from random noise, okay. And the discriminator, the second neural network is trying to, you know, detect which one is a forged image, which one is a real image, right? So, there are two dueling neural networks. The generator is constantly trying to produce images which look very authentic or just like real. Whereas, the discriminator is trying to detect which one is a forgery, which one is a real, right? So, we do need a data set. It is not unsupervised learning in that sense. We do need real images to compare it against, right? So, way this works is the generator starts with random noise, as you can see, starts creating forgeries, yeah. The discriminator is comparing forgeries and real images continuously. They are, you know, executing alternate fashion and then it is marking which one is real, which one is forged. So, in this process, this goes on for millions of iterations as you know, how you train a neural network. So, ultimately, the generator starts producing images which are very much like real images, all right? So, basically, we can use this for generative modeling. You can create new forms of art. You can create even new forms of text. So, you can combine images and create brand new images, all right? So, n number of applications. Yes, you can also create new data. Let us say you have less data. You have a limited set of images and you want to train a convolutional neural network for images processing. You want to create new kinds of images. You can use a GAN to create new image forms, all right? When you will have to see if those created images satisfy the kind of data set you are looking for, yeah? Here, any questions on this? Yes, please. See, the generator did not have knowledge of the real, right? Generator is a neural network which is just using random noise as the input initially and it is creating images. So, initially, what it will create will be images which will be very different from real and the discriminator neural network will immediately tell you, oh, this is a forgery. This is not at all like real, right? So, this process goes on millions of times and eventually, generator starts producing images which discriminator is finding it harder and harder to distinguish from real images, all right? So, this is what is happening. There are two dueling neural networks and this was invented by Dr. Ion Goodfellow, Professor at University of Toronto. Now, with the open AI team belonging to Elon Musk and he is called the GAN father. So, this is the new kind of, it was a research paper on AXRIV website which you can Google, okay? Any other questions? Let us see. There is no feedback as apparently, but once we see the code, we will look at that, all right? That specific question, yes? Obviously, I mean some feedback is going back. That is why it is adjusting itself. Otherwise, how will generator adjust itself to answer that question? Sorry, trying to understand your question. Why will it generate a real image in the beginning? It started with random noise, yeah? See, eventually, it starts generating images which will look very close to real. So, how this happens programmatically? How is the loss working? I will explain you shortly, okay? So, then it will become clear how this is able to do it. So, when I started looking at this topic, initially I was thinking, how is this working? I knew neural networks very well. I know CNN, RNN, right? I was thinking, how is this really happening? But once I looked at the algorithm, then it was clear how this, and I will come to that, all right? So, in the interest of time, let me continue. So, I think basically what we are doing here is, so, you start getting a bit deeper now. So, discriminator is trying to output a zero for the forced image and a one for the real image, all right? So, discriminator is trying to output an output level of one for the real image, a zero for the forced image. So, this is A, yes, this is no, nay, yeah, A or nay. So, generator is actually always trying to produce a one. So, generator is trying to produce a one because the real is one. See, real is one, generator is also trying to produce a one, but discriminator's job is to mark the forced image as a zero, as a faith, all right? But generator is constantly trying to produce a one. So, if this is one, this is one, then generator has figured out how to produce real images, all right? So, the output level is one for both the neural networks, okay? Let me just keep going, it will become clearer. So, this is an example of this, it is from a research paper, okay? So, this is called a stacked, stacked gain because there are two gains which are stacked on top of each other, okay? So, the first gain and in this case, I am also using a text line caption to image processing, as you know from, you can do caption to image, right? Using RNN and CNN combination, you can do caption to image, right? If you know neural networks. So, this caption says this smaller brown bird has wide stripes on the coverse wingbars and secondaries. So, it, it describes to describe a bird, okay? A brown bird. So, stage one gain produces sort of blurred images. Stage two sharpens the images, okay? So, there are two gains stacked on each other. So, this is from a research paper, okay? Same thing we can do with this line, this flower is white, pink and yellow in color and has petals that are multicolored. Stage one gain produces images, but somewhat blurred. Stage two, stage two gain sharpens the images, alright? It uses the text line as well as the output produced by stage one, you know, images and is able to produce sharper images. So, you know, once you can do this, you can imagine the applications, you know, fashion design, you know, image searches, image translation, changing colors, shades, contrast, you know, shapes, everything in images. Anything you can think of an image, you can do with a gain, okay? And obviously, gains are working in conjunction with traditional neural networks and oftentimes convolutional neural networks to do stuff, right? So, this is one more application of gains. Apart from gains, which other algorithm you think can do generative modeling, anybody? Can it do generative modeling? Yeah, that is a classifier, right? Knife base is a classifier, probability of this or that happening, right? Sorry? Yes, variational autoencoder can do the same thing, exactly, right? Any others? I mean, I am not sure of that. Hidden Markov model, correct? By the way, gains use some of the theory from Hidden Markov model, HMM, okay? Yeah. So, see, so some of the maths behind gains is coming from RBMs and, you know, HMMs, right? They use the energy function concept, right? Minimizing that. So, some of the theory behind gain, by the way, come from this. If you have read the deep learning book from Professor Ion Goodfellow, he talks a lot about RBMs and, you know, Boltzmann machines and then he goes on to describe generative models, the mathematics of it, right? It is quite intricate. So, anyway, so coming back to this, so another example what we can do, I take a smiling woman, I subtract the woman part, but add the neutral man who is not smiling, which is normally the case, right? And then we produce a smiling man. So, you give a wife to a husband, you get a smiling man, yeah? So, smiling woman minus neutral woman plus neutral man, you get a smiling man. So, you can guess I am married, right? So, I am smiling. So, a man with glasses minus, take out the man. So, you are left just with glasses at the woman. You get women with glasses, right? So, you can do all sorts of funky stuff with image processing. This is using, you know, it is manipulating the image vector to create new visual concepts. So, the mathematics can get quite intricate. So, another example, artistic style transfer. So, here you can see, I can take any kind of images and apply any kind of style on it. This is the image, there are couple of buildings. It looks like a European town and you apply a couple of a style, a certain style and you get a mixed image with similar buildings, but in this style now. This is called style transfer. It is a very interesting use case, can be used for if you are a criminal, you know. So, you can do all sorts of stuff with it. So, there are lot of interesting use cases, also crime detection on the other side. So, and art and craft and fashion and you can imagine. So, you know, because fashion industry is always interested in creating new art. So, you can imagine the possibilities, right? So, another application. So, here we are trying to do a video. So, there is a head rotating, right? So, it is trying to predict the next position of the head. So, this is a head of a person rotating, you know, on a pedestal and it is trying to predict the next, you know, the frame. So, this is the ground truth level, right? All of you understand ground truth level. This is the actual reality. MSC using normal mean square error, you know, reduction. I get a blurred image, but when I apply adversarial loss using again, I get a more sharper image. So, this is just expanded here. So, you can say regular MSC, you know, regression if you will, will give me a blurred image, but adversarial loss MSC gives me a more sharper image, even though there is a bit of blurriness here. So, you can train these models to do so many things, right? So, a lot of this, all of most of these examples are from some of the research papers, not like we invented them. So, a lot of good research has gone in and now folks like us are trying to create commercial, you know, POC is out of these research papers, if you will. So, one more example. So, this is very useful for image-based search. For example, anybody likes e-commerce shopping online? Okay, fair enough. So, you know, especially I have seen my wife. So, she is constantly on fashion websites looking for clothing, right? And shoes. I mean, I cannot understand why you have to spend so much time on e-commerce sites. I pretty much know what book to buy and just go and get it done, right? But women want to look for lots of styles, right? So, what if we give them a paintbrush? So, they say, okay, I am looking for a church, but the user is given a brush stroke. Let's say Flipkart were to give them a brush stroke. So, the user starts creating edits and it will start creating images based on that edit. So, in this case, we are looking for a church, but the user said I want a triangular shape. So, it created an image of the church with a triangular shape, okay? And then the user said I also want a blue sky on top of it. So, the generated image has blue sky on top of it. The user said I want a green stretch in front of the church. It created a green stretch. So, these are generated images based on these user edits, right? And then it goes ahead and finds images from the real database, which match the generated images. So, this is a complex, pretty large, you know, image search application which can be used by the e-commerce industry where user can actually participate in the search process by suggesting what kind of shapes and colors they want, okay? So, this is a user carousel, a color carousel you have given them on which they can do edits, all right? So, in this case, the user is trying to suggest, show me structures of this shape. I am looking for a church again and then structures of this shape and then add the red line, red, you know, tanginess here. And so, it tries to find equivalent matches, okay? So, lots of interesting, you know, visual applications can be created, visually rich. So, yeah, another before I go to my demo. So, this is the last application I am showing you. So, here we are doing image to image translation. So, you know, we are trying to say that generate an image which looks something like this, okay? So, I feed this. So, this is in this case what is happening? I am feeding an additional image both to the generator and discriminator, okay? So, this is an aerial photo from a satellite. I feed that into my GAN model. It produces a kind of a map, map output, okay? Here, I am able to convert day to night. This is a photo of a daytime and it is able to convert to night. So, this is the input I give it, okay? And I tell it to apply to the generated model. So, this you can call it image to image translation. Here, I am trying to, you know, create a black white grayscale to color photo, you know? And here, I am trying to create a first using this shape. So, by feeding this kind of input image, I can create output images which, you know, follow that input image. So, amazing use cases of different, you know, product generation or designers of different kinds, artists, you know? Any questions before we go to a programming demo? Any questions so far? Okay? So, it seems like, okay, we have 10 minutes. So, that is why we have to keep it a bit short. Yeah, please. Sorry. Yeah, actually, in the demo, I will show you. Yeah, yeah, I will show you in the demo. Yeah. Any other question? Yeah. Yeah. Yeah. See, if you could do with classic machine learning, so it is not able to produce very accurate models. So, that is the point we are trying to show with GAN by comparing with real images, we are able to create more accurate models. I mean, that is the whole point of GAN. It is able to generate more accurate model because it has been, sorry. Yeah, when I say adversarial training, I am using a GAN. Correct. When I say just MSC, I am not using a GAN. So, I could be using any of your other classical models. Yeah. Any other questions? Okay, let me go to the demo. Then you will see the programming mechanics of it. So, let us look at our traditional MNIST, which is mostly the most common data set. So, you can see that we are forging MNIST with fully connected GAN. So, we can use DC GANs, which is deep convolutional GANs. But in this case, just to keep it simple, we will show using two fully connected neural networks, discriminator and generator. Okay, all of you understand neural network programming. Okay. And so, both these, both G and D are fully connected neural networks. So, the neural network itself looks pretty simple if you are familiar with the concept. And you can see with increasing epochs or training groups, it is able to produce more accurate images. Okay. So, let me just run the, let me just explain the code first. Then actually I will run it for you in, you know, in pie chart. So, what is happening programmatically? There was a question before, how does it actually do it? So, what generator does, it takes random noise Z. Okay. This is the output of the generator, G. So, this is the fake image generated by the generator. Okay. What discriminator does, it runs two times. Okay. So, in one time, discriminator takes the real input X and outputs the logits and the output sigmoid. Okay. And then it takes the output of the generator and outputs the fake logits and the fake sigmoid. So, you can see generator run one time, the discriminator runs two times. Once on the real image, once on the fake image. Okay. So, then what happens? And they run in alternating cycle. First the generator runs, then the two cycles of discriminator, then again generator runs, then two cycles of discriminator. So, those of you who do not know, do not do programming, that is fine. Just, you know, logically we will try to understand. So, then in, you know how neural networks were, we try to minimize some kind of loss. This is a loss function. In this case, we are using cross entropy loss. Okay. And this is the generator loss. We are minimizing the loss function for the generator. So, this was the discriminator output. So, there was a reverse feedback loop. So, something from the discriminator goes back to the generator indeed. Okay. So, there is a feedback loop because D, this D fake logits is produced by the discriminator. So, I take the discriminator output and I compare against 1. So, I am trying to minimize the loss between the discriminator output and just 1. So, I am trying to say, bring it closer to 1. So, generator loss is trying to bring it closer to 1. Okay. And then discriminator is running two times. In first time, it is trying to reduce the loss between the real, you know, the real characters and 1. Okay. The discriminator, okay, this was, yeah. So, between real characters and 1 and between fake characters and 0s. So, discriminator is trying to bring the generated output to 0 in the real to 1. So, so, fake goes to 0, real goes to 1. Okay. For discriminator. But generator tries to take the output of the discriminator and tries to make it 1. So, if discriminator figures out how to make it to 1, it has started, it has learned how to produce real images. Okay. So, all of you are developers here. How many developers? Okay. Quite a few. So, we are good. And then what we are trying to do? We are minimize, trying to minimize the g loss and discriminator loss both. Right. So, the g loss is simply this. But discriminator loss is the sum of the loss on the fake images and the real images. Okay. So, it is a common sum. And I am trying to minimize. So, what happens? There is an equilibrium reach. This guy is pushing hard, hard, hard. This guy is pushing this way. Right. They are all, they are both pushing each other. Finally, the discriminator finds enough strength to hold back the discriminator. The generator finds enough strength to hold back the discriminator. So, there is an equilibrium. All right. Clear. So, so, I am kind of out of time. So, I will just quickly show you the code just before I go to code quick take away. So, GANS can produce new images, other forms of content. There are two dueling neural networks involved generator and discriminator. The generator eventually learns how to produce real like images and they can be used in innovative applications. Okay. Let me quickly show you the program run. So, we will be done in just a minute or so once we see the program and we can take any questions. Okay. Fine. Yeah, no problem. So, you can see it is a regular training model. Those of you who are developers, you know, and here I am taking just the generator, right, taking the output G and passing it to discriminator, right. So, G is fake, fake output, right, from generator and I take the output of discriminator. I also take output of discriminator using real images X, right. And then I try to what is discriminator loss is the combination of the real images loss and fake images loss, all right. And what is generator loss? It is merely the loss on the output of the discriminator, which was produced earlier. So, discriminator is trying to reduce loss compared to once whereas generator is trying to sorry generator is trying to compare loss with respect to once and discriminator is trying to mark the fake images as 0. Okay. And we just run usual optimizer of the neural network and it is able to train. So, I am going to now actually run this, but we will have to see on timeline how it works because sometimes the computations are high. Fine. Fine. Okay. I guess I will I will just show you the output then we will be done. So, you can see in subsequent epochs is gets more and more accurate. So, it is producing more and more MNIST. This training and we are printing the output every 500 training loops and the images are getting they will get sharper. I have to run about 100,000 iterations to get almost real like images. This is based on just 1000 iterations. So, the final output is you know roughly same, but it is not too many iterations. So, it did not produce perfect images yet. Okay. So, that is it. Any final question. I mean conventionally you cannot produce images. So, you cannot even produce. So, with variational on data encoder it is much more inferior. This is better. Okay. Thank you.