 Hi again, if you want to call me and just go by I've it's just easier So what are we gonna be talking today about? Today? We are gonna be talking about Unsupervised learning in the specific case about generative models and for Starter we're gonna be talking about auto encoders. So, let's see what this stuff is All right. So again, you've seen this stuff so many times so far So we have here a Input on the bottom part then we have like an affine transformation I have here my hidden representation and on top. I'm gonna have some other Item before we were calling the top part white hat, but in unsupervised learning We don't have the labels So we only have as I was saying on the first lesson different kind of fruits, but the teacher He went on a he went or she went on a strike. So it's not she's not there You had this poor kid trying to figure out things by its own So what are we gonna be using here? So in this case here, we are gonna be trying to reconstruct our initial input And so we have same input here the same input here. I mean we try to reconstruct the same input. So somehow in between We are gonna be basically perhaps learning about a efficient representation if this guy here the dimensionality of this one here Maybe it's smaller than this guy. So we had to somehow encode our information. Maybe in a smaller representation But okay, so this is just the initial diagram Potentially yeah, it could have like a lower dimensionality than X So you have somehow to be smart in the way you encode this guy because you have less way of Representing this guy here, and then you try to get back to the same Input they're gonna say more in a few slides. Don't don't worry You're gonna see soon in a few slides All right, so equations. They are all the same right so we have our hidden representation Which is the affine transformation of the input with which we apply a nonlinearity and then we have again the our Estimation of our again input is gonna be a fine transformation of our hidden representation to which we apply a nonlinearity All right, so again the input and the target or the actually reconstruction They both have the same dimensionality because it's the same thing So this is why it's called auto encoder because we are trying to encode its own input The hidden representation is R to the D where D is the dimension of my hidden space The first guy here shoots in D Starting from N and the other guy shoots in N starting from D. Okay, so nothing nothing new So here I introduce a new kind of representation So in the on the left-hand side we have here I represent with those circles different vectors So here is my input vector here is my hidden vector and here is my output vector whereas my Lines here represent a affine transformation plus the application of the nonlinearity In this case instead is a block diagram, which is more is used more. Let's say if you're I mean represent different things in this case here the vectors represent my Vectors sorry the arrows represent my vectors. So I have my input vector My hidden representation and my output and this guy here represent the process of actually conversion between those different units so In this case the affine transformation plus the nonlinearity All right, so dual representation right arrows means something here Arrows means something else there, but they are basically you can use either either one This is more of a notation for neural networks. This is more of a representation for signal processing perhaps of you know when you want to apply some specific transformation to your signal All right, so Again, why would be using this stuff here? Because labels are expensive someone has to get some human involved in order to label every singular Example of perhaps some some kind of you know some objects or some images in this case here We tried to learn some kind of interesting representation without need of using these external labels There is a paper just been released this week. I think last week from Facebook AI which is not fair, but it's a very nice paper you can check about pre-training with unsupervised techniques Which is very related to this topic so Also, we are going to be using these techniques whenever we have very few samples and if you perform simply a supervised learning We are going to be ending up overfitting the data set. So basically we are be going to be learning by heart those labels without actually Understanding the overall structure of your data instead if we pre-train those kernels perhaps of our convolutional neural net with a unsupervised technique We just have those kernels being Learning those structure of the data and then we can just use a few samples in order to train our last Classifier All right, let's get more meat on the fire. So sometimes you're going to see that Weights are tight. So this guy here. It's called encoder and this guy is called decoder and sometimes you have that the weights of the Encoder are equal to the decoder but transpose right because you have Dn and nd. So why would you use tight weights? For example in the two blocks? Second whatever you put in you're gonna get the same thing back But you have also a non-linear idea, right? So but why would you use this kind of relationship? How many parameters do you have if you apply this kind of? Constraint half as less right so your optimization is actually gonna be easier. You have less parameters to play with All right, so let's go on Say again We should be some other argument because actually you are taking Matrix which should be transpondent to each other Transpondent or household to each other. So it has some other mathematical properties Okay, which one so why why we are using this approach. Are you telling me if you know, I don't know Okay, but argument then we need to use this to have less less parameters. It's not Okay, then it's not argument. Let's go to the next slide So we can choose to have different kind of losses for example to train this out encoder At the beginning we are gonna be using this kind of capital L. Oh, sorry We use this capital L here, which is the average of all these per sample losses Where we try again to reconstruct our input with our own prediction. Sorry. This should be a hot design symbol there So if you have different kind of data, you're gonna be using different kind of per sample per sample loss For example, if you have binary data, you'd like to use a binary cross entropy For example, which is the same thing we have seen in the first lesson when we were trying to classify Those pixels as belonging to different spirals. That was a cross entropy Which is the same as shown here, but for multiple classes. This is just basically the cross entropy for two different classes I would spend more time, but I'd like to show you some examples later. So If there are no specific questions, I just keep going for this Slide here. So this this input here. I'm gonna be using it for binary data For example, I have images with our which are black and white So by like could be like some receptors. You're gonna find out. Oh, I have there a data Oh, I don't have like so there Like events know I get an event or there are there are no events. So I'm gonna use this guy here, for example Otherwise if I have real data, which are basically, I don't know Maybe the pixels the color of the pixel of an image or any, I don't know audio signals Then I'm gonna use this kind of representation where I compute the square norm sorry, the Norm the two norm of your the distance of those two guys, right? The square the quadratic The square of the distance between these two guys So this is gonna be basically a regression and this is gonna be basically classification, right? All right. Is it clear so far? Okay, so we have a loss we are gonna we have our input So we have our input which is gonna be basically The same as our target and here we have our output prediction All right, so as I was saying before Here is the sink is a mixed representation between To I show before here I show with those Circles inside those blocks like the size of the specific encoder decoder and whatnot actually not This is are actually the my vectors, right? So in this case here I have my input for example of dimension ID 3 and then I go up to some kind of smaller representation And then I go back to the larger representation and when I try here to match what I see here So you can think about this kind of scheme is a Condenser, right? So I have like a let's say I'd like to load images on a computer faster, right? So I would like to find a smart way of Compressing my image so you can use neural network in order to compress data And it's gonna be squeezing out all the redundancy in your natural data. It's you know, you have it has those three Specific statistics we have talked last time over and over so Going from here to here we squeeze out all these extra Unnecessary redundant information and then when we expand back here, we try to restore whatever we have compressed here Yes, so if we try to use like a standard data compression technique Then would that be counter to our overall goal or we're just trying to find the That's a very good question. So if you don't use any Nonlinearity here, this is becoming basically PCA is a principal component analysis So if you start from three components to two By training this guy here is gonna be converging to the same result you would get by applying PCA and basically a dimensionality reduction wherever you discard the Direction on which the data has less variance, right? Do you know PCA, right? Yes, okay So Huffman encoding works on a specific symbolization of your data this one works directly with your input data So it's gonna find the best way of encoding. So Huffman coding is expressed with binary numbers, I guess Here you can have much Better compression because you just learn from your statistics But the best representation in order to squeeze out all that kind of redundancy Yeah, it's different than that because Huffman encoding is lossless and the point of this is to do a lossy compression not quite so Okay, as many it is lossless. I would argue that for a specific Dimension of this guy this guy is still lossless Well Reconstruction so depends on what they are using if you're using images you can get to compress your image With like a hundred percent reconstruction here because it's just exploiting that kind of unnecessary Representational power that images have by you know It's a lossless compression What PNG is Huffman encoding it's actually a zip This this reminds me more of JPEG. I would say it's like a JPEG But if you if you are like in a lead in a specific setup You still retain all the information then of course if you start reducing even more then you start having some lossy compression But here the point is that you can learn directly from your input data the best way of representing the data Without you know need of actually resorting to specific algorithms, right? All right, so this is usually what I think about when I when I when I was thinking about out encoders not something that allows me to Re-encode my own input But using a smaller condensed representation. So I usually also think about this one as a concept vector So if I have here different words or sentences and I try to re-encode the same word or same sentence This kind of so like if you're using the one hotend coding Maybe this one these are presentation for it gives you like a more condensed Concept about that one that is taking a place in that long long long vector All right, so But then there are also the over complete So how does this work? So here we have a hidden state internal state, which is larger the actual input So you can actually Say like oh this guy since he has to just replicate this guy here He can simply copy here, and then he copies it over here, right? So How come are we interested in something that is over complete For an out encoder well because it depends what you're actually standing here. So In here we still have to introduce some kind of constraint which could be for example We turn some of these to zero. So we have a sparsity constraint so that each of those neurons here encode a specific Characteristic of your data or for example, we can have like we apply some noise here And we try to reconstruct the initial image without the noise So maybe we'd like to you know implement a denoising out encoder which is learning how to get rid of External noise and therefore it's better to use a larger representation in the middle because by having larger dimensionality You can as I was as we have seen in the first lesson push things around in an easier way other things are for example, you can use a contrastive out encoder or Variational out encoder where you have basically sampling from a distribution So all those techniques techniques are introducing some kind of constraint over the hidden representation Although it is larger than the input. It has somehow some constraint over just copying over the information through yeah And this is also very very useful. We are going to be seeing that very soon all right, so Let's see for example An example with a notebook. Do we have? Yes, we have some time. So let's train our first This guy here a normal kind of encoding version of the like a under complete under complete Out encoder So this stuff is on the on the git repo You're gonna have to type this on git reset card region master in order to get the latest version on your machine I'm gonna be using the No I'm gonna be using the Out encoder number six and so here I'm gonna be executing some of these lines and here you can see We go from 28 by 28 Which is seven hundred and eighty four dimension for representing one digit in the MNIST data set We are gonna be going going down to 30 dimensionality Okay, so this is gonna be definitely a lossy compression of image and then from 30 We are gonna go back up to this large number So I have executed that one. I get my optimizer Then I start training and I display some results, okay? and here we have the reconstruction I think so these are the reconstructed images and Let me see Yes, I'm showing you right now There you So here you can see the before and after so this is my input Which has been compressed down to the dimensionality I think 30 and then we get it back up to whatever dimensionality here So we have a very very very strong compression and then of course we're gonna see there is a Kind of bad reconstruction So so how but again we managed to compress all those different pictures into a very kind of small representation Which is sort of managing somehow to reconstruct those kind of initial values All right, so let me show you instead The denoising out encoder and how it works so In this case here we assume that we have our initial input But then again we have our the the input that we observe instead it's kind of It's not necessarily this one which is like a random variable Condition to the hour observation. So here I represent the manifold of my input data So here all those dots are examples of those digits So we have said we have said those digits are 28 by 28 Which are 700 is something and you can think about those digits as one point in this Very large dimensional space, right and here I'm showing you that all these points here They just follow a sort of lower space lower the lower dimensional manifold So the those dots are not really sparse sparsely distributed in the whole space But they are like forming a sort of shape in this kind of space So here we have our input X And we are going to be corrupting our input with a specific noise So we are for example adding some Gaussian noise We are moving this input from the manifold outside to the manifold So the task of the out encoder is going to be just getting this guy back to the X So the denoising out encoder gets things that are outside the manifold and brings them down back here So we assume that we at the beginning we have this guy here We had the X we perturbed the X and then we let the out encoder Remove the perturbation and by performing this operation We are going to be learning the statistics of our input manifold So here we have a very strong assumption We assume that the injecting we are injecting the same noise distribution We are going to be going to observe later on in reality In this game we can learn how to be robust to that data So if we observe noisy input we can learn We can just apply this denoising out encoder in order to remove that extra noise that we have managed before to learn So here we are going to be learning this kind of perturbation And then we learn how to reverse those perturbations So let's see how this works in the notebook So in here you should be scrolling a little bit up And we are going to be changing instead of going through 30 We are going to 500 Yes I understand that 500 it's still less than 784 But I would assume this is actually over complete because 784 are way too many dimensions in order to just to express those numbers of the M&E dataset So I would say that 500 is way more than what is the information content of those digits So I am initialising my out encoder which now has a much larger internal representation I have my optimiser and then here in this cell here I am going to be activating this module which is called dropout Which is dropping out some of my pixels from the input Then here I am uncommenting this line here so I have my images that Which are basically my input images which are getting corrupted Here I have my input which is going to be imageBed Which is going to be going through the model and generates the output And then I have my loss which is made between the output and the not corrupted input Finally here I am going to show you the imageBed versus my output So now it's training And so basically now we are going to see whether we are able to remove that External noise, that additional noise and basically have learned a representation So here we have the initial digits These are, you can see maybe a 0, a 6, a 9, an 8 And then we have these green dots which are basically set to 0 So those guys here are set to 0 And I think the blue is minus 1, the red, the yellow is plus 1 Or something like that So in here we have corrupted our input in this way Has our out encoder, the noise out encoder managed to disentangle this kind of noisy layer From our digit and then remove that noise from my data What do you think? There you go, right? So you can see here this has been really nicely restored From this initial corrupted input So these are my inputs of my network And the output of the network is just those very pretty images So here we have learned how to be robust over this kind of over 50% Actually 50% of the pixels here have been turned to 0 And here the network managed to reconstruct half of the missing image So this can be also used for example for image impainting When there are regions of your image that are missing for example Or if you want to delete some part, so you have your image You won't like to delete that photo bomber, no photo Shoot, how do you call it? Photo bomber, right? The guy there So you just draw black pixels there You send it through my denoising out encoder The image can be... You just told us that if we will draw some pixels Algorithm will return back the same, right? So if you will put these pixels on this face You will pass on this one and then this face will show up back No, so if you put some black pixels it's going to try to fill up with the background And it's going to be removing the guy who was trying to ruin your photo So let's see... The most function X is the picture before... Yes, that was for training Yes, of course So I trained with the original image which is not perturbed And I tried to go back to the unperturbed image But then you're just feeding noisy images And you're going to get cleaned up versions afterwards Alright, so I can show you some weights that has been learned And you can see here These haven't been basically trained But these kernels here, they are convolutional kernels Which are tuned to resonate with some specific curvature of your drawings Of your digits So those two may extract something And if you have a linear combination of these guys here You can generate those images So let's actually check what OpenCV has Right, so OpenCV has in-paint and in-paint NS and Telea In order, so it's like Telea and Navier strokes In-paint techniques which are there for cleaning up your data So let's see how they work So this one is my noise So this is my half percent, fifty percent of the data corruption These are my digits with corruption These are the original digits These are the results from my auto-encoder And these are the results from the other two techniques Oh, right, so right now, spending a few minutes, one minute of training Even less than thirty seconds We get results that are much better than state-of-the-art algorithms That, you know, big people spend so much time in creating We just made it from scratch But you just learn it from data, yeah And the reason that it's better is because you trained on your own data That's it, right? This one is only working with a specific noise We have trained with a specific data we have used If you apply it to other techniques, it's not going to be that good But if you can tailor your algorithm to your data statistics Then we just have to reconstruct the manifold, right? That's it All right, how much time do I have left? My God I have fifteen minutes, I guess All right, let's see if I can do some more stuff We have like much more stuff I'm really sorry, I'm going to be running so bad, yeah All right, so here we have the contractive autoencoder What is this guy? So despite the very scary name This is actually very easy to understand So here we have again our input manifold Whereas our dots here represent the specific data points of our data set And this one are representing basically the surface Or whatever this kind of object in this high-dimensional space Where all my things are laid All right, so I have there my input sample And then I have this reconstruction So what are we doing here? My reconstruction, it has a loss associated to the reconstruction term And then there is this guy here What is it? Renormalization Hello, what is it? Greenish A blue-blue-blue-blue-blue What is this symbol? Oh, nabla Okay, nabla, sure, what does nabla do usually? Greenish Gradient of my hidden representation With respect to my input So what does this say? Penalize it Penalize what? If the hidden representation is very far away from the input Uh, no In large bits In large bits Large bits So this one is going to be penalizing the sensitivity to the x So basically if this is very, very good If it is very strong My hidden layer is going to be completely insensitive of the input So whatever is going to be my input My edge edge is going to be always the same So if this is very, very large And if I try to minimize this loss This means that I don't want to be sensitive to the input Anything I send, I just put always the same constant So it is insensitive to noise and input This is insensitive to anything It's like you're talking, I don't listen So this is like basically I don't want to listen to anything that's happening over there So I'm like, man, man, man, putting my Okay, so this doesn't work I mean, it's just this one And this one is like somehow trying to do something So this is trying to reconstruct Whereas this is trying to avoid any kind of variability of my hidden state With respect to every direction in the x So how does it work? The first part penalizes insensitivity to the reconstruction So in this case here, my x can move in this direction, right? So this guy here is penalizing Whether this guy here is not sensitive to variation in this direction So this basically makes this guy respond to variation in this direction here But this guy here penalizes sensitivity to any other direction So this one says, okay, this guy shouldn't be able to move anywhere But this one says, okay, make it move towards this direction here So this is called the contractive out encoder Where we avoid, so if we are just happening to just fall outside the manifold That's actually bringing me back on the manifold I go outside the manifold, that's like back to the manifold If I am on the manifold, everything is just fine Because there is my reconstruction term which is saying life is good All right, so this was the contractive out encoder Now we're going to be doing fancy things Ready? I have five minutes now, ten minutes, disaster All right, yeah, penalize and incentivize in direction And that's just the penalized direction All right, cool, huh? All right, that's my basic out encoder So what does it do on this stuff? Here we have all those samples that we observe in the data So those balls are the samples Now as I said before, every digit can be thought as one dot In the 700 whatever dimensionality And this is my one image, another image, another image, another image And this is basically the path where only the possible image is Possible natural images can occur So I have my lower case x which belongs to this manifold curly x Which is a subspace of rn So this guy is my example and this guy is the manifold On the other side, what do we have? So the point of using out encoders is that Oh, how many dimensions do I need in order to represent this guy here? Although we are in three dimensions here You can think I get this one, I do that How much do you need? Just one dimension, right? So that's what the out encoder does We just get one dimension Although three were used there in the representation And I say, all right, so I have my z Which is my hidden representation It leads on this manifold here Which is under subspace of this guy here And I get my first element here I get the first last one there And I just do it Stretched, right? All right, sweet Very cool, all right This is basically what the out encoder does Now, how can we go back that direction? Meaning, how can I get here a sample And then try to go on the other side How can I choose samples? So if I would like to sample from here Like using a non-parametric sampling So I have all to learn the distribution of those dots here Just to see where they are usually happening to be Or I have to enforce here some structure So that I can, I know what is the structure And then I just sample from that distribution Yeah What if it's not a unique function? Like what if there are loops? Where? Like in this curve here You've drawn like a straight line But what if it like What if it's not a unique function? So these are, I'm just showing you Three dimensionality and one dimensionality We said it's 700 dimensions And this stuff is maybe 30 dimensions So loops and things are taken care by The larger dimensionality All right, so I really have to rush now, sorry So out encoder recaps So we have here the neurons We said the affine transformation And here we can show those blocks Encoder decoder So, variational out encoder Someone yesterday told me I spent the whole last month Trying to study those variational out encoders Wow I mean, I even read the article I didn't, it's too hard I cannot understand this So complicated So I try now to explain variational out encoder in one minute Maybe I can do a better job than I don't know All right, so let's start We have, these guys are my classical out encoder We have an encoder here We have a decoder over here Variational out encoder We have an encoder over here And an decoder over there Same, right? Nothing changed Second step Oh Okay, I'm outputting two things here I'm outputting some means And then I'm outputting some Distribute some some variances These maybe are like diagonal matrix So they are all independent All right So Then here I just sampled So here I output these guys here And then given that I have a mean And then I have a variance I can sample one of those z's here And use those z's to reconstruct my x So it is like basically having here my encoder Which is ending up with those two guys here And here I sum a noise So I go to the representation Oh, I get some noise I get kind of off the representation But then my decoder tries to get me back on the manifold So this one is basically instead of adding noise to the input So I was in the input I get outside the input Then I go to the latent space And then I go try to go back to the input In this case here I add the noise to the latent representation But then again I try to get back to the same point Where I started, okay All right, so And there what did we do? Well, this one can be thought as the variational auto encoder Without the variance part, right? So this is just output in the mean So this one gives me input, a value And I reconstruct this guy This guy instead from here I can think here I basically Managed to get a sort of distribution And then from the distribution I do a sampling And then I go back up there, yeah You just append this B of Z To the latent representation, right? Say again, sorry? You're just appending these random bits To the latent representation So if here I have D items Yeah, here I have two D items Right, where half of them are the same Are the same D from there And half of them are just random numbers Yes, we start expressing how much Those specific characteristics are further away From that specific point So if you have the different explainer Of your input The variation, the variance It tells you how much is the uncertainty Over that specific value So here you just have, okay My input can be expressed with a few values And here I tell you Oh, my input can also express by those few values But I also tell you how certain I am Over each of those four, perhaps Characteristics So you say X is larger on the left? Both X's are the same This one, let's say we start from 100 And we go to four But in this case here I know how much is each of those four How much is the uncertainty Associated to each of those four characteristics Here you don't know how far you are In guessing the correct code Here you have also this notion Which is telling you How certain you are about a specific feature Really running out of time All right, so let's see what it does So the encoder maps from the input to the X And we have said that this guy here Is twice the dimension Because we have the means and the variance And the other guy goes from Is samples from the latent variables space And it gets back to my prediction All right, so variational encoder in drawings So we start there with initial samples We go down here And then from here We actually add some noise in the encoding part So we start encoding But then you get somehow pushed away In here, we go back over there by something And then we try to get this guy here To be very close to the initial guy So potentially if we didn't have the additional noise We could be basically going here But then here every time we get new noise So every time you get to point somewhere else And by training this guy You get to reduce the distance between those two guys And you become insensitive to this guy, this noise Something else What we also like to do here Is to be basically invariant to this kind of distribution So we can also enforce here This guy here Which is a loss associated to the distribution That those guys here should assume In this specific case here, I'm saying Oh, the distribution of these Zs in those latent variables Should follow a normal distribution So that I can easily sample from here And I get meaningful examples over there So in this way basically I try to remove the association Between this guy here And the guy who generated it So that I can basically have a nice distribution From which I can sample And then have a nice generative model All right, a few slides more Because I think you're interested in this one Are you? Sorry for not taking too many questions But I have like one minute All right, generative adversarial networks in one minute Yeah, I'm sorry, seriously But tomorrow we have more stuff coming So, or this way or this way So the next notebook on the GitHub page Is going to be simply running that variation auto encoder Which allows you also to do something very nice So you're going to have one point here One point here And then you can just move that point From one location to the other You're going to be seeing how interpolation In the latent space Is going to be corresponding to interpolation Actually in the input space So you're going to see basically having Mathematical operation performed in the latent space In the hidden space Corresponds to basically different shapes Interpolating two different images So you can go from like your face And your girlfriend face or boyfriend face By just interpolating those latent variables Corresponding first one, your face And the other one is your girlfriend face And you're going to have very weird faces in between All right Not that you're weird, but you know All right, so one minute then One minute and a half, bro All right, thank you I won't drive, but don't stop me Because, well, stop me later All right, so you understood this one, right? More or less Instead of having just the ease here I have twice as many guys So I have also the uncertainty Associated to each of those elements So, variational and encoder, right? Generative, sorry, networks Actually they are almost the same I mean, very similar All right, so we have similar colors So we have a sampler here Sampler, same sampler as here This sampler was getting information from here So there is some information coming from our input Here there is no input So this is unconditional This guy sort of is conditioned to whatever we have seen down here So we have a generator Same color of the decoder Same Here I have my X hat I have my X hat, I guess it's the same So far we have same modules, right? Next one X, there, okay Everything the same so far Oh, what is it? It's a switch Discriminator Hmm So what is this stuff? So we have a generator Which is basically getting some noise And is trying to generate something cool And here we have another X Which are, these two guys are not related Then we have a discriminator here Which is just trying to say Huh, is my input coming from real data? Or is my input coming from fake data? So this guy's task is to discriminate Between fake input and true input At the beginning it's going to be very easy Because this is just randomly initialized So this guy's job is very easy You're going to see trash here And good things here The nice point is that This guy is going to provide me gradients through So I can improve this guy's performance By seeing what are the actual meaningful Information this guy used In order to tell me that this is fake And this is real So this guy here When he gets the job done of telling these two apart Is going to provide me It's like a spy It's going to tell me Ha, you made a mistake Okay, what is the mistake? That one, okay, let me fix So this one is actually trying to fix The mistakes, this guy tests it By using simply a gradient signal And so overall we know you train this guy This guy is going to be improving, improving, improving Its own performance And it's going to be converging to just produce Outputs Which are looking exactly the same as the Data from our input distribution So they are basically undistinguishable And this guy here Best job is going to be just being guessing One half, no, 50% Which one is the real data And which one is the fake data The point here is that We don't have any Conditionality on the input So these are just generating Like very real looking data Without actually having never observed real data This guy here instead generates You know, real data But it has some knowledge of whatever Is in the input And that's why we had to enforce that kind of Normality, no? Over here with that KL divergence loss All right, one more All right So the generator maps from the hidden space To the input space From z to x hat And the other one goes from the Or x or x This one to my binary outputs Which is going to be Or zero or one Like it's correct It's like true It's coming from the It's coming to the real data Or it's coming from the fake data So finally How does it work with the graphics? So first one We have our input Remember? In the variation out encoder We were starting from our input data We were going here And then we were going back Here we don't have the input involved We just start random noise From the input random noise We get the generator And we say ha This is my guy here And then we have the discriminator saying Hmm, I think it's fake So this guy It's going to tell this guy Why he thinks it's fake And this guy is going to try to improve the shooting In order to shoot on the manifold Then also the discriminator Is going to get some other input And it's going to say Huh, this is going to be good value So this guy is trained across We're using these samples And these samples And this guy here Just tries to shoot on the manifold And finally The way you train this guy Is just using a min-max game Over this value function Which is basically One is trying to fool the other The generator tries to fool the The generator tries to fool the discriminator And then you have just here The discriminator trying to do a good job In classifying the real data So again, this is going to be a min-max game Over this value function I know it was a lot of stuff But the whole point of this stuff here Was to tell you that Once you train this guy here You end up with two different things First one is going to be A very awesome generator Which goes from random noise Like a vector of five elements Gaussian White noise Five elements And that one is going to be generating Absolutely gorgeous pictures of your Model Whatever you train the thing on They really train on bad things Don't check bad things Ah, seriously The other one So this is one of the things, right? Why do you use a Generative at a higher network? Generative, no? Why is it generally? Because you have an absolutely Amazing generator But What do you have here? A discriminator, right? What does it do? This one is absolutely super well trained In telling apart Data that stays on the manifold Versus data that is not on the manifold So some of you yesterday asked me Oh, how do I train a classifier In order to find anomaly detection? Right? This way, so You get this discriminator Just learning very, very well What is your input data Manifold And therefore if you have some outlier Ha! It's going to catch it It's like, ha, it's fake data, right? So don't forget Whenever you train these guys You don't get just the generator Which is why people mostly train These generative adversarial networks But you get also a very, very, very Nice discriminator for free Right? All right Sorry for running so much You have the other examples on Github And I see you, I guess, tomorrow For sequence learning With recording your network and LSTM Thank you for listening