 All right, so foundation of deep learning me again All right. Yeah, so again do interrupt me. It's gonna be the last class. We have in person We have a very bad situation, especially in Italy people are dying. We don't have any more beds in the hospitals We don't have doctors doctor doctors are working like 24-7 and they are going home and they are affecting their own families so it's like a really bad situation right now back home and Looks like we are two weeks away from that situation over here So just wash your hands Try not to go in very crowded places Stay healthy. Okay, please. All right Let's start class All right, self or unsupervised learning generative models capabilities So here I'm gonna be giving you some eye candies such that you can get hungry And then I can feed you with the first part of this lesson This lesson is gonna be split in two parts half is gonna be today half is gonna be online through the zoom I think system so Who hands up who thinks the picture on the left? It's Real Okay, now hands up who thinks that the picture on the right is real Okay, who thinks that both of them are real who thinks that This one is fake Who thinks that this one is fake who thinks that both are fake? Okay, you are right So these two images have been generated by Modern that we train actually This guy Kara's train. So if you go on the website this Person does not exist. You can find several examples of very good-looking Non-existent people if you keep clicking sometimes you're gonna find some person with a hole in their faces That's kind of very easy to recognize that is not quite Likely a real person, but otherwise all of them look pretty legit You can notice that they have very nice teeth very nice, you know cheek chicken feet Like things what you call them You can clearly tell here though that if you check on the background behind The network is not producing a very accurate background although the faces looks very good Why is that? because the network has been provided many samples and those samples are Refiguring faces The thing that is not constant is the background, right? So the background is the variable here and therefore you can't learn any possible background So the background will look like some weird stuff because there is much more variability than what are the possible Appearances of a face how many given a face how many In how many way you can you distort a face a human face like if your face how many degrees of freedom does it have? Huh, you know this answer, right? We already covered this in class Eight he said eight How much 1050 50 yeah, that's 50 correct. So you have roughly 50 Muscles something less plus, you know rotation tilting and what so not what not so you know all possible It's just a manifold in 50 dimensions. So everything else doesn't quite is outside at manifold Although this picture, you know are several megapixel pictures. So each point here is living in this huge dimensional space Although all the possible variations are restrained to a subspace. Okay, so that's the training manifold the data manifold Okay, check out their website Okay, so we have here a very cute doggo and on the other side a less cute bird Sorry, I yeah If you do a linear interpolation between the doggo and the bird what are you expecting to see in the middle? And for this one, I'm gonna be actually turning off the light that turning back on the next soon after Again So a blurry image what do you mean? So what do you expect to get over here? Roughly, yeah, you can talk back So I'm gonna I'm gonna be doing 100% these one plus zero 90% these on plus 10% 80% These on plus 20% these one. So what you're gonna be seeing here You should guess you're less than usual. So you actually had to talk back more than usual Flying squirrel. No, okay So you have a dog you have a bird if I do like a linear interpolation of these two images in pixel space What do you get? Second sorry Yeah, super in superposition right so you're gonna get something that looks like this You got get basically overlay of the first image with a second image. All right So let's get back here and instead let's do a interpolation in the latent space of the network So I input both those two guys I get the two and representation those hidden layers Then I do a linear interpolation between the hidden layers and then I do the decoding part What do you expect to see here and you still have to talk back to me? You had to shout The background still a mess, but then what's gonna be the subject? Right, right. So you're gonna be starting seeing a birdie dog and then a doggy bird. Okay, and that's how it looks birdie dog and a birdie birdie Birdie dog and a doggy bird. Okay. All right This is a network that did this Atrocious right is blasphemy. You should watch full make a full metal I chemist episode where the father takes The daughter and the dog and make something similar. It was so interesting. Okay, if you watch cartoons Brotherhood right the second second season, okay I Don't know because it's a lot of green. So I guess it got rid of it. I don't know I guess it. Yeah, I don't know Good question though Good eye. So let me show you a few more. So this stuff comes from Brocks article and that okay, you have a smaller like So the first one you have a small version of those images. We start from like something like it looks like a shark or Manta, I think is the name then that stuff looks like a polypus or how do you call them? again Octopus right now Yeah, something like octopus and then the octopus becomes like a monkey and then it looks like a dog Right, so you can see the shark becomes octopus that becomes a monkey that becomes a dog It's interesting like how you can change breeds by just having the final two points fix and then you walk through the latent space So you have another one here. You go from these Puppy here to a bird. So again, you have a birdie a squirrel Whatever and then you have a bird and then the other side you have a doggy Or this one you have like the smelly guy. What's called? Shunk Skunk, thank you. Oh, that's why you smell like a skunk is the same, right? Okay, I see so that's you have a skunk here and then you get to a dog Right, but it actually looks a lot of a dog after this thing here and finally you get bird turn into a fly I think these are pretty awesome Examples, so this should make you hungry such that I can feed you with the second part of the class Right, so you have these images one side You have these images on the other side You get the embeddings and they do some basically Interpolation of the embedding you can check out the paper for more details The point here is just to show you that the difference between interpolation in in pixel space Yeah, what is the difference between interpolation in pixel space in the latent space? So the latent space capture what is the basic semantic of an image and therefore you go from the pixel space Which kind of doesn't really? Play well with our tools to kind of more Well-behaved space which is our internal hidden representation of the date of the of the network again This is just to give you some appetite right not I'm not gonna be I'm not formal at all What next okay, you can zoom on dogs you can shift you can shift on the X on the Y You can do change the brightness the interesting part here is that when you change the brightness actually you change day to Day to night or night today because that's what the most normal brightness change in pictures looks like okay So whenever you change brightness you actually change the time of the day or you have a 2d rotation or even a 3d rotation It's so interesting means that a network can somehow have a Internal representation of the 3d world that's Something Jan was mentioning yesterday if you shift, you know a little bit and you see this kind of parallax Then the easiest way to express To basically address these you know phenomenon is actually to imply that there is a 3d world, okay? So in this case they train the network in order to actually be able to you know handle specific transformation Actually, I have to put the reference on this paper again. I'm just giving you some How to say I came this okay, I'm not giving you any any formal thing right now In this case actually I like this on so much and you can get the anime version of your Picture right so you could try this up. I don't try with hentai right? We don't want to do those things Okay. All right So some of you actually know this stuff interesting Right the one that don't it's okay. Just forget. All right Okay, this is other stuff which is pretty cool like you can go from lower resolution to higher resolution From the left hand side to hide right hand side and here you have a few examples But this is of course is black on white It's much easier to do this stuff and the same for the zebra and this is a pre deep learning technique So they were just using some hierarchical model nevertheless. It works pretty well And then a few years later you actually get Garcia Getting you you know really Hi, this is like the Upsampling by doing like linear interpolate. Sorry by linear interpolation and instead this one is gonna be the Upsampling down with the neural net and the final one is actually the real image You can see clearly on the third row how these Asian dude became European Why is that By us right correct So the network has seen a lot of white dudes and therefore the most the easiest way to reconstruct a kind of unknown Face features it can be to plug there. I have a white dude face or this lady looks like she has in she's having a stroke Because she we don't have many side views of these images, right? This this lady became a change sex On the bottom side and then all this guy looks like he had an accident Because again, we didn't have many glasses on the data set Therefore the network, you know saw some very dark thing and then you know implied someone just punched him very hardly Okay, but again, this is very old stuff. I mean we are four years in the future now Again, these are the first results and this allows you to Leverage, you know actual data in order to fill in the gaps and the gaps are what are the actually actual detail We have very very many new Results recently, but these are again are the kind of pioneering result in pioneering examples One more here you basically block the face with a gray square and then you ask the network to reconstruct The face such that it gives you The best-looking Closing point right so you take an image which is staying on the training manifold You put a patch on the face This patch will make the image go away from the training manifold and then you can do for example gradient descent in this Energy space such that you can find what is the closest point like the point with the lowest energy That is associated to that specific initial image Okay, so you get an image you perturb the image that makes it go away from the training manifold And then you can do gradient descent in the energy landscape such that you can pick the whatever sample looks like the most You know it's on the closest sample on the training manifold and this is some stuff Yeah, and was covering yesterday about energy base model whenever you learn an energy Then you can actually use the energy to do inference to do inference you actually had to minimize The energy right so energy minimization means inference not training training is something else Again, we are going to be covering a more detailed energy base models in the following classes Here you have another few examples one using a variation of encoder and another using generative adversarial net This is also another example from read This is crazy. You can go from English description to actual drawing of What the English description mean right so you go from a sequence I guess to a vector Which is like the concept and then from the concept you use a decoder So it's a generative net which is going to be decoding your specific output and That was pretty much it for the eye candy. So This should make you very hungry for the second part of the class are you hungry? Yeah, I didn't have dinner in there either, but okay. All right, so Out encoders. What are the stuff unsupervised learning? So this is our first model. We're gonna be Diving into in order to see how we can train a network without targets or labels. What are targets? What what was the difference between target and label? You already know everything someone else Prediction means both of them are actually given both of them are annotations Thank you. So labels are gonna be categorical. You can label things. This is a chair. That's a table. That's a door Targets are gonna be, you know Target, right? Okay. All right. So we're gonna be observing how we can train this model without actually having targets or labels So this is a first network, which is the architecture. We are gonna be playing with It's very similar to what we observe so far But the big difference is that we start from the bottom. We know already why it's pink You got to a intermediate hidden layer in green and then you get top to the Input So the output of the network is gonna be the prediction of the input, okay? So you will have also a different kind of representation you may have So that's those are the equations the hidden layer is gonna be the rotation Squash rotation on my input and then the final output is gonna be the I guess squash version of the Rotation of the hidden okay, where rotation means a fine transformation We have some dimensionality so we shoot towards D But then both X and X hot leave in the RN Therefore the second part is gonna be our generative network This one that goes from H to X hot is gonna be my generative net and here you have a different diagram Which is basically doing the same thing But this one is you know some people prefer these diagrams where the actual transformations are made in boxes Okay, so you have people saying this is a two-layer neural net although we know that is a How many layer neural net? How many balls you see three layers neural net or fantastic? That's our my convention, okay? If we use Jan's notation, then we also have Okay, hold on so you actually can have some tights weights in order to be trying to reproduce somehow PCA, although You don't have guarantees over the order of the different weights of the different Basis, but if we use young you take notation, we are gonna be also adding those kind of little Projectiles there which are representing the transformations, okay? So why are we using out encoders? What's the point of? predicting my input Let's say I use a identity matrix Okay, I provide input which is a vector. I multiply this vector I do a identity matrix times my vector. I get the same vector There's an out encoder, so you get the same thing as you put it in Why the hell are we doing this? So you don't have the input when you predict but then I can just learn the identity matrix, right? I have identity matrix. I put something in I get something out. I put something in I put something out I Get put same thing in I get the same thing out So the most thing the most trivial thing to learn for a network would be the identity matrix, right? I the thing I put inside comes out and that's how we train this stuff second sorry If D is less right, okay, there is a first point. So if we have a Fepin if we have an intermediate dimensionality D lower than n Therefore we can start seeing what this can be used for for example could be used for compression So if I have an intermediate representation, which takes less space than my input representation I can use this encoder as a compressor Then I have my hidden representation my code which is you know addressing what specific input is and it takes less space So I can use a you know image Compressor for example here. So that was my initial idea for out encoders But that's just one type of application and it's kind of you know not the proper way of thinking about these guys So an out encoder task is to be able to reconstruct Data that leaves on the manifold, okay So we have a data manifold. We get some points. We have data points I use these points for training my system and I'd like my out encoder to be able to reconstruct only Things that live on the training manifold on the data manifold. Okay, so that's our that's actually what the task What is the task of these out encoders only to reconstruct a small subset and I get in tripped off by my microphone one sec I Only we should be able to okay. This is too short. Okay, we should be only able to reconstruct We had to enforce only to be able to reconstruct a small set of possible input Okay, now it becomes interesting because if you can only reconstruct a small set of inputs Then you cannot reconstruct things that are away, right? And so for example like before I show you I Have put I put a picture. I have a picture and I have a gray box in front of my face So I take my point and I take it away from my training manifold If I try to reconstruct it and I network and only reconstruct things that are on the manifold It will reconstruct something that is here which doesn't have That patch on the face, okay Do you see this or no? So if you're only constrained to reconstruct things that have been observed during training any variation that you Apply to the you know new inputs later on during when you're going to be using this network It's gonna be removed because the network will be Insensitive to that kind of perturbations. So let's see a bit of more details about this stuff. Is it clear so far? Yes, no Okay All right, so let's figure out what are the reconstruction losses we have we can use so the first one We have the classical, you know loss for the overall dataset is going to be the average between my first sample losses Okay, and so there are two percent losses the first percent loss is going to be The binary binary cross entropy Which is going to be penalizing a lot if you make a mistake So the output the the targets are going to be zero or one So you have a categorical distribution and then your input is going to be Sorry your output is going to be something that also Leads between zero and one so you have a sigmoid network a sigmoid a nonlinear function at the end And then you get you try to minimize this guy here Otherwise if you have real value inputs and outputs which are for example images color images You may want to use the MSc. Okay. All right, so As someone yes, the your friend there mentioned before we have you know pretty It's pretty obvious obvious to think about like a under complete hidden layer a hundred complete hidden layer Has a dimensionality which is smaller than the size of the input So in this case The network cannot perhaps copy or use the identity metrics because you have an intermediate representation Which is smaller and then you had to expand this one back to the original dimensionality Again, you can use a under complete hidden layer auto encoder for doing compression for example Okay, so this is pretty standard. I would say does it make sense far? Okay, so we're gonna play with this in a second on the notebook Nevertheless, I will say I actually like this one more and you are gonna be telling me why You should be able to I mean you should have all the ingredients It's gonna be what's this is the six week. I think seventh six six week seventh week, right? I think I'm not sure Why do I want a larger intermediate representation? Expanding it into a space that's larger than the actual input space Okay, did everyone hear him No, try again. That was correct intuition. Just shout a little bit louder You want the network or the encoder to Encoding to extract as much information as possible from the input So if you expand it into a space that is larger than its original The visual space it belongs to then the features It's easier for the network to extract more features out of that because it's more sparser Right, so we always said that the larger we go in the intermediate representation the easiest is gonna be the optimization, right? and so Although the information that is contained in the first layer and in the hidden layer is gonna be the same I can't add information But it's much easier now for the network to play with a Representation that has you know much many more dimensions The point is that we can simply learn now the identity metrics and we are gonna be just copying everything You know you copy the first guy in the first post Spot second guy you copy here the third one you copy it here And you copy everything through you haven't learned nothing had learned the identity Therefore, we have to learn we have to apply some other types of constraint For the information and so we have to learn how to Introduce now we have to introduce now a information bottleneck. Okay, although we enlarge the intermediate representation We have to constrain the representation. We have to constrain the possible configurations that the hidden Layer can take The input layer can take as many configurations as you want The hidden layer should be only containing the represent the possible configuration that the training data the data on the Manifold can have okay, so the input can be everything you want But you're gonna be training only with data that is on the manifold Therefore the hidden layer has only to be able to model to capture what is the viability within the Training data and be insensitive to anything that is outside Okay, such that we can have a selective reconstruction of a subset of these in very large input space Are you with me? Yes, no Unless We are gonna see now how we avoid overfitting of on the training data So a few there are a few ways to do this make this stuff on the right-hand side work And moreover you may want to have you can have the same rationale also for this less than side guy here So let's say I have a super awesome decoder Then my encoder should could simply put all my training data as you know first training data is First point second training data is going to be number two third training data for St. Paul's gonna be number three So I can associate each of my training data as one number one two three four five six seven Whatever and then you have the decoder has memorized all the training data points And then you just output the training point that you want from these kind of selector, right? So you potentially may only need only one ball here one only one neuron in the hidden layer In order to have a network that does overfit as long as the decoder and the encoder are very powerful Okay, so the point is your colleague mentioned. Yeah, how do we avoid overfitting this stuff can overfit to? This stuff will overfit unless we are you know Kind of you know careful about how we design these things and so there are a few different methods There are contrasting methods And there are there are regularized methods and there are Architectural methods we have seen yesterday as well as something we can be corey now a few of these And we have 20 minutes left. Okay. Yeah Next slide. Yeah So do you know is it out encoder? How does it work? I take my input the pink one I? Put it away from my original point So these are my training and this is my data manifold each of these points are gonna be you know samples I'm gonna be providing for the training. I take my point and I displace it, okay How I just have random crap. Okay, cool. Now. I enforce the network to be reconstructing that Initial point right so these are the noise and out encoder you take a point from your training manifold You take it you move it away And then you enforce the network to take it back Here I take the same point I take it away on the other direction And I put it back here. I take the same point. I put it another direction and I put it down here Okay, so what are we learning right now? We're gonna be learning a vector field Which has everything coming back to this point Then I start moving around my training manifold and I have all this kind of vector field like the vector field is gonna be all pointing towards the Training sample right, but if you have a training sample here and a training sample here This guy will try to attract here. He's gonna try a tracks there So things that are on the manifold stay there things that are outside manifold will be you know collapsing towards the manifold. Okay questions Okay All right, so actually there is a caveat caveat, right? Caveat caveat. How do you say caveat? Thank you All right, we assume that we are injecting the same noise distribution. We are going to observe in reality In this way we can learn how robustly We cover from it right so if we assume that we have Access to the type of perturbation we are gonna be observing later on at inference Then we can train the model to be insensitive to that to those kind of perturbations And this is a very big if Therefore, oh nice pictures. Okay. All right. So this is my training data This is my the pink points and I'm gonna be turning off the lights So I have here my pink points which look white to you Then I have the orange points without which are the displaced points Okay, so they are Originated from these points here and then I displace them in any very many directions Then I train my network to get all these orange points back to original starting point right and so This is the output of the network. I input this cloud of orange points in the network I train my network to output the points on on the actual spiral and therefore these are the blue points They're gonna be my reconstructions of the network. Okay, so if points were already on the manifold They didn't move if points are far away from the manifold. They moved a lot Guess what I can measure how much they moved and that's gonna be your energy How cool is this? Huh? Okay, maybe you haven't understood yet. So in this case in order to be a bit more thorough I just send every possible x y combination on this plane inside my network, right? So here you have this line because because all these bottom left corner got squashed down here And then you can see here points are quite sparse And then they are very very many of them densely occupied manifold nevertheless. There are a few points here, right and so this is Showing you with colors. What is the distance those points have traveled? So points over here in the bottom left corner have traveled one unit and they got down here I think points over here and travel also like 0.9 something and they went down here as As you can tell points within these two branches Didn't go anywhere Yeah, why is that? They are attracted by both points on this side and both on this side, you know on average during training Nevertheless, if you forget about you know having stuff that is curling on its own you have everything just drops down to here Guess what? I can put the Points that have just move a little bit again inside the out encoder And I can keep doing this a few times until these points collapse down to the manifold, okay? All right, or I can do something cool I'm cool, but it's a trick right so what did I do here? This is my denoising out encoder where I get to the initial point Where did I where I start from my displacement from right? So I got my initial point at this place and then I forced the network to go back to the initial point What happened here? How did I fix this? How can you fix these? Ridge here in this black this dark region Any guess You can send them randomly or Hold on someone there top right Push it up. How do I push that up? Oh, okay pushing up would be also very good So I also try to push up everything that is not on the manifold didn't quite work What did I do here it's a very It's a hack. Okay. It's not elegant. It cannot be done in a high-dimensional space So what I've done here is gonna be make your point fall on the closest point on the manifold And so I did an exhaustive search of the closest point on the manifold And then I enforced my network to always make my points fall on the closest point although They were maybe generated from another initial point, right? So if this point over here Initially originated from here, but it's gonna be always falling down on this direction Just a few points will not be you know, they are just in the in the middle way and they don't fall anywhere All right, so No in the more dimensions everything is far Right, so it doesn't quite work. Oh, we haven't covered that notebook yet. Okay. I'll show you next time. I guess well next next next time Yeah, so in this case I've done the denoising auto encoder in this case I got the displaced point to fall onto the closest point on a manifold so I did an exhaustive search It's it's simple because I have hundred and fifty points here, but it's a hack You cannot do that in reality Anyhow The point is that we we kind of have developed some kind of understanding And this one instead that the un really likes but I am not yet able to make it work So I guess I'm not that smart yet. It's a Regularized auto encoder in this case. I have a L1 regularization term cost on my hidden Representation so I force my network to come up with hidden representations Which are short and they are like short of a few dimensions, right? So if I have a L1 regularization on my hidden representation, I will only have a few items Active at a given time the problem is that if you set all those other Elements to zero and then you have zero gradients to send back. Okay, and so then you may want to use target prop and other Cute fancy things and I still am working on this So I have no idea how to make it work the point is that this is the regularization term So this is the L1 penalty on the hidden representation and this black dark here Region should be actually extending all around again. It's very Hard for the moment for me to get this to work not saying that is impossible. I'm just saying that I'm not smart enough Alright Contract the auto encoder and then we're going to be seeing the notebooks. Let me turn on the lights such that or maybe not I don't know. Shall I I? Like it so much dark Okay, whatever Okay, back on on the camera All right Again data manifold points training points. What is the contracted auto encoder doing? So this guy here. It's simply have the reconstruction term plus That thing here. What is it? gradient of my hidden representation with respect to the input Norm square in the overall loss, right? So my overall loss will try to minimize the variation of my hidden layer given variations on the input Okay, so here you want to have a representation for the input which is not changing that much as I wiggle my input Okay, and so this one basically makes Makes you insensitive to what makes you sorry insensitive to Reconstruction penalizing sensitivity to the reconstruction directions. So you actually will be able to reconstruct things over the manifold, but it will make you otherwise Insensitive to any other possible Direction and so this one we don't have an assumption over the perturbation I'm applying I just insensitive to everything, but then I still have many points here. So you will have to minimize the Reconstruction as I provide different samples Okay, and yeah penalize and incentivize as well, and that's just penalized Finally Okay, ten minutes left. Finally. What does this auto encoder do? As you can see I can use my plot sleep very well Here we have this training manifold, which is my single dimensional You know thing going in three dimensions and here I have all those data points. Okay Cool, so The X leaves on these set of data and It lives in our end what an out encoder has to do is gonna be basically Getting that curly line stretch down in one direction, right? And therefore you have there your Z in this case called latent space And so you get the first one there and then the second one over there the point is that How do I know how can I go from these back to here? I know if I'm in this first location, I can go back to this location I know if I'm this location I can go back there. I'm not entirely sure what's happening here There's no I only have training samples, right? So I only have the correspondence between points in the input space and points on the latent space. I Don't have any correspondence between Regions of the input space and regions of the latent space. Okay, so as as of right now You only know how to connect inputs to Regions here in the latent space and how to get back then we have learned that the denoising out encoder takes the input Shakes it, but you enforce to go back here same point and then you go back to the other location To define a location, right? So you take this one you shake it It's gonna be always going here and then you'll get back to the correct location or the denoising the Contractive you're gonna be the input and every you try to penalize any possible Wiggling of this one when you wiggle this. Okay, this is contract about encoder. Nevertheless, how can I? start from here move around and Get something that actually looks like a decent output Meaning if I translate this one If I have a dog here and a bird here embedding the latent space If I move on this line, how can I assure that the things on this line here will actually look like? Meaningful transformations in here. We don't know that right now We only know that this image is connected to this point this image is connected to this point We don't have any knowledge about what kind of behavior or how well we have the space is Whenever I move in this space Convert down here, right? So we don't know how this decoder that goes from the heat latent space to the Input space is behaving when we are not exactly in the points, right? So right now we have points mapping only next time we're gonna watch in we're gonna be learning how to map regions of the Input space with regions of the hidden space. Okay, right now we have point point All right, so notebooks in the last seven minutes Thank you for sticking with me Yeah, I'm moving too much. I'm I'm I'm panting CD work github PDL Jupiter conda conda activate PDL Jupiter notebook Okay, so I'm gonna be using this out encoder just the Number 10. Okay over there. It's invisible. I know but it's gonna be number 10 All right, so I'm gonna be just executing stuff Through all right. So what are we doing here? Let me see We import some random crap Can you see right? Yeah, you can complain if you don't see things, right? I can't check too many things So we import some stuff. We have image conversion routine, which is simply adding Adding one and then multiplying by zero half because I have otherwise my data When I get my data, I try to get it zero mean and I have also in a range That is between minus point five to plus five so it is centered then here to get it back. I Some one so instead of being zero mean you go 0.5 mean and then I actually have Also, it goes from my it starts from minus one to plus one then here I just some one so it goes from zero to two and then I get it zero to one This is just some displaying routines and here. Okay. I show you that I subtract 0.5 and I divide by 0.5 my data. Those are the MNIST digits from young website Here we set the device If you want to run on CPU on GPU and in this case we have images that are the digits We saw that when we were training the convolutional net, which are 28 by 28 Pixels and in this case, I'm going to be creating an auto encoder which has a 30 dimensional Intermediate layer right hidden layer. So we go from 784 to 30 and time and then back to 784 So here is going to be my auto encoder model Just a linear layer a fine transformation like to 28 square to D Hyperbolic tangent and then I have the decoder, which is my generative model Which goes from the hidden space the latent space, which is D to 28 square and then I have again a hyperbolic tangent such that I limit my output range to minus one to plus one and And my forward simply is going to be send things through the encoder and decoder. I Create my model and then I create my criterion, which is going to be the msc loss learning rate and Adam optimizer and So this is going to be the training part. So you're going to be having whatever 20 epochs the first part is going to be Sending the images through the model. This number one, right number two is going to be computation of the loss Which is line number 15 Then third point is going to be clearing the gradients. Otherwise we accumulate. So that's number 17 then we do back propagation computation of the partial derivative of the final loss with respect to the weight That is number 18. And finally we do a step Backwards in the direction of the so you see this direction of the gradient you set backwards I'm talking a lot because the computer is just training. Okay All right, so you can see here. We went through 20 epochs And actually let me okay, let me show you how they look. So these are the reconstruction of my network, okay? These are the output of the network given that we compress them to that's 30 dimensional Intermediate representation. I'm going to show you the currency in a sec Let me change this one to the to add the knowledge in out encoder So here I create a dropout module, which is randomly turning off Neurons, I create my nose mask noise mask and then I create my bad images Which are multiplying those images to this binary mask and then I send to the network Those bad images these alterated images and I train this stuff again We also like to get to 500 dimensions, right? So it is all our complete Hidden layer and then we train again. Okay, this is correct And this is training Okay All right, so just recap of what's the difference between the previous training and the current training All right, so we were saying that Before we were just using a under complete Out encoder. So we were going from a 784 dimensional as input to a 30 dimensional hidden layer But now we are going to be using a over complete Still I use 500 here, which is less than 7 784 so one proper question would be why Are 500 dimensions? Like why does a out encoder with 500 dimensional hidden layer is considered or can be considered over complete Think about the number of pixels that are black, for example on average in these images All right, so we actually are already around this part So we were down to the training How does the training changes right now? I I have a dropout mask here Which is allowing me to introduce a some kind of perturbation on the original images Then I have my noise, which is simply applying a dropout mask on a Vector of all ones. This is going to be useful for later visualizations Then I create my images bed the perturbed images Which are simply the multiplication of my image times this noise So if we didn't have any Neuron dropped the noise would be just those ones and then you get one times image So you get the same image Otherwise when the the neurons are dropped and set to zero the image now is going to be Multiply by zero for those specific values of the pixels. So these image bed are images with black dots Then I input inside my model these image bed Okay And then the criterion it's between like is the distance between the output And the original image, right? So before we were Like in here we are inputting these perturbed images inside the model So those are points that are outside the training manifold But then I enforce them to be the original point, right? So you get the original point you perturb it So you put it away and then you enforce the network to actually Output this on so it's going to be trying to contrast any kind of perturbation that happened to this original point. Okay All right, the rest is the same, right zero grad backwardness step So this is also a train and we can check how these reconstruction look and If I can remember from the previous iteration they look much more Actually clean because I guess we are using a lot much larger Um Hidden layer, right? But before we couldn't use such a large hidden layer because you would have been overfitting, right? If you try to Reconstruct things that are always in the same point. We just can copy them over in this case You can't copy right because the input is not this point, but the input is actually the displaced point, right? So you learn a vector field that is bringing you back to the original position on the training manifold Okay, so let's go down and let's visualize the actually let's have a look to the previous filters I didn't show you so these are the filters of the Out encoder with a under complete hidden layer. Okay, and so you can see here There are some kind of patterns in this central area of these filters So these are the filters which are simply my rows or my w matrix that have been reshaped in a Basically an image such that I can visualize For you. So in this case in this notebook, we are not using any convolutional network We are just using those images that have been unrolled into vectors and they are compared. They are, you know, multiplied like scalar product against the The vectors right of the of my matrix. So these are the rows of my matrix that have been Reshaped such that you can make sense of what they represent so here for example, it looks like there is like a Loop upper loop detector or not a detector because there are purple, right? So these are would be negative the output Here you have like a zero here looks like some eight or three and then you have this kind of Right this this this kernel over here and that has basically learned nothing Or well, that's the only one that didn't learn much Uh, moreover, you can notice that all those points outside the region where the number happened Or any kind of interesting thing happened all these points are multiplied by a constant, right? Because it's outside the digits and so things don't change there Um, and therefore these noisy kernels over there, you know, the on average They will produce a score of zero, right? And therefore you're gonna have that the network Didn't care about giving any specific value to these outstanding these out out out there Other point out their pixels because again on average, they will not contribute to the final score What happened now? Uh, when we are inputting the data which have variable amount of Like pixel set to zero now these points will matter, right? Because it's not longer Continuous the value of the image. And so if I show you the new kernels Boom How cool is this, right? This is completely different. Can you remember? So here you can still have some pattern, right? But then in the majority of these kernels Regardless, like if you don't consider the one that did not did not learn anything So this kernel here are didn't learn much But all the other kernels that I've have learned some kind of specific edge filter or shape specific You know shape filter all the outside pixels have now been set to some Zero value. I think of some some value that is uniform, right because Again, the input images now are no longer constant in the areas outside the digit And therefore the value of the pixels well the value of the Of the value of the kernels in those specific specific regions now do matter. Okay. This is a big big difference And again this this this maps here these kernels here didn't learn anything Um, okay, so let's now compare our denoising out encoder with state of the art algorithms for denoising images So here we are going to be importing some functions from the opencv Library there is the neighbor strokes and then the tele algorithm So let's import it And let's see how this stuff look So here the first image is the noise image the one we generated before these are the maps of all ones Where we have dropped out some specific values to set to zero, right? So yellow is one and purple is zero in this case Then the second part is going to be The bad images. So these are the bad images meaning purple is minus one yellow is plus one And then this green is the zero value. So all those Uh Black points like purple points in the first row are here represented in uh in green So those are the values that are being set to zero. They're a mask value Uh, then we have the original images And the reconstructions from our, you know, is you know the coder which look reasonably, okay If you think that half of the pixels were actually a miss, right? So these are Like half of the pixels are provided to the network and then the network actually reconstructed What that what looked like the original image more or less, right? cool one, uh, so let's now have a look to what the Uh, state of the art algorithms output are so We're gonna start with the tele and then neighbor strokes And so this is tele and this is neighbor stroke, right? So as you can tell here the quality of our model Is clearly superior in terms of, you know, quality of qualitative output Um, nevertheless the attention that this model works just for this kind of specific perturbation that we have introduced And then we have learned how to counteract, okay so again A Encoder that we have trained in a in a minute performs much better than state of the art Computer vision algorithms again when data is available. Okay Um, and so I think that's it for today. Thank you for listening. Subscribe to my channel The non-denotification bell if you'd like to have information about latest videos and follow me on twitter. Peace. Bye. Bye