 Great, so let's look a little bit. Now we talked about all the different architectures, and it was very fast. Let's talk a little about what the effect is. This graph is really nice at highlighting this. So what do we have here? On this axis, we have the runtime basically that we need for training something. And on this axis, we have the top one accuracy. And then as the size of these dots, we find how many parameters there. And look at that. Relative to a lot of models that came afterwards, AlexNet actually has a very large number of parameters. And VGG even has more parameters. And they have a lot of parameters. And they are eclipsed by later models. So what do we have here for the later models? Now, it almost looks like there's a certain frontier here, where at least within this range, having more runtime allows you to be better here. And what we see is that there's a little bit of a tradeoff. Now, look a little at these outlays here. There's MobileNet, these very small number of parameters, actually very good performance. And if you have some time, I highly recommend looking at these. But this gives you a little bit the flavor of the tradeoffs that are here. None, of course, by now we are constructing even much bigger models than they had back then. But in a way, these models with top one accuracy around 80, 75, 80 percent provides incredibly good performance on the ImageNet dataset. We can consider it solved in a way now. Okay, let's talk a little bit about other applications of ConvNets. The first one is, of course, we can use ConvNets for image recognition. And there's, of course, by now a lot of different products that do such things and recognize objects. We can use ConvNets for style transfer. Let's talk a little bit about style transfer. Now, you can say, here we have an image, we would like to be able to render that image in different modern art styles. Now, what do we have? We can say that content, what is there, is measured in a way by the deeper layers in the neural network. Whereas, what about style? We can say the style of an image is measured by the local correlations between feature vectors at lower levels. For example, what does that mean? Now, like, if we have a style like that, it means that nearby points tend to have similar color. It also has, if there's a stripe here, the stripe will tend to continue. And we have that here. We have that in this style. It's very different with respect to the distribution. But we can still say style and good approximation is something that's a local property. And then how we solve that deep learning style. Well, we define a compound objective that contains one. We want that the high levels are similar. So we want to minimize the content difference between the new image and the content template at the high levels. And we want to minimize the style difference between the new image and the style template at the low levels. And then putting that together allows us to do style transfer here. We can use it for image segmentation using something that's called units where we take an image. We basically go and do the convolutions as we go down. But we then also go back to high dimensional data using up convolutions. And what we do, but we have the side channel, like where we basically allow the input image to also be used. Such architectures are very good when it comes to, say, segmenting biological cell membranes. So what do we have here? We have up convolution. And it's an example of how to make fully convolutional networks, which go from pixels to pixels. Up convolution, what is that? In a way, it just means up sampling. Another image, we go from a lower dimensional space, maybe one where we'd normally get to by using a max pool. And we then up sample it and then apply a convolution. It allows for the refinement of up sample by land weights. And it goes along with decreasing the number of feature channels. And it produces the kind of connections that we have in the U in this architecture. Now, confidence are being used for face recognition. So I give you a database of K persons. You get an image, an input image. And your goal is to identify the image of any other of the K people or say that it's not in there. Now, like, I want to know, have I seen this person before? And if so, what are the other images where they're in? So in that case, we would want to have one shot learning. Now, like, I cannot give you 1000 images of Conrad. What I want to do is I want you to learn about images and then be able to take one photo from Conrad and generalize. You want to recognize a person given just one example. And we call that one shot learning. So if you train a network, no, like one way of doing that is give take that one image and train on it. But of course, that wouldn't produce good generalization, because you just have one training example and one output. And if a new person joins, you would immediately have to retrain your whole confidence. How can we solve this kind of problem? Well, the idea here is that we instead of learning to recognize, we want to learn a similarity function. And then we can solve that one shot learning basically outside of the confidence. And what I want is similarity metric that tells me how similar is this photo from this photo when we can say if Conrad is visible on both of them, I want it to be similar. And if Conrad's visible on one of them and Lila's visible on the other one, we want that they are very different to one another. And we can then say, well, if that new image is sufficiently close to another image of Conrad in some embedding space, we will call label that image with Conrad. And what we usually do is we use networks called Siamese networks to solve that problem. So what's the idea there? We have a network that takes one photo, it goes to the confnet produces a function of vector at the end, we have another image goes to the same neural network produces a vector at the end. And then we can define a distance between the two images. And that distance is basically the distance of the outputs of the network and the two norm between those two vectors. And now what's the goal here? We want to go through that. We have the parameters of the network that basically define how that output depends on the image. And we want to learn the parameters so that if the two photos are of the same person, the distance should be small. And if the two photos are of different people, the distance should be large. And we often use these triplet functions for it, where we take an anchor, one image, another image of the same person and an image of the of a different person. These confnets will all be the same. They're the same mapping of image to embedding. And then we have the triplet loss, which is basically the, basically we want the similarity to be high between the, between between these two. So we want them to be similar. So therefore we have that positively in here. We want these two to be dissimilar to one another. And then we want to have a sudden maximal gain that you could get here. Just so that the network can't specialize on basically just taking a small number of anchor positive samples and weighing them very strongly. And that's why this basically cuts off at a certain maximal value that you can have here. And now, why don't you try and understand that approach using the facial recognition exercise?