 So we're definitely not going to do any particle physics today, but today I'm going to give you help you do take a first few steps on training your first image classifier using PyTorch. Just before we start, can I get a hand for everyone who has used PyTorch before? Only a few. That's good because we're really going to start at the beginning. And so basically first I'm going to tell you first to start at what is actually an image classifier. That's not so difficult. Then we'll go by what's a neural network, how do we actually build one in PyTorch, and then finally what can we do with them. And in the spirit of this morning's keynote, I'm going to show you a bit how I played around with an image classifier and what cool things you can do with them. Okay, so first let's have a look at what is a classifier. So suppose we have a label training sets of points. So we have two dimensions, x1 and x2. And then we have a set of data points and they come in two classes, either red or green. Well, okay, that's cool. On this case we can see a pattern. So the green ones are on the left side of the screen and the red ones are on the right side of the screen. So we could say, for example, well, let's draw a line somewhere around here and then say everything that's left of this line is green and everything that's right of this line that's red. So now we have a model, a very simple model, it's just a line that can tell us whether a point belongs to one class or the other one. So then when we have an unlabeled data set, we have points like these, in this case just white points. Then what we can do is we can use the same line that we had before and use that to color all the points that we had into green and red. Well, so far that's a classifier and I'm pretty sure that most of you have seen something like this before. Now, sometimes you want to do something that's a little bit more complicated than using just a simple line. And in this case, we're going to use neural networks. Neural networks, probably most of you have heard about them, look something like this. Here you see a series of dots, circles and they represent neurons and they come in layers. So we have the input layer on the left, a hidden layer, there might be multiple in the middle and an output layer on the right. And they are connected by these lines. So how do these neurons actually work? Well, neuron looks something like this. You have a set of incoming signals on the left and all of these signals are then multiplied by a weight. These weights are the things that we use to train the neural network. They are the things that we must learn when we actually build one neural network. So what we do is we take those inputs, we multiply them by the weights, then sum them up and then in the end apply some kind of activation function of which the main property is that it's nonlinear, which makes the neural network more capable of learning stuff. Typically what we use, and we'll see that more today, is we use the rectified linear unit, shorthand is just value function, which is y equals the maximum of 0 and x. So basically it's 0 if x is negative and otherwise it's x. Okay, well that's fairly simple. So how do we actually use a neural network like this? What do we do with it? Well, in the case that we had before, so we had the 2D, we called these 2D points, we had 2 dimensions x1 and x2 and then we had these points that we wanted to classify in red or green. Well, we could use the neural network like this. We can say, well, we have 2 input nodes and one of them is x1 and the other one is x2. And then we use, we propagate these signals through the network with these weights and these functions. And then we think, we can say, well, we have 2 output nodes and then one of the output nodes is the probability that the point belongs to the green class and the other one that it belongs to the red class. So now of course, if you first make a neural network like this, it's not going to do what you want because you have to set these weights to the correct values. So what do you do is you take all those points that you have labeled before, you pass them through the network, see whether it makes an error and then tune the weights in such a way that it will actually perform better. So you have to do a lot of tuning before you get a neural network that actually does what you want. But we're talking here about image classification and not about 2D points. So what we want to do is instead of just having 2 inputs, we want to say when we have an image, well, in this case the left one here is a dog and the right one is a cat, that's something that we can easily see. But we would want a neural network to recognize something like that for us. And for that you cannot use a simple neural network like the one I showed before, but we're going to need something a little bit more complicated. We're going to need something that is called a deep convolutional network. Well, that's quite a mouthful, but in the end it's not so extremely complicated. Let's start with the deep part. A deep neural network is just a neural network that has more than one hidden layer. Preferably, in this case I think 10 or so, could be 20 or 30, but just more than a few. That's all that is to deep neural networks. And then the convolutional part. Well, we need convolutions in order to interpret those images. So on the left side here we see that the input to a neural network like this is an image. And what we do with these convolutions is we take a box and we say we aggregate all those pixels from that area of the image and then apply some function over it. We could say we could try to see whether there's a big difference from the light right to the left or the top and the bottom or whether all the pixels have approximately the same value. It's just some function that we apply using those weights of a neuron. And then what we do is we shift this box around across this image such that we get a matrix of the results of all this function across all the image. And in this case, you see we don't just use one convolution, but we get in the end four feature maps. So we could use four convolutions. Of course you could use many more. Then typically what you do after using a convolution is you sub-sample. So you say, well, for each out of two by two results we take the maximum result which is called max pooling or we could take the average result or something like that. That's just, we do that in order to reduce the size of our neural network because otherwise it might become too big and if we have too many weights to tune, it becomes too difficult to train. Typically after that we do more convolutions. So you have more future maps. In this case we do 10 more convolutions, then more sub-sampling. And in the end we make a fully connected layer and a fully connected layer is just one, just like the one I showed you in the simple neural network before. And then in the end we have an output, in this case two outputs. So we could, one of them could represent the probability of the image being a robot and the other one of being a cat. Something like that. Okay, well, a typical example of a convolutional neural network is VGG-16. VGG-16 is a big neural network and it consists of 16 layers and has in total 144 million weights. And what it does, I hope you recognize some of the ingredients here. We start with an image and this image is 224 by 224 pixels and then three layers, one for each color, red, green and blue. And then so what we do is we start at the left with two convolutions, then some max pooling to subsample, two more convolutions, more sub-sampling and then more convolutions, sub-sampling, more convolutions. And then in the end we end up with the blue areas. These are the fully connected layers that we have in the end, so a standard neural network at the end, but then big with like 4,000 nodes and in the end we have one layer with a thousand output nodes. So why a thousand? Well, in this case, this is because this network was made to be trained on ImageNet. An ImageNet has a collection of 14 million images that was annotated into a thousand classes, of which, for example, cat and dog. And so this network was trained by using a lot of computers to get like 90% accuracy on these 1,000 classes. So, well, cool, this thing already exists, but aren't we going to train our own ImageClassifier? Well, of course we are going to do that. And to do that we use transfer learning. So remember we had this network and this has already been trained. It has 144 million weights that have all reasonable values such that it can accurately classify all those kinds of images. Now we're going to make use of that by taking off this last part of 1,000 classes, just removing that whole last layer and putting our own layer at the end. Not necessarily a simple layer like that, but just the new classifier that we put on top. So we remove the end and we add our own layer. So this has the advantage that all the weights in the previous layers, they already have some reasonable values. This network already knows how to recognize sharp edges, round edges, strange patterns, all this kind of things, all the things that you typically see in a photo, it already knows how to deal with those. And then in the end, it comes up with a set of features and we build a classifier on top of that. Isn't that cheating? Of course it's cheating. But hey, you never get anywhere in life without a little bit of cheating. So what we want to do is we're going to make use of what people have done before and build our own classifier that doesn't classify stuff that they have trained it on because, well, if you want to classify images into exactly those 1,000 classes of ImageNet, well, of course, you can use the pre-trained version of VGG. If you don't, well, this is the way to go. You just remove the last layer and, well, then you're all set. So now you know exactly how to train your own ImageClassifier, right? Well, let's have a look at the code. But before that, you might ask me the question, why did he choose PyTorch and not Keras? Because you may have heard of Keras as well. Keras is also a library that allows you to do neural networks. But you could say Keras is there first or PyTorch is more flexible. You could say Keras is faster, which might sound very important. But the main thing that I think is important is that PyTorch lets you play with the internals. Basically that means that if you get to tweak the neural networks and not just import them and use them, that you learn more from PyTorch. So that was the main reason that I chose for using PyTorch. Well, okay. Now let's have a look at it. And here it gets a little bit technical. So bear with me. First, if we want to use PyTorch, we want to use a neural network, we have to define the neural network. And you do that by creating a class. In this case, we create a class that's called net. And this inherits from the neural network.module from PyTorch. And when we initialize this class, we first initialize the superclass, the module. Not so interesting. And then we define the four layers of our class. So first, there's the convolutional layer, which is a 2D convolutional layer with some parameters. We'll go over those in a minute. Then a pool layer, which is 2D max pooling. So this is a sub-sampling where we take from it, in this case, a kernel size 2. So a 2 by 2 matrix, we take the maximum value in order to reduce the size of our neural network a bit. And then we have two fully connected layers, like the normal layers that come after each other at the end. And then the second fully connected layer ends with 10 nodes. So we have 10 output nodes. Secondly, we have to define the forward method. And the forward method, it accepts a single argument that is called x, that is the input. And the input comes in batches. And in this case, these are 32 by 32 pixel images in three channels. Now if we then apply this convolutional layer, that's the first, the second thing we do, we apply the convolutional layer to x. And that converts it to 18 channels. So we get 18 different types of convolutions. And again, 32 by 32 pixels. Then we apply this value function, just to make it nonlinear, that helps our neural network with learning more complicated stuff. Okay, then we apply the pooling, which reduces the picture size from 32 by 32 to 16 by 16 pixels. And then since we are done with the 2D stuff, we have to reshape the whole vector to a single very long vector of size more than 4,000. Then we are ready to apply the first fully connected layer, again the nonlinear function, and then lastly the last fully connected layer, after which our output has a size 10. Okay, but hold on. We weren't going to train our own neural network, right? We were going to do transfer learning. Yes, that's right. What we first have to do, if we're going to do transfer learning, well, we have to import this pre-trained network. So what we can do in this case, I've chosen squeeze net, and I did that because, well, VGG is actually a little bit bigger than squeeze net, takes a longer to run. So I'd go for the easy option. So let's have a look at squeeze net. So squeeze net, you can simply import it and then instantiate it. Pre-trained is true and it will download the weights for you, which is a big set of weights, takes a while, but then you get a pre-trained network, and it's ready to use, all ready to use for you. But we weren't going to use that pre-trained network with a thousand classes, so we're going to modify it. Let's have a look at the internals before we modify it. So if we have a look at the internals, we see if we just print the network, it will show us all the layers, and we find that it consists of two parts. The first part is called the features, and it has a lot of layers, and I couldn't fit them all on the slides. I think it's 20 layers or so. And then you have lots of convolutions and pooling, and these revenue functions, all in a sequence after each other. And then in the end, there's the classifier part, which consists of four pieces of which you already recognize. There's the 2D convolution, there's the revenue, and then average pooling at the end. The first part is dropout, and dropout is a technique to help your neural network learn a little quicker by, while you're training it, dropping the inputs or the outputs of half, in this case, with a probability of 50%, so half of the neurons. That makes it impossible for the network to rely on a single neuron or a small subset of neurons, so it must make more connections to learn the same information, which basically makes it more robust. So what happens here in the classifier is we apply this dropout during training, then we have this 2D convolution from 512 to 1,000, and this is again where you see the 1,000 classes of output in the end. And then there's the value and the average pooling. So in the end, we have, again, 1,000 outputs, one for each of the classes that it wants to be able to classify. Now, if we are going to change this and make it our own classifier for our own classes, well, then all we need to do is, well, simply define the number of classes that we have. For example, four. Download the model, set it up. First, it has a parameter that says the num classes, so we can update that to four, although internally it's not even used, but let's do it to be complete. And then what we can do is we can take this classifier apart. Remember that the 2D convolution layer was the one with index one, and we can simply replace it with a new 2D convolution layer. That goes from 512, just like the original, but now to our number of classes, not 1,000. So that's all you need, and now you have a new neural network that you can train in order to classify your classes. Okay, now let's have a look at how you train a model like this. That looks like this. So we start with setting our model to the training mode. That's important. I'll get to why in a little bit. Then we need to define our criterion. How do we score whether the model is good or bad? In this case, we use cross-entropy loss. And we need to define an optimizer. And the optimizer in this case is stochastic gradient descent. We say, okay, these are the model parameters, and then there are some arguments that we'll have a look at at a later stage. And then we look through stuff that comes out of a loader object, and again, we'll look at the loader object later. And these are the inputs, so the images, and the labels, so the classes that you've labeled them to be. And for each of those sets of images and labels, because we do this in batches, you always process multiple images at the same time, for each of those sets, we first reset the optimizer because we don't want to use any information from the last batch. Then we simply pass the images through our neural network, and then we get some outputs. We calculate how good the outputs are, do the outputs correspond with the labels that we give it. Then we propagate this loss backward through the network, so we calculate for each neuron, how well did it do on scoring your training images. And then in the end, when we know that, we can optimize the weights. And then every time we loop through all our training images, we call this one epoch, and you're going to do this quite a few times when you want a classifier that works a little bit well. Okay, and once you've done that, say suppose you've trained 20 epochs, then of course you want to know how well does my model actually work. Well, for that, first we set the model to evaluation mode. So what is this difference between the training and evaluation mode now? Well, most importantly, it disables the dropout. Of course, if you're going to train, then it might work well to let your model use only half of the information in some of the stages. But when you're evaluating, when you're trying to actually classify an image, you want to make sure that you use all possible information that you have and disable dropout. So that's the most important reason why we always must call these eval and these train methods. Well, then we can say with no grad, which prevents the PyTorch to do internal calculations that you don't need. And then again, we loop through this loader, we pass the inputs through the model to get the outputs. We can get... So the outputs, these are vectors with the probability for each class, and we can get the maximum of these, which is then the class that would classify the images being. So we can get the predictions from that and we can sum the loss in order to get some idea of how well our model is performing. So I promised you also to have a closer look at the loader. So where does our data actually come from? Well, what you need to do first is specify where are your images on disk. And you do that by defining those image folders. And you want to have a separate train and test set. And so you define two image folders, one with a path through the train images and one with a path through the test images. But to both of those, you need to first also define a transform. So what methods will be applied to the images when they are loaded? And we define two different transforms, one for the training images and one for the test images. Let's first have a look at the test images. So what we do is we say we compose a transform, sorry, it consists of multiple steps. First we resize it to size 256. Then we crop out to the center, 224 pixels in the center, and then we transform it to a tensor so that PyTorch can work with it. Well, that's fairly simple, but for the training images, we do something that's a little bit different. What we do is we take a randomly resized crop of the same size from the image. So we don't always look at the same part of the image, but it could be a little bit more zoomed in or a little bit more zoomed out or a little bit more to the left or the right. This means that every time that we train an epoch, our model actually gets to see a different set of images. Well, the source images were the same, but the actual image it looks at is just a little bit shifted or zoomed. So it gets to learn not from the individual pixels, but from actually the information that's in the image. That's really important. Now once we've defined those train and test sets, we can define the train and test loaders, which are simply a data loader where we provide the data set that we want to use. We set the batch size. That's the number of images that we process at the same time. The number of workers is the number of processes that can process these images while loading them, and we say we want to shuffle them. That means that every time we train an epoch or we evaluate, we do this in a random order. For training, this is really important. For testing, it isn't. Okay, so we're almost there, but I skipped something that's fairly important. Remember that when we defined our optimizer, which is stochastic gradient descent, I said, well, there are these arguments at the end. And the most important one is the first one, the LR is the learning rate. And this is the rate at which we change the weights when we're training. So we need to figure out what is actually a good value. Now suppose that we have only a single weight, and I can only make a plot in a single dimension, so suppose we have a single weight, and we want to optimize this. We want to find the place right there at the bottom of this graph. Now suppose that we start all the way at the right of this graph, and we want to, by taking little steps, find the bottom of the graph. Then, of course, we want to make sure that we don't take, for example, steps that are too large. If we make steps that are too large, you could step all the way across the valley to the opposite side. And then if you're unlucky, you might even go so far away that you, in the end, step out of the valley and even reduce the performance of your model. On the other hand, if your learning rate is too small, then, first of all, it takes a very long time to get there. But in this case, you'll find this local optimum there and you won't fight the global optimum. So balancing this learning rate is really important. So how do we actually find the best learning rate for our problem? Well, the best thing that you can do is just try them out. Basically, well, here we define a function. We set the learning rate for the optimizer to a certain value. And then for a certain range of values, so this log space from some minimum learning rate to some maximum learning rate with a number of steps. And for each of these learning rates, we set the optimizer to this learning rate and then we train for a number of batches. And then, after that, we evaluate for a number of batches. So what you'll then find is that, of course, during the course of doing this, your model is going to, first, you're starting with a very low learning rate, is going to improve very, very slowly. And after a while, this improvement is going to be quicker and quicker and quicker until your learning rate is so big that it will go all the way away from your local or global minimum where you're at and the performance will degrade enormously. So what this will look like if you do this is something like this. And also, typically, what you see is that, first, you have some value of loss, and as you increase the learning rate, your loss will go down until, after at some point, it will go up way, all the way until your model doesn't do anything anymore. So what we found here is that, typically, something, well, like 10 to the minus 3 is the optimal learning rate. So that's what we set it to. But, of course, the optimal value also depends on the state of your model. If your model doesn't do anything yet, well, then probably a very high learning rate is good. Well, if it's almost there and you just want to squeeze out that last percent of accuracy, then probably a very low learning rate is the right way to go. So for that, we have the learning rate scheduler. We can use, for example, the reduced learning rate on plateau, which is a scheduler that whenever the performance of your model during training has reached the sort of plateau, it's stable, it reduces the learning rate, and then tries again. So after every epoch, we then have to call scheduler.step with the loss that we found, and based on that, it might reduce the learning rate. That looks like something like this. So while you're training, your accuracy goes up at the beginning, and then after a while, it figures out, okay, maybe we're stable now. Let's reduce the learning rate, and you'll see that it takes these steps all the time until, well, at some point, you decide that the accuracy is good enough. Okay, we're all set. Let's have a look at actually some data that I've played with. So, of course, if you want to train a model, you need data. So what I did is I took one of these, a Raspberry Pi, and I set it to work for a couple of months, and I gathered a data set of photos taken in the world's largest cities. I took 72 cities, half a million images of 10,000 photographers, all in all some 30 gigabytes of data, and I made sure that all of these are licensed for reuse, such that I can show them to you right now. So, of course, the first thing that you do when you gather a data set is you have a look at the images themselves. So first, I live in Amsterdam, so what I did is I had a look at a subset of the images that were taken in Amsterdam. So this is a nice one. Typically, we don't have weather like this so often, but, well, it's nice, right? Something very typical, but also that you'll find in Amsterdam are bikes. This is a very typical scene from Amsterdam. For those of you who have been there, I'm sure you'll recognize it. Okay, well, this looks good, right? Let's have a look at another one. So this is a really nice view. But hold on. This wasn't taken in Amsterdam. We don't have any cliffs like within 200 kilometers from Amsterdam, probably even more. So what's going on? Let's have a look at the metadata. That says Amsterdam, so probably someone thought they were taken in Amsterdam. Well, it wasn't. On the other hand, it also has a tag that says Dublin. Interesting, interesting. There are quite a few tags, actually, on this image. And it couldn't fit more than this. This is less than 5% of the tags of this image. And I certainly don't see a teddy bear museum on this image, or all kinds of these things. So it turns out people don't always tag their images as they should be. So I had a look at where all the images that were supposedly taken in Amsterdam were taken in the world. Well, around here. And actually, the bench that we were just looking at was taken right there at the edge in Korea. Some nice island in Korea. Well, definitely not Amsterdam. So what you can do then, what you could do, of course, we only want the images from Amsterdam that were taken right there, the red dot in the middle. That's where Amsterdam actually is. What you can do is you can take all the images, take the median latitude and longitude. And now the mathematicians will cringe because of course these are circular values and you can take a medium of the latitude or a longitude. Well, in the end, you can just do it and it works. Then what you can do is you can remove all the images that were more than five kilometers away and you repeat this for all cities. Then we have a clean data set, right? Let's do it. Okay, then after that, I had a look at all the other tags and thought of something cool that we could do. These were the most common tags in this data set. Well, of course, if you're going to look for photos taken in cities, then the most common tag is city. Well, but I think other than city, maybe this one, skyline here, is the most interesting one. Let's try to make an image classifier that recognizes skylines of cities. So what I did is I took the ten most common cities in my data set. Those were these. Here are the image counts. I split these into a train and a test set like this and then we train a model. Hold on. We wait. First, we wait. And the way they actually is quite annoying because, well, training a model like this takes a while. In this case, I took a fast GPU. I used my boss's credit card. He doesn't know yet. He'll be a little bit surprised. And then I spent, like, 20 hours or so on training time. And then I had a model. So then, well, you feed in an image and who knows where this image was taken? This is London and I got it correct. Okay. Well, that's nice. So this one, where is this? This is Sydney. And it learned that. All right. Well, that's cool. This one. Anyone? This is Toronto. I heard it right there. Okay. Cool. This one is tough. Where is this? This is LA. And actually, the model got it all right. So it's pretty impressed. So this is clearly Chicago, right? And then here we have Philadelphia. Got it all right. Again, this is Tokyo. Cool. Even this one is not really so complicated. It doesn't have too many buildings. I would say I wouldn't know it, but it got it all right. It's Houston. Then here we have Shanghai. And this is clearly Chicago, right? Wait. What? What just happened? Well, it turns out there was one photographer who labeled all his photos with the tag Chicago. Well, all he did was take photos of sandals on pavement. And my model, well, it got it all right. It learned that a sandal on pavement must be in Chicago. And of course, well, then when you feed this test image to the model, it gets it all right. This must be Chicago. Okay. So we need to fix this. Let's come up with a plan. Okay. First, what we can do instead of splitting the images randomly in train test set, what we can do is we can split them by photographer. In this case, at least all those sandals will end up either in the train or in the test set. So then we wait. It takes a while. This is really annoying once in a while, because, well, it's late at night. You want to do some hacking on your project. And then you think of a solution. You fix it. You start training. And then, well, you must wait till tomorrow to see the results. Anyway, the results of this one were terrible, because of course, well, if you put all the sandals in the train set, then it will get a very high train accuracy, but the test set accuracy will be terrible. In fact, it will just be over-trained on those. In the end, we just have too many mis-tagged photos. So I needed to come up with another plan. And in this case, what I did is I built another model. And in this case, so what I did, I took a model that classified only two classes, and I said either it has a skyline or it is not a photo of a skyline. And I trained on all data that I have, so half a million images, and I gave them the labels. Either it is a skyline when it has a skyline tag or it's not a skyline when it doesn't have the skyline tag. Then I could make predictions for all data and then only use the data that were labeled with a positive prediction for skyline for my original model. Again, then we have to wait. It takes a while. It gets really annoying after a while. But then in the end, the results of this were pretty nice. Out of those with a tag that had skyline, I had about 6,000 that were labeled by the model as actually having a skyline and about 1,000 that were labeled as not having a skyline. So I could just get rid of those. But in the other hand, I got 1,000 images that did have a skyline according to my model, but that didn't have the skyline tag. So still I ended up with about the same number of images. So I recreated this train test split that had to wait again. As I tell you, this gets really annoying and my boss will not be happy with me. And in the end, I got yet more results. And so as you can see, in the end, the accuracy was about, well, 70%. After 200 training epochs or so, so that's, I think, 24 hours of training, I think this was fairly reasonable. Okay, well, that's cool. Let's have a look at some of the actual results. So this was, it got it right. Chicago, of course, that's something I would also recognize. So that's cool. I think it actually learns to recognize some of these cities. Also Los Angeles, it got it right. In this case, the model said New York City, while in reality the label was Philadelphia. But to be honest, looking at this picture, I probably would have gotten it wrong as well. So sometimes it's not that bad. Here we have an example where the model says it's London, probably because of the bad weather, but it was actually taken in Toronto. Sometimes, though, you cannot explain the errors that the model makes, because in this case, well, although the skyline is a bit difficult to see, but you can see some high buildings in the background, but you can clearly see that this street definitely is not an American street, but it's something like in Asia. So this was actually taken in Shanghai, and it got it wrong. Yeah, so that was it. But before I end just some final remarks, training your own image classifier really isn't that difficult. All you need to do is cheat a little and do transfer learning, otherwise you won't be waiting for 24 hours but for months on end. Doing PyTorch is fun. Keras might be easier and faster, but PyTorch is a lot of fun. And in the end, having clean data is way more important than having a good model. Thank you. After this, if you want to have a look at my code, you can have a look at this GitLab link. You'll find an example or all the code that I use to create this image classifier. And keep an eye on our blog, blog.goateddriven.com, where I will make a sort of transcript of this talk. Thanks. There are questions. We have two mics, so please line up. Hi. You mentioned two very different kinds of hardware, the Raspberry Pi and the GPU. Can you say a little bit more about whether this is really practical on a Raspberry Pi alone and if it's not, then how does one actually take the next step to use the GPU as well? Yes, of course. Thank you. I did not train any of these models on the Raspberry Pi. I merely used it to collect my data, since that's just a bunch of web scraping. You don't need any big hardware for that. If you're going to train a model, I tried training it on my MacBook. That takes way too long. I did have to get a machine with a GPU. On the other hand, if you have a very small training data set, you might give it a try on your laptop. It's still fun. You might get as far as 10 or 20 epochs and then get a reasonable accuracy. It still is fun to play with. If you want a little bit better, getting a GPU or, I don't know, getting a cloud machine with a GPU, there's always the way to go. One miscategorization where you had the Shanghai, like, very small section of Skyline, why include that in the test set? I didn't actually make the choice myself to include this in the test set. I included all images that were classified by the previous model as being a Skyline, and so apparently it got to recognize the properties of what is a Skyline and it recognized in the background this is something that looks a bit like a Skyline. My question is, you mentioned that clean data is better than having a good model. This to me doesn't look like clean data. I'd expect a super genius classifier to figure it out, but I would have excluded this if I just saw your talk and didn't see this example. So I'm curious if I'm just wrong and there's some value in stuff like this? Well, in this case I guess it's just laziness. I didn't go through my entire test set before using it because while going through thousands of images and manually labeling them as good or bad is not my idea of fun. I think a proper data scientist must be lazy. Any more questions? Well, again, thanks.