 Now, here's another thing that we almost always do, which is data augmentation. Look, here we have an image of a bird. What can we do so that we have more different training data? Well, for example, we can flip it left-right. Almost all things in the world are the same if we flip them left-right, maybe apart from a written text. It's also the bird will be the same if we crop it slightly differently. The image will also be the same if we say slightly change brightness or saturation. And on the top you see the potential original image. At the bottom you see all kinds of like minor changes to the image that would leave the meaning of the image the same. Now, this is one way of thinking about it. It's just a trick to get a lot of extra training data. As you saw before, more data always means better performance. More data is in fact always better than having better algorithm, almost always. Now, there's, and I'm referring to a paper by Alex Sanandes Garcia here, there are many different ways of doing data augmentation. Like, and here you see a couple of images in the top row. Those are the originals. Then we can do a little bit of data augmentation like we cropping maybe intelligently doing minor things and then we can do heavier augmentation. Now, these heavier augmentation images look kind of very bad, but we will see that in many cases they're useful. Now, here we see a comparison where what they did is they basically said, well, let's have a baseline which is without regularization and then let's see how much we gain by doing multiple things. We can either add a little bit of light data augmentation to it or heavy data augmentation, which you see in the two different bubbles. Alternatively, we can have weight decay and dropout, standard regularization techniques and either light or heavier regularization. And look what's happening here. It's quite interesting that regular, that data augmentation appears to be doing a lot of the work and the additional benefit that we get out of weight decay and dropout, at least in this analysis, seems to be relatively small. Why? In a way, it makes a lot of sense. Now, like, we need regularization primarily if we are limited by the size of the data set. If we can use data augmentation to produce very, very large data sets, then the importance of weight decay and dropout of regularization like that will be much smaller. Of course, we can be in domains where the prime is so big that we cannot meaningfully do all that data augmentation, but at least in smaller domains, it seems that we get a lot of mileage out of data augmentation. So now I want you to edit the training loop, which is great that we'll have a look at that. What do we want? We want some resizes, we want some flips, we want some rotations, say up to 20 degrees, we want some color jitter, and read the Colab text for some of the relevant functions and then see how much data augmentation can help.