 Okay, hi everybody and welcome to Lesson 14. The numbers are getting up pretty high now, huh? We had a lesson last time talking about calculus and how we implement the chain rule in neural network training in an efficient way called back propagation. I just wanted to point out that one excellent student Koshik Sinha has produced a very nice explanation of the code that we looked at last time and I've linked to it, so it's got the math and then the code. The code is slightly different to what I had but it's basically the same thing, some minor changes and it's helpful, it might be helpful to kind of link between the math and the code to see what's going on so you'll find that in the Lesson 13 resources but I thought I'd just quickly try to explain it as well. So maybe I could try to copy this and just explain what's going on here with this code. So the basic idea is that we have a neural network that is calculating, well, a neural network and a loss function that together the calculated loss. So let's imagine that, well, let's just call the loss function, we'll call it L and the loss function is being applied to the output of the neural network, so the neural network function we'll call N and that takes two things, a bunch of weights and a bunch of inputs. The loss function also requires the targets but I'm just going to ignore that for now because it's not really part of what we actually care about. And what we're interested in knowing is if we want to be able to update the weights, let's say this is just a single layer things, keep it simple, if we want to be able to update the weights, we need to know how does the loss change if we change the weights, if we change one weight at a time, if you like. So how would we calculate that? Well, what we could do is we could rewrite our loss function by saying, well, let's call capital N the result of the neural network applied to the weights and the inputs and that way we can now rewrite the loss function to say L equals, big L equals little L, the loss function applied to the output of the neural network. And so maybe you can see where this is going, we can now say, okay, the derivative of the loss with respect to the weights is going to be equal to the derivative of the loss with respect to the outputs of that neural network layer times, this is the chain rule, the derivative of the outputs of that neural network layer, I'm going to get my notation consistent since these are not scalar, with respect to the weights, right? So you can see we can get rid of those and we end up with the change in loss with respect to the weights. And so we can just say this is a chain rule, that's what the chain rule is. So the change in the loss with respect to the output of the neural network, well, we did the forward pass here and then we took here, this here is where we calculated the derivative of the loss with respect to the output of the neural network, which came out from here and ended up in diff. So there it is, so out.g contains this derivative. So then to calculate, let's actually do one more, we could also say the change in the loss with respect to the inputs, we can do the same thing with the chain rule times. And so this time we have the inputs. So here you can see that is this line of code. So that is the change in the loss with respect to the inputs. That's what input.g means. And it's equal to the change in the loss with respect to the output. So that's what out.g means. Times, it's actually matrix times because we're doing matrix calculus, times this derivative. And since this is a linear layer, we were looking at this derivative is simply the weights themselves. And then we have exactly the same thing for w.g, which is the change in the loss, the derivative of the loss with respect to the weights. And so again, you've got the same thing. You've got your out.g. And remember, we actually showed how we can simplify this into also a matrix products with a transpose as well. So that's how the, what's happening in our code is mapping to the math. So hopefully that's useful. But as I say, do check out this really nice resource, which has a lot more detail if you're interested in digging deeper. The other thing I'd say is if you, some people mentioned that they actually didn't study this at high school, which is fine, that we've provided resources on the forum for recommending how to learn the basics of derivatives and the chain rule. And so in particular, I would recommend Three Blue, One Browns, Essence of Calculus series and also Khan Academy. It's not particularly difficult to learn. It'll only take you a few hours and then you can, this will make a lot more sense. Or if you did it at high school, but you've forgotten it, same deal. So don't worry if you found this difficult because you had forgotten the, or had never learnt the basic derivative and chain rule stuff. That's something that you can pick up now and I would recommend doing so. Okay. So what we then did last time, which is actually pretty exciting, is we got to a point where we had successfully created a training loop, which did these four steps. So, and the nice thing is that every single thing here is something that we have implemented from scratch. Now we didn't always use our implemented from scratch versions. There's no particular reason to when we've re-implemented something that already exists. Let's use the version that exists. But every single thing here, I guess not argmax, but that's trivially easy to implement. Every single thing here, we have implemented ourselves. And we successfully trained an MNIST model to 96% accurately recognise handwritten digits. So I think that's super neat. This is not a great metric. It's only looking at the training set and in particular it's only looking at one batch of the training set. Since last time, I've just refactored a little bit. Let's talk about this report function, which is now just running at the end of each epoch. And it's just printing out the loss and the accuracy. Just something I wanted to mention here is hopefully you've seen fstrings before. They're a really helpful part of Python that lets you pop a variable or an expression inside curly braces in a string and it'll evaluate it. You might not have seen this colon thing. This is called a format specifier. And with a format specifier, you can change how things are printed in an fstring. So this is how I'm printing it to do decimal places. This says a two decimal places floating point number called loss printed out here, followed by a comma. So I'm not going to show you how to use those other than to say, yeah, Python fstrings and format specifiers are really helpful. And so if you haven't used them before, do go look them up a tutorial of the documentation because they're definitely something that you'll probably find useful to know about. Okay, so let's just rerun all those lines of code. If you're wondering how I just rerun all the cells above where I was, there's a cell here, there's run all above. And it's so helpful that I always make sure there's a keyboard shortcut for that. So you can see here I've added a keyboard shortcut QA. So if I type QA, it runs all cells above. If I type QB, it runs all cells below. And so stuff that you do a lot, make sure you've got keyboard shortcuts for them. You don't want to be fiddling around, moving around your mouse everywhere. You don't want to be as easy as thinking. So this is really exciting. We've successfully trained a neural built and trained a neural network model from scratch and it works okay. It's a bit clunky. There's a lot of code. There's features we're missing. So let's start refactoring it. And so refactoring is all about making it so we have to write less code to do the same work. And so we're now going to, I'm going to show you something that's part of PyTorch and then I'm going to show you how to build it. And then you'll see why this is really useful. So PyTorch has a sub module called nn, torch.nn. And in there there's something called the module class. Now we don't normally use it this way, but I just want to show you how it works. We can create an instance of it in the usual way where we create instances of classes. We can assign things to attributes of that module. So for example, it's assign a linear layer to it. And if we now print out that, you'll see it says, oh, this is a module containing something called foo, which is a linear layer. But here's something quite tricky. This module we can say, show me all of the named children of that module. And it says, oh, there's one called foo and it's a linear layer. And we can say, oh, show me all of the parameters of this module. And it says, oh, okay, sure, there's two of them. There's this four by three tensor. That's the weights. And there's this four long vector. That's the biases. And so somehow, just by creating this module and assigning this to it, it's automatically tracked what's in this module and what are its parameters. That's pretty neat. So we're going to see both how and why it does that. I'm just going to point out, by the way, why did I add list here? If I just said m1.name children, it just prints out generator object, which is not very helpful. And that's because this is a kind of iterator called a generator. And it's something which is going to only produce the contents of this when I actually do something with it, such as list them out. So just popping a list around a generator is one way to run the generator and get its output. So that's a little trick when you want to look inside a generator. Okay, so now, as I said, we don't normally use it this way. What we normally do is we create our own class. So for example, we create our own multi-layer perception and we inherit it. We inherit it from nn.module. And so then in done to inner, this is the thing that constructs an object of the class. This is the special magic method that does that. We'll say, okay, well, how many inputs are there to this multi-layer perception? How many hidden activations and how many output activations are there? So just be one hidden layer. And then here we can do just like we did up here where we assigned things as attributes. We can do that in this constructor. So we create an L1 attribute, which is a linear layer from number in to number hidden. L2 is a linear layer from number hidden to number out. And we'll also create a value. And so when we call that module, we can take the input that we get and run the linear layer and then run the value and then run the L2. And so I can create one of these as you see. And I can have a look and see like, oh, here's the attribute L1. And there it is, like I had. And I can say print out the model and the model knows all the stuff that's in it. And I can go through each of the named children and print out the name and the layer. Now, of course, if you remember, although you can use Dundercall, we actually showed how we can refactor things using forward such that it would automatically kind of do the things necessary to make all the automatic gradient stuff work correctly. And so in practice, we're actually not going to do Dundercall, we would do forward. So this is an example of creating a custom PyTorch module. And the key thing to recognize is that it knows what are all the attributes you added to it. And it also knows what are all the parameters. So if I go through the parameters and print out their shapes, you can see I've got my linear layer's weights, first linear layer, sorry, second linear layer, my, I don't know, first linear layer's weights, my first linear layer's biases, second linear layer's weights, second linear layer's biases. And this 50 is because we set NH, the number of hidden, to 50. So why is that interesting? Well, because now I don't have to write all this anymore, going through layers and having to make sure that they've all been put into a list. We've just been able to add them as attributes and they're automatically going to appear as parameters. So we can just say, go through each parameter and update it based on the gradient and the learning rate. And furthermore, you can actually just go model.zero grad and it will zero out all of the gradients. So that's really made our code quite a lot nicer and quite a lot more flexible, which is cool. So let's check that this still works. There we go. So to clarify, if I called report on this before I ran it, as you would expect, the accuracy is about 8%, about 10%, a bit less, and the loss is pretty high. And so after I run this fit, this model, the accuracy goes up and the loss goes down. So basically, all of this is exactly the same as before. The only thing I've changed are these two lines of code. So that's a really useful refactoring. So how on earth did this happen? How did it know what the parameters and layers are automatically? It used a trick called Dundasetatra. And we're going to create our own nn.module now. So if there was no such thing as nn.module, here's how we'd build it. And so let's actually build it and also add some things to it. So in Dunder in it, we would have to create a dictionary for our named children. This is going to contain a list, a dictionary of all of the layers. And then just like before, we'll create a couple of linear layers. And then what we're going to do is we're going to define this special magic thing that Python has called Dundasetatra. And this is called automatically by Python if you have it every time you set an attribute, such as here or here. And it's going to be passed the name of the attribute, the key. And the value is the actual thing on the right-hand side of the equals sign. Now generally speaking, things that start with an underscore we use for private stuff. So we check that it doesn't start with an underscore. And if it doesn't start with an underscore, setatra will put this value into the modules dictionary with this key. And then call pythons, the normal Python setatra to make sure it just actually does the attribute setting. So super is how you call whatever is in the super class, the base class. So another useful thing to know about is how does it do this nifty thing where you can just type the name and it kind of lists out all this information about it. That's a special thing called Dundarepra. So here Dundarepra will just have it return a stringified version of the modules dictionary. And then here we've got parameters. How did parameters work? So how did this thing work? Well, we can go through each of those modules, go through each value. So the values of the modules is all the actual layers. And then go through each of the parameters in each module and yield p. So that's going to create an iterator. If you remember when we looked at iterators for all the parameters. So let's try it. So we can create one of these modules. And if we just like before loop through its parameters, there they are. Now I'll just mention something that's optional kind of like advanced Python that a lot of people don't know about, which is there's no need to loop through a list or a generator or I guess say loop through an iterator and yield. There's actually a shortcut, which is you can just say yield from and then give it the iterator. And so with that, we can get this all down to one line of code and it will do exactly the same thing. So that's basically saying yield one at a time, everything in here. That's what yield from does. So there's a cool little advanced Python thing. Totally optional. But if you're interested, I think it can be kind of neat. So we've now learned how to create our own implementation of nn.module. And therefore we are now allowed to use PyTorch's nn.module. So that's good news. So how would we do using the PyTorch nn.module? How would we create the model that we started with, which is where we had this self.layers? Because we want to somehow register all of these all at once. That's not going to happen based on the code we just wrote. So to do that, let's have a look. So let's make a list of the layers we want. And so we'll create again a subclass of nn.module. Make sure you call the superclasses in it first. And we'll just store the list of layers. And then to tell PyTorch about all those layers, we basically have to loop through them and call add.module and say what the name of the module is and what the module is. And again, probably should have used forward to here in the first place. And you can see this is now done exactly the same thing. Okay, so if you've used a sequential model before, you'll see or you can see that we're on the path to creating a sequential model. Okay, so Ganesh has asked an interesting question, which is what on earth is super calling because we actually, in fact, we don't even need the parentheses here. We actually don't have a base class. That's because if you don't put any parentheses or if you put empty parentheses, it's actually a shortcut for writing that. And so Python has stuff in object, which does all the normal objective things like storing your attributes so that you can get them back later. So that's what's happening there. Okay, so this is a little bit awkward, is to have to store the list and then enumerate and call add module. So now that we've implemented that from scratch, we can use PyTorch's version, which is safe just got something called module list that just does that for you. Okay, so if you use module list and pass it a list of layers, it will just go ahead and register them all those modules for you. So here's something called sequential model. So this is just like an n not sequential now. So if I create it passing in the layers, there you go. You can see there's my model containing my module list with my layers. And so I don't know why I never used forward for these things. It's silly. I guess it doesn't matter terribly in this stage, but anyhow. Okay, so call fit. And there we go. Okay. So in forward here, I just go through each layer and I set the result of that equal to calling that layer on the previous result and then pass and then return it at the end. Now there's a little another way of doing this, which I think is kind of fun. It's not like shorter or anything at this stage. I just wanted to show an example of something that you see quite a lot in machine learning code, which is the use of reduce. This implementation here is exactly the same as this thing here. So let me explain how it works. What reduce does so reduce is a very common kind of like fundamental computer science concept reductions. It's something that does a reduction. And what a reduction is, is it's something that says start with the third parameter, some initial value. So we're going to start with X, the thing with being passed. And then loop through a sequence. So loop through each of our layers and then for each layer call some function. Here is our function and the function is going to get passed time around or be passed the initial value and the first thing in your list. So your first layer and X. So it's just going to call the layer function on X. The second time around it takes the output of that and passes it in as the first parameter and passes in the second layer. So then the second time this goes through it's going to be calling the second layer on the result of the first layer and so forth. And that's what a reduction is. And so when you might see reduce you'll certainly see it talked about quite a lot in papers and books and you might sometimes also see it in code. It's a very general concept. And so here's how you can implement a sequential model using reduce. So there's no explicit loop there although the loop's still happening internally. All right. So now that we've re-implemented sequential we can just go ahead and use PyTorch's version. So there's nn.sequential we can pass in our layers and we can fit, not surprisingly we can see the model. So yeah, it looks very similar to the one we built ourselves. All right. So this thing of looping through parameters and updating our parameters based on gradients and a learning rate and then zeroing them is very common. So common that there is something that does that all for us and that's called an optimizer. It's the stuff in opt-in. So let's create our own optimizer and as you can see it's just going to do the two things we just saw. It's going to go through each of the parameters and update them using the gradient and the learning rate and there's also zero grad which will go through each parameter and set their gradients to zero. If you use .data it's like it's just a way of avoiding having to say torch.nograd basically. Okay, so in optimizer we're going to pass at the parameters that we want to optimize and we're going to pass at the learning rate and we're just going to store them away and since the parameters might be a generator we'll call list to turn them into a list. So we are going to create our optimizer pass it in the model.parameters which have been automatically constructed for us by nn.module and so here's our new loop now we don't have to do any of the stuff manually. We can just say opt.step so that's going to call this and opt.zerograd and that's going to call this. There it is. So we've now built our own SGD optimizer from scratch so I think this is really interesting like these things which seem like they must be big and complicated once we have this nice structure in place an SGD optimizer doesn't take much code at all and so it's all very transparent, simple, clear if you're having trouble using complex library code that you've found elsewhere this can be a really good approach is to actually just go all the way back, remove as many of these abstractions as you can and like run everything by hand to see exactly what's going on it can be really freeing to see that you can do all this anyway since PyTorch has this for us in torch.optim it's got a optim.sgd and just like our version you pass in the parameters so you really see it is just the same so let's define something called getModel that's going to return the model, the sequential model and the optimizer for it so if we go model,opt equals getModel and then we can call the loss function to see where it's starting and so then we can write our training loop again go through each epoch go through each starting point for our batches grab the slice into our x and y in the training set calculate our predictions, calculate our loss do the backward pass do the optimizer step, do the zero gradient and print out how you're going at the end of each one and there we go so let's keep making this simpler there's still too much code so what we could do is we could replace these lines of code with one line of code by using something we'll call the dataset class so the dataset class is just something that we're going to pass in our independent and dependent variable we'll store them away as self.x and self.y we'll have something so if you define done to len then that's the thing that allows the len function to work so the length of the dataset will just be the length of the independent variables and then done to get item is the thing that will be called automatically any time you use square brackets in python so that just is going to call this function passing in the indices that you want so when we grab some items from our dataset we're going to return a tuple of the x values and the y values so then we'll be able to do this so let's create a dataset using this tiny little three line class it's going to be a dataset containing the x and y training and they'll create another dataset containing the x and y valid and those two datasets will call train ds and valid ds so let's check the length of those datasets should be the same as the length of the x's and they are and so now we can do exactly what we hope we could do we can say xb, yb equals train ds and pass in some slice so that's going to give us back our check the shapes are correct should be 5 by 28 by 28 5 by 28 times 28 and the y's should just be 5 and so here they are the x's and the y's so that's nice we've created a dataset from scratch and again it's not complicated at all and if you look at the actual PyTorch source code this is basically all datasets do so let's try it we call getModel and so now we've replaced our dataset line with this one and as per usual it still runs and so this is what I do when I'm writing code is I try to always make sure that my starting code works as I refactor and so you can see all the steps and so somebody reading my code can then see exactly like why am I building everything I'm building, how does it all fit in see that it still works and I can also keep it clear in my own head so I think this is a really nice way of implementing libraries as well alright so now we're going to replace these two lines of code with this one line of code so we're going to create something called a data loader and a data loader is something that's just going to do this so we need to create an iterator so an iterator is a class that has a dunderItter method when you say for in in Python behind the scenes it's actually calling dunderItter to get a special object which it can then loop through using yield so it's basically getting this thing that you can iterate through using yield so a data loader is something that's going to have a data set and a batch size because we're going to go through the batches and grab one batch at a time so we have to store away the data set and the batch size and so when we call the for loop it's going to call dunderItter we're going to want to do exactly what we saw before go through the range just like we did before and then yield that bit of the data set and that's all so that's a data loader so we can now create a train data loader and a valid data loader from our train data set and valid data set and so now if you remember the way you can get one thing out of an iterator so you don't need to use a for loop you can just say iter and that will also call dunderItter next we'll just grab one value from it so here we will run this and you can see we've now just confirmed we've xb is a 50 by 784 and yb there it is and then we can check what it looks like so let's grab the first element of our x batch make it 28 by 28 and there it is so now that we've got a data loader again we can grab our model and we can simplify our fit function let's just go for xbyb and trainDL so this is getting nice and small don't you think and it still works the same way okay so this is really cool and now that it's nice and concise we can start adding features to it so one feature I think we should add is that our training set each time we go through it it should be in a different order it should be randomized the order so instead of always just going through these indexes in order we want some way to say use random indexes so the way we can do that is create a class called sampler and what sampler is going to do I'll show you is if we create a sampler without shuffle without randomizing it it's going to simply return all the numbers from 0 up to n in order and it'll be an iterator see this is done to iter but if I do want it shuffled then it will randomly shuffle them so here you can see I've created a sampler without shuffle so if I then make an iterator from that and print a few things from the iterator you can see it's just printing out the indexes it's going to want or I can do exactly the same thing as we learned earlier in the course using iSlice here's the first five things from a sampler when it's not shuffled so as you can see these are just indexes so we could add shuffle equals true and now that's going to call random.shuffle which just randomly permits them and now if I do the same thing I've got random indexes of my source data so why is that useful well what we could now do is create something called a batch sampler and what the batch sampler is going to do is it's going to basically do this iSlice thing for us so we're going to say pass in a sampler that's something that generates indices and pass in a batch size and remember we've looked at chunking before it's going to chunk that iterator by that batch size I'll say alright, please take our sampler and create batches of four as you can see here it's creating batches of four indices at a time so rather than just looping through them in order I can now loop through this batch sampler so we're going to change our data loader so that now it's going to take some batch sampler and it's going to loop through the batch sampler that's going to give us indices and then we're going to get that data set item from that batch for everything in that batch so that's going to give us a list and then we have to stack all of the x's and all of the y's together into tensors so I've created something here called collate function and we're going to default that to this little function here which is going to grab our batch pull out the x's and y's separately and then stack them up into tensors so this is called our collate function okay so if we put all that together we can create a training sampler which is a batch sampler over the training set with shuffle true a validation sampler will be a batch sampler over the validation set with shuffle false and so then we can pass that into this data loader class the training data set and the training sampler and the collate function which we don't really need because it's we're just using the default one so I guess we can just get rid of that so now here we go we can do exactly the same thing as before x, b, y, b is next iter and this time we use the valid data loader check the shapes this is how PyTorch actual data loaders work this is all the pieces they have they have samplers they have batch samplers they have a collation function and they have actual data loaders so remember that what I want you to be doing for your homework is experimenting with these carefully to see exactly what each thing is taking in so PyTorch is asking on the chat what is this collate thing doing so collate function it defaults to collate what does it do well let's see, let's go through each of these steps so we need we've got a batch sampler so let's do just the valid sampler so the batch sampler here it is so we're going to go through each thing in the batch sampler so let's just grab one thing from the batch sampler so the output of the batch sampler will be next it's a okay so here's what the batch sampler contains just the first 50 digits not surprisingly because this is our validation sampler if we did a training sampler that would be randomized there they are okay so then what we then do is we go self.dataset i for i and b so let's copy that copy paste and so rather than self.dataset i we'll just say valid dsi it's not i and b it's i and o that's what we called it oh and we did it for training sorry training okay so what it's created here is a list of tuples of tensors I think let's have a look so we'll call this um p whatever so p0 okay is a tuple it's got the x and the y independent independent variable so that's not what we want what we want is something that we can loop through we want to get batches so what the collation model is going to do sorry not the collation model the collate function is going to do is it's going to take all of our x's and all of our y's and collate them into two tensors one tensor of x's and one tensor of y's so the way it does that is it first of all calls zip so zip is a very very commonly used python function it's got nothing to do with the compression program zip but instead what it does is it effectively allows us to like transpose things so that now as you can see we've got all of the second elements or index one elements all together and all of the index zero elements together and so then we can stack those all up together and that gives us our y's for our batch so that's what collate does so the collate function is used an awful lot in PyTorch increasingly nowadays where hugging face stuff uses it a lot and so we'll be using it a lot as well and basically it's a thing that allows us to customize how the data that we get back from our data set once it's been kind of generating a list of things from the data set how do we put it together into some into a bunch of things that our model can take as inputs because that's really what we want here so that's what the collation function does oh this is the wrong way around like so this is something that I do so often that Fastcore has a quick little shortcut for it just called store address store attributes and so if you just put that in your dunder in it then you just need one line of code and it does exactly the same thing so there's a little shortcut as you see and so you'll see that quite a bit alright let's have a seven minute break and see you back here very soon and we're going to look at a multi-processing data loader and then we'll have nearly finished this notebook alright see you soon alright let's keep going so we've seen how to create a data loader and sampling from it the PyTorch data loader works exactly like this but it uses a lot more code because it implements multi-processing and so multi-processing means that the actual data loading here that code can be run in multiple processes that can be run in parallel for multiple items so this code for example might be opening up a jpeg rotating it flipping it et cetera because remember this is just calling the dunder getItem for a data set so that could be doing a lot of work for each item and we're doing it for every item in the batch in parallel so I'll show you a very quick and dirty way that basically does the job so Python has a multi-processing library it doesn't work particularly well with PyTorch tensors so PyTorch has created an exact re-implementation of it so it's identical API-wise but it does work well with tensors so this is basically what is grabbed the multi-processing so this is not quite cheating because multi-processing isn't the standard library and this is API equivalent so I'm going to say we're allowed to do that so as we've discussed you know when we call square brackets on a class it's actually identical to calling the dunder getItem function on the object so you can see here if we say give me items three six eight and one it's the same as calling dunder getItem passing in three six eight and one now why does this matter well I'll show you why it matters because we're going to be able to use map and I'll explain why we want to use map in a moment map is a really important concept you might have heard of map reduce so we've already talked about reductions and what those are maps are kind of the other key piece map is something which takes a sequence and calls a function on every element of that sequence so imagine we had a couple of batches of indices three and six and eight and one then we're going to call dunder getItem on each of those batches so that's what map does map calls this function on every element of this sequence and so that's going to give us the same stuff but now this same as this but now batched into two batches now why do we want to do that because multiprocessing has something called pull where you can tell it how many workers you want to run how many processes you want to run and it then has a map which works just like the python normal python map but it runs this function in parallel over the items from this iterator so this is how we can create a multiprocessing data loader so here we're creating our data loader again we don't actually need to pass in the collate function because we're using the default one so if we say n workers equals two and then create that if we say next see how it's taking a moment and it took a moment because it was firing off those two workers in the background so the first batch actually comes out more slowly but the reason that we would use a multiprocessing data loader is if this is doing a lot of work we want it to run in parallel and even though the first the first item might come out a bit slower once those processes are fired up it's going to be faster to run so this is a really simplified multiprocessing data loader because this needs to be super, super efficient PyTorch has lots more code than this to make it much more efficient but the idea is this and this is actually a perfectly good way of experimenting or building your own data loader to make things work exactly how you want so now that we've re-implemented all this from PyTorch let's just grab the PyTorch and as you can see they're exactly the same data loader they don't have one thing called sampler that you pass shuffle to they have two separate classes called sequential sampler and random sampler I don't know why they do it that way it's a little bit more work to me but same idea and they've got batch sampler and so it's exactly the same idea a training sampler is a batch sampler with a random sampler the validation sampler is a batch sampler with a sequential sampler passing batch sizes and so we can now pass those samplers to the data loader this is now the PyTorch data loader and just like ours it also takes a collate function okay and it works cool so that's as you can see it's doing exactly the same stuff that ours is doing with exactly the same API and it's got some shortcuts as I'm sure you've noticed when you've used data loaders so for example calling batch sampler you're going to be very very common so you can actually just pass the batch size directly to a data loader and it will then auto create the batch samplers for you so you don't have to pass in batch sampler at all instead you can just say sampler and it will automatically wrap that in a batch sampler for you so it does exactly the same thing and in fact because it's so common to create a random sampler or a sequential sampler for a data set you don't have to do that manually you can just pass in shuffle equals true or shuffle equals false to the data loader and that does again exactly the same thing there it is now something that is very interesting is that when you think about it the batch sampler and the collation function are things which are taking the result of the sampler looping through them and then collating them together but what we could do is actually because our data sets know how to grab multiple indices at once we can actually just use the batch sampler as a sampler we don't actually have to loop through them and collate them because they're basically instantly collated they come pre-collated so this is a trick which actually hugging face stuff can use as well and we'll be seeing it again so this is an important thing to understand is how come we can pass a batch sampler to sampler and what's it doing and so rather than trying to look through the PyTorch code I suggest going back to our non-multiprocessing pure python code to see exactly how that would work because it's a really nifty trick for things that you can grab multiple things from it once and it can save a whole lot of time it can make your code a lot faster okay so now that we've got all that nicely implemented we should now add a validation set and there's not really too much to talk about here we'll just take our fit function and this is exactly the same code that we had before and then we're just going to add something which goes through the validation set and gets the predictions and sums up the losses and accuracies and from time to time prints out the loss and accuracy and so getDLs we will implement by using the PyTorch data loader now and so now our whole process will be getDLs passing in the training and validation data set notice that for our validation data loader I'm doubling the batch size because it doesn't have to do backpropagation so it should use about half as much memory so I can use a bigger batch size get our model and then call this fit and now it's printing out the loss and accuracy on the validation set so finally we actually know how we're doing which is that we're getting 97% accuracy on the validation set and that's on the whole thing not just on the last batch so that's call we've now implemented a proper working sensible training loop it's still a bit more code than I would like but it's not bad and every line of code in there and every line of code it's calling is all stuff that we have built ourselves re-implemented ourselves so we know exactly what's going on and that means it's going to be much easier for us to create anything we can think of we don't have to rely on other people's code so hopefully you're as excited about that as I am because it really opens up a whole world for us so one thing that we're going to want to be able to do now that we've got a training loop is to grab data and there's a really fantastic library of data sets available on HuggingFace nowadays and so let's look at how we use those data sets now that we know how to bring things into data loaders and stuff so that now we can use the entire world of HuggingFace data sets with our code so we're going to so you need to pip install data sets and once you've piped install data sets you're going to say from data sets import and you can import a few things just these two things now load data set, load data set builder and we're going to look at a a data set called fashion MNIST and so the way things tend to work with HuggingFace is there's something called the HuggingFace hub which has models and it has data sets amongst other things and generally you'll give them a name and you can then say in this case load a data set builder for fashion MNIST now a data set builder is just basically something which has some metadata about this data set so the data set builder has a .info and the .info has a .description and here's a description of this and as you can see again we've got 28 by 28 gray scale so it's going to be very familiar to us because it's just like MNIST and again we've got 10 categories and again we've got 60,000 training examples and again we've got 10,000 test examples so this is this is cool so as it says it's a direct drop in replacement for MNIST and so the data set builder also will tell us what are what's in this data set and so Hackingface stuff generally uses dictionaries rather than tuples so there's going to be an image of type image there's going to be a label of type class label there's 10 classes and these are the names of the classes so it's quite nice that in Hackingface data sets we can kind of get this information directly it also tells us if there are some recommended training test splits we can find out those as well so this is the size of the training split and the number of examples so now that we're ready to start playing it we can load the data set so this is a difference between load data set builder versus load data set so this will actually download it cache it and here it is and it creates a data set dictionary so a data set dictionary if you've used fastai is basically just like what we call the data sets class they call the data set dict class so it's a dictionary that contains in this case a train and a test item and those are data sets these data sets are very much like the data sets that we created in the previous notebook so we can now grab the training and test items from that dictionary and just pop them into variables and so we can now have a look at the zero index thing in training and just like we were promised it contains an image and a label so as you can see we're not getting tuples anymore we're getting dictionaries containing the x and the y in this case image and label so I'm going to get pretty bored writing image and label and strings all the time so I'm just going to store them as x and y so x is going to be the string image and y will be the string label I guess the other way I could have done that would have been to say x comma y equals that probably be a bit neater because it's coming straight from the features and if you iterate into a dictionary you get back its keys that's why that works so anyway I've done it manually here which is a bit sad but there you go okay so we can now grab the from train zero which we've already seen we can grab the x i.e. the image and there it is there's the image we could grab the first five images and the first five labels for example and there they are now we already know what the names of the classes are so we could now see what these map to by grabbing those features so there they are this is a special huckingface class which most libraries have something including Fast.ai that works like this there's something called int to string which is going to take these and convert them to these so if I call it on our y batch you'll see we've got first is ankle boot and there that is indeed an ankle boot never have a couple t-shirts in address okay so how do we use this to train a model well we're going to need a data loader and we want a data loader that for now we're going to do just like we've done it before it's going to return well actually we're going to do something a bit different we're going to have our collate function is actually going to return a dictionary actually this is pretty common for huckingface stuff and PyTorch doesn't mind if you it's happy for you to return a dictionary from a collation function so rather than returning a tuple of the stacked up hopefully this looks very familiar this looks a lot like the thing that goes through the data set for each one and stacks them up just like we did in the previous notebook so that's what we're doing we're doing all in one step here in our collate function and then again exactly the same thing go through our batch, grab the y and this is just stacking them up with the integers so we don't have to call stack and so we're now going to have the image and label bits in our dictionary so if we create our data loader using that collation function grab one batch so we can go batch x.shape is a 16 by 1 by 28 by 28 and our y if the batch here it is so the thing to notice here is that we haven't done any transforms or anything or written our own data set class or anything we're actually putting all the work directly in the collation function so this is like a really nice way to skip all of the kind of abstractions of your framework if you want to is you can just do all of your work and collate functions so it's going to pass you each item so you're going to get the batch directly you can just go through each item and so here we're saying okay grab the x key from that dictionary convert it to a tensor and then do that for everything in the batch and then stack them all together so this is like can be quite a nice way to do things if you want to do things just very manually without having to think too much about a framework particularly if you're doing really custom stuff this can be quite helpful having said that hugging face data sets absolutely lets you avoid doing everything in collate function which if we want to create really simple applications that's where we're going to eventually want to head so we can do this using a transform instead and so the way we do that is we create a function we're going to take our batch it's going to replace the x in our batch with the tensor version of each of those pao images and I'm not even stacking them or anything and then we're going to return that batch and so hugging face data sets has something called with transform and that's going to take your data set your hugging face data set and it's going to apply this function to every element and it doesn't run at all now it's going to basically when behind the scenes when it calls done to get item it will call this function on the fly so in other words this could have data augmentation which can be random or whatever because it's going to be rerun every time you grab an item it's not cached or anything like that so other than that this data set has exactly the API same API as any other data set it has a length it has a done to get item so you can pass it to a data loader and so PyTorch already knows how to collate dictionaries of tensors so we've got a dictionary of tensors now so that means we don't need a collate function anymore I can create a data loader from this without a collate function as you can see and so this is given exactly the same thing as before without having to create a custom collate function now even this is a bit more code than I want having to return this seems a bit silly but the reason I had to do this is because hugging face data sets expects the with transform function to return the new version of the data so I wanted to be able to write it like this transform in place and just say the change I want to make and have it automatically return that so if I call if I create this function it's exactly the same as a previous one that doesn't have return how would I turn this into something which does return the result so here's an interesting trick we could take that function pass it to another function to create a new function which is the version of this in place function that returns the result and the way I do that is by creating a function called in place it takes a function it returns a function the function it returns is one that calls my original function and then returns the result so this is the function this is a function generating function and it's modifying an in place function to become a function that returns the new version of that data and so this is a function this function is passed to this function which returns a function and here it is so here's the version that hugging face will be able to use so I can now pass that to with transform and it does exactly the same thing so this is very very common in python it's so common that this line of code can be entirely removed and replaced with this little token if you have a function and put at at the start you can then put that before a function and what it says is take this whole function pass it to this function and replace it with the result so this is exactly the same as the combination of this and this and when we do it this way this kind of little syntax sugar is called a decorator so there's nothing magic about decorators it's literally identical to this well I guess the only difference is we don't end up with this unnecessary intermediate underscore version but the result is exactly the same and therefore I can create a transformed data set by using this and there we go it's all working fine yeah so I mean none of this is particularly necessary but what we're doing is we're just kind of like seeing the pieces that we can put in place to make this stuff as easy as possible and we don't have to think about things too much alright now with all this we can basically make things pretty automatic and the way we can make things pretty automatic is we're going to use a cool thing in Python called item getter and item getter is a function that returns a function so hopefully you're getting used to this idea now this creates a function that gets the a and c items from a dictionary or something that looks like a dictionary so here's a dictionary it contains keys a b and c so this function will take a dictionary and return the a and c values and as you can see it has done exactly that explain why this is useful in a moment I just wanted to briefly mention what did I mean when I said something that looks like a dictionary I mean this is a dictionary but Python doesn't care about what type things actually are it only cares about what they look like and remember that when we call something with square brackets, when we index into something behind the scenes it's just calling done to get item so we could create our own class and it's done to get item gets the key and it's just going to manually return one if k equals a or two if k equals b or three otherwise and look that class also works just fine with an item getter the reason this is interesting is because a lot of people write Python as if it's c++ or Java or something they write as if it's this kind of statically typed thing but I really wanted to point out that it's an extremely dynamic language and there's a lot more flexibility than you might have realized anyway that's a little aside so what we can do is think about a batch for example where we've got these two dictionaries okay so PyTorch comes with a default collation function called not surprisingly default collate so that's part of PyTorch and what default collate does with dictionaries is it simply takes the matching keys and then grabs their values and stacks them together and so that's why if I call default collate a is now one three b is now two four that's actually what happened before when we created this data loader is it used the default collation function which does that it also works on things that are tuples not dictionaries which is what most of you would have used before and what we can do therefore is we could create something called collate dict which is something which is going to take a data set and it's going to create a item getter function for the features in that data set which in this case is image and label so this is a function which will get the image and label items and so we're now going to return a function and that function is simply going to call our item getter some default collate and what this is going to do is it's going to take a dictionary and collate it into a tuple just like we did up here so if we run that so we're now going to call data loader on our transform data set passing in remember this is a function that returns a function so it's a collation function for this data set and there it is this looks a lot like what we had in our previous notebook this is not returning a dictionary but it's returning a tuple so this is a really important idea particularly for working with hugging face data sets is that they tend to do things with dictionaries and most other things in the pytorch world tend to work with tuples so you can just use this now to convert anything that returns dictionaries into something that provides tuples by passing it as a collation function to your data loader so remember the thing you want to be doing this week is doing things like import pdb, pdb.set trace right put breakpoints, step through see exactly what's happening you know not just here but also even more importantly doing it inside the innermost inner function so then you can see what's I do wrong there oh did I set underscore trace so then we can see exactly what's going on print out b list the code and I can step into it and look I'm now inside the default collate function which is inside pytorch and so I can now see exactly how that works there it all is so it's going to go through and this code is going to look very familiar because we've implemented all this ourselves except it's being careful it works for lots of different types of things dictionaries, numpy arrays, so on and so forth so the last thing I wanted to do oh actually something I did want to mention here this is so useful where you want to be able to use it in all of our notebooks so rather than copying and pasting this every time it would be really nice to create a python module that contains this definition so we've created a library called nbdev it's really a whole system called nbdev which does exactly that it creates modules you can use from your notebooks and the way you do it is you use this special thing we call comment directives which is hash pipe and then hash pipe export so you put this at the top of a cell and it says do something special for this cell what this does is it says put this into a python module for me please export it to a python module what python module is it going to put it in well if you go all the way to the top you tell it what default export module to create so it's going to create a module called datasets so what I do at the very end of this module is I've got this line that says import nbdev nbdev.nbdev export and what that's going to do for me is create a library a python library and it's going to have a datasets.py in it and we'll see everything that we exported here it is collate dict will appear in this for me and so what that means is now in the future in my notebooks I will be able to import collate dict from from my datasets now you might wonder how does it know to call it mini AI what's mini AI well in nbdev you create a settings.ini file where you say what the name of your library is so we're going to be using this quite a lot now because we're getting to the point where we're starting to implement stuff that didn't exist before so previously most of the stuff or pretty much all the stuff we've created I've said like oh that already exists in PyTorch so we don't need it we just use PyTorches but we're now getting to a point where we're starting to create stuff that doesn't exist anywhere we've created it ourselves and so therefore we want to be able to use it again so during the rest of this course we're going to be building together a library called mini AI that's going to be our framework our version of something like Fast AI maybe it's something like what Fast AI 3 will end up being we'll see so that's what's going on here so we're going to be using once I start using mini AI I'll show you exactly how to install this but that's what this export is I also had an export on this in place thing and I also had it on my necessary import statements okay we want to be able to see what this data set looks like so I thought it now is a good time to talk a bit about plotting because knowing how to visualise things well is really important and again the idea is we're not allowed to use Fast AI's plotting library so we've got to learn how to do everything ourselves so here's the basic way to plot an image using matplotlib so we can create a batch grab the X part of it grab the very first thing in that and imshow means show an image and here it is there is our ankle boot so let's start to think about what stuff we might create which we can export to make this a bit easier so let's create something called show image which basically does imshow but we're going to do a few extra things we will make sure that it's in the correct axis order we will make sure it's not on CUDA that's on the cpu if it's not a numpy array we'll convert it to a numpy array we'll be able to pass in an existing axis which we'll talk about soon if we want to we'll be able to set a title if we want to and also this thing here removes all this ugly 05 blah blah blah axis because we're showing an image we don't want any of that so if we try that you can see there we go we've also been able to say what size we want the image there it all is now here's something interesting when I say help the help shows the things that I implemented but it also shows a whole lot more things how did that magic thing happen and you can see they work because here's big size which I didn't add sorry I did add that's a bad example these other ones all work as well so how did that happen well the trick is that I added star star quags here and star star quags says grab pass as many or any other arguments as you like that aren't listed and they'll all be put into a dictionary with this name and then when I call I am show I pass that entire dictionary star star here means as separate arguments and that's how come it works and then how come does it know how come it knows what help to provide the reason why is that fast core has a special thing called delegates which is a decorator so now you know what a decorator is and you tell it what is it that you're going to be passing quags to I'm going to be passing it to I am show and then it automatically creates the documentation correctly to show you what quags can do so this is a really helpful way of being able to kind of extend existing functions like I am show and still get all of their functionality and all of their documentation and address so delegates is one of the most useful things we have in fast core in my opinion so we're going to export that so now we can use show image anytime we want which is nice something that's really helpful to know about matplotlib is how to create subplots so for example what happens if you want to plot two images next to each other so in matplotlib subplots creates multiple plots and you pass it number of rows and the number of columns so this here has as you see one row and two columns and it returns axes now what it calls axes is what it refers to as the individual plots so if we now call show image on the first image passing in axes zero it's going to get that here right then we call ax.imshow that means put the image on this subplot they don't call it a subplot unfortunately they call it an axis put it on this axis so that's how come we're able to show an image one image on the first axis and then show a second image on the second axis by which we mean subplot and there's our two images so that's pretty handy so I've decided to add some additional functionality to subplots so therefore I use delegates on subplots because I'm adding functionality to it and I'm going to be taking quags and passing it through to subplots and the main thing I wanted to do is to automatically create an appropriate figure size by just finding out you tell us what image size you want and I also want to be able to add a title for the whole set of subplots and so there it is and then I also want to show you that in it'll automatically if we want to create documentation for us as well, for our library and here is the documentation so as you can see here for the stuff I've added it's telling me exactly what each of these parameters are, their type, their defaults and information about each one and that information is automatically coming from these little comments we call these documents, this is all automatic stuff done by Fastcore and NBDev and so you might have noticed when you look at FastAI library documentation it always has all this info, that's why you don't actually have to call ShowDoc, it's automatically added to your documentation for you I'm just showing you here what it's going to end up looking like and you can see that it's worked with delegates it's put all the extra stuff from delegates in here as well and they all listed out here as well so anyway, subplots so let's create a 3x3 set of plots and we'll grab the first two images and so now we can go through each of the subplots now it returns it as a 3x3 basically a list of three lists of three items so I flatten them all out into a single list so we'll go through each of those subplots and go through each image and show each image on each axis and so here's a quick way to quickly show them all as you can see it's a little bit ugly here so we'll keep on adding more useful plotting functionality so here's something that again it calls our subplots delegates to it but we're going to be able to say for example how many subplots do we want and it will automatically calculate the rows of the columns and it's going to remove the axes for any ones that we're not actually using and so here we got that so that's what getGrid is going to let us do so we're getting quite close and so finally why don't we just create a single thing called showImages that's going to get our grid and it's going to go through our images optionally with a list of samples and show each one and we can use that here you can see we have successfully got all of our labeled images and so we I think all this stuff for the plotting is pretty useful so as you might have noticed they were all exported so in our data sets.py we've got our getGrid, we've got our subplots we've got our showImages so that's going to make life easier for us now since we have to create everything from scratch we have created all of those things so as I mentioned at the very end we have this one line of code to run and so just to show you if I remove many.ai.datasets.py so it's all empty and then I run this line of code and now it's back as you can see and it tells you it's auto generated alright so we are nearly at the point where we can build our learner and once we've built our learner we're going to be able to really dive deep into training and studying models so we've kind of nearly got all of our infrastructure in place before we do there's pieces of Python which not everybody knows and I want to kind of talk about and kind of computer science concepts I want to talk about so that's what 06 Foundations is about so this whole section is just going to talk about some stuff in Python that you might not have come across before or maybe it's a review for some of you as well and it's all stuff we're going to be using basically in the next notebook so that's why I wanted to cover it so we're going to be creating a learner class so a learner class is going to be a very general purpose training loop which we can get to do anything that we wanted to do and we're going to be creating things called callbacks to make that happen and so therefore we're going to just spend a few moments talking about what are callbacks how are they used in computer science how are they implemented look at some examples they come up a lot commonplace that you see callbacks in software is for GUI events so for events from some graphical user interface so the main graphical user interface library in Jupyter Notebooks is called IPayWidgets and we can create a widget like a button like so and when we display it it shows me a button and at the moment it doesn't do anything if I click on it what we can do though is we can add an onclick callback to it which is something which is we're going to pass it a function which is called when you click it so let's define that function so I'm going to say w.onclickf is going to assign the f function to the onclick callback now if I click this there you go it's doing it now what does that mean well a callback is simply a callable that you've provided so remember a callable is a more general version of a function so in this case it is a function that you've provided that will be called back to when something happens so in this case there's something that's happening is that they're clicking a button so now we are defining and using a callback as a GUI event so basically everything in IPI widgets if you want to create your own graphical user interfaces for Jupyter you can do it with IPI widgets and by using these callbacks so these particular kinds of callbacks are called events but it's just a callback alright so that's somebody else's callback let's create our own callback so let's say we've got some very slow calculation and so it takes a very long time to add up the numbers 0 to 5 squared because we sleep for a second after each one so let's run our slow calculations still running oh how's it going come on finish our calculation there we go the answer is 30 now for a slow calculation like that such as training a model this is slow calculation it would be nice to do things like print out the loss from time to time or show a progress bar or whatever so generally for those kinds of things we would like to define a callback that is called at the end of each epoch or batch or every few seconds or something like that so here's how we can modify our slow calculation routine such that you can optionally pass it a callback and so all of this code is the same except we've added this one line of code that says if there's a callback then call it and pass in what where we're up to so then we could create our callback function so this is just like we created a full callback function f let's create a show progress callback function that's going to tell us how far we've got so now if we call slow calculation passing in our callback you can see it's going to call this function at the end of each step so here we've created our own callback so there's nothing special about a callback like it doesn't require its own syntax it's not a new concept it's just an idea really which is the idea of passing in a function which some other function will call at particular times such as at the end of a step or such as when you click a button so that's what we mean by callbacks we don't have to define the function ahead of time we could define the function at the same time that we call the slow calculation by using lambda so as we've discussed before lambda just defines a function but it doesn't give it a name so here's a function that takes one parameter and prints out exactly the same thing as before so here's the same way as doing it but using a lambda we could make it more sophisticated now and rather than always saying awesome we finished epoch whatever we could have let you pass in an exclamation and we print that out and so in this case have our lambda call that function and so one of the things that we can do now is to again we can create a function that returns a function and so we could create a make show progress function where you pass in the exclamation we could then create and there's no need to give it a name actually let's just return it directly we can return a function that calls that exclamation so here we are passing in nice and that's exactly the same as doing something like what we've done before instead of using a lambda we can create an inner function like this so here is now a function that returns a function this does exactly the same thing okay so one way with a lambda one way out of lambda and one of the reasons I wanted to show you that is so I can so many here is that we can do exactly the same thing using partial so with partial it's going to do exactly the same thing as this kind of make show progress it's going to call show progress and pass okay I guess so this is again an example of a function returning a function and so this is a function that calls show progress passing in this as the first parameter and again does exactly the same thing okay so we tend to use partial a lot so that's certainly something worth spending time practicing now as we've discussed Python doesn't care about types in particular and there's nothing about any of this that requires CB to be a function it just has to be it just has to be a callable a callable is something that you can call and so as we've discussed another way of creating a callable is defining dunder call so here's a class and this is going to work exactly the same as our make show progress thing but now as a class so there's a dunder in it which stores the exclamation and a dunder call that prints and so now we're creating a object which is callable and does exactly the same thing okay so these are all like fundamental ideas that I want you to get really comfortable with the idea of dunder call dunder things in general partials classes because they come up all the time in pytorch code and in the code we'll be writing and in fact pretty much all frameworks so it's really important to feel comfortable with them and remember you don't have to rely on the resources we're providing if there are certain things here that are very new to you google around for some tutorials or ask for help on the forums finding things and so forth and then I'm just going to briefly recover something I've mentioned before which is star args and star star quags because again they come up a lot I just wanted to show you how they work so if we create a function that has star args and star star quags nothing else and I'm just going to have this function just print them now I'm going to call the function I'm going to pass 3 I'm going to pass a and I'm going to pass thing 1 equals hello now these are passed what we would say by position we haven't got a blah equals they're just stuck there things that are passed by position are placed in star args if you have one it doesn't have to be called args you can call this anything you like but in the star bit and so you can see here that args is a tuple containing the positionally passed arguments and then quags is a dictionary containing the named arguments so that is all that star args and star star quags too and as I say there's nothing special about these names I'll call this a I'll call this b okay and it will do exactly the same thing okay so this comes up a lot and so it's it's important to remember this is literally all that they're doing and then um on the other hand let's say we had a function which takes a couple of okay let's try that print a actually we'll just print them directly a b c okay we can also rather than just using them as parameters we can also use them when calling something so let's say I create something called args again it doesn't actually called args called which contains 1,2 and I create something called quags that contains a dictionary containing c,3 I can then call g and I can pass in star args comma star star quags and that's going to take this 1,2 and pass them as individual arguments positionally and it's going to take c,3 and pass that as a named argument c equals 3 and there it is okay so they're kind of two linked but different ways that use star and star star um okay now here's a slightly different way of doing callbacks which I really like in this case I've now passing in a callback that's not callable but instead it's going to have a method called before calc and another method called after calc and so now my callback is going to be a class containing a before calc and after calc method and so if I run that you can see it's there it goes okay and so this is printing before and after every step by calling before calc and after calc so a callback actually doesn't have to be a callable, doesn't have to be a function a callback could be something that contains methods um so we could have a version of this which actually as you can see here um it's going to pass in to after calc both the epoch number and the value it's up to but by using star args and star star quags I can just safely ignore them if I don't want them right so it's just going to chew them up and not complain if I didn't have those here it won't work see because it got passed in val equals and there's nothing here looking for val equals it doesn't like that so this is one good use of star args and star star quags is to eat up arguments you don't want um or we could use the argument so let's actually use epoch and val and print them out and there it is so this is a more sophisticated callback that's giving us status as we go skip this bit because we don't really care about that okay so finally let's just review this idea of dunder which we've mentioned before but just to really nail this home anything that looks like this underscore underscore something underscore underscore something is special and basically it could be that python has to find that special thing or py torch has to find that special thing or numpy has to find that special thing but they're special and these are called dunder methods um and some of them are defined as part of the python data model uh and so if you go to the python documentation it'll tell you about these various different here's wrapper which we used earlier here's in it that we used earlier so they're all here py torch has some of its own numpy has some of its own so for example if python sees plus what it actually does is it calls dunder add so if we want to create something that's not very good at adding things it actually already also always adds 0.01 to it then I can say sloppy adder 1 plus sloppy adder 2 equals 3.01 so plus here is actually calling dunder add so um if you're not familiar with these click on this data model link and read about these specific 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 um methods because we'll be using all of these in the course so I'll try to revise them when we can but I'm generally going to assume that you know these um a particularly interesting one is gedatra we've seen sedatra already gedatra is just the opposite take a look at this here's a class it just contains two attributes a and b that are set to 1 and 2 so I'll create an object of that class a dot b equals 2 so set b to 2 okay now when you say a dot b that's just syntaxed sugar basically in python what it's actually calling behind the scenes is gedatra it calls gedatra on the object and so this one here is the same as gedatra a comma b which hopefully oh actually that'll be yeah so it calls gedatra a comma b and this can kind of be fun because you could call gedatra a comma and then either b or a randomly how's that for crazy so if I run this 2, 1, 1, 1, 2 as you can see it's random so yeah python is such a dynamic language you can even set it up so you literally don't know what attribute is going to be called now gedatra behind the scenes is actually calling something called done to gedatra and by default it'll use the version in the object based class so here's something just like a it's got a and b defined but I've also got done to gedatra defined and so done to gedatra it's only called for stuff that hasn't been defined yet and it'll pass in the key the name of the attribute so generally speaking if the first character is an underscore it's going to be private or special so I'm just going to raise an attribute error otherwise I'm going to steal it and return hello from K so if I go b.a that's defined so it gives me one if I go b.foo that's not defined so it calls gedatra and I get that hello from foo and so this gets used a lot in both fast ai code and also hugging face code to often make it more convenient to access things so that's how the gedatra function and done to gedatra method work okay so I went over that pretty quickly since I know for quite a few folks this will be a review but I know for folks who haven't seen any of this this is a lot to cover so I'm hoping that you'll go back over this revise it slowly, experiment with it and look up some additional resources and ask on the forum and stuff for anything that's not clear remember everybody has parts of the course that's really easy for them and parts of the course that are completely unfamiliar for them and so if this particular part is completely unfamiliar to you it's not because this is harder or going to be more difficult or whatever it's just so happens that this is a bit that you're less familiar with or maybe the stuff about calculus in the last lesson was a bit that you're less familiar with there isn't really anything particularly in the course that's more difficult than other parts it's just that you know based on whether you happen to have that background so yeah if you spend a few hours studying and practicing you'll be able to pick up these things and yeah so don't stress if there are things that you don't get right away just take the time and if you if you do get lost please ask because people are very keen to help if you've tried asking on the forum hopefully you've noticed that people are really keen to help alright so I think this has been a pretty successful lesson we've got to a point where we've got a pretty nicely optimised training loop we understand exactly what data loaders and data sets do we've got an optimiser we've been playing with hugging face data sets and we've got those working really smoothly so we really feel like we're in a pretty good position to write our generic learner training loop and then we can start building and experimenting with lots of models so look forward to seeing you next time doing that together ok bye