 Yeah, so I guess the thing I've kind of learned, I guess I should, I can share my screen and turn and say what I feel like I've learned from this sweet script, okay, so on pets, this is the top 15, after running a few more sweeps to see if there's some better options. But basically, I think what's interesting is in the top 15 on pets, you have like a bit of everything, right? You've got ResNet RS and ResNet B2 distilled. You've got, you know, VIT, Transformers Ones, VIT, Swin, Mobile, VIT. Classic ResNet 26D, still in there, actually the fastest of all, which maybe reflects like maybe that's the most well optimized kind of because it's been around a while and videos probably worked hard on that. RegNet, GPU memory, like, yeah, it's kind of interesting the way there's all these very different approaches, but they all kind of end up in there. I think, you know, one thing interesting when you look at on the graph is this, these green ones is VIT, which they kind of cut off here at this fit time of 150. And I think that's because the larger vision Transformers only work on larger images and I only did 224 pixel images. So if I included larger images, we might see VIT at the very best in terms of error rate. I was pleasantly surprised to see that some of the VITs, you know, didn't actually use much memory. And they were pretty fast. And it's interesting, like, because that was quite a while ago, right, the vision Transformers paper came out quite a while ago. And it was kind of like the way I remember it, it's like just tacking together the first and most obvious thing people thought of when it came to kind of making Transformers work on vision. And yeah, the fact that it still works so well seems that people haven't improved on it much other than, perhaps, SWIN. So I guess they're my takeaways. On the, like, small and fast models, yeah, I guess I hadn't really looked much at ResNet IRS before. So it's interesting to see that one's right up there. And then on planet, it doesn't look that different, except, you know, it really is just VIT, SWIN and COMF next. In fact, entirely is VIT, SWIN and COMF next. The whole top 15. Yeah. So I guess that's some of the things I noticed. That's cool. Yeah, look at this one. You were able to try so many different kinds of models. Yeah, round in less than 24 hours on my three GPUs. Something like 1200 runs or something. Yeah, I thought this was interesting. Look at this one, VIT small patch 32. Memory usage is amazing. And speed is fastest. And yeah, third from the top of the smaller, faster ones. I have a question. Yeah. Can you hear me? Sure. Yes. Hi. Just a small question. So you mentioned the small and large. Does large mean the size of the image? Sometimes when I was talking about VIT, yes it does. For VIT, they, so I was specifically just doing 224 by 224 pixel images. And pretty much all the transformer based ones are fixed. They can only do a one size. And so the VIT models, I don't think they have, so there's two meanings of larger here. There's larger as in more bigger models, like more layers, wider layers. And I think all the VIT models, which are larger or capacity slower, probably more accurate, I think only run on literally on bigger images, which is like, doesn't have to be that way. That's just what they happen to do. So that's why, yeah, VIT only goes this far. There are bigger VIT models that should be more accurate, but they don't work on 224 by 224 pixel images. Is there like a good threshold to know when it's a good time to use large versus the small one? Or is it just all experimental? Larger images. You basically do everything on the smaller images you can, as long as it gets your reasonable results, because you want to iterate quickly. And then when you're finished and you want to, like in the case of Kaggle, you want the best accuracy you can. So you try bigger and bigger images to see what you can get away with and keep doing that as long as the accuracy improves. In a production environment, it's kind of similar, but you make them bigger and bigger until the latency of your model is too high. And you find the right trade-off between model latency and accuracy. Generally speaking, large images will give you more accurate results. But it's a bit slower. Correct. I mean, a lot slower, right? Because if you increase it from 224 to on one side to 360, then you're going from 224 squared to 360 squared. So you end up with a lot more pixels. So like for, for example, an application would be like for object detection, for video, for example, like live video, then even if it's a larger size, it will still be good to do small because it's faster. Certainly for your iterating, there's no need to have a really accurate model for your iterating because you're trying to find out what data pre-processing works best or what architecture works best or whatever. So yeah, there's no point using large models and large images generally, as long as they're big enough to get the job done reasonably well. Okay. Thank you very much. Yeah, no worries. If you were, you were going to do this in like a business context, let's say, and someone said, Hey, Jeremy, I want to, you know, have a vision model. Would you, would you kind of just pick a reasonable one and just kind of go with that? And if the results were good, do you just use it or? I would do exactly what I'm doing here, you know, which is to try a few things on, you know, small, fast models on small images on a small subset of the data to find out what data pre-processing to use and what architecture to use. And then I would look at, yeah, one of the constraints in terms of operationalizing this. How much RAM do we have? How much latency can we get away with? How expensive is it going to be to then scale it up to the point at which, you know, we're getting acceptable results using acceptable resources. So yeah, it wouldn't look very different at all to, you know, a Kaggle competition in terms of the modeling. But then there'd be a whole piece of analysis around user requirements and costs and stuff like that, you know. I see. Jeremy. I tried doing what you're doing was going from smaller to larger models. And mine somehow started out with much lower accuracy. Is it just a fluke or I had several issues happen that and then it means you pressed the wrong buttons somehow. So I will, I think I already have, haven't I shared my notebooks? So if I haven't, I'll certainly share them today. Maybe I haven't yet. So I share my notebooks today. So what I suggest you do is like, like, you know, go from mine, make sure you can rerun them and then look at yours and see how they're different and then figure out where you went wrong. But also, like, yeah, you know, I always tell people when debugging like to look at the inputs and the outputs. So what predictions are you making? Are you always predicting zero? For example, you know, did you run the LR finder to find what learning rate works well? Yeah, stuff like that. Thank you. No worries. Jeremy, on the question for the professional, I tried to after your workflow last week, you did mention learn dot export. And then later on, we can learn dot no, I found that it's maybe that's about the because when I load them, they actually looking for the p suffix is PTH. But when we save and they say to the models folder, and then you can give whatever name you want. But when you want to load them, they actually have the suffix at the end. So I'm not sure there's some, some. Yeah, so just to make sure you save it with a dot PTH suffix. But yeah, it certainly would make sense. You're asked to do that automatically. But in the documentation, it seems it's a safe in pickle. The format is pickle. But the extent, but, but this is just pytorch. So pytorch uses, you know, a variant of the pickle format. And they normally use PTH as their extension. So yes, it is pickle. And just it does use the PTH extension. Okay. Jeremy, when you're opening this window, you typed in like something in the in the, in the URL bar to get, what were you? Oh, I just typed my port number because I know that the only thing that has 8888 that is. Oh, okay. Yes, some magic going on. Yeah. Nothing like that. Let me just shut these down. I did have one more idea about this competition, which is there was that CSV file, right? Yes. No, yeah, train dot CSV. That's right. Right. And it has this variety thing. And I want for variety diff dot variety. So there's 10,000 rows and 7,000 of them are one variety, but there are 3000 rows that contain other varieties. So the only, you know, idea I had for this was something which is a bit counterintuitive, but those of you that did, I can't remember, 2017 or 2018, fast day, I might remember. Sometimes if there's two different things, just in this case, what kind of rice is it and what kind of disease is it? Sometimes trying to get your model to predict both of those things makes them better at both. So if we tried to get our model to predict what kind of disease is it and what kind of rice is it, it might actually get better at predicting the kind of disease, which might sound counterintuitive, right? I find it counterintuitive because it sounds like it's got more work to do. But you're also giving it more signal, like there's more things you're teaching it to look for. And so maybe if it knows how to recognize different types of rice, it can use that information to also recognize how different kinds of rice are impacted by different diseases. So I have no idea if that's going to be useful or not, but I thought it would be an interesting exercise to try to do that. So that's what I thought we might have a go at today, if that sounds of interest. Which also is frankly a good exercise in delving into models in a way we've never done before. So this is going to be much more sophisticated than anything we've done with deep learning before, which means it's very much up to you folks to stop me anytime something slightly confusing, because I actually want everybody to understand this. And it's a really good test of how well you understand what's going on inside a neural network. So if you're not understanding it, that's a sign I haven't explained it very well. So let me try, let's have a look. Okay, so one thing I just did yesterday afternoon was I just trained a model three times to see what the error rate was, because I wanted to get a sense of like how much variation is there. And I found if I use a learning rate of 0.02 and just train for three epochs, I seem to pretty consistently get reasonable results. So here's something I can now do in two minutes to see how I'm going. So I thought that would be good. So this is one thing I really like doing. People are often very into doing reproducible training where they have like set the seed for their training and run the same thing every time. I think that's normally a bad idea, because I actually want to see like what the natural variation is. And so if I make a change, I don't want to know whether that's, you know, changes. The difference I see in the result is might be just due to natural variation or it's actually something significant. So that's why I did this. The natural variation is really large. Does that every week? That's going to be tough. Yeah, that's going to be tough to see like, did I improve things? But then if the natural variation is so large that improvements are invisible, then trying to improve it seems pointless, right? Because it sounds like you haven't really found a way to stably train something. And normally that happens because my learning rate is too big. So if you try this yourself and bump the learning rate up to 0.04, you'll see like at least for me, I got like 5%, 6%, 5.5%. You know, it's like all over the place. So yeah, training for more epochs at a lower learning rate will generally give you more stable results. And there's a compromise because doing more epochs is slow. So that's why I was trying to find a learning rate, the number of epochs which is fast and stable. You could also try using a smaller subset of the data or I don't know, like in the end, sometimes things just will be slow in such as life. But most of the time, I find I can get a compromise and I certainly did here, I think. With six epochs at half the learning rate, I certainly can do better. I can get to 4%, you know, rather than five. But that's okay. I just want something for testing. One thing that was always counterintuitive to me that I think you talk about is like these improvements that you make on the small scale like show up on the larger scale. Like always. Oh yeah, absolutely. Basically, they pretty much always will. Yeah, because they're the same models with just more layers or wider activations. Yeah, if you find something that's going to, some pre-processing step that works well on a Confnext tiny, it's going to work also well on a Confnext large 99.9% of the time. Most people act as if that's not true, I find, but you know, like in academia and stuff. I feel like you have to do a full sweep of everything. Yeah, which like most people just never think to try. But like intuitively, of course, it's the same, you know, why wouldn't it be the same? Like it's, it is the same thing, just scaled up a bit. They behave very similarly. I mean, it's hard to argue with you because it works. So, I mean, but like you wasn't able to. Yeah, no, you can argue that it's not intuitive. That's fine. But like, I feel like the only reason it would be not intuitive is because everybody's told you for years that it doesn't work that way. Do you know what I mean? Yeah, nobody told you that. I think it could be like, yeah, of course it works that way. That's fair. Okay, so, okay, let's do something crazy. Let's, let's actually look at a model. So inside our learner, there's basically two main things. There's the data loaders, learn.deals, and there's the model, learn.model. Okay, and we've seen these before. And if you've forgotten, then yeah, go back and have a look at the older videos from the, from the course. So the model itself basically, yeah, it's got like things in it. And in this case, the first thing in it is called a timbody. And the timbody has things in it. And the first thing in it is called model. And then timbody.model has things in it. And the first thing's called the stem. And the next thing's called the stages and so forth. Right. So you can see how it's this kind of tree. And we actually want to go all the way to the bottom. So the basic top, the very, there's two things in it at the very top level. There's a timbody. And there's a thing here, which doesn't actually have a name, but we always call it the head. And so the body is the bit that basically does all the hard work of looking at the pixels and trying to find features and stuff like that. That's something we call a convolutional neural network. And at the very end of that, it spits out a whole bunch of information about those pixels. And the head is the thing that then tries to make sense of that and make some predictions about what we're looking at. And so this is the, this is the head. And as you can see, the head is pretty simple. Where else the body, which goes from here all the way to here is not so simple. And we want to predict two things, what kind of rice it is and what disease it has. Now look at the very, very, very last layer. It's a linear layer. So a linear layer, if you remember, is a, it's just something that does a matrix product. Okay. And the matrix product is a matrix which takes as import 512 features and spits out 10 features. So it's a 512 by 10 matrix. So like, let's do a few things. Let's grab the head, right? So the head is the index one thing in the model. So there's our head. Quick question. You, you know, I've seen these model sort of whatever you want to call it x-rays a lot. Have you ever wanted to, like, is there a way that maybe I don't know about to see the shape of the tensors as a flow, the shape of the data as it flows through the model? Yeah. Absolutely. There it is. Oh, I didn't even know about this. Okay. Dude, you should try watching some first day eye lectures. Yeah. So this will tell you how many parameters there are. And yeah, the shape as it goes through. And so the key thing is, since we're predicting 10 probabilities, one probability for each of the 10 possible diseases, we end up with a shape of 64 by 10. The 64 is because we're using a batch size of 64. And for each image, we're predicting 10 probabilities. It's very thorough. It shows the callbacks. Wow. I don't remember this. Yeah, we don't look around, man, here in first day eye with Thara. So, yeah, so that's the question because that's, that's a great thing for us to look at. So yeah, so in the head, let's create something called the last layer, which is going to be the end of the head. And obviously, the very end of the head. So our last layer is this linear thing, right? And so this is, so we could actually see the parameters themselves. Oh, I hope it does that. A lot of these things are generated lazily, right? So when you see this thing saying generator object, it's just, it's, it's literally the word is lazy. It's too lazy to actually bother calculating what it is. So it doesn't bother until you force it to. So if you turn it into a list, it actually forces to generate it. Okay. So it's a list of one thing, which is not surprising, right? There it is. And so the last layer parameters is a matrix, which is, there we go, 10 by 512. So it's transposed to what I said, but that's okay. So we're getting 512 inputs. And when we multiply this by this matrix, we end up with 10 outputs. So, oh, my daughter's wanting me. Sorry about that. Home schooling transitions always require some input. All right. So we're going to basically have to, if we got rid of this, right, then our last linear layer here would be taking in 1,536 features and spitting out 512 features. So what we could do would be to, yeah, delete this layer. And instead take those 1,536, sorry, this 512 features and create two linear layers. One with 10 outputs as before. And one with however many varieties there are. Which... Oh, Jeremy. Yes. So I was just contemplating whether, back in that linear layer where it was, the output was 10 by 512. Yes. Not the output. That's the matrix. Oh, sorry. So the output was 10. The output is 64 by 10. Yes. So when you want to mix diseases with rice in the output, I was wondering whether that might be like, I don't know how many rice tops there are. So there's five rice tops. There's 10. Okay. So that 10 might be a 10 by 10 matrix output? No. Two by 10. So you want one probability of what type of race is it and one probability of what disease does it have? Okay. Yeah. So just two by 10. So let's go ahead and do the easy thing first, which is to delete the layer we don't want. So this says sequential. So sequential means like literally PyTorch is going to go through and calculate this and take the output of that and pass it to this and take the output of that and pass it to this and so forth. Right. So if we delete the last layer, that's no problem. It's just won't ever call it. So I can't quite remember if we can do this in sequential, but let's assume it works like normal Python. We should be able to go delete h minus one. That looked helpful. Yep, we can. Okay. So it's got normal Python list semantics. So this model will now be returning 512 output. So we want to basically wrap it in a model which instead has two linear layers. So there's a couple of ways we can do this, but let's do it like the most step-by-step way, which just uses. Can you wrap it? So we're going to create a class. Right. So in PyTorch, modules are classes. Right. So we're going to take a class which includes this model. Right. So let's call this class um disease and type classifier. Right. Now that is a, so PyTorch calls all things that it basically uses as layers in a neural net module. So this is a neural net module. Now if you haven't done any OO programming in Python before, it would be very helpful to read a tutorial about basic Python OO programming because PyTorch assumes that you are pretty familiar with it. If you've done any kind of OO programming before, I'm going to work on the assumption you have. Then the constructor, there's a lot of weird things in Python. The constructor is called Dundee in it. So this is, so Dundee means underscores on each side. And it always passes in the object being constructed or the object recalling it on first. So we'll give that a name. And so we're basically going to create two linear layers. And one easy way to create the correct kind of layer would be self.l1 equals, we could do that. So one question is like, I understand this subclassing thing. Is there some other way that you could push two additional layers onto the existing thing? Or does that not make any sense? Yeah, we could try that. Let's see if we get this one working. And then we'll try it the other way. How about that? That'd be fine. And then we could also try using fast.ai has a create head function as well. So we'll see how we go. So here's linear layer number one. And as you can see, I literally just copied and pasted. It's inside the nn submodule. So I just had to add that. But the representation of it is nice and convenient and that I can just copy and paste it. In real life, we'd have it normally write the in features and out features. Everybody kind of knows that the first two things are in and out features. So I might just make it look more normal. So then the second layer, and then maybe we'll just give ourselves a note here. So we'll use this one for rice type and we'll use this one for disease. Okay, so at this point, once we create this, it's going to be these things are going to be in it. And then we also need to wrap the actual model, right? So we'll just call that M and we'll just store that away. M equals M. So what happens is when PyTorch calls, like basically modules act exactly like functions in Python terms are called callable. So they act exactly like functions. But the way PyTorch sets it up is when you call this function, which is actually a module, it will always call a specially named method in your class. And the name of that is forward. So you have to create something called forward. And it will pass the current set of features to it, which I normally I always call X. I think most people call it X, if I remember correctly. So this is going to contain a 64 by 512 tensor. Okay. So no, it's not going to contain a 64 by 512 tensor. It's going to contain an input tensor, because this is going to be our model. So we need to create the 64 by 512 tensor from it by calling the model, like so. So results, in fact, what we often do is we'll go X equals because we're kind of making it like a sequential model. We're going X equals. Oh, you know, another idea is we, something else we can try is we can make this whole thing as a sequential model. Let's do that next. So this is probably going to be the least easy way is what I'm doing it here, the most manual way. So first of all, call the original model. And then basically we're going to create two separate outputs, the rice type output and the disease type output. And so then we could return both of them. So that's, so what I would then do is I would say, let's create a new model. So disease type classifier. So we'd create it like this. And we need to pass in the existing model, which is this thing here, right? Oh, yes. And you always have to call the superclasses done to in it to construct the object before you do anything else. There's a lot of annoying boilerplate in Python. Oh, I'm afraid. Okay, there we go. Okay. I just wanted to point out how cool it is that you created the model with the last layer by doing it on it. That is so cool. I didn't know this existed. And I had to look at something PyTorch code because how can you delete stuff like that? It's not a list and it has all the functionality to support this. Yeah, I mean, yeah, I kind of like, it's nice. I generally find I can work on the assumption that PyTorch classes are well designed because it turns out they generally are. And so to me, a well, you know, a well designed collection class would have the exact same behavior as Python. For example, fast cause L collection class has the exact same behavior as Python. So yeah, PyTorch is very nicely, very nicely made, I find. So that thing where you deleted the thing, that's a PyTorch thing? That's not a fast thing, I think. Yeah, it's a PyTorch thing. This is just a regular, this is just part of the sequential, yeah, the sequential class. So, yeah. Okay, nice. Well, that's cool. Then the way I did this, I would explode the model into layers and then reconstruct it using sequential without the layers that I need. But hey, you can actually do this. This is so nice. Yeah, exactly. So let's create a new learner. Just be a copy of the last one, right? And then let's set the model to our new model. So we've now got a learner that contains our new model. So that's cool. I guess at this point, I guess we should be to get some predictions, right? Wait, one, oh yeah. So the main thing I'm waiting for is the loss function. Yeah, yeah, yeah, yeah. We're going to get there. Let's do this first. As opposed to you're doing the predictions just to verify the plumbing is working. Exactly. Okay. Okay. And it's not. Stack trace input type. Oh, right, right, right. Floating point 16. Okay. I think to simplify things, we're going to remove the dot to FP16 and we'll worry about that later. Is that some kind of mixed precision? That is exactly mixed precision, yes. So let's just pretend that doesn't exist for a moment and we'll come back to that. What on earth just happened? All right, so let's go back even simpler. So it's useful to like, if you talk about what's going through your mind when you see this error. Yeah, I want to create like a minimum reproducible example. So let's just like create a learner and then copy it and then not change it at all. Yeah, I can't even do that. All right. So this, so then I would be like, okay, let's not even copy it. But instead, let's just call it directly. What to learn to. Okay, that works. So doing a copy apparently doesn't work. Though generally speaking, I would be inclined to change copy to deep copy at this point. Wait, but you still got the stack trace in the end. Most things though. Oh, why are we still getting a half precision somewhere? That's curious, isn't it? Oh, it's probably because our data loaders got changed. Somehow let's recreate the data loaders as well. It nearly made it, didn't it? Yeah, it looked like it was working. And then at the very end, oh, there we go. How are we getting half precision? What on earth is making it half precision? That's odd. Do you think resetting your kernel and? I think so, yeah. I don't see how this would help, but there's never any harm, right? And be one of the culprits somehow. Yeah, maybe. Yep, that's exactly what happened. Okay, well, that shouldn't happen. So that's not a great sign. Wait, you know what happened? I don't know what happened like some, like, like Redex, it's something has some state that's keeping things in half precision, which, yeah, shouldn't be happening. And so at some point, we can try to figure out what that is, but not now. Okay, so let's make a copy of the learner and chuck this into it. Actually, before we do, we're just going to use the copy directly. So we'll just make as few changes each time as possible. Okay, that worked. What are you looking for in this? Just saying, like, I'm trying to see why it's returning. I thought that was a decoded thing. So I was just wondering why it's being returned. Says here with just decoded equals false. Oh, they're the targets. That's why. That's why. That's why. So it actually returns preds comma targets. That's what it's returning. Okay. All right. So now it's looking quite nicely. And so I would be now inclined to, like, create a really minimal model, which is like a, I'm going to call dummy classifier. And all it does is it calls the original model. And let's see if that works. Because if this works, then we're at a point where we can then try out new models, right? It's interesting. I would have just gone straight back to the full model and tried that next, but you're slowly walking away. Yeah, I probably should have done it that way in the, you know, done it more slowly in the first place, but I got into over enthusiastic. No, I mean, seems okay. Great. Oopsie, Daisy. We could do this. We could do this inside our model, I guess. It's a bit, this is all pretty hacky, but we're just trying to get something working. So the head is the number one thing in the model. The last layer is the end of the head. Don't need that. We delete that last thing. Yes, we don't need that. Okay. So we might as well inline that. Keep it simple. Okay. So we delete the head and store it away. Okay. So we're going to create, we create our learner. Okay. So create a learner, create our class. This time we call the disease, et cetera, classifier. Set the model to that. Okay. Cool. Okay. So we're now at the point where it's trying to calculate loss and it has no way to do that. Slightly surprised it's trying to calculate loss at all. Since with loss is false. That's fine. So, okay. So the loss function to remind you is the thing which is like a number which says how good is this model? And the loss function that we were using was designed on something that only returned a single tensor and we're returning a tuple of tensors. And so that's why when it tries to call the loss function it gets confused which is fair enough. So the loss function is another thing that is stored inside the learner. Okay. There it is. So what we could do is we could, what's the best way to do this? One thing would be we could look at the source code for vision learner and see how that creates the loss function. It just passes it to learner. Let's look at that. Okay. So it's trying to get it from the training dataset. So the training dataset knows what function, loss function to use which is pretty nifty. So to start with we could, let's create a loss function. So let's create a really simple loss function. So disease and type classifier loss. So we're going to be passed some, we're going to be passed predictions and actuals. Okay. So we're going to be passed predictions or sometimes called the targets. And what we could do is we could just say like for now let's say the current loss function is whatever loss function we had before. And let's just try to predict, let's just try to get it so it's working just on the disease prediction which is this bit here. So predictions will be a tuple. So this will be rice predictions and it will be disease predictions. That'll be what's in our preds. And so just to start with let's just keep getting this, get this so it keeps your works on disease predictions. So we'll just return whatever the current loss function was and we'll call it on the disease predictions. Okay. So now we need to go learn to dot loss function is that function we just created. That's interesting. Sorry when you did that to like set the current loss, set the loss function and the learner didn't want this mess up your code. I guess like you need to create the learner again. Oh never mind. Sorry. Okay. Jeremy, do you want to go up a little bit back to your loss function? Is it you actually want to pass the predate target? A disease target? Okay. No, no, sorry. So I'm just going to ignore the rice type prediction for now and just try to get it our new thing working to continue to do exactly what it did before but with this new structure around it. Do we have to split the targets as well? No, because at the moment our targets we haven't included anything other than just the diseases in the targets. So yeah, we're going to have to change our data loading as well to include the rice type as well but we haven't done that yet. Okay. Yeah. Okay. Yes. Okay. So then we've got metrics. So metrics are the things that just get printed out as you go and we don't yet have a metric that works on this. So a very easy way to fix that is just to remove metrics for now. Great. Now preds.shape shouldn't work. Good. It doesn't because now we've got two sets of predictions, right? We've got a tuple because that's the predictions is just whatever the model creates and the model is creating two things, not one. So we've now got rice predictions and disease predictions. So that's actually pretty good progress, I think. But for those of you who are involved in fast AI development, it's pretty clear to me in trying to do this that this is far harder than it should be and it feels like something that should be easy to do. I used to see Andrew using the magic, the percentage and then have a patch and then just add some little thing on top of it. Yeah. It's not so much about patching. It's about, I feel like there might even be some multi loss thing. If there's not, I feel like this is something we should add to fast AI to make it easier. Can you explain a little bit about why the loss is stored in the data loader, like how that is a good thing? Yeah, sure. So generally speaking, what is the appropriate loss function to use or at least a reasonable default? It depends on what kind of data you have. So if your data is a single continuous output, you probably have a regression problem. So you probably want mean squared error. If it's a single categorical variable, you probably want cross-edge be lost. If you have a multi categorical variable, you probably want that log loss without the softmax and so forth. So yeah, basically by having it come from the data set, means that you can get sensible defaults that ought to work for that data set. I see. So that's why we generally most of the time don't have to specify what loss function to use unless we're doing something kind of non-standard. All right. So we're about to wrap up. The last thing I think I might do is just try to get put this back and we can do it exactly the same way, which is to say DTC error. So this is a bit higher. Oh, no, it's okay. We're done here. So we'll just return error rate on the disease predictions. Learn 2.metrics equals DTC error. Cool. So I guess we should now be able to do things like learn 2.LR find for example. And this should, we should be able to just replicate our disease model at this point, because we're not doing anything with this extra rice type thing yet. And fine tune. The one epoch, 0.01. And while I wait for that, let's see if I search like fast AI multiple loss function or something. 2018 is going to be too long ago. Nothing there. Multitask learning. I gotta go, but yeah, thanks a lot. No worries. Okay. So it looks like this person did something pretty similar. They created their own little multitask loss wrapper. All right, cool. Well, I think we're at a good place to stop. That's, we've got back. So it's not totally broken. So that's good. Next time we will try and plug this stuff in. Anybody have any questions or anything before we wrap up? Just a quick question, Jeremy. It says the valley is 0.001, but you use 0.01 for the fine tune? Yeah, I'm not sure. I can kind of see this is pretty, this is, it's picked out something pretty early in the curve. I thought something down here seems more reasonable, just by balling it. It tends to recommend like rather conservative values. So, yeah, I tend to kind of look for the bit that's, I kind of look for the bit that's as far to the right as possible, but still looks pretty steep gradient. I guess it's my roll of thumb. Cool. Thank you. All right. See you again. Thanks.