 I've got a question. Is there a way that machine learning can actually find the sort of conditional probabilistic segments that are say in heterogeneous data? I am having trouble parsing that question. Can you give like an example or something? Yeah. Okay. All right. Well, I'm modeling with road surface friction with road risk, brother, and quite immediately there's, you know, there's a set of stereotypes in road analysis. And, you know, we all know them as highways, freeways, urban turrets, and they actually go through a series of stages, almost like states. And each of the states has got a sort of conditional probabilistic relationship between a set of predictors and the actual response variable, you know, the crash response variable. Is there anything that like that in deep learning? So how is that different to a normal predictive model? Like, I mean, all predictive models are conditional probabilities, right? What's the, yeah. Well, I mean, it takes, say, something like XG Boost for example, and you want to predict risk of a given road. So it'll give you value. I think you've got no idea as to what's happened inside of the model. And, you know, I'm really, we're really interested in that because once you find the distributions, you can then start to do some quality testing on whether they actually follow the domain or, you know, whether your segmentation process that actually determines predictions is good or not. And so, you know, why rather than say predicting some sort of crash rate or risk or whatever, I'm really looking for those probabilistic distributions and learning the service. So all deep learning models will return a set of probabilities. That's what their, that's what their final layer returns. And then we decode them by taking the arg max across them. But there's nothing to stop you using those probabilities directly. But I'm probably misunderstanding your question. It's a little abstract for me to understand. Like, I mean, I know there's there's lots of things you can do with, you know, confidence intervals and whatnot. But it really depends a great deal on the specific details of the application, what you're trying to do and how you're trying to do it. Good question, Daniel. I'm just, are you talking about probability of an incident or risk related to the right surface? So you're going to need some sort of tabular data that has the occurrences with each right surface that you're trying to And why wouldn't XGBoost give you that if you had a predictive model of incidents? In my mind, at least one of the biggest advantages of XGBoost is the fact that it only gives you a single set of variable effects. Whereas in what we're dealing with, we've got, say, some really high, high crash roads that have sort of a different conditional probability relationship between the predictors and the response compared to, say, the average. And so XGBoost does an excellent job in making the predictions, but you've got no idea as to the group of instances that they're actually making the prediction or the actual variable effects. Okay, so I think I understand your question now. And I think the answer is actually it does. And what I suggest you do, if you haven't already, is read the chapter of the first AI book on tabular modeling. And it will cover something very similar, which is random forests, which is another ensemble of decision trees. And it will show you how to get exactly the kind of insights that I think you're looking for. And all of the techniques there would work equally well for random forests. And they also work equally well for deep learning. So maybe after you've done that, you can come back and let us know whether they've helped. Yeah, well, I've sort of played with random forests a bit, and it doesn't really give you what I'm looking for. I strongly suggest you read the chapter before you do that. I will. Because I'm pretty sure it will. And if it doesn't, that would be very interesting to me. But I've, yeah. In fact, I mentioned to you last time that I'm really looking forward to the, you know, taking the data. Next, sir. Cool. Great. I'll show you what you, I'll show you guys what I've been working on, which has been fun. So the first thing I did, you know, after I got off our last call was I basically just threw together the kind of like most obvious basic steps one would do for a standard image recognition competition, just in order to show people that that can be quite good. And it was actually a little embarrassing because I didn't mean to do this. When I submitted it, it turned out I got first on the leaderboard. So now I feel like I'm going to have to write down exactly what I did because, you know, during an active competition, everybody needs to, needs to share what they're, what they're doing if they share it with anybody semi-publicly. So I thought I'd show you what I did here, but I think this is about to go up quite a lot because, you know, what we're doing here is where they're interesting images for a couple of reasons. One is that they're kind of like things that you see in ImageNet, like their pictures of natural objects, their photos. But I don't think ImageNet has any kind of like categories about diseases. You know, they have categories about like what's the main object in this. So they might have a category about like, I don't know if they do like some different kinds of grass or some different types of, even some different types of, you know, fields or something. But I'm pretty sure they don't have anything about different kinds of crop disease. So it's a bit different to ImageNet, which is what most of our pre-trained models are trained on. But it's not that different. And it's also interesting because nearly all of the images are the same shape and size. So we can kind of try to take advantage of that. And, you know, so when we fine tune a pre-trained model, there's, there's, so let me pull up this Kaggle notebook I just created. So I just published this yesterday. Kind of look at the, what are the best vision models of fine tuning. And so I kind of realized that there are two key dimensions that really seem to impact how well a model can be fine tuned, you know, whether it works well or not, or how it's different. So one is what I just talked about, which is how similar is your dataset to the dataset used for the pre-trained model. If it's really similar, like pets to ImageNet, then like the critical factor is how well does the fine tuning of the model maintain the weights that are pre-trained, you know, because you're probably not going to be changing very, very much. And you're probably going to be able to take advantage of really big accurate models because they've already learned to do almost the exact thing you're trying to do. On the other hand, so that's the pets dataset. On the other hand, there's a dataset called the Planet dataset, which is images of satellite images. And these are not really at all like anything that ImageNet ever saw, you know, they're taken from above. They're taken from much further away. There's no single main object. So a lot of the weights of a pre-trained model are going to be useless for fine tuning this because they've learned specific features like, you know, what does text look like, what do eyeballs look like, what does fur look like, you know, which none of which are going to be very useful. So that's the first dimension. The second dimension is just how big the dataset is. So on a big dataset, you've got time, you've got epochs to take advantage of having lots of parameters in the model to learn to use them effectively. And if you don't have much data, then you don't have much ability to do that. So you might imagine that deep learning practitioners already know these answers of how do we, you know, what's the best models for fine tuning. But in fact, we don't. As far as I know, nobody's ever done an analysis before of which models are the best for fine tuning. So that's what I did over the weekend. And not just over the weekend, but really over the last couple of weeks. And I did this with Thomas Cappell, who works at Weights at Biases, another fast AI community member slash alumni. And so what we did was we tried fine tuning lots of models on two, two datasets, one which has 10 times over 10 times less images. And where those images are not at all like ImageNet, that being the Kaggle planet sample. And one which is a lot like ImageNet and has a lot more images that being IoT pets. And I kind of figured like if we get some insights from those two, perhaps there'll be something that we can leverage more generally. So Thomas wrote this script, which it's 86 lines, but really there's only like three or four lines and there will be lines you recognize, right? The lines are untar data, image data loaders from blah, and then vision learner, DLs, model, etc. So there's the normal like three or four lines of code we see over and over again. And then, you know, the rest of it is basically lets you pass into the script different choices about batch size epochs and so forth. And that's about it. So this is like how simple the script was that we used. And then partly because Thomas works at weights and biases and partly because weights and biases is pretty cool, we used weights and biases then to feed in different values for each of those parameters. So this is a YAML file that weights and biases uses where you can say, okay, try each of these different learning rates, try each of these different models, try, let's see if I can find another one, try each of these different resize methods, each of these different pooling methods, this distribution of learning rates, you know, whatever, and it goes away and tries them. And then you can use their web GUI to look at like the training results. So then you basically say, okay, start training and it trains each of these models over each of these datasets with each of these pool values and each of these resize methods and a few different selections from this distribution of learning rates and creates a web GUI that you can dive into. I personally hate web GUIs, I would much rather use Python, but they also thankfully have an API. So yeah, so once we ran that script for a few hours, I then checked the results into a gist. So a gist is just a place to check text files, basically, if you haven't used it before. So I checked my CSV file in here, as you can see it kind of displays it in a nice way, or you can just click on to see the raw data. So I find that quite a nice place just to check things which I'm just going to share publicly. And so then I can, there's the URL to the gist. And maybe, let me show you how I did that. Right. So I was kind of like everything to be automated. So I can always easily redo it because I always assume my first effort is going to be crap and it always is. And normally my second, third efforts are crap as well. So here's my little notebook I put together. So basically each time you do one of these sweeps on weights and biases, it generates a new ID. And so we ended up kind of doing five different ones as we realized we were to add different models and change things a little bit. And so they have this API that you can use. And so you basically can go through and say go through each of the sweep IDs and ask the API for that sweep and grab the runs from it. And then for each one create a dictionary containing a summary and the model name. So the details don't matter too much, but you kind of get the idea, hopefully, and then turn that into a data frame. And so I kind of end up with this data frame that contains all the different configuration parameters along with their loss and their speed, accuracy, GPU, maximum memory usage and so forth. So that's basically what I wanted to chuck into a gist. And so specifically I really wanted to subset of the columns. So these are the columns I wanted. So I can grab those columns and put them into a CSV. Now, one thing you might not realize is I would say for most Python libraries or at least most well written ones, anywhere you can put a file name. So normally when you say to CSV, you put here a file name or a path, you could instead put something called a string IO object, which is something that behaves exactly like a file. But it actually just stores it into a string because I don't want this stored into a file. It's what I do into a string. So if you then call dot get value, I actually get the string. And so even things like creating the gist, I want to do that automatically. So there's a library I'm very fond of. I'm very biased because I made it called ghapi, which is an API for GitHub, where we can do things like say create gist. And you give it a description. And here's the text, which is the contents of the CSV. And the file name, make it public. And then you can get the HTML URL of the gist. So that's how like I used in this case a notebook as my kind of, you know, interactive, repel, redevelop print loop for manipulating this dataset, putting it together and then uploading it to GitHub. Jeremy, I had a doubt in this pandas data frame. Here you have like in your data, I just take it to your gist and it had in the dataset entries with plan that other data, the pet status. So how did you populate it? So what's your question? How did I populate this dataset? Yeah, that pandas data frame. Yeah. Just here. So I passed it a list of dictionaries. And the list of dictionaries I created using a list comprehension containing a bunch of dictionaries. Okay, got it. And so that's going to make each key. So that means all the dictionaries should have, you know, roughly the same keys. Anyone sort of missing are going to end up being NA. And then I just fiddled around with it slightly. So for example, set matrix, everything had an error rate that was equal to one minus the accuracy on the planet data set. It's not called accuracy. So I copied accuracy multi into accuracy. Yeah, nothing very exciting. Thank you. Jeremy, what's the actual goal? Let me show you. So what we've now got is a CSV, which I can then also very helpful. Okay, a CSV, which I can then use pandas pivot table functionality to group by the data set, the model family and name, and calculate the min of error rate, fit time and GPU memory. And I can then take the pets subset of that sort by score where score represents a combination of error and speed and take the top 15. And this now shows me the top 15 best models for fine tuning on pets. And this is this is gold in my opinion. I don't think anybody's ever done anything like this before. There's never been a list of like pure of the best models for fine tuning. And sorry, I have a question. So you, you, you fine tune different models with pets and then collected this information. Is that correct? That's correct. And then based on the information that you collected from the fine tune of five or whatever number of iterations we did three runs for each model. Yes. And then you collected this information to find out which one is the best behave model for this specific case in this case. Correct, correct, correct. Exactly. And the best is going to involve two things. It's going to be which ones have the lowest error rate and which ones are the fastest. Now I created this kind of arbitrary scoring function where I multiplied the error rate times fit time plus 80. Just because I felt like that particular value of that constant gave me an ordering that I was reasonably comfortable with. But you can kind of look through here and see like, okay, well, VIT base has a much better error rate than comf next tiny, but it's also much slower, like you can decide for your needs where you want to trade off. So that's what I kind of the first thing I did was to create this kind of top 15. And it's interesting looking at the family, right? The family is like each of these different architectures, you know, it's kind of from, you know, from, you know, different sizes of a smaller subset of families, right? So there's comf next tiny comf next base comf next tiny in 22k and so forth. So you can kind of get a sense of like, if you want to learn more about architectures, which ones seem most interesting. And, you know, for fine tuning on pets, it looks like comf next, VIT, swin, resnet are the main ones. So that, you know, the first thing I did, the second thing I then did was to take those most interesting families, actually also added this one called reg next, reg net x and created a scatter plot of them, colored by family. And so you can kind of see, like for example comf next, which I'm rather fond of is these, this kind of blue line, these blue ones, right? And so you can see that the very best error rate actually was a comf next. So they're pretty good. You can see this one here, which is reg net x seems to be, had some pretty nice values. Feel like super fast seems like these tiny swings seem to be pretty good. So it kind of gives you a sense of like, you know, depending on how much time you've got to run or how accurate you want to be, what what families are likely to most most useful. And then the last thing I did for pets was I grabbed a subset of the basically the ones which are in the top, basically smaller than the median and faster than the median, because these are the ones I generally care about most of the time, because most of the time I'm going to be, you know, training quick iterations. And so and then I just ordered those by error rate. And so comf next tiny, it's got the best error rate of those which are in the the upper half of both speed and accuracy. What's, what's GPU memory in this context? That's the maximum amount of GPU memory that was used. I can't remember what the units of measure are, but they don't matter too much because it'll be different for your data set or that matters is the relative usage. And so if you want something, you know, if you, if you try to use this and it's actually uses too much GPU memory, you can try a ResNet 50d, for example, or, you know, it's interesting that like ResNet 26 is really good for memory and speed. Or if you want something really lightweight on memory, ResNet Y004, but the error rates are getting much worse once you get out to here, as you can see. So then, so then I looked at planet. And so as I said, planets kind of as different to data set as you're going to get in one sense, or it's very different. And so not surprisingly, it's top 15 is also very different. And interestingly, all of the top six are from the same family. So this VIT family, these are kind of model or transformers models. And what this is basically showing is that these models are particularly good at rapidly identifying features of data types it hasn't seen before. So, you know, if you're doing something like medical imaging or satellite imagery or something like that, these would probably be a good thing to try and SWIN, by the way, is kind of another transformers based model, which as you can see, it's actually the most accurate of all, but it's also the smallest. This is SWIN V2. So I thought that was pretty interesting. And you know, these VIT models, there are ones with pretty good error rates that also have very little memory use and also run very quickly. So I did the same thing for Planet. And so perhaps not surprisingly, but interestingly, for Planet, these lines don't necessarily go down, which is to say that the really big models, the big slow models, don't necessarily have better error rates. And that makes sense, right? Because if they've got heaps of parameters, but they're trying to learn something they've never seen before on very little data, it's unlikely we're going to be able to take advantage of those parameters. So when you're doing stuff that doesn't really look much like ImageNet, you might want to be down more towards this end. So here's the VIT, for example. And here's that really good SWIN model. And there's ConvNex Tiny. So then we could do the same thing again of like, okay, let's take the top half, both in terms of speed and memory use. ConvNex Tiny still looks good. These VIT models, this 224, yeah, this is because you can only run these models on images of size 224 by 224. They're not, you can't use different sizes. Whereas the ConvNex models, you can use any size. So it's also interesting to see the classic ResNet still, again, they do pretty well. Yeah, so I'm pretty excited about this. It feels like exactly what we need to kick ass on this Patty Doctor competition or indeed any kind of computer vision classification task needs this. And I, you know, I ran this week on three consumer RTX GPUs in 12 hours or something. Like it's, this is not big institutional resources required. And one of the reasons why is because I didn't try every possible level of everything, right? I tried a couple of, you know, so Thomas did a kind of a quick learning rate sweep to kind of get a sense of the broad range of learning rates that seem pretty good. And then we just tried a couple of learning rates and a couple of the best resize methods and a couple of the best polling types across a few broadly different kinds of models across the two different datasets to kind of see if there was any common features. And we found in every single case, the same learning rate, the same resize method and the same polling type was the best. So we didn't need to try every possible combination of everything, you know, and this is where like a lot of the stuff you see from like Google and stuff, they tend to do hundreds of thousands of experiments because I guess because they don't, they have no need to do things efficiently, right? Yeah, but you don't have to do it the Google way. You can do it the faster highway. Quick question, Jeremy. Which cards did you use? And another question is why do you keep... Which cards did you say? Yeah, the GPU cards. Oh, RTX 3090. Oh, okay. So they were all three different. They're all RTX 3090s. Okay. And you reset the index after the query? Why? Oh, just because otherwise it shows the numeric ID here will be the numeric ID from the original dataset. And I wanted to be able to quickly kind of say what's number six? What's number 10? What's number three? That's all. Okay. Jeremy, getting back to the earth, satellite images. When you say, you know, like the classification, what is it trying to classify? In this case, the planet competition, we have some examples. Basically, they try to classify for each area of the satellite imagery. What's it a picture of? Is it forest or farmland or town or whatever, and what weather conditions to observe, if I remember correctly. Question. In this image space, is it just these two major datasets? Or how do you find other models that are trained on beside the planet and image net? You mean beside planet and pets? Sorry, yeah. So what was your question? How do you do what with them? How do you find other trained pre-trained models that have been worked on different datasets? These all use pre-trained models, pre-trained on image net. These are only using pre-trained models, pre-trained on image net. So how do you find pre-trained models, pre-trained on other things? Mainly you don't. There aren't many. But you know, just Google. It depends what you're interested in and academic papers. There is a, I don't know how it's doing. It's, there was a model zoo. So there is a model zoo zoo, which I've never had much success with, to be honest. So these are, yeah, a range of pre-trained models that you can download. Yeah, but as I say, I haven't found it particularly successful, to be honest. You could also try papers with, papers with code. And I think these, yeah, they have a link to the paper and the code. That doesn't necessarily mean they've got a pre-trained model. And then you can just click on the code and see. And of course, for NLP models, there's the hugging face model hub, which we've seen before. And that, that is an easy answer for NLP. It's like, but lots of different pre-trained models are on that hub. Jeremy, since you touch on academic papers and papers with code, first question, will this comparison, would you, do you or Tomah intend to publish it? If not, if you were to do that, what would you go for actually? What kind of journal would you look for? Yeah, so I'm not a good person to ask that question, because I very rarely publish anything, which is partly a philosophical thing. I find academia overly exclusive. And I don't love PDFs as a publication form. And I don't love the writing style, which is kind of required if you're going to get published as being rather difficult to follow. I have published a couple of papers, but like only really one significant deep learning one. And that was because guy named Sebastian Ruder was doing his PhD at the time. And he said it'd be really helpful to him if we could co-publish something and that he would kind of take the lead on writing the paper. And so that was good, because I'm always very happy to help students and you know, he did a good job and he was a terrific researcher to work with. The other time I've written a paper, the main time was when I wanted to get the message out about masks. And I felt like it probably not going to be taken seriously unless it's in an exclusive academic paper, because medical people are very inter-exclusive things. Yeah, so I don't know, like I'd say like this kind of thing I suspect would be quite hard to publish because most deep learning academic venues are very focused on things with kind of reasonably strong theoretical pieces and this kind of field of like trying things and seeing what works is, you know, experiment based is certainly a very important part of science in other areas, but in the deep learning world it hasn't really yet been recognised as a valid source of research as far as I can tell. Oh, I could concur with all the domains and feel the same quandary to be an STS. Fair enough. Fair enough. What's your domain? Hydrology, but more the computational science part of it. Okay, so then what I did was I, I mean, this is kind of a bit at the same time, but I went back to Patty and I wanted to try out a few of these interesting looking models reasonably quickly. So what I did was I kind of took our standard, well in this case, three lines of code because I've already untarded earlier, took our three lines of code so I could basically say train and pass in an architecture and pass in some per item preprocessing, in this case, resizing everything to the same square using Squish, and some per batch preprocessing, which in this case is the standard fast AI data augmentation transforms targeting a final size of 224, which is what most models tend to be trained at. And so then train a model using those parameters. And then finally, it would use test time augmentation. So test time augmentation is where, I think we briefly mentioned it last time. We, in this case on the validation set, I basically run the model, the pre, the fine-tuned model four times. Using random data augmentation each time. And then I run it one more time with no data augmentation at all and take an average of all of those five predictions basically. And that gives me some predictions and then I take an error rate for TTA for the test time augmentation. So that basically spits out a number, which is an error rate for Patty. And I use a fixed random seed when picking out my validation set. So each time I run this, it's going to be with the same validation set. So I can compare. So I've got a few different con for next small models I've run. First of all, by squishing when I resize. And then by cropping when I resize. So that was 235. This is also 235. And then instead of resizing to a square, I resized to a rectangle. In theory, this wouldn't have been necessary. I thought they were all 480 by 680, sorry, 480 by 640. But when I ran this, I got an error. And then I looked back at the results with that parallel image sizing thing we ran. And I realized there was actually three or four images that were the opposite aspect ratio. So that's why. So the vast majority of the images, this resizing does nothing at all. But there's three or four that are the opposite aspect ratio. And then for the augmentation, yeah, pick a size based on 224 of a similar aspect ratio. But what I'm actually aiming for here is something that is a multiple of 32 on both edges. And the reason for that we'll kind of get into later when we learn about how convolutional networks really well really work. But it basically turns out that the kind of the final patch size in a conf net is 32 by 32 pixels. So you generally want both of your sides. Normally you want them to be multiples of 32. So this one, you got a pretty similar result again, 240. And then, you know, I wasn't sure about my contention that they need to be multiples of 32. I thought maybe it's better if they like a really crisp resizing by using an exact multiple. So I tried that as well. And that, as I suspected, was a bit worse. And oh, what's this? I've got some which, which ones are the right way around? Now I'm confused. I think let's check some of these. Originally I had my aspect ratio backwards. That's why I've got both. It looks like I'd have ever got around to removing the ones that were unnecessary. Oops, wrong button. Leave those off. Method equals add pad mode. This just, this makes it a bit easier to see what's going on if you do padding with black around them. There we go. Okay, yeah. So you can clearly see this is the wrong way around, right? I've tried to make them wide, but actually they were tall. So the best way around is actually 640 by 480. That's more like it. So 640 by 480 is best. So let's get rid of the ones that were the wrong way around. Okay. All right. Yeah. So that was all, you know, various different transforms, some pre-processing for conflict next small, and then I did the same thing for one of the VITs. It's VIT small. Now VIT, remember I mentioned, it can only work on 224 by 224 images. So these rectangular approaches aren't going to be possible. So I've just got the squish and the crop versions. The crop version doesn't look very good. The squish version must look pretty good. And I also tried a pad version, which looks pretty good. And then, yeah, I also tried SWIN. So here's SWIN V2. And this one is slow and memory intensive. So I had to go down to the 192 pixel version, but actually it seems to work very well. This is the first time we've had one that's better than .02. It is interesting. This one's also very good. So it's interesting that this slow memory intensive model works better even on smaller size, 192 pixel size, which I think is pretty interesting. And then there's one more SWIN, which seemed to do pretty well. So I included that, which I was able to do at 224. That one had okay results. So like I kind of did that for all these different small models. And as you can see, they run pretty quickly, right? Five or 10 minutes. And so then I picked out the ones that look pretty fast. Sorry, they're pretty fast. They're pretty accurate. And created just a copy of that, which are called Petty Large. And this time I just replaced small with large. And actually, I've made a mistake. I'm going to have to rerun this because there should be not, there should not be a C equals 42. I actually want to run this on a different subset each time. And the reason why is my plan is to train. So basically what I did is I deleted the ones that were less good in Petty Small. And so now I'm just running the large ones. Now some of these, particularly something like this one, which is 288 by 224, they ran out of memory. They were too big for my graphics card. And a lot of people, at this point, say, oh, I need to go buy a more expensive graphics card. But that's not true. You don't. So if you guys remember our training loop, we get the gradients. We add the gradients times the learning rate to the weights. And then we zero the gradients. What you could do is half the batch size, so for example, from 64 to 32, and then only zero the gradients every two iterations. And so, and only do the update every two iterations. So basically you can calculate in two batches what you used to calculate in one batch, and it will be mathematically identical. And that's called gradient accumulation. And so for the ones which ran out of memory, I added this little acume equals true, which is here in my function. And I said, yeah, I said, if acume equals true, then set the batch size to 32, because by default it's 64. And add this thing called a callback. Callbacks are basically things that change the behavior of the training. And there's a thing called gradient accumulation callback, which gradient accumulation. And this is like just for people that are interested. This is not like massively complex stuff. The entire gradient accumulation callback is that many lines of code, right? These are not big things. And like literally all it does is it keeps the count of how many iterations it's been. And it adds the keeps track of the count. And as long as we're not up to the point where we is the number of accumulations we want, we skip the step and the zero gradient basically. So it's, yeah, things like gradient accumulation, they sound like big complex things, but they, yeah, turn out not to be, at least when you have a nice code base, like fast AIs. Jeremy, can I get a question of this much? Of course. How exactly did the batch size macinations work? So we will get into that in detail in the course, and certainly we get into it in detail in the book. But basically all that happens is we randomly shuffle the dataset, and we grab, so if the batch size is 64, we grab the next 64 images. We resize them all to be the same size, and we stack them on top of each other. So if it's black and white images, for example, we would have 64, whatever, 640 by 480 images. And so we end up with a 640 by 64 by 640 by 480 ratio tensor. And pretty much all the functionality provided by Tytorch will work fine for a mini batch of things, just as it would for a single thing on the home. So in the large esteem of things, you know, some huge process that's trying to characterise, what role does the batch sort of claim? Well, it's just about trying to get the most out of your GPU. Your GPU can do 10,000 things at once, so if you just give it one image at a time, you can use it. So if you give it 64 things, it can do one thing on each image, and then on each channel in that image, and then you don't have another few other kind of degrees of paralysation it can do. And so that's where you start what, you know, we saw that NVIDIA SMI demon command that shows you the utilisation of your symmetric multiprocessor. Yeah, if you use a batch size of one, you'll see that SM will be like 1%, 2%, and everything will be very slow. It's a bit tricky at inference time, you know, in production or whatever, because most of the time you only get one thing to do at a time. And so often inference is done on CPU rather than GPU, because we don't get to benefit from batching. Or, you know, all people will queue a few of them up and stick them on the GPU at once, and you know, stuff like that. But yeah, for training, it's pretty easy to take advantage of many batches. No worries. Jeremy has trained so many models. Will you consider using a majority vote or something like that? No, I wouldn't, because a majority vote throws away information, it throws away the probabilities. So I pretty much always find I get better results by averaging the probabilities. So each of them, each of the models after I've trained it, I'm exporting to a uniquely named model, which is going to be the name of the architecture, and then an underscore, and then some description, which is just the thing I pass in. And so that way, yeah, when I'm done training, I can just have a little loop which opens each of those models up, grabs the TTA predictions, sticks them into a list, and then at the end, I'll average those TTA predictions across the models, and that will be my ensembled prediction. So that's my next step. I'm not up to that yet. Okay. All right. Well, I think that's it. So that's really more of a like little update on what I've been doing over my weekend. But hopefully, yeah, gives you some ideas for things to try. And hopefully, you find the Kaggle notebook useful. So Jeremy, so how many hours did you spend in all these experimentations? Because you spend a lot of experience here. So, you know, it's like a week or two of work to do the fine tuning experiments. But that was like a few hours here and a few hours there. The final sweep was probably maybe six hours of three GPUs. The paddy competition stuff was maybe four hours a day over the last four days since I last saw you guys. And writing the notebook was maybe another four hours. Thanks. It helps. That was all right. Bye, everybody. Nice to see you all. Bye. Thank you, Jeremy. Bye.