 Follow you up to the to the lesson nine and I can get to number 10 is never so close to you before so amazing. Thank you. Okay. Yes, I saw you on the leaderboard. Serada you were 10th in the Patty competition. That's very cool. The most recent news on the Patty competition is I did. Two more entries. And I don't remember. I think I might have shown you. I can't remember to show you one or both. But yeah, so I assembled the. Bottles that we had. And that improved the submission from nine, eight, seven, six to nine, eight, eight. And then the other thing I did was I. You know, since the VIT models are actually definitely better than the rest, I kind of doubled their weights. And that got it from nine, eight, eight, one to nine, eight, eight, four. And let's see Serada Serada you're down to 11th you're going to have to put in another effort to. Yeah, my friend. Yep. Anybody else here on this later board somewhere. I'm Dan at, I don't know was it 37 last time I checked. 37 that's not bad with your username. I think this is not you. Matt. Matt Rosinski 45. Stop taking further. You just you can't stop for a moment with these things or somebody or jump in ahead. I tried. Yeah. 60s is pretty good. I've had bronzes papers for a second train again. Oh no. Not successful. Like still just nothing at a log in. Just error. I know I subscribed to the paid version still. I'm not sure maybe the restructuring something. An error. Well, feel free to share it on the forum if it's an error that we might be able to help you with. I think it's just a generic when you try to set up a machine and just says error. Yeah. That's annoying. They're quite, they're quite receptive. If you use that support email. I know I had an issue and they got right back to me. Another thing is if the error is your fault, i.e. if you put something in pre run.sh that breaks things, then. Just fire up a pytorch instance rather than a fast AI instance, because that doesn't run pre run.sh. And so then you can fix it. And I have to say thank you for ready to set up the competition to help us to get started. What a great actor. He also shared in the forum to set up the local for us. Yeah, so I think thank you for him to get me back on the cargo. Yes. Awesome. So now radix next job will be to become a. Kaggle notebooks grandmaster. That's what I'm going to be watching out for. I think he's got what it takes personally. You've got a gold right radical gold medal. You've had a gold on Kaggle for books. No, it's not. I'm not sure what I have for notebooks. I haven't done that many notebooks ever. I think I have. What's your username on Kaggle. Let's find you. Radic one. Radic one with a number not not written. Not worth. Yeah, that's me. Two silvers. Okay. This one actually is on the way to being a goal. You just, it's got so close. You need 50 votes from. Irregulars. I guess I don't know what counts as a regular. Oh, that's how it works. So it's not in the relative terms. No, it's just 50 votes full stop so. And, you know, I definitely noticed like. It makes a big difference to. Yeah, so therefore it makes a big difference to put notebooks in popular competitions because that's where people are looking so like. This one got 400 votes right and I'm not sure it's necessarily my best notebook, but it was part of the patent competition. Which. Had a lot of people. Working on it. So that's one trick. Yeah, so things which are not actually attached to any competition is much harder to get votes for. Yeah, I'm getting pretty close to notebooks grandmaster actually so excited about that. What's your something to do with loving science I'm guessing what's your it's actually well yeah my the the link is slightly different actually it's T a and. Yeah, I'm. And L I K e s. M a th. Oh, math, not science. Okay. Let's take a look. Oh, look at you 74. Very nice. And you need two more gold nails these nine silvers. Well, that's, I'm going to upload this stuff right now. Let's see. I'm going to go up. Oh, there we go. Yeah, that's, that's some channel our enthusiasm to getting to niche into notebooks grandmaster. That would be cool. Yeah, so just have to get those silver ones over the line. All right, so I've somebody asked about where the, the gist uploading thing is. So let me take that up. Oh, and actually when I do what I might do here is I'm going to connect to my server. I'm going to ask about the gist with the gist uploading or a question asked in the forum somewhere. Yeah, yeah, yeah, on the forum. Exactly. And we'll see when I connect to this computer it's busy doing stuff. And specifically this is what it looks like. When you're busy training a model. So you can see I've got three windows here. I didn't get rid of the dots always. Oh, that just means that I've got another team up session running on a different computer which has a smaller screen than this one. And there is worth some way to get rid of it by disconnecting other sessions. Connect other clients prefix D gives you connected clients, whichever you select is just try that. No, that's not right. Oh, they probably mean shift D. There we go. Right. This is what I just created. So if I hit this, there we go. So shift D and then select the one to disconnect. Oh, nice. Okay, learn something new. Oh, we've got another new face today. Hello, Sophie. I don't think you've joined us before. Is that right? I've been here just quietly in the background sometimes. Okay, thank you for joining. We're about to use it in us from in Brisbane. Oh, good on you. And what do you work with AI stuff where you're just getting started? Not all background in psychology doing a postdoc and psych and sort of trying to move over into data science. Okay, cool. Have you done a lot of the statistical side of psychology? Yeah, yeah, quite a bit and quite a bit of coding in art, but I'm pretty new to Python. Okay, great. So big learning curve. Well, you know what you're, you're our target market, right? So if you have any questions along the way, please jump in. Even things that you feel like everybody else must know. I guarantee not everybody else knows them. So yeah, definitely. These have been really helpful and really great for running. Awesome. Thanks for joining. Okay, so training three models parallel right now. Yeah, so I've got three GPUs in this machine. And so, yeah, one nice thing with weights and biases is you basically, let me show you. Okay, so here's weights and biases and see I don't use my Mac very much because nothing's locked in. All right. And so you can see it's running this thing called a sweep, right? There's going to be 477 runs. I don't know why it says create 31 seconds ago because that's certainly not true. It's currently running. And so it's coming from this get repo. I feel like there's a, there's a sweep view, because this is a particular run. This is, this is a, oh, this is a particular run. That's right. I'm terrible with the query, to be honest. Okay, so let's go to the project. Yes. And then a project has sweeps. And then, okay, this one here I can kill. Okay. So basically you kind of say on the, on the Linux side, WNB, you know, sweep create or something like that. And then things all grouped under this thing. Okay. So then, yeah, so then basically it runs lots of copies of your program feeding at different configurations. And yeah, you can run the client as many times as you like. So I've run it three times at each time I've set it to a different computer device. Did you turn your models into Python scripts into able to do this? Yes, exactly. So, so this is fine tuned up high. So it's just calling. So it causes paths. So that's going to just go through and check what batch size, etc, etc, etc. You asked for right. Sticks them all into paths, I think. And then it calls train passing in those arguments. And so then train is going to initialize weights and biases for this particular project. For this particular entity, which is fast AI using the configuration that you requested. And so then you can say, for example, okay, there's got some particular data set some particular batch size and image size, etc. And then it creates a learner for some particular model name, some particular pooling type. Fine tunes it. And then at the end logs how much GPU memory used what model it was how long it took. And you don't have to log much because the fast AI weights and biases integration automatically tracks everything in the learner. So you can see here there's all this like learner dot architecture learner dot loss function, etc, etc. Curiosity was this process of refactoring into a script painful. Actually, I'm so actually you probably actually tell I didn't do this time. Thomas Capel did this if I had done it, I would have used fast fast core dot script instead of this stuff, I guess. But no, I wouldn't have been painful. I would have just checked an NB dev export on the cell that I had in my notebook and that would have become. Yeah, my script. Hi Jeremy painful. Hi. I have a question, wouldn't it would it need to be interesting to track power consumption, for example. I mean, for some people it might be not for me. As to how you would track how consumption I have no idea you'd have to have some kind of sensor connected to your power supply I guess they track a lot of system metrics in the runs. So like, if you look on a run, they will track like GPU memory CPU memory. Yeah, stuff. Like, yeah, if you click on the thing on the left, that it looks like a CPU chip, that thing. Yeah, there's a lot of. So maybe there's power in here. I don't see how it can be right because like it. Well unless the video. There's that does you go GPU power so it tells you the GPU power usage apparently. Although that won't tell you about your CPU, etc power. The thing that's useful about this is I think is the memory. The graph. Yeah, well the key thing is the maximum memory use so we actually track that here. In the script. Yeah, we put it into GPU memory. Oh, good GPU man. Okay. That's a. GPU members. Blah blah blah. So Thomas did that as well. I don't know why it's. The power of negative three. What's that about. See curious. I'll have to ask him what that's that's doing. Thomas works away to my sister right. Correct. Correct. Correct. Yeah, so he. I had never used it before. So. Yeah, so probably most people have never heard of this for first day I actually has a thing called fast GPU, which is what I've previously used for doing this kind of thing. So in general, when you've got more than one GPU, or just even if you've got any one GPU and you've got a bunch of things you want to run. It's helpful to have some way to say like, okay, here's the things to run and then set a script off to go and run them all and check the results. So fast GPU was the thing I built to do that. And the way fast GPU works is that you have a whole list of the poll directory of scripts in a folder. And it runs each script run at a time and puts them in and it runs them and puts them into a separate directory. You know, to say this is completed and it tracks the results and you can do it on like as many or few GP users you like it or just go ahead and run it and this is fine but it's very basic. And I kind of been planning to make it. A bit more sophisticated and yeah weights and biases takes a lot further, you know, by, and I kind of want to re redo or add something on top of fast GPU so it is fairly compatible with weights and biases but you could do everything locally. So the key thing. So the thing it's actually using to for that config file is it goes through the basically the Cartesian product of all the values in this YAML. So it's going to do each of these two datasets planets and bets for this one running great eight for every one of these models. And for every one of these poolings for okay this is just the one resize method and for every one of these experiment numbers. So, yeah. So that's a lot of projects to do at some point. The sweep allows you to run arbitrary programs doesn't have to be a script. So you could just stay in the notebook and use tiny kernel or sorry, I can be buying a thing or whatever it's called. Yeah, exactly. Yeah. Yeah, yeah, it'd be fun to work on this to make the whole thing you know run with notebooks and stick stuff in a local SQL like database and because like all this stuff, all this web GUI stuff honestly I don't like it at all. So the nice thing is it actually doesn't matter because I don't have to use it because they provide an API. So before I realized they have a nice API. I kept on like sending Thomas these messages saying how do I do this how do I do that why isn't this working when you'd have to like send me these like pages of screenshots like click here click there turn this off then you have to redo this three times Oh, I hate this. Yeah, and then I found that then here's like we do have an API and I was like I looked at the API it is so well documented it's got examples. Yeah, it's, it's really nice. So, I've put all the stuff I'm working on into this get repo. And so here's a tip by the way that the information about if you're in a get repo. The editor get directory to clone directory the information about your get repo all lives in a file called dot get slash config. So you can see here. This is the get repo. So if we now go to GitHub. One cool thing about this runs is it tracks your git commit, like the run you can get back to what code version. Yeah, that is very cool isn't it. Yeah. Yeah, I mean, I do think we could pretty easily create a local only version of this without all the fancy GUI, you know, which would also have benefits and people who want the fancy GUI and run stuff from multiple sites stuff like that which is you know, you could also do stuff without what's biases. Anyway, here's our, yeah, so here's our repo. And this analysis dot I pay and be is the thing that I showed yesterday, if you want to check it out. And I'll put that in the chat list. Oh, by the way, you know, I think something else which would be good is we should start keeping a really good list for every walkthrough of like all the like key resources key, like, you know, links key commands examples we wrote and stuff like that. So I think to do that what we should do is we should turn all of the walkthrough top topics into wikis. I don't know if you folks have used wiki topics before, but basically a wiki topic simply means that everybody will end up with an edit button. So if I just click. Okay, this one already is a wiki right so everybody should find on walkthrough one that you can click edit right and so one thing we put in an edit for example would be probably like often Daniel has these really nice full walkthrough listings we should have like a link to his reply. Which you can get by the way by, I think you click on this little date here. Yes, and that gives you a link directly to the post, which is handy. What about this one. Okay, make that a wiki. Sorry, this is going to be a little bit boring for you guys to watch better than us will do it while I'm here. And if anybody else has any questions or comments while I do that. Yeah, Jeremy, you did the fast GPU is possible to extend to high performance computing to use it on the note. To do what a pie in high performance computing so in the distributed environment. Is it possible to track it as well. I mean, I don't know. I mean, yeah, I mean anything that's running on in in Python on a Linux computer should be fine. Some HPC things are like, use their own weird job scheduling systems and stuff. But yeah as long as it's running a normal. Nvidia. It doesn't even have to be in video honestly. But yeah as long as it's running a normal Linux environment should be fine. So generic, you know, pretty general. Okay, so they are now all wikis and so something I did the other day for example was in walkthrough for I added something saying like oh this is the one where we actually had a bug and you need to add CD at the end, you know and I tried to create a little list of what was covered. For example, maybe. Matt's fantastic time stamps we could copy and paste as list items into here for instance. Some of Ruddix examples, maybe, or even just a link to it. Yeah, so for this walkthrough we should certainly include this link to the analysis that I find be anyway so you can see, yeah, with the API was just so easy just to go API dot sweep runs comes in as a dictionary, which we can then chuck a list of dictionaries into a data frame. Okay, I'm re running the whole lot by the way because it turns out I made a mistake at some point I thought that Thomas had told me that squish was always better than crop for resizing and he told me I was exactly wrong and it's actually that crops always better than squish for resizing so running the whole lot. It is annoying but shouldn't take too long. You find that analyzing the sweep results like this was useful in relative to like what you can see in the, the UI, you know you can make so much better hammer yes so much I is like. You've done a good job with that. With that UI like it's very sophisticated and different stuff but I just never got to be friends with it and as soon as I turn it into a data frame is just like, okay now I can get exactly what I want straight away it was absolute breath of fresh air frankly. I really like their parallel coordinates chart. And I find it very difficult to reproduce that in like any visualization library. I don't like the parallel coordinates chart, but yeah I mean there must be parallel coordinates chart. No there is there's like a plot Lee want but it's not that nice. Okay, because I don't want to bother with it. So like hover over it and stuff and see, you know what is today right there. I think so. Yeah, impressive. And they kind of wrote their own data frame, kind of language, their own visualization library in like in a sense, because like those are weights and biases reports and they have their own syntax. There isn't one in plot Lee or something. Yeah, there's one in plot Lee for sure. Plot Lee things are normally interactive so have you tried that. Do you know if it's. Yeah it works. It's not as nice, but yeah it works like when you hover over, like there's a there's at least a version. This one doesn't. Yeah that one. It's like, it's very fiddly. You might have to draw a box around it to to highlight it. Oh, yeah. Okay so you just drag over it. That's not terrible. Yeah I mean it's okay it's not the best UI. You know. Okay, this is thanks for telling me about this it's cool. We don't think you don't you don't like this that much it's not that useful for you. I haven't managed to I mean I know other people like it so I don't doubt that it's useful for something it's just apparently not useful for the things I've tried to use it for yet, somehow. How do you do you kind of like drag over the, the end bit to see where they come from or something. Yeah, I mean it might be useful if you want to look at the weights and biases one. Because I think it renders one by default for you for the runs. Yeah, yeah, it does. It's easier to like, let's check it out, operate that yeah. W and be flash. Pick a sweeps thing. Most likely. Okay. And then, yeah, pick a sweep. That one has zero runs, but maybe that one. Okay, and then. Yeah, okay, so here we go. And then when you just hover over a section. See, I don't see how this is helping me. I guess like saying, so there's not that much variance in the, well, I guess like what is the metric. We're trying to optimize doesn't really seem like it's even on this chart. Like, you know what you probably have to tell it what your metric is, and we probably didn't. So the far right hand thing is resize method rather than. Yeah. So that's, is there some way to tell it that the we care about. Yeah, there's an edit. There's like a little pencil. Let's see. Okay. Add the column. We add a loss or something. That's the error. Wait, this is no, let's do accuracy and multi. Okay. Okay, now we're talking. You probably want to get rid of pool and resize method since they don't have any variants. They're not adding any information. All right. There we go. Now you can like cover over. I actually want to do the thing. Oh, here we go. Can I do this drag. There we are. Yeah, that doesn't mean this is definitely not going to tell me more than the number of experiments is not. That's true because there's just some other tree thing. Anyway, there's a thing. Yeah, I'm a yeah, sometimes I learn something sometimes I don't for that visualization, you know, it's not always. Okay. So, D to detach. Do you generally like to do the grid search thing or the Bayesian exploration. So like, I'm all very new to all this right so but like, in general, I don't do hyper parameter Bayesian hyper parameter stuff ever. And that's kind of funny because I was actually the one that taught weights and biases about the method they use for hyper parameter optimization. And it actually tells you this is not quite true. I've used it once and I used it specifically for finding a good set of dropouts for a WDL STM because there's like five of them. And I told Lucas about how I had like created a random forest that actually tries to, you know, predict how accurate something's going to be and then use that random forest to actually target better sets of hyper parameters. And then, yeah, that's what they ended up using for weights and biases, which is really cool. But I kind of like to really use a much more human driven approach from like, well, what's the hypothesis I'm trying to test how can I test that as fast as possible. Like, most hyper parameters are independent of most other hyper parameters. So, you know, like, you don't have to do a huge grid search, whatever, and you can figure out. So for example, in this case, it's like, okay, well, learning rate of 0.008 was basically always the best. So let's not try every learning rate for every model for every resize type, etc. That that's just use that learning rate. Same thing for resize method, you know, crop was always better for the few things we tried it on so don't have to try every combination. And also, like, I feel like I learn a lot more about deep learning when I, you know, ask, like, well, what do I want to know about this thing? Well, is that thing independent of that other thing? Or is it or are they connected or not? Does it, you know, and so in the end, I kind of come away feeling like, okay, well, I now know that, you know, every model we tried the optimal learning rates basically the same. Every model we've tried the optimal resize methods basically the same. Like, so I've come away knowing that I don't have to try all these different things every time. And so now, next time I do another project, I can leverage my knowledge of what I've learnt, rather than do yet another huge hyper parameter sweep, if that makes sense. You are the Bayesian optimizer. Yeah, my brain is the thing that's learning. Exactly. And I find like, people with big companies that spend all their time doing these big, you know, hyper parameter optimizations, like, I always feel in talking to them that they don't seem to know much about the practice of deep learning. Like, they don't seem to know, like, what generally works and what generally doesn't work, because they never bother trying to figure out the answers to those questions. But instead they just chuck in a huge hyper parameter optimization thing into, you know, 1000 TPUs. Yeah, it's kind of something I've observed. That's really interesting. I mean, like, do you, does it, do you feel like these like hyper parameters generalize across different architectures, different models? Oh, totally. Yeah, totally. In fact, yeah, that was a piece of analysis we did gosh, I don't know, four or five years ago along with a fellowship today I folks in the platform today I folks, we're just trying lots of different sets of hyper parameters across this different sets of data sets as possible. And the same sets of hyper parameters were the best or close enough to the best for everything we tried. That's a little bit scary. Yeah. Yeah, it was different architectures like I can somewhat imagine that no data set maybe it's not that super important but you know between transformers and CMS. I mean, I'm not questioning this because I don't have any experience to say that this is not correct. I think this is wonderful and it is. It is. It's amazing. So yeah the fact that across 90 different models that we're testing that couldn't be more different. They all had basically the same best learning rate or close enough, you know, the very interesting aspect here is during the learning rate is something that you dump a lot of time into usually when you start working on a project or in a competition, you would be naturally inclined to hey you know I'm using a different architecture. Let me try to find the experiment with learning rates, but it's nice that you can discuss focus on what really matters. This is true of computer vision. But not necessarily for tabular. I suspect like all computer vision problems do look pretty similar, you know, the data for them looks pretty similar. I suspect it's also true like specifically of object recognition so like. Yeah, for. I don't know. I mean these are things like nobody seems to bother testing like which I find a bit crazy but we should do similar tests for segmentation and you know, bounding boxes and so forth. I'm pretty sure we find the same thing. You have the learning rate. So we suggest maybe some different learning rates are good in different places. Well, the learning rate finder I built before I had done any of this research right. Oh, okay. Like you might have noticed that I hardly ever use it nowadays in the course. So if we've mentioned it yet in this course, maybe we have the last lesson. Does anybody remember do we done the learning rate finder yet in course 22. Yeah, I think we did. You think we did. Yeah. Well, one of the really you can sit there and play with parameters all your life and schedule wheels and get nowhere. And right. It's one of the things I'm really taken away from the course is the fact that you're talking about strategy. Which goes back to Renato copiates 2002 paper. He had a term called strategy of analysis. And that's something that really stuck with me. And so that sort of transcends that idea of just mucking around with parameters. Yep. Exactly. I suppose these magic parameters. These are the defaults and fast AI. Yeah, pretty much, although with learning rate. That's weird with learning rate. The defaults a bit lower than the optimal. Just because I didn't want to like push it, you know, I'd rather it always worked pretty well rather than be pretty much the best, you know. Yeah. Okay, I'm just going to go and disconnect my other computer because it's connected to port 8888, which is going to mess things up. I'll be back in one tick. Actually, now I think about it. I don't quite know why this is connecting on port 8889. But part of this is to learn how to debug problems. Right. Normally, the Jupyter server uses port 8888. And I've only got my SSH connected to forward port 8888. So it's currently not working. So the fact that it's using a different port suggests it's already running somewhere. So to find out where it's running, you can use PS, which lists all the processes running on your computer. And generally speaking, I find I get used to some standard set of options that I nearly always want. And then I forget what they mean. So I have no idea what WAU or X means. I just know that there are a set of options that I always use. So that basically lists all your processes, which obviously is a bit too many. So we want to now filter out the ones that contain Jupyter or notebook. So pipe is how you do that in Linux. So that's going to send the output of this into the input of another program. And a program that just prints out a list of matching lines is called grep. So we can grep for Jupyter. Okay, there it is. So I'm kind of wondering where that, how that's running. I wonder if we've got like multiple sessions of Tmux running. No, we don't. So Tmux LS lists all your Tmux sessions. Oh, I've got a stopped version in the background. Okay, that's why. So I just have to foreground it. There we go. That was a bit weird. Okay, so now that should work. FG foreground. FG. Control Z to put it in the background. FG to put it in the foreground. And when you Control Z somebody, it actually stops it. Right. You can put it in the background and have it keep running by, actually I'll show you. So if I press Control Z and type jobs, that's stopped. Right. So if I now try to refresh this window, it's going to sit there waiting forever and never going to finish. Okay. Because it's background. It's stopped in the background. If you type BG optionally followed by a job number, which would be number one. And it defaults to the last thing that you put that you put in the background. It will start running it in the background. Even after you stop there. Yeah. So it's now running in the background. So if I now type jobs. It's now running. Okay. And it's still attached to this console. So if I open up this, you'll see it's still printing out things, right? But I can also do other things. And I don't do this very much because normally if I want something running at the same time, I would just chuck it in another T-Max pane. But I don't know. It's kind of nice to know this exists. Something else to point out is once I said BG, it added this ampersand after the job. That's because if you run something with an ampersand at the end, it always runs it in the background. So if you want to like fire off six processes to run in parallel, just put an ampersand at the end of each one and it'll run in the background. I see. So for example, here's a script that runs LS six times. And so if I run it, you can see they're all interspersed with each other because it ran all six times at the same time. I see. And let's say like you create a process like this in the background without T-Max and you want to kill it. You use the DS thing. You could type FG to foreground it and then press Ctrl C. Yeah, something like that would be fine. Or you can kill a single job. So in general, like you probably would want to search for bash job control to learn how to do these things. And as I said, as I mentioned here, one of the key things to know is that a job number has a percent at the start. So this is actually percent one would be how you do this. Knowing what to Google is definitely the key thing. Although often you can just put in a few examples. So you could I'm guessing like if I take Troll C, BG, FG jobs, which are the things you just learned about. There we go. It kind of gets us pretty close. Now we know what job control commands. All right. Now, so when I kind of iterate through notebooks, what I tend to do is like once I've got something vaguely working, I generally duplicate it and then I try to get something else vaguely working. And once that starts vaguely working, I then rename it to the thing that it is what I want. So then from time to time, then I just clean up the duplicated versions that I didn't end up using. And I can tell which they are because I haven't renamed them yet. And so this is kind of how you can duplicate it. Like you make a copy. It looks like you're making copies of it. Yeah, so you can just click file, make a copy. Yep. Or in here you can click it and click duplicate. So I mean, what do you do after you duplicate it? I'll open up that duplicate and I'll try something else, some different type of parameter and different method or whatever. So in this case, I started out here in Patty. And I kind of just experimented and show batch and LR find and try to get something running. And then, you know, after that I was like, okay, I've got something working. How do I make it better? And so I created Patty small, well, I actually made a copy and it would be called pattycopy.ipinb. And I was like, I wonder about different architectures. So I created this like I was like, okay, well, basically I want to try different item transforms, different batch transforms and different architectures. So create a train which takes those three things. And so it creates a set of image loaders with those item transforms and those batch transforms. Use a fixed seed to get the same validation set each time. Train it with that architecture. And then return the TTA error rate. And so then. So this is kind of like your weights and biases, like, this is how you keep back up your different experiments, ideas. Yeah. So, yeah, so now you can see I've kind of gone through and tried a few different sets of item and batch transforms for this architecture. And this is like some small architectures. So they'll run reasonably quickly. So these ran at about six minutes or so. And this is very handy, right? If you go sell all output toggle, you can quickly get an overview of what you're doing. And so from that, I kind of got a sense of which things seem to work pretty well for this one. And then I replicated that for a different architecture and found those things, which these are very, very different ones. Transformers based ones, confident based, you know, find the things which work pretty well consistently across very different architectures. And for those, then try those on other ones, Swin B2 and Swin. And yeah, then find, you know, so then let's toggle the results back on. So I'm kind of looking at two things. The first is what's the error rate at the end of training? The other is what's the TTA error rate? So my squish worked pretty well for both. Crop worked pretty well for both. This is all for ConvNext. This 640 by 480, 288 by 224 didn't work so well. I mean, it's not terrible, but it's definitely worse. And the 320 by 240 instead. Can you talk a little bit about what you're looking for in the TTA versus the final? I just want to see like, I mean, the main thing I care about is TTA, because that's what I'm going to end up using. Yeah, that's the main one, but like, let's see. In this case, this one's not really any better or worse than our best ConvNext, but the TTA is way better. So that that's very encouraging, which is interesting. So this is now for VOT, right? Now VOT, we can't do the rectangular ones because VOT has a fixed input size. So the final transformation has to be 224 by 224. So if you pass an int instead of a tuple, it's going to create square final images. And, you know, on the other hand, this one looks crappy. Right, so definitely want to use Squish for VOT. And then this one looked pretty good, you know, so this was using padding. So like for VIT, I probably wouldn't use Crop. Last time I looked, TTA was not really a thing in other modeling frameworks that is given to you. Is that still the case? No, as far as I know, that's true. Yeah, you know, so there are, I mean, a lot of people, well, one group in particular has been copying without credit, everything they can from Fast. They might have done it. I won't mention their name, but yeah. So Swin V2, apparently, Tanish told me is what all the cool kids on Kaggle use nowadays. That's a fixed resolution. And I found that for the larger sizes, there was no 224. You had the choice of 192 or 256. It got so slow, I couldn't bear it. But interestingly, even going down to 192, Swin's TTA is actually nearly as good as the best VOT. So that's, I thought that was pretty encouraging. This one interestingly, like VOT, didn't do nearly as well for the crop. And again, like VOT, it did pretty well on the pad. And then this is Swin V1, which does have a 224. And so here, this TTA is okay, but the final result's not great. And so to me, I'm like, no, that's not, not fantastic. This one's again, you know, it's interesting the crop, none of them are going well, except for ConvNext. This one's not great either, right? So Swin V1, little unimpressive. So basically that's what I did next. And then I was like, okay, let's pick the ones that look good. And I made a duplicate of Paddy Small. And I just did a search and replace of Small with Large. So we've now got ConvNext Large. And the other things I did differently was I got rid of the fixed random seed. So there's no seed equals 42 here. And so that means we're going to have a different training set each time. And so these are now not comparable, which is fine. You'll see if one of them's like totally crap, right? But they're not totally comparable. But the point is now, once they train each of these, they're training on a different architecture, a different resizing method. And I append to a list. So I start off with an empty list and I append the TTA predictions. And so, and I deleted the cells from the duplicate that weren't very good in Paddy Small. So you'll see there's no crop anymore. Just Squish and Pad for VIT and for Swin V2. Probably shouldn't have kept both of the Swin V1s actually, they weren't so good. And then what I did in the very last Kaggle entry was I took the two VIT ones because they were the clear best. And I appended them to the list so they were there twice. So it's just a slightly clunky way of doing a weighted average, if you like. Yeah, stack them all together, take the mean of their predictions. Find the arg maps across the mean of their predictions to get the predictions and then submit in the same way as before. So that was, yeah, that was basically my process. It's like it's very like not particularly thoughtful. You know, it's pretty mechanical, which is what I like about it. In fact, you can probably automate this whole thing. So somebody has to say something. No, I was going to say how critical is this model stacking in Kaggle? I'm just curious how you think about that. I mean, it's like, I mean, you can kind of, I mean, we should try, right? We should probably submit, in fact, let's, well, we're kind of out of time. How about next time, let's submit just the VIT, the best VIT, and we'll see how it goes. And that will give us, yeah, that will give us a sense of how much the ensembling matters. We kind of know ahead of time, it's not going to matter hugely. I mean, you specifically said on Kaggle, on Kaggle, it definitely matters because in Kaggle, you want to win. But in real life, my small confnext got 97, well, rounded up, that's 98%. And my ensemble got 98.8%. Now that's, in terms of error rate, that's nearly halving the error. So I guess that's actually pretty good. Really important question. How do you keep track of what submissions are tied to which notebook? Oh, I just put a description to remind me, but you know, a better approach would actually be to write the notebook name there, which is what I normally do. But in this case, I wasn't taking it particularly seriously, I guess. I was only planning to do these ones and that was it. So it's basically like, okay, do one with a single small model, then do one with an ensemble of small models and then do one with an ensemble of big models. And then it was after I submitted that, that I thought, oh, I should probably wait the VITs a bit higher. So I ended up with a fourth one. So it's pretty easy for me though, I did four significant submissions. So easy to track. But yeah, I think now that I know actually that I'm doing a little bit more, because I actually did want to try one more thing, I think what I'll probably do is I'll go back and I'm going to, you can edit these. I'm going to go and I'll put in the notebook name NH1. And then, and then I wouldn't go back and change those notebooks later, unless there was likes, probably never. I would, I would just duplicate them and make changes in the duplicate and rename them to something sensible. And of course, this all ends up back in GitHub. So I will always see, yeah, see what's going on. So this is like, about a lot sample with that. No, it's like, you have a, you'd like every like quote run is a notebook, like in the sense like the way to buy things and kind of keep track. Yeah. Yeah. Exactly. But I mean, the only reason I can kind of do this is because I had already done like lots of runs of models to find out which ones I can focus on. Right. So I didn't have to try a hundred architectures. I mean, in a way, it forces you to really look at it closely. Yeah. If you just kind of like have this dashboard. Right. Kind of like this. My view is that this approach, you will actually become a better deep learning practitioner. And I also believe almost nobody does this approach and I almost feel like there are very few people I come across who are actually good deep learning practitioners like not many people seem to know. What works and what doesn't. So, yeah. All right. Well, that's it. I think. Thanks for joining again. And yeah. See you all next time. Bye. Thank you. Thank you. Take care, everybody.