 Okay. All right. So this is the repo. Fast AI paperspace setup. I've started a machine. I'll see you to my home directory. I'll get flown the repo. I'll CD into the thing I get cloned. I'll run dot slash setup.sh. Okay, and it says install complete. Please start a new instance. So then I'll stop the machine. And then I'll start a machine. And that's going to install a pre run dot SH script, which is going to set up all these things and all these things. And it's going to install a dot bash dot local script, which will set up our path. And it's going to also install things, set up things from installing software pip I for pip install and Mambo I for Mambo install. So we now have a machine running. And so we should now be able to create a terminal. Just press, press terminal. Something's happening. Great. Okay. Try creating a terminal here then. Okay, much better. All right. So in theory, if we look at our home directory, oh, look at that. All this stuff is now sim linked to slash storage. So I should be able to pip I fast AI and get the latest version. I wonder if I can add a minus you to say upgrade. Yes, I can. So that's how I get the latest version. And so that should have installed it locally. There it is. And okay, so now if I create a notebook, fast AI, fast AI version. Oh, look, that's a good start. Okay, next question. Can we install binaries? For example, universal c tags. Mambo. I remember install universal c tags. There we go. So you see the nice thing about this is even all this persistent stuff we're installing into a, you know, all works on the free paper space as well. So we should now be able to check c tags. Tada, it works. And which one is it? And that is actually in our storage. Oh, so I think we've done it. What do you guys think? Is that simple enough? It's good. All right. Good. Okay, so next step is I thought we might try to fix a, I don't know if you are fixing a bug or maybe it's probably we could generously call it adding an enhancement to fast AI, which is to add normalization to Tim models. So let's grab fast AI. Now this is where, so when I get clone this, so let's get a notebooks. So slash notebooks is persistent on a particular machine. And I think this will not work because I'm using SSH. Oh, it's already there. That's interesting. Oh, you know, so there's a bug in our script, which is I didn't pop D. So let's fix that pre run SSH. I did a push D at the start, but no pop D at the end. All right, no worries. That means, okay, yes, we're actually in here. No worries. All right, so let's restart this. And then I'll tell you about the bug we're fixing for it. So normalization is where we subtract the means and divide by the standard deviation of each channel for vision. And that goes, that's a transform called normalize. And we need to use the same standard deviation and mean that was used in the, when the model was pre trained. Because you know, there is, you know, so some people will normalize. So it's everything's between zero and one, someone normalized. So it's got a mean of zero and a standard deviation of one. So we need to make sure we use the same, you know, divide by the same thing to track the same thing. If you look at vision learner, vision learner has a normalized parameter. And if it's true, then it will attempt to add the correct normalization here. So if it's not a pre trained model, it doesn't do anything because it doesn't know what to normalize by. Otherwise it's going to try and get the correct statistics from the models metadata. So the models metadata is here, model underscore meta. And it's just a list of models with, with metadata and the metadata here stats. Image net stats. So the image net stats is the mean and standard deviation of image net, which I can't quite remember where that comes from, but that's something we import from somewhere. So none of these are Tim models. And so that means currently, Tim models aren't normalized. Now, Tim has its own stats. Not this, not this. There's a lot of stuff in Tim I still haven't looked into. I actually haven't used this transforms factory. Maybe in fast AI 3, which you consider using more of this functionality from Tim. There's like a configuration for them. Oh, I guess we can just try and find it. Oh, actually, we forgot to edit this. Let me start the machine. Here we go. Okay. So we can just do this locally now. All right. So this happens in vision learner. And Tim is optional. You don't have to use it. But if you do, then we have a create Tim model, which you don't normally call yourself. Normally, you just call vision learner and you pass in an architecture as a string. And if it's a string, it will create a Tim model for you. So there's best models, for example, let's say conv next or something like that. I don't know what conv it is. Never tried that one. Let's do a tiny. So we can create a model using like create model, we pass in a string. And I have a feeling that's, yeah, that's got a config. Here we are. Yeah. See, and it's got a main and a standard deviation. So models equals Tim list models. Maybe we'll just do pre trained ones. So I wonder if they all have this for M in models. So let's create a model. Look at M default config instead of deviation. Yeah, so you can see a lot of them use point five. And then some of them use image stats. I'm guessing they're the only two options. So, okay, so hopefully you get the idea. Just to know, usually putting the image in the mean should be similar standard deviation should be one byte. I mean, not necessarily. Sometimes people make the minimum zero in the maximum one. But what we need to do is use the same stats that it was pre trained with because we want our range to be the same as the range is pre trained with otherwise our, you know, data has a different meaning. So, so let's go to add norm. Okay, so here's add norm. And it's being passed a meta stats. So this only works for non Tim. So how about we put this here and we'll create an else or I guess really an L if then here will have for Tim if normalize. We could have a Tim normalize. Which, you know, we can refactor out some duplicate code later. But basically, Tim, we're going to be passing in architecture. We don't need to pass in the architecture. We can just pass in the model. And do to protect against future like ability to pass in other types that are strings that aren't Tim, do you think there's any benefit having like default normalization function that if you pass through, you can actually do your own normalization. No, because my answer to all of those questions is always you ain't going to need it. So I very intentionally don't do like, you know, dealing with things that may happen in the future. It'd be simpler just to create your own vision liner, because that looks like there's not much going on there that you can duplicate if you wanted to have support for a different model. Yeah, yeah, exactly. I mean, it's, you know, this is just a small little wrapper really you can call create Tim model or create vision model. You can call learner, you can call create head. Yep. Okay, so we'll call that M. So the normalize takes a mean and a standard deviation. So it should be just those two things, I guess, like so. All right. Okay, Tim normalize using the model and pre-trained. Okay, I see I already had an else there. So do that. There we go. Okay, so let's test this out. Yeah. So what happens when you add a transform adds a transform to each data loader in it. Okay. So what does that do? What did I do wrong? Oh, it's part of I see it's part of. Okay, that's a bit confusing. Right. Okay, so let's find sometimes it's just easiest to look at the code. I see. So it's just calling add. I see for this particular event. And we're adding it I see we're adding it to the after batch event. So we should find there's a after batch event. Here we are. I see and there's our transforms. So if we call vision learner. That should change our debt data loader. Yep. And it's now got normalized using the image net stats. And if we now try it for a string version. What happened differently? Oh, I see. We need to recreate the data loaders for this test. So that it doesn't have normalized anymore. And that gives us, okay, that gives us an error. And that's because it says we're passing a sequential object. Okay. That makes sense. Because create Tim model actually. Yeah, modifies things. That's why. And it creates a sequential model because it's got the head and the body in it. So we need to change how we do this. This is Tim body. Here is the model. And, oh, look, here we use default config to get stuff here. Interesting. So Tim body is called from here. I guess like it would be nice to know how Tim does this exactly. Where does that default config come from? So when we call Tim dot create model set layer config. I wonder if we should take a look default config. It's probably going to be a lot. Here's data config dot pie. So where does it get set? Maybe bottles help us build model with config. Well, it seems like this button, it's restructuring. It's not surprising. It was originally built not to expect to be doing stuff with Tim. Create vision model. Let's create body and create body here. This is where it builds creates the model. So maybe we should change how these work. So let's do so much. We think about doing some redesign, maybe. And so the idea of the redesign, I guess would be that this doesn't instantiate the model. But it assumes it's already instantiated. So we would remove that. Okay, so that's now not going to work, of course. So then we're creating body with model. Okay. And so then we have to instantiate that. So we might as well just do that directly, right? We make this a function. So it's a new one each time. Okay. So in this refactoring, we now are passing around models, not architectures. Create head won't change. All the model meta stuff doesn't change. Okay. So this changes. So now we say model equals arch pretrained. Pass in model. Okay. Looks hopeful. So we're going to do the same thing for Tim. You're going to pass in a model. Okay. So it's going to be the same here. That's your vision learner still works. Okay, it does. So maybe we should move, keep moving this back further and further. So to make Tim work, we do that. And this is kind of like the body. Or maybe we'll just call that the Tim model. Tim model. Our problem with that is the keyword arguments. So there's a lot of, this is, this gets a bit crazy. There's a lot of keyword arguments when we create a model. And the ones we don't know about, we pass on to Tim. So I think actually what we'll do is we'll do it up here. Okay. And so Tim body doesn't need quags anymore. And what we might do is we'll say this is the result. And we'll return those things. Or even return those two things. So now we've got the config. And so we can pass the config to this. Like so. Let's see how much we just broke. Okay. So create Tim model. Yes. We do pass it an architecture after all. We just changed that back. Oh, that looks hopeful. So we should find that if we create a VIT and check its default config. Yep. That looks good. Now comf next tiny on the other hand uses image net stats. Excellent. That looks very hopeful. So if somebody feels like an interesting and valuable problem to solve, making create unit model work with Tim would be super helpful. All right. Now create unit model needs to do the same thing as create vision model, which is to actually instantiate the model. Is anybody potentially interested in having a go at doing unit models with Tim? If so, did you want to talk about it? I'd be interested. Okay. So let's just get this working first. All right. Are you somewhat familiar with using units in general and dynamic unit? A little bit. I'm training one at the moment. That's my maximum experience. And then I've been through some notebooks to walk through fast AI one and everything. Great. So, okay. So the interesting. Okay. So you know the basic idea of a unit is that it has not just the usual kind of downward sampling path where the image is getting kind of effectively smaller and smaller as it goes through convolutions with strides. And we end up with, you know, a kind of a very small set of patches. And then rather than averaging those to get a vector and using those as our features for our head. Instead, we go through reverse convolutions, which are things which make it bigger and bigger. And as we do that, we also don't just take the input from the previous layer of the up sampling, but also the input from the equivalently sized down sample size down sampling there. Before fast AI, all units had to be only handled a fixed size. What Karen did was he created this thing called the dynamic unit, which would look to see how big each size was on the downward path and automatically create an appropriate size thing on the upward path. And that's what the dynamic unit does. Fast AI has been very aggressive in like using pre-trained models everywhere. So something we added to this idea is this idea that the downward sampling path can be, can have a pre-trained model, which is not rocket science. Obviously it's like this, this one line of code. So to understand like at the moment I'm using say like a ResNet 34. Does that mean the down path is a ResNet 34 backbone? And then there's a reverse ResNet 34 being automatically generated. It's not a reverse ResNet 34. It's, it is a ResNet 34 backbone. So here's our dynamic unit. The upward sampling path is, has a fixed architecture, which is, they are indeed ResBlocks. But they're not like, if you use as a downward sampling path, you know, downward sampling of VIT, the upward sampling is not going to be a reverse VIT. It's not a mirror. No, exactly. Would there be an advantage in doing that? Or is it just not really helpful? I don't see why there would be. I'd also don't see why there wouldn't be. Nobody's tried it as far as I know. I don't even know if there's such a thing as an upsampling transformer block. There may well be. There's no need to worry about that. The key thing is that in the downward sampling path, what we do is we, we have the downward sampling bit we call the encoder. Okay. And what we do is we do a dummy eval. Now a dummy eval is basically to take a, I can't remember like either a zero length batch or a one length batch, like a very small batch and pass it through at some image size. And we use, I believe we use hooks. If I remember correctly. What's happened to my screen? My screen's gone crazy. Yeah. So we've got these hooks with a pie torch hooks. Yes. Okay. So we use fast AIS hook outputs function, which says, I want to use pie torch hooks to grab the outputs of these layers. And so, what is SCCCHG indexes? So this is, yeah, okay. So that's a great question. So this is the indices of, this is the key thing. This is the indices of the layers where the size changes. And so that's where you want the, that's where you want the cross connection, right? Either just before that or just after that, you know, so get the indices where the size changes. So the sizes, here, model sizes. So we hook outputs. We do a dummy eval and we find the shape. Of each thing. And yeah. So here you can see dummy eval is using just a single image. And so, yeah, this just returns the shape of the output of every layer. That's going to be in sizes. And so then this is just a very simple function, which just goes through and finds where the size changes. Okay. And so this is the indices of those things. So now that we know where the size changes, we know where we want our cross connections to be. Now, for each of the cross connections, we need to store the output of the model at that point, because that's, that's going to be an input in the up sampling block. So these SFs for each unit block we create. So for each change in the index for each up sampling block, you have to pass in that, that those outputs in the down sampling side. So this is the index where it happened. And so this will be the actual, so if we go to the unit block. It looks like it's, so it's the size of that list minus one. Is that how the unit blocks get created on the other side? So it's going to be past the hook, right? Which is, and so that's just the hook that was used. That's the hook that was used on the down sampling side. And from that we can get the stored activations. And so those stored activations then. So this is the shape of those stored activations. And this is a minor tweak. So let's just ignore this if block for a moment. Basically all we then do is we take those activations, stick them through a batch norm, concatenate them with the previous layers up sampling. And chuck that through a value. And then we do some comms. And the comms aren't just comms. They're fast AI comms, which can include all kinds of things like batch norm activation, whatever. So it's, it's a, some combination of batch norm, you know, activation, convolution. You can, you can also do up sampling. So it's transpose. You can go first or last, whatever. So that's quite a, you know, a very rich convolutional layer. Okay. So then this if part here is that it's possible that things didn't quite round off nicely so that the cross connection doesn't quite have the right size. And if that happens, then we'll interpolate the cross connection to be the same shape as the up sampling connection. And again, I don't know if anybody else does this, but this is to try to make it so that the dynamic unit always just works. That's the basic idea. Yeah. So to make this work for Tim. You know, this encoder. It needs to know about the spots, right? Oh, no, it would have to check the spots. So, so honestly, this, this might almost just work. Like, I don't, like, I don't think it does. I think somebody tried it and it didn't, right? But yeah, it would, you know, to just to figure out what doesn't work, you know, you would need to change this line to say, oh, if it's a string create trim model, otherwise do this, you know, and then you'd like create body would need to be create Tim Tim body if it's a string. So like, at minimum, do the same stuff that create vision model does. And then, yeah, and then see if this works. Right. And it might well. Now I will say if you do get it working. Tim does have an API to actually tell you where the feature sizes change. So like, you could actually optimize out that dummy of our stuff, but I don't even know if I'd bother because it would make the code more complex for no particular benefit. Yeah, sure. So look, I think if, you know, this you commit this as a PR, I'll definitely be looking at it. I was actually going to try Confnext in my unit. So I had no idea it wouldn't work actually. So that would have been, I would have noticed that already, but I just had no time. So I'd love to, because I, you know, tried in Resident 32, I've got particular results and I'd like to see if we can push it with a different model. Yeah. No, I mean, I think there'd be a lot of benefit to that. So all right. So now we should run the tests. Just to know, would that all likely be in the same notebook that you're editing the vision learner? Is that when most of the source code is the unit learners? Or is it a different? I don't know. I was just using. That's all right. I'll find it. Jump to whatever, automatically in Vim. So I was using Vim CTags to jump around. So I don't, I have no idea where it was. I mean, actually. So yeah, so there's a models unit is where the unit lives. Okay. Is there anything unique about the fact that the Tim model doesn't, that's sort of an option there to cut the, the tail and head off. Does that need to be done with the unit architecture? Oh, got an error here. Yeah. So yeah, you absolutely have to cut the head off because it comes with a default classifier head. So you only, you know. So, you know, you, once you get it working, you'll probably find you can factor out some duplicate code between the unit and the vision learner. But yeah, you basically have to cut off the classifier head in the same way that create Tim body does. And I don't think you'll need to change any input processing as far as I know. The vision, create vision model, you know, handles like, you know, if you've only got one or two or four channel inputs in the models of three channel input, it handles that automatically. But Tim actually, I think Ross and I independently invented this as far as I know, we both kind of automatically handle like copying weights if necessary or deleting weights if necessary or whatever. And the same stuff, then vision model should should work there as well. So interestingly layers, the layers notebook doesn't work because it is actually creating a model, which is curious. And that we easily fixed. Yeah, that's interesting. So the big question then is can we still predict rice disease? So let's compare. I don't know if it's going to make much difference or not, you know, because we're pretty careful about fine tuning the batch norm layers. It'll actually be interesting to see whether normalization matters as much as it used to. It used to be absolutely critical. Is it possible to create like a layer that learns the normalization sort of thing? Yeah, I mean that's basically what batch norm does, you know. To understand it's those weights in the batch norm layer are basically learning the aggregate of that batch that optimally gives the best activations for the next layer. Yeah, exactly. Yeah, yeah, it's just, it's just, you know, multiply by something and add something. So it's finding what's the best thing to multiply by and add by. So let's take a look. All right, so this got 47% error. This got 44% error. Yeah, so I mean, it's a bit disappointing after all that work. It doesn't actually, I mean, this is fascinating. Like, yeah, when you fine tune the way we do, basically doesn't really matter, you know. And let's just double check it actually is, it actually is working. It'd be fair to say that the one advantage would be if you wanted to use pre-trained models without fine tuning, you definitely want the statistics in there, right? Yes, absolutely. I mean, I don't know if that's an actual thing that people do, but yes, if you did. All right, so we did dls.train.afterbatch. Yep, there it is. Groovy. Yeah, it's funny these things that, you know, we've been doing for years and I guess never question. I have a question relating to that because one of the things I wanted to do is get this unit into a mobile app. So I used the latest Torch script and it works with the demo app. I had to fill around the locks. It's broken from PyTorch. But of course in there you need to provide the averaging statistics for the app. So it's like inference mode. So I wonder, I know that at the moment the fast AI is kind of idea is that you dump everything as like a pickle, but conceivably it would be helpful if you could maybe use those new fine-tuned statistics or something for your deployment in particular environments. How would I go about doing that? I mean, they're just parameters and batch norm layers. They're just parameters. So they'll be in the parameters attribute of the model. But they're not really parameters that make sense independently of all the other parameters that you use. So I don't think you would treat them any differently. If you use say image nets statistics when you're fine-tuning, then that's the result of your model, right? You're going to use that down the track as well. Well, yes and no. Like that's what you'd normalize with. But you've got batch norm layers which then obviously dividing and subtracting it. So, yeah, I mean, those normalization stats aren't going to change, but there isn't really any reason to. It would only be if you trained a new model from scratch. I just want to have a look at this next one. So this is 27 to 18, 24. Yeah, this is actually kind of what I thought might happen is on a slightly better model. There's a little bit of errors initially. Then as it trains a bit, it makes no difference. All right. So, yeah, I'd love people to try out fast AI from master because tell me if any of your models look substantially better or even more important, substantially worse. Auto normalize Tim models. Okay, fixes 3716. All right, anybody have any questions before we wrap it up. So just with normalize it's just the initial error rate will be a bit more less than earlier approach rate. Yeah, so like that, you know, well, at first you have a random head. So at first it doesn't actually matter, right? Random is random, whether you normalize or not. So maybe, you know, after 10 batches, it's better or something. But yeah, I don't know. I mean, it'd be interesting to see if anybody notices a difference. I mean, this just, this used to matter a lot, right, for a couple of reasons. The first reason is that most people didn't fine tune models. Most people train most models and scratch until fast AI came along pretty much. And then secondly, well, we didn't have batch norm, right? So it was totally critical. And then even when batch norm came along, we didn't know how to fine tune models with batch norm. So we just fine tuned the head. At that point, we didn't realize that we had to fine tune the batch norm layers as well. So I remember emailing Francois, the creator of Keras, and I was saying to him, like, I'm trying to fine tune your Keras model and it's like bizarrely bad. Like, why is that? Well, you're probably doing the wrong thing. Here's the documentation, whatever, and like, no, I'm pretty sure I'm doing the right thing. And I spent like three months trying to answer this question. Eventually I realized it's like, holy shit, it's the batch norm layers. I sent him an email and I said, oh, we can't fine tune Keras models like this. You have to fine tune batch norm layers, which I don't think they changed for years. Actually, anyway, so those, yeah, so those changes is why I guess this whole normalization layer thing is much less interesting than I guess we thought, which is why we hadn't really noticed it wasn't working before. Because our models are training fine. Anybody else have any questions before we wrap up? All right, gang. See you. Let's see you all. Good luck with you, Nat. Bye.