 basically every time Silverman and I found something that didn't quite work the way we wanted it at any part of the stack, we wrote our own. So it's kind of like building something with no particular deadline and trying to do everything the very, very best we can. So the Layered API of FastAI v2 starts at the applications layer, which is where most beginners will start. And it looks a lot like FastAI v1, which is the released version of the software that people have seen before. But v2, everything is rewritten from scratch. It's totally new. There's no code borrowed. But the top level API looks quite similar. The idea is that in one, two, three, four lines of code you can create a state-of-the-art computer vision classifier, including transfer learning, with nearly the same one, two, three, four lines of code, five lines of code in this case, because we're also displaying, you can create a state-of-the-art segmentation model. And actually, when I say state-of-the-art, like for example, this segmentation model is, to the best of my knowledge, still better than any published result on this particular Canvid data set. So like these five lines of code, super good five lines of code. And as you can see, it includes a line of code which, if you say show batch, it will display your data in an appropriate format, in this case, showing you segmentation, a picture, and the color-coded pixels overlaid on top of the picture. The same basic four lines of code will do text classification. So here's the basis of ULMFIT, which is a system that we developed and wrote up along with the bastion router for transfer learning in natural language processing. And as you can see in here, this is working on IMDB on a single epoch in four minutes. The accuracy here is basically what was this state-of-the-art as of a couple of years ago. Tabular or time series analysis, same deal. Basically, a few lines of code, nearly exactly the same lines of code, and you'll get a great result from your tabular data and Ditto for collaborative filtering. So the high-level API for Fast AI v2 is designed to be something where, you know, regardless of what application you're working on, you can get a great result from it using sensible defaults and carefully selected hyperparameters, automatically, largely done for you for the most common kinds of problems that people look at. And that bit doesn't look that different to v1, but understanding how we get to that is kind of interesting and involves getting deeper and deeper. This approach, though, does work super well, and partly it's because this is based on quite a few years of research to figure out what are the best ways to solve various problems along the way, and when people actually try using Fast AI, they're often surprised. So this person posted on our forum that they've been working in TF2 for a while, and for some reason they couldn't figure out all of their models are suddenly working much better. And the answer is basically they're getting all these nice kind of curated best practices, and somebody else on Twitter saw that and said, yep, we found the same thing. We were trying TensorFlow, spent months tweaking, and then we switched to Fast AI. A couple of days later, we were getting better results. So these kind of carefully curated defaults and algorithms and high-level APIs that do things right for you the first time, even for experienced practitioners, can give you better results faster. But it's actually the other pieces that are more, I think, interesting for a Swift conversation, because as the deeper we go into how we make that work, the more stuff you'll see, which will be a great fit, I think, with Swift. So the mid-layer API is something which is largely new to Fast to, well, actually, I guess the foundation layer is new. So the mid-layer, I guess I'd say, is more rewritten for V1. And it contains some of the things that make those high-level APIs easy. One of the bits which is the most interesting is the training loop itself. And I thank Silver for the set of slides we have for the training loop. This is what a training loop looks like in PyTorch. We calculate some predictions. We get a loss. We do a backwards pass to get the gradients. We do an optimizer step. And then, optionally, from time to time, we'll zero the gradients based on if we're doing when we're accumulating. So this is what that loop looks like around the model, get the loss, do the gradients, step the optimizer, do that a bunch of times. But you want to do something interesting. You'll need to add something to the loop to do keeping track of your training statistics in TensorFlow or in fast progress or whatever. You might want to schedule various hyperparameters in various different ways. You might want to add various different types of characterization. You may want to do mixed precision training. You may want to do GANs. So this is a problem because either you have to write a new training loop for every time you want to add a different tweak. Now, making all those tweaks work together then becomes incredibly complicated. Or you try and write one training loop, which does everything you can think of. This is a training loop for fast AI 0.7, which only did a tiny subset of the things I just said, but was still getting ridiculous. Or you can add callbacks at each step. Now, the idea of callbacks has been around in deep learning for a long time, APIs. But what's very different about fast AI is that every callback is actually a two-way callback. It can read absolutely everything. It can read gradients, parameters, data, so forth. And it can write them. So it can actually change anything at any time. So the callbacks, we say infinitely flexible, we feel pretty confident in that because the training loop in fast AI has not needed to be modified to do any of the tweaks that I showed you before. So even the entirety of training GANs can be done in a callback. So basically, we switch out our basic training loop and replace it with one with the same five steps, but callbacks between every step. So that means, for example, if you want to do a scheduler, you can define a batch begin that sets the optimizer's learning rate to some function. Or if you want to do early stopping, you can write an onEPOC end that checks the metrics and stops training. Or you can do parallel training, set up data parallel, and be happy at the end of training, take data parallel off again. Gradient clipping, you have access to the parameters themselves. So you can click the gradient forms at the end of the backward step and so forth. So all of these different things are all things that have been written with fast AI callbacks, including, for example, mixed precision. All of NVIDIA's recommendations, mixed precision training, will be added automatically if you just add a 2FP16 at the end of your learn call. And really importantly, for example, all of those mixed precision things can be combined with multi-GPU and one-cycle training and gradient accumulation and so forth. And so trying to create a state-of-the-art model, which involves combining state-of-the-art regularization and mixed precision and distributed training and so forth is a really, really, really hard job. But with this approach, it's actually just a single extra line of code to add each feature and they all explicitly are designed to work with each other and are tested to work with each other. So for instance, here is mixup data augmentation, which is a incredibly powerful data augmentation method that has powered lots of state-of-the-art results. And as you can see, it's well under a screen of code. By comparison, here is the version of mixup from the paper. Not only is it far longer, but it only works with one particular dataset and one particular optimizer and is full of all kinds of assumptions and only one particular kind of metric and so forth. So that's an example of these mid-tier APIs. Another one is the optimizer. It turns out that it looks like there's been lots and lots of different optimizers appearing in the last year or two. It actually turns out that they're all minor tweaks on each other. Most libraries don't write them this way. So for example, Adam W, also known as decoupled weight decay Adam, was added to PyTorch quite recently in the last month or two and it required writing a whole new class and a whole new step to implement. And it took, you know, it was like two or three years after the paper was released. On the other hand, Fast AI's implementation, as you can see, involves a single extra function containing two lines of code and this little bit of gray here. So it's kind of like two and a half three lines of code to implement the same thing because what we did was we realized let's refactor the idea of an optimizer, see what's different for each of these, you know, state-of-the-art optimizers that have appeared recently and make it so that each of those things can be added and removed by just changing two things. Stats and steppers. A stat is something that you measure during training, such as the gradients or the gradient squared or you might use dampening or momentum or whatever. And then a stepper is something that uses those stats to change the weights in some way and you can combine those things together and by combining these we've been able to implement all these different optimizers. So for instance, the lamb optimizer, which came out of Google and was super cool at reducing pre-training time from three days to 76 minutes, we were able to implement that in this tiny piece of code. And one of the nice things is that when you compare it to the math, it really looks almost line-for-line identical, except ours is a little bit nicer because we refactored some of the math. So it makes it really easy to do research as well because you can kind of quite directly bring the equations across into your code. Then the last of the mid-tier APIs is the data block API, which is something we had in version one as well. But when we were porting that to Swift, we had an opportunity to rethink it and actually, Alexis Gallagher in particular helped us to rethink it in a more idiomatically, swifty way. And it came out really nicely. And so then we took the result of that and kind of ported it back into Python, and we ended up with something that was quite a bit nicer. So there's been a kind of a nice interaction and interplay between fast AI in Python and Swift AI in Swift in terms of helping each other's APIs. But basically the data block API is something where you define each of the key things that the program needs to know to flexibly get your data into a form you can put in a model. So it needs to know what type of data do you have, how do you get that data, how do you split it into a training set and a validation set, and then put that all together into a data bunch, which is just a simple little class, essentially I think four lines of code, which just has the validation set and the training set in one place. So with a data block, you just say, okay, my types, I want to create a black and white pillow image for my x and a category for my y. And to get the list of files for those, I need to use this function and to split those files into training and validation, use this function, which is looking at the grandparent path directory name. And to get the labels, use this function, which is use the parent's path name. And so with that, that's enough to give you MNIST, for instance. And so once you've done this, you end up with a data bunch. And as I mentioned before, everything has a show batch. So one of the nice things is it makes it very easy for you to look at your data, regardless of whether it's tabular or collaborative filtering or vision or text or even audio. If it was audio, it would show you a spectrogram and let you play the sound. So you can do custom labeling with data blocks by using, for example, a regular expression labeler. You can get your labels from an external file or data frame and they could be model with multi labels. So this thing here knows it's a multi label classification task. So it's automatically put a semicolon between each label. Again, it's still basically just three lines of code to define the data block. So here's a data block for segmentation. And you can see really the only thing I had to change here was that my dependent variable has been changed from category to pillow mask. And again, automatically I show batch works and we can train a model from that straight away as well. You can do key points. So here I've just changed my dependent variable to tensor point. And so now it knows how to behave with that. Object detection. So now change my dependent variable to bounding box. And you can see I've got my bounding boxes here. Text. And so forth. So actually going back, I have a couple questions. Yeah. So the code you've got sort of the X's and Y's and these both, these sounds like these different data types roughly conformed to a protocol. Yep. We're going to get to that in a moment. Absolutely. That's an excellent way to think of it. And actually this is the way it looked about three weeks ago. Now it looks even more like a protocol. So yes, this is where it all comes from, which is the foundation APIs. And this is the bit that I think is the most relevant to Swift. A lot of this I think would be a lot easier to write than Swift. So the first thing that we added to PyTorch was object-oriented tensors. For too long, we've all been satisfied with a data type called tensor, which has no semantics to it. And so those tensors actually represent something like a sentence or a picture of a cat or recording of somebody saying something. So why can't I take one of those tensors and say dot flip or dot rotate or dot resample or dot translate to German? Well, the answer is you can't because it's just a tensor without a type. So we have added types to tensors. So you can now have a tensor image, a tensor point, a tensor bounding box, and you can define a flip left, right for each. And so this is some of the source code from we've written our own computer vision library so that now you can say flip LR and it flips the puppy. And if it was a key points, it would flip the key points. If it was a bounding box, it would flip the bounding boxes and so forth. So this is an example of how tensors which carry around semantics are nice. It's also nice that I didn't just say dot show, right? So dot show is something that's defined for all fast AI v2 tensor types and it will just display that tensor. It could even be a tuple containing a tensor and some bounding boxes and some bounding box classes. Whatever it is, it will be able to display it. It will be able to convert it into batches for modeling and so forth. So with that, we can now create, for example, a random transformation called flip item. And we can say that the encoding of that random transformation is defined for a pillow image or any tensor type. And in each case, the implementation is simply to call x dot flip LR. Or we could do the dihedral symmetry transforms in the same way. Before we call, grab a random number between 0 and 7 to decide which of the 8 transposes to do. And then encodes call x, but what's dihedral with that thing we just got. And so now we can call that transform a bunch of times and each time we'll get back a different random orientation. So a lot of these things become nice and easy. Hey Jeremy, Maxim asked, why isn't tensor a backing data structure for an image type? Tenser image is a tensor, which is an image type. Why isn't, he says, why isn't tensor a backing, why not have a different type named image, I guess, that has a tensor inside of it? Do you mean why inherit rather than compose? Apparently, yes that. Yeah. So inheritance, I mean, you can do both and you can create identical APIs. Inheritance just has the benefit that all the normal stuff you can do with a tensor, you can do with a tensor that happens to be an image. So just because a tensor is an image doesn't mean you now don't want to be able to do fancy indexing to it or do an OUD composition of it or stack it with other tensors across that axis. So basically, a tensor image ought to have all the behavior of a tensor plus additional behavior. So that's why we used inheritance. We have a version that uses composition as well and it uses Python's nice get attra functionality to pass on all of the behavior of tensor. But it comes up more nicely in Python when you do inheritance. And actually the PyTorch team has decided to officially implement semantic tensor subtypes now. And so hopefully in the next version of PyTorch, you won't have to use the extremely ugly hacks that we had to use to make this work and you'll be able to use the real ones. And hopefully you'll see in TorchVision some of these ideas will be brought over there. Can I ask you, so how does that, the type propagate? So if you do arithmetic on image tensor, do you get an image tensor back there? So Chris and I had a conversation about this a few months ago and I said I'm banging my head around this issue of types not carrying around their behavior. And Chris casually mentioned, oh, yes, that thing is called higher kind of types. So I went home and that was one of these phrases that I thought only functional programming dweebs talked about and I would never care about because it actually matters a lot. And it's basically the idea that if you have a tensor image and you add one to it, you want to get back a tensor image because it should be an image that's a bit writer rather than something that loses its type. So we implemented our own, again, hacky, partial, higher kind of type implementation in FastAV2. So any of these things that you do to a tensor of a subtype, you will nearly always get back the correctly subtype tensor. I mean, I saw that PyTorch recently started talking about their named indexing extensions for their tensors as well. And they seem to have a similar kind of challenge there where when you start doing arithmetic and other things like that on a tensor that has named dimensions, you want to propagate those along. Yeah, so we haven't started using that yet because it hasn't quite landed as stable. But yeah, we talked to the PyTorch team at the DevCon and we're certainly planning to bring these ideas together. They're all fucking orbit related concerns. Yeah, I just mean that I assume that that feature has the same problem, the same challenge. I assume so, yeah. It would be interesting to see what they do. Yeah. Yeah, it would. Yeah, so it's kind of nice. Not only do we get to be able to say dot show batch, but you can even go dot show results. And in this case, it knows what the independent variables type is. It knows what the dependent variables type is. And it even knows things like, hey, for a classification task, those two things should be the same. And if they're not by default, I will highlight them. So these lower level foundations are the things that drive our ability to easily add this higher level functionality. So this is the kind of ugly stuff we wouldn't have to do in Swift. We had to write our own type dispatch system. We can annotate things with types and those type annotations are actually semantic. And so we now have the joyfully modern idea of function overloading in Python, which has made life a lot easier. And we already have that. Do you have many users that are using this? Yeah. So it's still pre-release. It's not even alpha. But there is an enthusiastic early adopter community who is using it. So, for example, the user-contributed audio library has already been ported to it. I've also built a medical imaging library on top of it and have written a series of five notebooks showing how to do CT scan analysis with it. So it's kind of like, it works. And I was curious what your users think of it, because there's this very strongly held conception that Python folks hate types. And you're kind of providing a little bit of typing. And I'm curious how they react to that. The extremely biased subset of earlier adopter class-AI enthusiasts who are using it love it. And they tend to be people who have gone pretty deep in the past. So, for example, my friend Andrew Shaw who wrote something called Music Auto Bot, which is one of the coolest things in the world, in case nobody hasn't, in case you haven't seen it yet, which is something where you can generate music using a neural network. You can put in some melodies and some chords. And it will auto-complete some additional melodies and chords. Or you can put in a melody and it will automatically add chords. Or you can add chords that create melody. And so he had to write his own MIDI library, FastAI.MIDI. He rewrote it in V2 and he said it's just like, so, so, so much easier thanks to those mid-tier APIs. So, yeah, at this stage, it's easy asking. I was just going to jump in quick. I've been helping with some of the audio stuff and it's been really awesome. So it makes things a lot more flexible than version one. So that's probably my favorite thing about it is everything can be interchanged. Nothing is like, well, it's got to be this way because that's how it is. Cool. Thanks. Another piece of the transform is of the foundation is the partially reversible composed function pipeline dispatched over collections, which really rolls off the tongue. We call them transform and pipeline. Basically, the idea is that the way you kind of want a function dispatch to work and function composition to work in deep learning is a little different to other places. There's a couple of things. The first is you often want to dispatch over tuples. And what I mean by that is if you have a function called flip left, right, and you have a tuple representing a mini batch where your independent variable is a picture and your dependent variable is a set of bounding boxes, if you say flip left, right on that tuple, you would expect both the X and the Y to be flipped and to be flipped with the type of appropriate method. So our transforms will automatically send each element of a tuple to the function separately and will dispatch according to their types automatically. We've mentioned type retention, so the kind of basic type stuff we need. One interesting thing is not only encoding, so in other words, applying the function, you often need to be able to decode, which is to kind of de-apply the function. So for example, a categorization transform would take the word dog and convert it to the number one, perhaps, which is what you need for modeling. But then when your predictions come back, you need to know what one represents. So you need to reverse that transform and turn one back into dog. Often, those transforms also need data-driven setup. For example, in that example of dog becoming one, there needs to be something that actually creates that vocab automatically, recognizing what are all the possible classes, so it can create a different index for each one and then apply that to the validation set. And quite often, these transforms also have some kind of state, such as the vocab. So we built this bunch of stuff that builds on top of each other. At the lowest level is a class called transform, which is a callable, which also has a decode, does the type retention, higher kind of type thing, and does the dispatch over tuples by default. So then a pipeline is something that does function composition over transforms, and it knows about, for example, setting up transforms. And like setting up transforms in a pipeline is a bit tricky, because you have to make sure that at each level of the pipeline, only the previous steps have been applied before you set up the next step. So it does little things like that. And then we have something that applies a pipeline to a collection to give you an indexable lazily transformed collection. And then you can do those in parallel to get back an independent variable, for instance. And then finally, we've built a data loader, which will apply these things in parallel and create collated batches. So in the end, all this stuff makes a lot of things much easier. For example, the language model data loader in FastDIV1 was like pages of code in TensorFlow, it's pages of code in FastDIV2, it's less than a screen of code by leveraging these powerful abstractions and foundations. So then finally, and again, this is something I think Swift will be great for, we worked really hard to make everything extremely well optimized. So for example, pre-processing and natural language processing, we created a parallel generator in Python, which you can then basically pass a class to that defines some setup in a call, and it can automatically paralyze that. So for example, tokenization is done in parallel in a pretty memory-efficient way. Excuse me. But perhaps the thing I'm most excited about both in Python and Swift is the optimized pipeline running on the GPU. So all of the, pretty much all of the transforms we've done can and by default do run on the GPU. So for example, when you do the flip left, right, I showed you earlier, we'll actually run on the GPU, as we'll warp, as we'll zoom, as we'll even things like crop. So one of the basics of this is the affine coordinate transform, which uses affine grid and grid sample, which are very powerful PyTorch functions, which would be great things to actually write in Swift for TensorFlow's new meta programming, because they don't exist in TensorFlow or at least not in any very complete way. But with these, with these basic ideas, we can create this affine coordinate transform that lets us do a very wide range of data augmentations in parallel on the GPU. For those of you that know about the Dali library that we created, this provides a lot of the same benefits of Dali, it's pretty similar in terms of its performance. But the nice thing is, all the stuff you write, you write it in Python, not in CUDA. So with Dali, if they don't have the exact transformation you want, and there's a pretty high chance that they won't, then you're stuck. Whereas with Fast AI v2, you can write your own in a few lines of Python, you can test it out in a Jupyter Notebook. It makes life super easy. So this kind of stuff, I feel like, because Swift is a much faster, more hackable language than Python, or at least hackable in the sense of performance, I guess not as hackable in terms of its type system necessarily. You know, I feel like we can kind of build even more powerful foundations and pipelines and, you know, like a real Swift for TensorFlow computer vision library, you know, leveraging the metaprogramming and leveraging Swift numerics, stuff like that, I think would be super cool.