 to myself, to Paige, to Chris, to Ed, or others. Today we actually have a very short agenda and a very welcome guest. So with that, I'd like to hand it off to the man who needs no introduction, Jeremy, to talk about SDI V2. Thanks, Brandon. So this actually comes out of my enthusiasm for when Adam presented a little bit of Haskell Torch code a couple of weeks ago, which I thought was super cool. And so mainly my goal here is to kind of encourage other people to present cool things in other languages and libraries, because I think it's a great way for us all to learn what cool stuff you can do. But as tends to happen when you say, can somebody please do X? Somebody else says, hey, why don't you do X first? So here I am doing X, where X is telling you about the library that Silvia and I have been working on. Basically, since Chris Latner and I finished our last Swift and Fast.AI lesson. So for quite a while now, I will, it's a library for PyTorch called Fast.AI and it, I think there are things we can learn from it regarding stuff, cool stuff we can do in Swift. But I'm gonna focus on trying to sell you on Fast.AI rather than on the Swift bits, but where I think of Swift-y things, I will mention them as we go. So Fast.AI is a library, as I said, that sits on top of PyTorch. And a lot of people kind of think that a higher level API is this kind of like small little thing that you slap on top of the serious business of TensorFlow or PyTorch or whatever. But hopefully you'll be convinced when I show you actually what's involved in a truly modern high level API that there's actually quite a lot going on. If you wanna check it out, I put a link to it in the meeting notes and that will link you to the notebooks, the development notebooks. So that's the first weird thing. What the hell are development notebooks? Well, this is an example of what Fast.AI V2 source code looks like. It's written, as you see, in notebooks. We are just having a little troubles actually doing that seeing part. Okay, so that probably means I failed to present my screen. Shall I endeavor to do that? That would be great. Should present your entire screen. Yeah, that explains a lot. There you go. Nope, nope. We don't. There we go. There we go, victory. All right, sorry about that. So yeah, so here is an example of what the Fast.AI V2 source code looks like. It has links, it has titles, it has pictures, it has code. And this may seem like a painful way to develop because these are notebooks that are designed for interactive stuff, not for normal development, but actually you'll find that also this pixel shuffle appears here in a standard layers.py module which you can import in the usual way. So we've developed a new literate programming system that allows you to write code and have it automatically turned into nice modules, which even do things that most people don't bother to do because they're annoying if they're automatic like setting done to all, so it only exports the things that you want. Also coming out of that is automatically documentation. So all that gets turned into hyperlink documentation, including links directly to the source code and automatic doc strings and parameter lists. Also you'll see tests and the tests are used both to document the behavior expected. So if you're not sure what pixel shuffle is, this is actually, this test is a very good description of exactly what it is and also ensures that our code is working and those tests can all be put in continuous integration and so forth. So that's the first interesting thing about Fast.AI V2 is it's the first kind of truly literate programming based system I've worked on and it's been, it's been an absolute delight. Now, so we've written our own framework for every part of this, which is kind of a theme for Fast.AI V2. Basically every time Shilpa and I found something that didn't quite work the way we wanted it at any part of the stack, we wrote our own. So it's kind of like building something with no particular deadline and trying to do everything the very, very best we can. So the layered API of Fast.AI V2 starts at the applications layer, which is where most beginners will start and it looks a lot like Fast.AI V1, which is the released version of the software that people have seen before. But V2, everything is rewritten from scratch. It's totally new, there's no code borrowed but the top level API looks quite similar. The idea is that in one, two, three, four lines of code you can create a state-of-the-art computer vision classifier, including transfer learning with nearly the same one, two, three, four lines of code, five lines of code in this case, because we're also displaying, you can create a state-of-the-art segmentation model. And actually, like when I say state-of-the-art, like for example, this segmentation model is to the best of my knowledge, still better than any published result on this particular Canvas dataset. So like these five lines of code, super good five lines of code. And as you can see, it includes a line of code which if you say show batch, it will display your data in an appropriate format, in this case showing you segmentation, a picture, and the color-coded pixels overlaid on top of the picture. The same basic four lines of code will do text classification. So here's the basis of ULM fit, which is a system that we developed and wrote up along with Sebastian Ruder for transfer learning in natural language processing. And as you can see in here, this is working on IMDB on a single epoch in four minutes. The accuracy here is basically what was the state-of-the-art as of a couple of years ago. Tabular or time series analysis, same deal. Basically a few lines of code, nearly exactly the same lines of code, and you'll get a great result from your tabular data and Ditto for collaborative filtering. So the high-level API for FastDIV2 is designed to be something where regardless of what application you're working on, you can get a great result from it using sensible defaults and carefully selected hyperparameters, automatically largely done for you for the most common kinds of problems that people look at. And that bit doesn't look that different to V1, but understanding how we get to that is kind of interesting and involves getting deeper and deeper. This approach though does work super well and partly it's because this is based on quite a few years of research to figure out what are the best ways to solve various problems along the way. And when people actually try using FastDIV2, they're often surprised. So this person posted on our forum that they've been working in TF2 for a while and for some reason they couldn't figure out all of their models are suddenly working much better. And the answer is basically they're getting all these nice kind of curated best practices and somebody else on Twitter saw that and said, yep, we found the same thing. We were trying TensorFlow, spent months tweaking and then we switched to FastDI. Couple of days later, we were getting better results. So these kind of carefully curated defaults and algorithms and high level APIs that do things right for you the first time even for experienced practitioners can give you better results faster. But it's actually the other pieces that are more I think interesting for a swift conversation because as the deeper we go into how we make that work, the more stuff you'll see which will be a great fit, I think, with Swift. So the mid-layer API is something which is largely new to Fast to... Actually, I guess the foundation layer is new. So the mid-layer I guess I'd say is more rewritten for V1 and it contains some of the things that make those high level APIs easy. One of the bits which is the most interesting is the training loop itself. And I thank Silver for the set of slides we have for the training loop. This is what a training loop looks like in PyTorch. We calculate some predictions. We get a loss. We do a backwards pass to get the gradients. We do an optimizer step. And then optionally, we run time to time, we'll zero the gradients based on if we're doing... When we're accumulating. So this is what that loop looks like. Run the model, get the loss, do the gradients, step the optimizer, do that a bunch of times. But you wanna do something interesting. You'll need to add something to the loop to do keeping track of your training statistics in TensorBoard or in Fast Progress or whatever. You might wanna schedule various hyperparameters in various different ways. You might wanna add various different majorization. You may wanna do mixed precision training. You may wanna do GANs. So this is a problem because either you have to write a new training loop for every time you wanna add a different tweak. Now making all those tweaks work together then becomes incredibly complicated. Or you try and write one training loop which does everything you can think of. This is the training loop for Fast AI 0.7 which only did a tiny subset of the things I just said but was still getting ridiculous. Or you can add callbacks at each step. Now the idea of callbacks has been around in deep learning for a long time, APIs. But what's very different about Fast AI is that every callback is actually a two-way callback. It can read absolutely everything. It can read gradients, parameters, data, so forth. And it can write them. So it can actually change anything at any time. So the callbacks, we say infinitely flexible. We feel pretty confident in that because the training loop in Fast AI has not needed to be modified to do any of the tweaks that I showed you before. So even the entirety of training GANs can be done in a callback. So basically we switch out a basic training loop and replace it with one with the same five steps but callbacks between every step. So that means, for example, if you wanna do a scheduler, you can define a batch begin that sets the optimizer's learning rate to some function. Or if you wanna do early stopping, you can write an onEPOC end that checks the metrics and stops training. Or you can do parallel training, set up data parallel and be happy at the end of training, take data parallel off again. Gradient clipping, you have access to the parameters themselves so you can click the gradient norms at the end of the backward step and so forth. So all of these different things are all things that have been written with fast AI callbacks, including, for example, mixed precision, all of NVIDIA's recommendations, mixed precision training will be added automatically if you just add a 2FP16 at the end of your learn call. And really importantly, for example, all of those mixed precision things can be combined with multi-GPU and one-cycle training and gradient accumulation and so forth. And so trying to create a state-of-the-art model which involves combining state-of-the-art regularization and mixed precision and distributed training and so forth is a really, really, really hard job. But with this approach, it's actually just a single extra line of code to add each feature and they all explicitly are designed to work with each other and are tested to work with each other. So for instance, here is mix-up data augmentation, which is a incredibly powerful data augmentation method that has powered lots of state-of-the-art results. And as you can see, it's well under a screen of code. By comparison, here is the version of mix-up from the paper. Not only is it far longer, but it only works with one particular dataset and one particular optimizer and is full of all kinds of assumptions and only one particular kind of metric and so forth. So that's an example of these mid-tier APIs. Another one is the optimizer. It turns out that it looks like there's been lots and lots of different optimizers appearing in the last year or two. But actually it turns out that they're all minor tweaks on each other. Most libraries don't write them this way. So for example, Adam W, also known as decoupled weight decay Adam, was added to PyTorch quite recently in the last month or two. And it required writing a whole new class and a whole new step to implement. And it took, you know, it was like two or three years after the paper was released. On the other hand, FastAI's implementation, as you can see, involves a single extra function containing two lines of code and this little bit of gray here. So it's kind of like two and a half, three lines of code to implement the same thing. Because what we did was we realized, let's refactor the idea of an optimizer, see what's different for each of these, you know, state-of-the-art optimizers that have appeared recently and make it so that each of those things can be added and removed by just changing two things. Stats and steppers. A stat is something that you measure during training, such as the gradients or the gradient squared or you might use dampening or momentum or whatever. And then a stepper is something that uses those stats to change the weights in some way. And you can combine those things together and by combining these, we've been able to implement all these different optimizers. So for instance, the lamb optimizer, which came out of Google and was super cool at reducing pre-training time from three days to 76 minutes, we were able to implement that in this tiny piece of code. And one of the nice things is that when you compare it to the math, it really looks almost line-for-line identical, except ours is a little bit nicer because we refactored some of the math. So it makes it really easy to do research as well because you can kind of quite directly bring the equations across into your code. Then the last of the mid-tier APIs is the Datablock API, which is something we had in version one as well. But when we were porting that to Swift, we had an opportunity to rethink it and actually, Alexis Gallagher in particular helped us to rethink it in a more idiomatically swifty way. And it came out really nicely. And so then we took the result of that and kind of ported it back into Python and we ended up with something that was quite a bit nicer. So there's been a kind of a nice interaction and interplay between FastAI and Python and SwiftAI and Swift in terms of helping each other's APIs. But basically the Datablock API is something where you define each of the key things that the program needs to know to flexibly get your data into a form you can put in a model. So it needs to know what type of data do you have? How do you get that data? How do you split it into a training set and a validation set and then put that all together into a data bunch, which is just a simple little class. It's literally, I think four lines of code, which just has the validation set and the training set in one place. So with a Datablock, you just say, okay, my types, I wanna create a black and white pillow image for my X and a category for my Y and to get the list of files for those, I need to use this function and to split those files into training and validation, use this function, which is looking at the grandparent path directory name and to get the labels, use this function, which is use the parents path name. And so with that, that's enough to give you MNIST, for instance. And so once you've done this, you end up with a data bunch. And as I mentioned before, everything has a show batch. So one of the nice things is it makes it very easy for you to look at your data, regardless of whether it's tabular or collaborative filtering or vision or text or even audio. If it was audio, it would show you a spectrogram and let you play the sound. So you can do custom labeling with Datablocks by using, for example, a regular expression labeler. You can get your labels from an external file or data frame and they could be multi labels. So this thing here knows it's a multi label classification task. So it's automatically put a semicolon between each label. Again, it's still basically just three lines of code to define the data block. So here's a data block for segmentation. And you can see really the only thing I had to change here was that my dependent variable has been changed from category to pillow mask. And again, automatically our show batch works and we can train a model from that straight away as well. You could do key points. So here I've just changed my dependent variable to tensor point. And so now it knows how to behave with that. Object detection. So now change my dependent variable to bounding box. And you can see I've got my bounding boxes here. Text and so forth. So actually going back, I have a couple of questions if you're, if it's a, yeah. So if you, the code you've got sort of the X's and Y's and these both, these sounds like these different data types roughly conform to a protocol. Yep, we're going to get to that in a moment. Absolutely. That's an excellent way to think of it. And actually this is the way it looked about three weeks ago. Now it looks even more like a protocol. So yes, this is where it all comes from, which is the foundation APIs. And this is the bit that I think is the most relevant to Swift. A lot of this I think would be a lot easier to write than Swift. So the first thing that we added to PyTorch was object-oriented tensors. For too long, we've all been satisfied with a data type called tensor, which has no semantics to it. And so those tensors actually represent something like a sentence or a picture of a cat or a recording of somebody saying something. So why can't I take one of those tensors and say dot flip or dot rotate or dot resample or dot translate to German? Well, the answer is you can't because it's just a tensor without a type. So we have added types to tensors. So you can now have a tensor image, tensor point, tensor bounding box, and you can define a flip left, right for each. And so this is some of the source code from we've written our own computer vision library so that now you can say flip LR and it flips the puppy. And if it was a key points, if it was a bounding box, it would flip the bounding boxes and so forth. So this is an example of how tensors which carry around semantics are nice. It's also nice that I didn't just say dot show, right? So dot show is something that's defined for all fast AIV2 tensor types and it will just display that tensor. It could even be a tuple containing a tensor and some bounding boxes and some bounding box classes, whatever it is, it will be able to display it. It'll be able to convert it into batches for modeling and so forth. So with that, we can now create, for example, a random transformation called flip item and we can say that the encoding of that random transformation is defined for a pillow image or any tensor type. And in each case, the implementation is simply to call x dot flip LR or we could do the dihedral symmetry transforms in the same way. Before we call, grab a random number between zero and seven to decide which of the eight transposes to do and then encodes call x but what's dihedral with that thing we just got. And so now we can call that transform a bunch of times and each time we'll get back a different random orientation. So a lot of these things become nice and easy. Hey Jeremy, Maxim asked, why isn't tensor a backing data structure for an image type? A tensor image is a tensor, which is an image type. Why isn't, he says, why isn't tensor a backing, why not have a different type named image, I guess that has a tensor inside of it? Do you mean why inherit rather than compose? Apparently, yes that, yeah. So inheritance, I mean, you can do both and you can create identical APIs. Inheritance just has the benefit that all the normal stuff you can do with a tensor, you can do with a tensor that happens to be an image. So just because a tensor is an image doesn't mean you now don't want to be able to do fancy indexing to it or do an OUD composition of it or stack it with other tensors across that axis. So basically a tensor image ought to have all the behavior of a tensor plus additional behavior. So that's why we used inheritance. We have a version that uses composition as well and it uses Python's nice getAtra functionality to pass on all of the behavior of tensor but it comes out more nicely in Python when you do inheritance and actually the PyTorch team has decided to officially implement semantic tensor subtypes now. And so hopefully in the next version of PyTorch you won't have to use the extremely ugly hacks that we had to use to make this work. And you'll be able to use the real ones. And hopefully you'll see in TorchVision some of these ideas will be brought over there. Can I ask you, so how does that, the type propagate? So if you do arithmetic on a image tensor do you get an image tensor back there? So Chris and I had a conversation about this a few months ago and I said I'm banging my head around this issue of types not carrying around their behavior and Chris casually mentioned, oh, yes, that thing is called higher kind of types. So I went home and that was one of these phrases that I thought only functional programming dwebs talked about and I would never have to care about. You have to care about it because it actually matters a lot and it's basically the idea that if you have a tensor image and you add one to it, you wanna get back a tensor image because it should be an image that's a bit brighter rather than something that loses its type. So we implemented our own, again, hacky partial higher kind of type implementation in Fast.av2. So any of these things that you do to a tensor of a subtype, you will nearly always get back the correctly subtype tensor. Yeah, I mean, I saw that PyTorch recently started talking about their named indexing extensions for their tensors as well. And they seem to have a similar kind of challenge there where when you start doing arithmetic and other things like that on a tensor that has named dimensions, you wanna propagate those along. And I don't know how they type that. Yeah, so we haven't started using that. Yet, because it hasn't quite landed as stable. But yeah, it's, we talked to the PyTorch team at the DevCon and we certainly are planning to bring these ideas together. They're orthogonal but related concerns. Yeah, I just mean that I assume that that feature has the same problem, the same challenge. I assume so, yeah. I don't, it would be interesting to see what they do. Yeah, yeah, it would. Yeah, so it's kind of nice. Not only do we get to be able to say dot show batch, but you can even go dot show results. And in this case, it knows what the independent variables type is, it knows what the dependent variables type is. And it even knows things like, hey, for a classification task, those two things should be the same. And if they're not by default, I will highlight them. So these like lower level foundations are the things that drive our ability to easily add this higher level functionality. So, you know, this is the kind of ugly stuff we wouldn't have to do in Swift. We had to write our own type-to-spatch system. And so we can annotate things with types and those type annotations are actually semantic. And so we now have the joyfully modern idea of function overloading in Python, which has made life a lot easier. And we already have that. Do you have many users that are using this? Yeah. So it's still pre-release. It's not even alpha, but there is a enthusiastic early adopter community who is using it. So for example, the user contributed audio library has already been ported to it. I've also built a medical imaging library on top of it and I've written a series of five notebooks showing how to do CT scan analysis with it. So it's kind of like, it works. And... Well, I was curious what your users think of it because there's this very strongly held conception that Python folks hate types. And you're kind of providing a little bit of typing. Yeah. And I'm curious how they react to that. The extremely biased subset of early adopter class AI enthusiasts who are using it love it. And they tend to be people who have gone pretty deep in the past. So for example, my friend, Andrew Shaw, who wrote something called Music Auto Bot, which is one of the coolest things in the world in case you haven't seen it yet, which is something where you can generate music using a neural network. You can put in some melodies and some chords and it will auto complete some additional melodies and chords or you can put in a melody and it will automatically add chords or you can add chords that create melody. And so he had to write his own MIDI library, fast.io.midi, he rewrote it in V2 and he said it's just like, so, so, so much easier thanks to those mid-tier APIs. So yeah, at this stage, it's easy. I was just gonna jump in quick. I've been helping with some of the audio stuff and it's been really awesome. So it makes things a lot more flexible than version one. So that's probably my favorite thing about it is everything can be interchanged. Nothing is like, well, it's gotta be this way because that's how it is. Yep, that's cool. Cool, thanks. Another piece of the transform of the foundation is the partially reversible, composed function pipeline dispatched over collections, which really rolls off the tongue if we call them transform and pipeline. Basically the idea is that the way you kind of want function dispatch to work and function composition to work in deep learning is a little different to other places. There's a couple of things. The first is you often want to dispatch over tuples. And what I mean by that is if you have a function called flip left right and you have a tuple representing a mini batch where your independent variable is a picture and your dependent variable is a set of bounding boxes, if you say flip left right on that tuple, you would expect both the X and the Y to be flipped and to be flipped with the type appropriate method. So our transforms will automatically send each element of a tuple through the function separately and or dispatch according to their types automatically. We've mentioned type retention, so the kind of basic type stuff we need. One interesting thing is not only encoding, so in other words, applying the function, you often need to be able to decode which is to kind of de-apply the function. So for example, a categorization transform would take the word dog and convert it to the number one, perhaps, which is what you need for modeling. But then when your predictions come back, you need to know what one represents. So you need to reverse that transform and turn one back into dog. Often those transforms also need data-driven setup. For example, in that example of dog becoming one, there needs to be something that actually creates that vocab automatically, recognizing what are all the possible classes so it can create a different index for each one and then apply that to the validation set. And quite often these transforms also have some kind of state such as the vocab. So we built this bunch of stuff that builds on top of each other. At the lowest level is a class called transform which is a callable which also has a decode. Does the type pretension hire kind of type thing and does the dispatch over tuples by default. So then a pipeline is something that does function composition over transforms. And it knows about, for example, setting up transforms. And like setting up transforms in a pipeline is a bit tricky because you have to make sure that at each level of the pipeline, only the previous steps have been applied before you set up the next step. So it does little things like that. And then we have something that applies a pipeline to a collection to give you an indexable lazily transformed collection. And then you can do those in parallel to get back an independent variable for instance. And then finally we've built a data loader which will apply these things in parallel and create collated batches. So in the end, all this stuff makes a lot of things much easier. For example, the language model data loader in FastDIV1 was like pages of code. In TensorFlow, it's pages of code. In FastDIV2, it's less than a screen of code by leveraging these powerful abstractions and foundations. So then finally, and again this is something I think Swift will be great for, we worked really hard to make everything extremely well optimized. So for example, pre-processing and natural language processing, we created a parallel generator in Python, which you can then basically pass a class to that defines some setup in a call and it can automatically paralyze that. So for example, tokenization is done in parallel in a pretty memory efficient way. Excuse me. But perhaps the thing I'm most excited about both in Python and Swift is the optimized pipeline running on the GPU. So all of the, pretty much all of the transforms we've done can and by default do run on the GPU. So for example, when you do the flip left right I showed you earlier, we'll actually run on the GPU. As we'll warp, as we'll zoom, as we'll even things like Crop. So one of the basics of this is the affine coordinate transform which uses affine grid and grid sample, which are very powerful PyTorch functions, which would be great things to actually write in Swift for TensorFlow's new meta programming because they do a lot of the work in TensorFlow's new meta programming because they don't exist in TensorFlow or at least not in any very complete way. But with these basic ideas, we can create this affine coordinate transform that lets us do a very wide range of data augmentations in parallel on the GPU. For those of you that know about the Dali library that you created, this provides a lot of the same benefits of Dali. It's pretty similar in terms of its performance. But the nice thing is, all the stuff you write, you write it in Python, not in CUDA. So with Dali, if they don't have the exact transformation you want, and there's a pretty high chance that they won't, then you're stuck. Whereas with Fast AI v2, you can write your own in a few lines of Python. You can test it out in a Jupyter notebook. It makes life super easy. So this kind of stuff, I feel like, because Swift is a much faster, more hackable language than Python, or at least hackable in the sense of performance, I guess not as hackable in terms of its type system, necessarily, I feel like we can build even more powerful foundations and pipelines and a real Swift for TensorFlow computer vision library, leveraging the metaprogramming and leveraging Swift numerics, stuff like that, I think would be super cool. So that is the end of that. That was great. That was excellent. Thank you very much, Jeremy. My pleasure. So just sort of thinking through, so as you're propagating along this self type amongst the transformations, that seems relatively straightforward for Swift to handle. Are there other sorts of things that you think we should start thinking about now? Yeah, the thing I really want you to think about, and we've kind of been nagging you on and off since March, is the way that tensors are represented. Having them as a value type the way they are now makes some things hard or impossible. So the generic optimizer is a thing that I really, really want you guys to look into and build properly. Currently it uses ugly keypath hacks and it's only kind of partially doing what we needed to do. So I talked to Alexis about this idea quite a bit and we kind of thought maybe there could be like some type that represents the actual block of GPU memory in a way where we can easily share that. Like in practice, we've realized the vast majority of the time we want to refer to that exact piece of memory on the GPU, not this idea of a tensor which may magically copy itself if I change something, you know. And so for example, with the generic optimizer, we need to be able to say like, oh, this layer is part of this layer group and this layer group has these things that need to happen to it. So I actually said to Ed, like, hey, could you please have a look at the Swift AI generic optimizer because it looks, it's trying to be a similar design to the fast AI V2 optimizer, but it's currently pretty unattractive. The second is, I feel like creating a really good computer vision library is something which could be done now-ish. When I tried to do it, I was getting kind of race conditions and freezes inside Swift and I don't have the Swift skills to know where they were coming from or how to fix them. It'd be nice if folks could like, I think all of my answers is like, go back to the stuff that we all built together back in March, April, May and try to start using it in real life and build models with it and put them in production and see the bits where it hits, where you get stuck because you'll find things like, oh, there's no grid sample and oh, there's race conditions in the interaction of OpenCV and the optimizer doesn't quite work properly and that stuff. That makes sense. I think we're also trying to figure out right now what the right path is with the runtime. So we've historically been building on top of the TensorFlow runtime, which is great for a lot of reasons. It has a lot of functionality in the box. It does pretty much everything. On the other hand, the performance particularly in eager mode is not great. I think one of the things we're kicking around is the idea of going more directly into XLA. Yeah. Well, I think that's a thing that's- And XLA being a stepping stone towards MLR and the bigger future, which is also coming. I mean, I think that's the thing that's been stopping us all from using stuff like Swift AI to actually build models because the Autodiff has memory leaks and the TensorFlow runtime is, I don't know if people like some Autogurgles, slow as molasses and implements everything in six different ways and six different places and so forth. So yeah, it's, I think everybody's gonna be thinking into these higher level APIs a lot more once the foundations are working better. Yeah. And so, I mean, the trade-off there is if we go with that direction now, XLA doesn't provide all the things in the box, but I think that's probably fine. We haven't fast-forwarded something, I'm just looking for that stuff that we need it. And so I think we're talking about that, trying to decide what to do there. We're also investing a lot in AD and finishing that off. Yeah. I mean, all the right works being done. It's just, you know, it's just early days. Yes, yeah. I think the challenge that we're really struggling with is this decision to stick with the TensorFlow runtime or to move on to something else. And that I think is complicated, but I agree this is one of the major blockers for adoption of use. Yeah. I mean, especially if you want to take advantage of Swift, which we do, you need something where, you know, the kernel launch time is tiny or better still kind of non-existent because you can write everything in Swift. Otherwise it's, yeah, you don't really get the benefits. Yeah. And one of the, so I'll answer your question a second, but one of the trade-offs there is that XLA doesn't have really fast kernel launch time because it effectively jet compiles things before launching it. On the other hand, there are a lot of opportunities to do, for example, Fusion and other things like that that can offset it. And one of the nice hybrid models you get is this combination of tracing plus compilation, which I think could be really interesting. Yeah. Which is good to explore. Said asked, what's going on with MLNR? There's tons of stuff going on, it's really exciting. Just yesterday there was a really fantastic talk from some folks at Intel talking about their code generation algorithm that are bringing over to MLNR, which I'm really, really, really excited about. And so there's tons of stuff going on. Getting the ideal code gen for NVIDIA GPUs, for example, is probably still six plus months away. And I don't know how much plus that is, but what I'm encouraging is the community come together and collaborate instead of the different teams and the different companies, like kind of being in front of me. And the Intel stuff that they presented yesterday is super, super impressive. And so we'll see what happens with that. The other thing I might. The other thing I might mention a lot in this stuff as well. The other thing I might mention in terms of, tales from the other side, what's life like in the Python world. Things that are and aren't working well over there. The kind of the answer to switch for TensorFlow in the PyTorch world is JIT. So it's basically to trace your Python code and attempt to figure out what it's doing and create a what they call TorchScript, which is a dialect subset of Python or else to actually parse your Python code is also an option and turn it into TorchScript. It has reached the point now where it can actually be used for good. So one of our students created some, a bunch of our students actually been working on a thing called Mish, including a young researcher who designed the original thing. It's a very nice activation function that's outperforming everything else that anybody's trying it on and it was pretty slow. And when we just took me half an hour to create a JIT version and it ran at the same speed as somebody else's hand created code. So for like small things like that where it's like two or three lines of code, that's working pretty well. Although for bigger things like a new batch norm implementation we tried to do during the last course, the performance wasn't there or if we actually tried to take, like one of the big problems at the moment, not just for Python, but the whole world of non-Google people is that the best computer vision models by far are largely those that have been coming out of Google, like Efficientnext, Nix Nets, like Cochle's team. They run very slowly and with a lot of memory on GPUs. And so we tried wrapping an entire EfficientNet and MixNet into a Jitted thing so it wouldn't be so slow. The MixNet didn't work at all and the EfficientNet was a little bit slower. So that's kind of the status of JIT in PyTorch is, you know, bits of it are useful. The way I look at this from the compiler is that every cogeneration eight pieces that I think the MLR pieces are all going the right direction. They're just going to take a while to get here. XLA, as far as I know, is state-of-the-art in cogeneration. For the things it does, it does quite well. The challenge with it though is that it does have sort of limitations like static shapes and the number of offset supports. And so you kind of have to be within its world for it to be useful. But it has a very useful, it has a large subset of the world that it covers very well. It has a pretty useful world. My understanding is that the base model of TorchScript and the interpreter they have, I understand that's quite nice. But the kernel fusion pieces is still fairly early on, it's mostly Elm-wise operations, for example. I don't find them that quite nice. I mean, like simple things, like they're partly a limitation of the Python type system. So like you wanna be able to write things that can work with different nums of channels while you're out of luck, because they use Python height limitations, which have no way of saying it's a tuple of size n, you have to set a tuple of size three, so you then have to hard code all these assumptions into your code. Lots of stuff I find pretty frustrating. I see it interesting. Well, so I mean, I think there's other spaces that I'm eager to reevaluate as, I mean, this isn't the highest priority at this moment, but in terms of our APIs, there's still very legit questions around, should we encode D-type in the static type system, or should we just say tensor, right? And if you just say tensor, then you get rid of all the generics everywhere, cleans up tons of code at the cost of losing some of the checking. But then I think if you go with more semantic tensor types that Jerry was pushing forward, you actually really don't even want the D-type, if you want is the semantics and that you're actually in a better spot. Right, like for big decision, we're switching stuff from one type to another all the time. Depending on whether you're doing a loss function or a gradient calculation or whatever, you need to be changing between half and single. So like if we went that direction, I think that would be really interesting in terms of ergonomics, but also simplification, which I think would be great. And also like your point about the optimizers, the key path, I have all kinds of weirdness because you have multiple D-types and you want to be generic over D-type. And so that's really unclosing right now. I think also like for Swift, wanting to bring over big world of Python using data scientists, they're definitely not going to be wanting to put lots and lots of verbose generic type annotations in their Jupyter notebooks. So I don't know when we'll have cycles to re-evaluate those APIs, but I think we should go do a fresh take of this and combine it with an XLA-based approach that changes a lot of the trade-offs. Right. It's really interesting. Yeah, I mean, I think in my mind right, so a couple of weeks ago I presented the layering proposal to separate out LibTensor from LibDeep Learning so that we can then get the freedom to then iterate at that level and have multiple explorations on top. So the progress update on there is that I've started, we have the two different packages now in Swift DPI's so you can depend only on one as opposed to the other. And Dan helped fix all the issues that I caused while doing the initial move of the random number generators out of, but we'll become LibDeep Learning. That said, it's still very early and I have a lot more code to move. Well, I think the Jeremy's like fundamentally right, that we need to spend more time with Swift AI and the optimized designs and re-evaluate the training with callback systems and things like that. Yeah. As each of these variables change, like it affects other parts of the system and different trade-offs, I think should be re-evaluated just between that. But I think that getting AD like full of proof is like super important. And performance. Yeah. We have to get those two things right. Well, and upstream and integrated in Swift so that we can build on it and take it for granted. Yeah. Quick question about tensor.d type. Yeah. I wonder if we would add any type assertions in any functions. I think the Python model is to not check things and to let things crash at runtime, if I understand. I don't know. So I don't know. I mean, I think that there's a couple of different options there. I don't know what the right answer is, but again, one of the things that PyTorch is doing is they're doing more coercions with d types. So if you take an intate and add a 10 and 32 it will actually promote then a 10 and 32, for example. I mean, rocket science, but you know, that's the kind of thing that it's just very nice and it just eliminates a certain kind of error. On the other hand, it's kind of like broadcasting where it makes certain things just work at the cost of potentially getting surprising in some cases, I don't know about that. I think a few things that don't make sense like you try to do a put a point operation on an integer then you would want it to be a runtime error. I think that our model is turning towards a much more runtime centric approach. I think ironically, Swetford and Swet started out very static, but now like throwing a very dynamic. Yeah, for me, like I'm realizing like one of the major benefits of having a fast fast language is like dynamic is free. And so now you can have super dynamic abstractions that you can do these things in a nice way. If PyTorch, you do get a pretty clear runtime error if there's a type mismatch, it doesn't just crash, it will tell you what it's better than what it got. Yeah. And one of the nice things about eager motors is that then you get a snack trace. I think there are other ways where instead of encoding things into the static type system that you have to adhere to, right? I think Adam's work on just fitting perfectly shows that you can still get a lot of benefits of static analysis without necessarily coding into the type system. That said, I think it's still an open question as to how far we can really push that where we end up landing. Yeah. I think it's just a really great, really great opportunity to re-evaluate these things as other pieces are coming together. Maxim asks, why is runtime tracking preferable over static analysis? I think it's more that we're still trying to figure out what dimensions you want to be flexible on. And so doing things dynamically is sort of the ultimate in flexibility. And so as we're trying to iterate on the programming model, making sure that things are as dynamic as you want them to be is sometimes nice. And then we should think about how static analysis can help catch errors sooner. Yeah, exactly. And so this is just a spectrum. And it's not that one end of the spectrum is better than the other. It's about where in the spectrum you end up. And this Nicholas's question, Nicholas asks, how are MLIR and XLA related? That is a super complicated question because we're actively re-implementing pieces of XLA in terms of MLIR. So that's actually a lot more complicated than it sounds. I would just say that MLIR is a broad scale compiler technology that solves lots of problems. XLA, as a name, is typically thought of as the thing that turns sensors into efficient code. And so I wouldn't overindex on the number of letters, I guess. And once Swift has one of those on top of MLIR, we'll still use XLA to target TPUs. Yeah, so we're, I mean, this is internal work, but we're doing a lot to change and enhance the TPU software stack in XLA. And things that are XLA are changing in their implantation as well. And so there's a big investment going on in all these pieces right now. And I think that more generally, again, if you ignore which letters get attached to them, the effort here culminates in a much more flexible cogeneration stack, support for dynamic shapes and custom ops and things like that. It's just that different pieces in this very complicated technology come together at different points of time. I don't know what the marketing, the crack compiler marketing team will end up labeling the resultant thing. Excellent, we're slightly over time. So I just wanted to, you know, unless there's any pressing questions, thank everyone for joining and see you all next week. I think next week, Mark will be up talking about some of his work on testing the auditive system to ensure that it's really reliable. There's some pretty good things that Mark's been up to there. It's also exciting that AD is getting upstream to master too, which is really cool. Yeah. Thanks everyone, have a great week and see you all next week. Thank you.