 Welcome to the final lesson of this section of 2019. Although I guess it depends on the videos, what order we end up doing the extra ones. This is the final one that we're recording live here in San Francisco. Anyway, lesson 14. Lesson two of our special Swift episodes. This is what we'll be covering today. I won't read through it all, but basically we're going to be filling in the gap between matrix multiplication and training image net with all the tricks. And along the way, we're going to be seeing a bunch of interesting Swift features and actually seeing how they make our code cleaner, safer, faster. I want to do a special shout out to a couple of our San Francisco study group members who have been particularly helpful over the last couple of weeks since I know nothing about Swift. It's been nice to have some folks who do, such as Alexis, who has been responsible actually for quite some of the most exciting material you're going to see today. And he is a CTO at Topology Eyewear. So if you need glasses, you should definitely go there and get algorithmically designed glasses, literally. So that's pretty cool. So thanks, Alexis, for your help. And thanks also to Pedro, who has almost single-handedly created this fantastic package cache that we have so that in your Jupyter Notebooks you can import all the other modules that we're using and other exported modules from the Notebooks and it doesn't have to recompile it all. And so that's really thanks to Pedro. And I actually am a customer of his as well, or at least I was when I used an iPhone. He's the developer of Camera Plus, which is the most popular camera application on the App Store, literally. And back when I used an iPhone, I loved that program. So I'm sure version two is even better, but I haven't tried version two. So you can use his camera while looking through your Topology Eyewear glasses. All right. So thanks to both of you. And where we left off last week was that I made a grand claim. Well, I pointed out a couple of things. I pointed out through this fantastic Halide video that actually running low-level kind of CUDA kernel-y stuff fast is actually much harder than just running a bunch of for loops in order. And I showed you some stuff based on Halide, which showed here some ways you can write it fast and here's some ways you could do it quickly. And then I made the bold claim that being able to do this on the GPU through Swift is where we're heading. And so to find out how that's going to happen, let's hear it directly from Chris. Sure. Thanks, Jeremy. So we will briefly talk about this. So we went through this video and the author of Halide gave a great talk about how in image processing kernels, there's actually a lot of different ways to get the computer to run this and they all have very different performance characteristics and it's really hard to take even a two-dimensional blur and make it go fast. But we're doing something even harder. We're not talking about two-dimensional images. We're talking about 5D matrices and tensors and lots of different operations that are composed together and hundreds or thousands of ops and trying to make that all go fast is really, really, really, really hard. So if you wanted to do that, what you do is you'd write a whole new compiler to do this and it would take years and years of time. But fortunately, there's a great team at Google called the XLA team that has done all this for us. And so what XLA is is it's exactly one of those things. It's something that takes in this graph of tensor operations. So things like convolutions and map moles and ads and things like that. It does low-level optimizations to allocate buffers to take these different kernels and fuse them together. And then it generates really high-performance code that runs on things like CPUs, GPUs, or TPUs, which are crazy fast high-performance accelerators that Google has. And so XLA does all this stuff for us now, which is really exciting. And if you take the running Bachelorette example that we left off with and we were talking about, this is the graph that XLA will generate for you. And this is generated from Swiftcode, actually. And so you can see here what these darker boxes are is they're fusion nodes where it's taken a whole bunch of different operations, pushed them together, gotten rid of memory transfers, pushed all the loops together. And the cool thing about this is this is all existing shipping technology that TensorFlow has now. There's a big question, though, and a big gotcha, which is this only works if you have graphs. And with TensorFlow 1, that was pretty straightforward because TensorFlow 1 was all about graphs. Jeremy talked about the shipping, shipping, shipping, ship, ship, shipping thingy. Ship, ship, shipping, ship, ship. I don't know. My recursion's wrong. And so with TensorFlow 1, it was really natural. With TensorFlow 2, with PyTorch, there's a bigger problem, which is with eager mode, you don't have graphs. That's the whole point, is you want to have a step at a time, you run one off at a time, and so you don't get the notion of these things. So what the entire world has figured out is that there's two basic approaches of getting graphs from eager mode. There's tracing, and there's different theories on tracing. There's staging and taking code and turning it into a graph algorithmically. And PyTorch and TensorFlow both have similar but different approaches to both of these different things. The problem with these things is they all have really weird side effects and they're very difficult to reason about. And so if Swift for TensorFlow is an airplane, we've taken off and we're just coming off the runway, but we're still building all this stuff into Swift for TensorFlow as the plane is flying. And so we don't actually have this today. The team was working on the demo and it just didn't come together today, but this is really cool. And so one of the problems with tracing, for example, is that in PyTorch or in TensorFlow Python, when you trace, if you have control flow in your model, it will unroll the entire control flow. And so if you have an RNN, for example, it will unroll the entire RNN and make one gigantic thing. And some control flow you want to ignore, some control flow you want to keep in the graph, so having more control over that is something that we think is really important. So, Chris, this nearly there is at the end of April. This video will be out somewhere around mid to late June. I suspect it will be up and running by then. And if it's not, you will personally go to the house of the person watching the video and fix it for them. So look, here's the deal. In two, three months, so that's July, look on the TensorFlow main page there should be a co-lab demo showing us. So we'll see what they feature. And there should be a notebook in the Hairbrain repo that will be called BatchNorm or something. And we'll have an XLA version of this running. And so Swift also has this thing called Graph Program Extraction. And the basic idea here is that where Autograph and TorchScript are doing these things where they're kind of like Python, kind of not. And Jeremy was talking before about how you add a comment in the wrong place and TorchScript will fall over. And it kind of looks like Python, but really, really is not. With Swift, we have a compiled reasonable language. And so we could just use compiler techniques to form a graph, pull it out for you. And so a lot of things that are very magic and very weird are just very natural and plug into the system. So I'm very excited about where all this comes. But for right now, this doesn't exist. The airplane is being built. One last thing that doesn't exist, because Jeremy wanted to talk about this, he's very excited, is there's this question about what is MLIR? And how does MLIR relate to XLA? What is all this stuff going on? What does this make sense for TensorFlow? And the way I look at this is the XLA is really good if you want high performance with these common operators, like matrix multiplication, convolution, things like that. These operators can be combined in lots of different ways. Running BatchNorm is a great example of this. And so these are the primitives that a lot of deep learning is built out of. XLA is really awesome for high performance, particularly weird accelerators. But there's a catch with this, because one of the things that power deep learning is the ability to innovate in many of these ways. And so depth-wise convolutions came out, and suddenly with many fewer parameters, you can get really good accuracy wins. And you couldn't do that if you just had convolution. Yeah, and like on the other hand, like depth-wise convolutions are a specific case of grouped convolutions. What we haven't been talking about grouped convolutions in class is that so far no one's really got them running quickly. And so there's this whole thing that, like somebody wrote a paper about three years ago, which basically says, hey, here's a way to get all the benefit of convolutions but much, much faster. And we're still, you know, the practical deep learning for coders course still doesn't teach them because they're still not practical because no one's got them running quickly yet. And so we've been talking about this whole course, the goal with this whole platform is to make it an infinitely hackable platform. And so if it's infinitely hackable down in convolution or give up all performance around a CPU, well, that's not good enough. And so what MLIR is about is there's multiple different aspects of the project, but I think one Jeremy's most excited about is what about custom ops, right? How can we make it so you don't bottom out at Matmill in convolution? And so you get that hackability to invent the next great convolution. So the cool thing about this is that this is a solved problem. The problem is all the problems, all the solutions are in these weird systems that don't talk to each other and they don't work well together and they're solving different slices of it. So Halide, for example, is a really awesome system if you're looking for 2D image processing algorithms, right? That doesn't really help us. Other people have built systems on top of Halide to try to adapt it and things like that. But this is really not a perfect solution. There's other solutions. So PlatML was recently acquired by Intel and they have a lot of really cool compiler technology that is kind of in their little space. TVM is a really exciting project, also building on Halide, pulling together with its own secret sauce of different things. And it's not just the compiler technology. It's also in each of these cases, they've built some kind of domain-specific language to make it easier for you, the data scientist, to write what you want in a quick and easy way. Right. And often what happens here is that each of these plug into the deep learning frameworks in different ways. And so what you end up having to do is you end up in a mode of saying TVM is really good for this set of stuff and TensorFlow Comprehensions, which is another cool research project, is good at these kinds of things. And so I have to pick and choose the framework I want to use based on which one they happen to build into, which is not very specific. And again, we don't teach this in practical deep learning for coders because it's not practical yet. These things are generally research quality code. They generally don't integrate with things like PyTorch. They generally require lots of complex build steps. The compile time is often really slow. They work really great on the algorithm and the paper, but they kind of fall apart on things that aren't, all those kinds of problems. So our goal and our vision here with TensorFlow, but with Swift for TensorFlow also, is to make it so that you can express things at the highest level of abstraction you can. So if you have a batch norm layer, totally go for that batch norm layer. If that's what you want, use it and you're good. If you want to implement your own running batch norm, you can do that in terms of map muslin and ads and things like that, fine. If you want to sink down farther, you can go down to one of these systems. If you want to go down farther, you can write assembly code for your accelerator if that's the thing you're into. But you should be able to get all the way down and pick that level of abstraction that allows you to do what you want to do. And so I just want to give TensorFlow Comprehension as one random example of how cool this can be. So this is taken straight out of their paper. This is not integrated. But TensorFlow Comprehension gives you what is basically like Einstein notation on total steroids. It's like INesum. Yes, good point. INesum but taken to a crazy extreme level. And what TensorFlow Comprehension is doing is you write this very simple code. It's admittedly kind of weird and it has magic. The syntax isn't the important thing. But you write a pretty simple code and then it does all this really hardcore compiler stuff. So it starts out with your code. It then fuses the different loops because these two things expand out to loops. It does inference on what are the ranges for all the loops and what the variables that you're indexing into the arrays do. Then fuse and tile these things. Fuse, tile, then sync the code to make it so the inner loops can be vectorized. This is actually a particularly interesting example because this thing here, gem, is a generalized matrix-matrix product. This is actually like the thing on which large amounts of deep learning and linear algebra and stuff is based on. So like a lot of the stuff we write ends up calling a gem. And the fact that you can write this thing into lines of code, if you look inside like most linear algebra libraries, there will be hundreds or thousands of lines of code to implement something like this. So the fact that you can do this so concisely is super cool. And so the idea that then we could do nice little tweaks on convolutions or whatever in similar amounts of code is something that I get very excited about. Yeah, me too. And the other thing to consider with this and generating a really good code for this is hard. But once you make it so that you separate out the code that gets compiled from the algorithms that get applied to it, now you can do search over those algorithms. Now you can apply machine learning to the compiler itself. And now you can do some really cool things that open up new doors. So I mean, that's actually really interesting because in the world of databases, which is a lot more mature than the world of deep learning, this is how it works, right? You have a DSL, normally called SQL, where you express what you want, not how to get there. And then there's a thing called a query analyzer or query compiler or query optimizer that figures out the best way to get there. And it'll do crazy stuff like genetic algorithms and all kinds of heuristics. And so like what we're seeing here is we'll be able to do that for deep learning. Our own DSLs and our own optimizers, not deep learning optimizers, but more like database optimizers. Yeah, so it's going to be really exciting. The MLR part of this is longer time horizon. This is not going to be done by the time this video comes out. But this is all stuff that's getting built and it's all open source and it's super exciting. So to overall summarize all this TensorFlow infrastructure stuff, so TensorFlow is deeply investing in the fundamental parts of the system. This includes the compiler stuff, also the runtime, Optis batch, the kernels themselves. There's tons and tons and tons of stuff and it's all super exciting. So let's stop talking about the future. Yeah, I mean, that's kind of boring. Yeah, this is very exciting, Chris, that sometime in the next year or two, there'll be these really fast things. But I actually know about some really fast languages right now. Really? Yeah, they're called C, C++, and Swift. Seriously? Oh. Yeah, let me show you what I mean. Oh, okay. These are actually languages that we can make run really fast right now. And it's quite amazing actually how easy we can make this. Like when you say to an average data scientist, hey, you can now integrate C libraries, the response is not likely to be, oh, awesome, right? Because data scientists don't generally work at the level of C libraries. But data scientists work in some domain, right? You work in neuro radiology, image acquisition, or you work in astrophysics or whatever. And in your domain, there will be many C libraries that do the exact thing that you want to do at great speed, right? And currently, you can only access the ones that have been wrapped in Python and you can only access the bits that have been wrapped in Python. What if you could actually access the entire world of software that's been written in C, which is what most software has been written in, and it's easy enough that an average data scientist can do it. So here's what it looks like, right? Let's say we want to do audio processing, okay? And so for audio processing, I'm thinking, like, oh, how do I start doing audio processing? And in my quick look around, I couldn't see much in Swift that works on Linux for audio processing. So you write an MP3 decoder from scratch, right? Yeah, I thought about doing an MP3 decoder from scratch, but then I figured, like, people have MP3 decoders already. What are they doing? Well, the internet, it turns out, there's lots of C libraries that do it. And one popular one, apparently, is called Sox, right? And I'm a data scientist. I'm not an audio processing person, so this is my process last week. It was, like, C, library, MP3 decode, and it says use Sox. So look at this. I've got something here that says import Sox, and then it says, in it Sox, and then it says, read Sox audio. Where did this come from? Well, this comes from a library. Here it is, sound exchange. This is what C library homepages tend to look like. They tend to be very 90s. And basically, I looked at the documentation, and C library documentation tends to be less than obvious to kind of see what's going on, but you just kind of have to learn to read it, just like you learn to read Python documentation. And so basically, it says you have to use this header, and then these are the various functions you can call. There's something called in it, and there's something called open. So here's what I did. I jumped into Vim, and I created a directory, and I called it Swift Sox. And in that directory, I created a few things. I created a file called package.swift, and this is the thing that defines a Swift package. A Swift package is something that you can import. And you can actually type Swift package in it, and it'll kind of create the skeleton for you. Personally, my approach to wrapping a new C library is to always copy an existing C library folder that I've created and then just change the name, because every one of them has the same three files, right? So this is file number one, you have to give it a name, and then you have to say, what's the name of the library in C? And in Sox, the name of the library is Sox. Step two is you have to create a file called sources, Sox, module dot module map. And it contains always these exact lines of code, again, where you just change the word Sox, and the word Sox, and the word Sox. So it's not rocket science. So what this is doing is this is saying that you want to call it Sox and Swift. They called it SoxU.H for some reason. Well, I actually called it SoxU.H, which we'll see in a moment. And then, but the library isn't, it gets linked in by LibSox. Yeah, exactly. So all these things in C can be different. Yeah. So most of the time, we can make them look the same. Yep. And so then the final third file that you have to create is the .h file. And so you put that in sources, Sox, and I call it SoxUmbrellaHeader.h, and that contains one line of code, which is the header file, which as you saw from the documentation, you just copy and paste it from there. So once you add these three files, you can then do that. Okay, and so now I can import Sox, and now this thing, this C function is available to Swift, right? And so this is kind of wild, right? Because suddenly, like a lot of what this impractical deep learning for coders course is about is like opening doors that weren't available to us as data scientists before and thinking what happens if you go through that door? So what happens if you go through the door where suddenly all of the WorldC libraries are available to you? What can you do in your domain that nobody was doing before? Because there wasn't any Python libraries like that. So what I tend to do is write little Swift functions that wrap the C functions to make them look nice. So here's initsox, which checks for the value I'm told the doc said to check for. And socks open read. For some reason, you have to pass 0, 0, 0, so I just wrapped that. And so now I can say read Sox audio. And so that's going to return some kind of structure. And so you have to read the documentation to find out what it is or copy and paste somebody else's code. Very often, the thing that's returned to you is going to be a C pointer. And that's no problem. Swift is perfectly happy with pointers. You just say pointy to grab the thing that it's pointing at. And according to the documentation, there's going to be something called signal, which is going to contain things like sample rate, precision, channels, and length. And so I can... Let's run those two. So I can run that and I can see I've opened an audio file with a C library without any extra stuff. One of the things you can do is you can type Sox tab. And now here's all the stuff that's coming in from that header file. That's wild. Yeah. Super cool. So now I can go ahead and read that. And this is kind of somewhat similar to Python. In Python, you can open C libraries in theory and work with them, but I don't do it because... I almost never do it because I find that when I try to... The thing you get back are these C structures and pointers which I can't work with in Python in a convenient way. Or if I do use things like PyBind 11, which is something that helps with that, then I have to create all these make scripts and compile processes. And I just don't bother, right? None of us bother. But in Swift, it's totally fine. And then the nice thing is we can bring Python and C and Swift together by typing import Python. The unholy marriage. Yeah. Now, we can just say we can take our C array and say make numpy array and plot it, right? So we're really bringing it all together now. And we can even use the ipython.display and we can hear some audio. Hi, everyone. I'm Chris. Hi, everyone. Hi, Chris. Hi, Jeremy. Thank you, Jeremy. My pleasure. All right, so... Why did I say yes to this again? Okay, so this is pretty great, right? We've got Swift libraries, C libraries, Python libraries. We can bring them all together. We can do stuff that our peers aren't doing yet. But what I want to know, Chris, is how the hell is this even possible? Wow, okay. Your guy that likes to look under the covers or under the hood. Where's the... Cool, so let's talk about this. C is really a very simple language, so it should be no problem to do this, right? So C is two things, actually. It's really important. I think you were just talking about why it's actually very useful. There's tons of code available in C. A lot of that C is really useful. But C is actually a terrible, crazy, gross language on its own, right? C has all these horrible things in it, like pointers, that are horribly unsafe. And we have a question. Oh, let's do it. Is it possible to achieve similar results in Python using something like Cython? Yeah, absolutely. So, Cython is a Python-like language which compiles to C. And I would generally rather write Cython than C for integrating C with Python. You still kind of... It's actually easier in a Jupyter Notebook because you can just say percent-percent Cython and kind of integrate it. But as soon as you want to start shipping a module with that thing, which presumably is the purpose as you want to share it, you then have to deal with build scripts and stuff like that. So, like, Cython has done an amazing job of kind of making it as easy as possible. But I personally have tried to do quite a lot with Cython in the last few months and ended up swearing off it because it's just still not convenient enough. I can't quite use a normal debugger and a normal profiler and just ship the code directly. And it's still... Yeah, it's great for Python if that's what you're working with, but it's nowhere near as convenient for like... I've created Swift-C libraries. I created a Swift-C library within a week of starting to use Swift. It was just very natural. Cool. And so the thing I want to underscore here is that C is actually really complicated. C has macros. It's got this preprocessor thing going on. It's got bit fields and unions and it's weird notion of what verbs are. It's got volatiles. It's got all this crazy stuff that the grammar is context-sensitive and gross. And so it's just actually really hard to deal with. You sound like somebody who's been through the process of writing a C compiler and came out the other side. Well, so the only thing worse than C is C++. Uh-huh. And it has this dual side of it. It's both more gross and huge and it's also more important in some ways. And so Swift doesn't integrate with C++ today, but we want to be able to. We want to be able to provide the same level of integration that you just saw with C and C++. But how are we going to do that? Well, Swift loves C APIs like Jeremy was just saying. And so we love C APIs because we want you to be able to directly access all this cool functionality that exists in the world. And so the way it works, as you just saw, is we take the C ideas and remap them into Swift. And so because of that, because they're native pure Swift things, that's where you get the debugger integration. That's where you get code completion. That's where you get all the things that you expect to work in Swift, talking to Dusty Deck old Groty C code from the 80s or wherever you got it from, whatever, Epoch. And we also don't want to have wrappers or overhead because that's totally not what Swift's about. So Jeremy showed you that usually, when you import the C API into Swift, it looks like a C API. But the nice thing about that is that you can build the APIs, you want to wrap it, and you can build your abstractions and make that all good in Swift. So one of the ways this happens is that inside the Swift compiler, it can actually read C header files. And so we don't have a great way to plug this into workbooks quite yet. But Swift can actually take a C header file, like math.h, which has macros. Here's M under bar E, because M under bar E is a good way to name E, apparently. Here's the old school square root. Here's the sign and cost function, which of course it returns sign and cosine and through pointers because C doesn't have tuples. And so when you import all that stuff into Swift, you get M under bar E as a double that you can get. You have square root, and you can totally call it square root. You have sign and cost. You get this unsaved mutable pointer double thing, which we'll talk about later. Similarly, like malloc, free, realloc, all this stuff exists. And so just to show you how crazy this is, let's see if we can do the side-by-side thing. Can you make it do side-by-side? Is that a challenge? Yes. My Windows skills are dusty. Check it out. Okay, beautiful. So what we have here is we have the original header file, math.h on the left. If you look at this, you'll see lots of horrible things in C that everybody forgets about, because you never write C like this. But this is what C looks like when you're talking about libraries. So we've got a whole bunch of if-defs. We've got macros. We've got, like, crazy macros. We've got conditionally-enabled things. We've got these things are also macros. We've got inline functions. We've got tons and tons and tons of stuff. We've got comments. We've got structures, like, exception, of course. That's an exception, right? So when you import this into Swift, this is what the Swift compiler sees. You see something that looks very similar, but this is all Swift syntax. So you see you get the header, the comments. You get all the same functions. You get all, like, here's your M under bar E. And you get your structures as well. And this all comes right in. And this is why Swift can see it. Now, how does this work? That's the big magic question. So if you want to get this to work, what you can do is you can build into the Swift compiler. We can write a C parser. We can implement a C preprocessor. And we can implement all the weird rules in C. Someday we can extend it and write C++ as well. And we can build as this library, so the Swift compiler knows how to parse C code. And a C++ compiler is pretty easy to write, so we can hack that on a weekend. Or, yay, good news. Like, we've already done this many years ago. It's called Clang. So what Clang is, is it's a C++ compiler. Oh, this is getting even more talk about how horrible C is. You actually get inline functions. Inline functions, the insane thing about inline functions is that they don't exist anywhere in a program unless you use them. They get inlined. And so if you want to be able to call this function from C, you actually have to code gen. You have to be able to parse that code gen, understand what unions are now, understand all of this crazy stuff just so you can get the sign bit out of a float. C also has things like function pointers and macros and tons of other stuff. It's just madness. And so the right way to do this is to build a C compiler. And the C compiler we have is called Clang. And so what ends up happening is that when Jeremy says import socks, Swift goes and says, haha, what's a socks? Oh, it's a module. Okay, what is a module? Oh, it's C. Oh, it's got a header file. Fire up Clang. Go parse that header file. Go parse all the things the header file pulls in. That's what an umbrella header is. And go pull the entire universe of C together into that module and then build what's called syntax trees to represent all the C stuff. Well, now we've got a very perfect, pristine C view of the world the exact same way a C compiler does. And so what we can do then is we can build this integration between Clang and between Swift where when you say, give me malloc or give me socks in it, Swift says, whoa, what is that? Hey, Clang, do you know what this is? And Clang says, oh yeah, I know what socks in it is. It's this weird function that takes all these pointers and Swift says, okay, cool. I will remap your pointers into my unsafe pointer. I will remap your int into my int 32 because the languages are a little bit different. And so that remapping happens. And then when you call that inline function, Swift doesn't want to know how unions work. That's crazy. So what it does instead is it says, hey, Clang, you know how to do all this stuff. You know how to code generate all these things. And they both talked to the LVM compiler that we were talking about last time. And so they actually talked to each other. They share the code. Clang does all that heavy lifting. And now it's both correct. And it just works. Two things we like. And so these two things plug together really well. And now Swift can talk directly to CAPIs. It's very nice. If you want to geek out about this, there's a whole talk that's like half hour and hour long talking about how all this stuff works at a lot lower level. We will add that link to the lesson notes. Yeah. So let's jump back to your example. Thanks. So one of the reasons I'm really interested in this description, Chris, is that it's kind of all about one of the reasons I really wanted to work with you, apart from the fact that you're very amusing and entertaining, is that this idea of what you did with Clang and Swift is like the kind of stuff that we're going to be seeing is what's happened with how differentiation is getting added to Swift and this idea of being able to pull on this entire compiler infrastructure, as you'll see, is actually going to allow us to do some similarly exciting and surprisingly amazing things in deep learning world. And I'll say this is all simple now, but actually getting these two massive systems to talk to each other was kind of heroic. And getting Python integrated was comparatively easy, because Python is super dynamic and C is not dynamic. Yeah. And one thing I'll say about C libraries is each time I come across a C library, many of them have used these weird edge case things Chris described in surprising ways. And so I just wanted to point out a couple of pointers as to how you can deal with these weird edge cases. So when I started looking at kind of how do I create my own version of tf.data, I need to be able to read JPEG files and be able to do image processing. I was interested in trying this library called Vips. And Vips is a really interesting C library for image processing. And so I started looking at bringing in the C library. And so I started in exactly the way you've seen already. So let's do that. So you'll find just like we have a Swift SOX in the repo. There's also a Swift Vips. And we'll start seeing some pretty familiar things. There's the same package.swift that you've seen before. But now it's got some extra lines. We'll describe in a moment. There's the sources, vips, module map with the exact three lines of code that you've seen before. There's the sources, vips, some header. I called it a different name in this case, which has the one line of code, which is connecting to the header. After you've done that, you can now import vips done. But it turns out that the vips documentation says that they actually added the ability to handle optional positional arguments. In C. In C. And so it turns out that you can do that in C, even though it's not officially part of C, by using something called vargs, which is basically in C. You can say the number of arguments that go here is kind of not defined ahead of time. And you can use something I've never heard of before called a sentinel. And basically you end up with stuff which looks like this. You end up with stuff which looks like this, where you basically say I want to do a resize. And it has some arguments that are specified like horizontal scale. And by default, it makes the aspect ratio the same. But if you want to also change the aspect ratio and have a vertical scale, you literally write the string vscale. And that says, oh, the next argument is the vertical scale. And if you want to use some different interpolation kernel, you pass a word kernel. You say there's some different interpolation kernel. Now, this is tricky because for all the magic that Swift does do, it doesn't currently know how to deal with vargs and sentinels. It's just an edge case of the C world that Swift hasn't handled. I think this might be the last edge case. Jeremy has this amazing thing to find the breaking point of anything. That's what I do. Yes. But no problem, right? The trick is to provide a header file where the things that Swift needs to call look like the things that they expect. So in this case, you can see I've actually written my own C library, right? And so I added a C library by literally just putting into sources. I just created another directory. And in there, I just dumped a C header file, right? And here's the amazing thing. As soon as I do that, I can now add that C library, not pre-compiled, but actual C code I've just written, to my package.swift. And I can use that from Swift as well, right? And so that means that I can wrap the VIPs weird var args resize version with a non-var args resize version where you always pass in vertical scale, for instance. And so now I can just go ahead and say VIPs load image. And then I can say VIPs get, and then I can pass that to Swift for TensorFlow in order to display it through Matplotlib. Now, there's a really interesting thing here, which is when you're working with C, you have to deal with C memory management. So Swift has this fantastic reference counting system, which nearly always handles memory for you. Every C library handles memory management differently. So we're about to talk about OpenCV, which actually has its own reference counting system, believe it or not. But most of the time the library will tell you, hey, this thing is going to allocate some memory. You have to free it later, right? And so here's a really cool trick. The VIPs get function says, hey, this memory, you're going to have to free it later. To free memory in C, you use the free function because we can use C functions from Swift. We can use the free function. And I need to make sure that we call it when we're all done. And there's a super cool thing in Swift called defer. And defer says, run this piece of code before you finish doing whatever we're doing, which in this case would be before we exit from this function. Yeah, so if you throw an exception, if you return early, like anything else, it will make sure to run that. Yeah, in this case, I probably didn't need defer because there isn't exceptions being thrown or lots of different return places. But that's my habit, is that if I need to clean up memory, I just chuck it in a defer block. At least that's one of the two methods that I use. So that's that. So because I like finding the edges of things and then doing it anyway, the next thing I looked at, and this gives you a good sense of how much I hate tf.data is I was trying to do anything I could to avoid tf.data. And so I thought, all right, let's try OpenCV. And for those of you that have been around FastAI for a while, you will remember OpenCV is what we used in FastAI 0.7. And I loved it because it was like insanely, insanely fast. It's fast, reliable, high-quality code that covers a massive amount of computer vision. It's kind of like, it's what everybody uses if they can. And much to my sadness, we had to throw it out because it hates Python multiprocessing so much. It does cat creating weird race conditions and crashes and stalls, like literally the same code on the same operating system on two different AWS servers that are meant to be the same spec, would give different results. So that was sad. So I was kind of hopeful maybe it'll work in Swift. So we gave it a go. And unfortunately, since I last looked at it, they threw away their CAPI entirely and then LC++ only. And Chris just told you we can't use C++ from Swift. But here's the good news. You can disguise it so Swift doesn't know that it's C++. And so the disguise needs to be a C header file that only contains C stuff, right? But what's in the C++ file behind the header file? It can be anything. It can be anything at all. It could be Pascal. Clang those Pascal calling conventions. And Swift can call them. I didn't know that. Pascal strings too. So here's Swift CV. And so Swift CV has a very familiar-looking package.swift that contains the stuff that we're used to. And it contains a very familiar-looking OpenCV4 module map. Now, OpenCV actually has more than one library. So we just have to list all the libraries. It has a very familiar-looking. Actually, we don't even have a... Oh, sorry, that's right. So we didn't use the header file here because we're actually going to do it all from our own custom C++ slash C code. So I created a C OpenCV. And inside here, you'll find C++ code. And we actually largely stole this from the Go OpenCV wrapper because Go also doesn't know how to talk to C++, but it does know how to talk to C. So that was a convenient way to kind of get started. And you can see that, for example, we can't call new because new is C++. But we can create a function called mat new that calls that. And then we can create a header that has mat new, and that's not C++, right? This is actually a plain C pointer to a struct. And so I can call that. And so even generics, C++ generics, we can handle this way. So OpenCV actually has a full-on multi-dimensional generic array, like NumPy, with, like, matrix multiplication or the stuff in it. And the way its generic stuff works is that you can ask for a particular pixel, and you say, what data type is it using C++ generics? So we just create lots and lots of different versions for all the different generic versions, which in the header file look like C. So once we've done all that, we can then say import SwiftCV and start using OpenCV stuff. So what does that look like? Well, now that we can use that, we can read an image. We can have a look at size. We can get the underlying C pointer. And we can start doing timing things and kind of see, is it actually looking like it's going to be hopeful in terms of performance, and so forth. I was very scared when I started seeing in Swift all these unsafe multiple pointers and whatnot. They're designed to make you scared. It starts with unsafe. Fair enough. But this is C. Right. And so C is inherently unsafe. Yeah. So a theory on that is that it does not prevent you from using it. It just makes it so you know that you're in that world. But there's actually this great table I saw from Ray Wendelix, from Ray Wendelix website. And I've stolen it here. And basically what he pointed out is the names of all of these crazy pointer types actually have this very well structured thing. They all start with unsafe. They all end with pointer. And in the middle, there's this little mini language, which is can you change them or not? Are they typed or not? Do we know the count of the number of things in there or not? And what type of thing do they point to if they're typed? So once you kind of realize all of these names have that structure, suddenly things start seeming more reasonable again. We have two questions. All right. Let's go with the two questions. One is, are the C libraries dynamically linked or statically linked or compiled from source to be available in Swift? Sure. By default, if you import them, they are statically linked. And so they'll link in with the normal linker flags. And if the library is a .a file, then it will get statically linked directly into your Swift code. If it's a .so file, then you'll dynamically link it. But it's still linked to your executable. All the linker stuff, so DL open, is a C API. And so if you want to, you can dynamically load C libraries. You can look up their symbols dynamically. You can do all that kind of stuff, too. Another question is, how much C do you have to know to do all these C related imports? Almost none. So like, I don't really know any C at all. So I kind of like randomly press buttons until things start working or copy and paste other people's C code. Yeah. The Internet Stack Overflow has a lot of helpful stuff. Yeah. You need to know there's a thing called a header file. And that contains a list of the functions that you can call. And you need to know that you type hash, include, angle brackets, header file. But you can just copy and paste the Swift Socks library that I've already shown you, which has the three files already there. And so really, you don't need to know any C. You just need to replace the word socks with the name of the library you're using. And then you need to know, you need to kind of work through the documentation that's in C. And that's the bit where it gets like, you know, I find the tab completion stuff is the best way to handle that. It's like hit tab. And you say let x equal, and then you call some function. And then you say x dot, and you see what's inside it. And things kind of start working. And for all the hard time you give socks as a, you know, not a web design firm. It has a pretty well-structured API. And so if you have a well-structured API like this, then using it is pretty straightforward. If you have something, somebody hacked together, they didn't think about it, then it's probably going to be weird. And you may have to understand their API. And it may require you to understand a lot of C. But those are the APIs that you probably won't end up using, because if they haven't gone a lot of love to their API, people aren't using it usually. My impression is that almost all of the examples of the future power of Swift seem to rely on not on the abstraction to higher levels, but on the diving into lower level details. As a data scientist, I try to avoid doing this. I only go low if I know there's a big performance gain to be had. So let me, you know, set my perspective as a data scientist, and maybe we can hear you all. Well, and I was just going to inject. We're starting at the bottom. So we'll be getting much higher level soon. Yeah. But I mean, there's a reason that I'm wanting to teach this stuff, which is that I actually think as data scientists, this is our opportunity to be far more awesome. It's like being able to access, like, something I've noticed for the last 25 years is like everybody I know in, I mean, it didn't used to be called data science. We used to call it like industrial mathematics or whatever. Operated within the world that was accessible to them. Right? So like at the moment, for example, there's a huge world of something called sparse convolutions that are, I know they're amazing. I've seen like competition winning solutions. They like get state of the art results. There's like two people in the world doing it because it all requires custom Kuda kernels, you know. For years, for decades, almost nobody was doing differentiable programming because we had to calculate the derivatives by hand. So like it's not just about, oh, I want an extra, it's absolutely not about, I want an extra 5% of performance. It's about being able to do whatever's in your head. It's like I used to be a management consultant, I'm afraid to say, and I didn't know how to program. And on UXL, and the day that I learnt Visual Basic was like, oh, now I'm not limited to the things I can do in a spreadsheet. I can program. And then when I learnt Delphi, it was like, oh, now I'm not limited to the things that I can program in a spreadsheet. I can do things that are in my head. So that's where I want us all to get to. Hey, and some people are feeling overwhelmed with Swift, C, C++, Python, PyTorch, TensorFlow, Swift for TensorFlow. Do we need to become experts on all these different languages? No. No, we don't. But can I show why this is like super interesting? Because this is like this. So let me show you why I started going down this path, right? Which is that I was using tf.data. And I found that it took me 33 seconds to iterate through ImageNet. And I know that in Python, we have a notebook which Sylvia created to compare called Timing. And the exact same thing takes 11.5 seconds. And this is not an insignificant difference. So waiting more than three times as long just to load the data is just not okay for me. So I thought, well, I bet OpenCV can do it fast. So I created this little OpenCV thing. And then I created a little test program. So this is the entirety of my test program, right? Which is something that downloads ImageNet and reads and resizes images and does it with four threads. And so if you go Swift, run. Sorry, Swift, run. Okay, so when I run this, check this out. 7.2 seconds, right? And so this was like half a day's work. And half a day's work, I have something that can give me an image processing pipeline that's even faster than PyTorch. And so it's not just like, oh, we can now do things a bit faster. But it's now like, anytime I get stuck that I can't do something, it's not in the library I want, or it's so slow as to be unusable, this whole world's open. So I'd say we don't really touch this stuff until you get to a point where you have no choice but to, and at that point, you're just really glad it's there. Well, and to me, I think it's also, the capability is important even if you don't do it. So keep in mind this is all code that's in a workbook. So you can get code in the workbook from anywhere. And now you can share that workbook. You don't have to share this like tangled web of dependencies that have to go with the workbook. And so the fact that you can do this in Swift doesn't mean that you yourself have to write the code, but it means you can build on code that other people wrote. And if you haven't seen Swift at all, if this is your first exposure to it, it's definitely not the place you start. Like the data APIs that we're about to look at would be a much more reasonable place to start. You've had a month or two months worth of hacking with Swift time, and that's Jeremy months. That's like a year for normal people. So this being super powerful and the ability to do this is, I think, really great, and I agree with you. Yeah, and I am totally not a C programmer at all, and honestly it's been more like two weeks, because before that I was actually teaching a Python course, believe it or not, but Ciova has been doing this for a month. Yeah, so this is all there, and I would definitely recommend ignoring all of this stuff, and we're about to start zooming up the levels of the stack. But the fact that it's there, I think, is reassuring, because one of the challenges that we have with Python is that you get this ceiling. And if you get up to the ceiling, then there's no going further without this crazy amount of complexity. And whether that be concurrency, or whether that be CAPIs, or whether that be other things, that prevents the next steps and the next levels of invasion and the industry moving forward. And this is meant to be giving you enough to go on with until a year's time course as well. So hopefully this is something where you can pick and choose which bits you want to dig into, and whichever bit you pick to dig into, we're showing you all the depth that you can to begin to over the next 12 months. So I was really excited to discover that we can use OpenCV, which is something I've wanted ever since we had to throw it away from FastAI. And so I thought, you know, what would it take to create a data blocks API with OpenCV? And thanks to Alexis Gallagher, who kind of gave us the great starting point to say, well, here is what a Swifty-style data blocks would look like. We were able to flesh it out into this complete thing. And when Chris described Alexis to me as the world leader on value types, I was like, wait, I thought you created them. I thought, okay, I guess we can listen to Alexis's code for this. I will say I'm terrified about presenting those slides because Alexis is sitting right there. And if you start scowling, then... We have a handheld mic. Come and correct us anytime. So there's a thing here called OC Data Block Generic where you'll find that what we've actually done is we've got the entire data blocks API in this really interesting Swifty-style. And what you'll see is that when we compare it to the Python version, this is on every axis very significantly better. So let's talk about some of the problems with the data block API in Python. I love the data block API, but lots of you have rightly complained that we have to run all the dot-blah, dot-blah, dot-blah in a particular order. If we get the wrong order, we get inscrutable errors. We have to make sure that we have certain steps in there. If we miss a step, we get inscrutable errors. It's difficult to deal with at that level. And then the code inside the data blocks API, I hate changing it now because I find it too confusing to remember like why it all fits together. But check out this Swift code that does the same thing, right? So is that download? This is just the same get files we've seen before. All we need to do is we say, you know what, if you're going to create some new data bunch, you need some way to get the data. You need some way, and let's assume that they're just paths for now, you need some way to turn all of those paths into items. So something like our item list in Python. You need some way to split that between training and validation. And you need some way to label the items. So for example, for ImageNet, download calls that. And things that convert paths to images, which grab all of the list of paths is that collect files. And then the thing that converts that decides whether they're training or validation is whether the parent dot parent is trained or not. And the thing that creates the label is the parent. And so like we can basically just define this one neat little package of information and we're done. And Swift will actually tell us if we forgot something. Or if one of the things that we provided is training is meant to return true or false, if it's training or validation, if we accidentally return the words train instead, it'll complain and let us know. So I just love this so much. But to understand what's going on here, we need to learn a bit more about how Swift works and this idea of protocols. Yeah. So this is something that is actually useful if you are doing deep learning stuff. So let's talk about... Sorry. Go for it. So let's talk about what protocols are in Swift. So we've seen structs. We will see classes later. But right now we want to talk about what protocols are. And if you've worked in other languages, you may be familiar with things like interfaces in Java, abstract classes that are often used, advanced other weird languages have things called type classes. And so all these things are related to protocols in Swift. And what protocols do is they're all about splitting the interface of a type from the implementation. And so we'll talk about layer later. Layer is a protocol. And it says that to use a layer, rather to define a layer, you have this call. So layers are callable, just like in PyTorch. And so then you can define a dense layer and say how to call a dense layer. You can define a conf2d and show how to implement a conf2d layer. And so there's a contract here between what a type is supposed to have. All layers must be callable. And then the implementations, these are different. So this is pretty straightforward stuff. Even that's quite nice. Like in PyTorch, you kind of have to know that there's something called forward. And if you misspell it or forget it, put it there or put done to call instead of forward, you get kind of weird and screwable errors, whereas with this approach... You get tab completion for the signatures. Yeah, and since Swift will tell you if you get it wrong, it'll say this is what the function should have been called. Yeah, that's great. And so what this is really doing is this is defining behaviors for groups of types. And so we're saying a layer, layer is like the commonality between a whole group of types that behave like a layer. And what does that behavior mean? Well, it's a list of what are called requirements. And so these are all the methods that the type has to have. This is the signatures of the types. And these things often have invariance. And so one of the things that Swift has in its library is the notion of equatable. What is equatable? Well, an equatable is any type that has this equals-equals operator. And then it says what is equatability and all that kind of stuff. Now, the cool thing about this is that you can build up towers of types. And so you can say, well, equatable gets refined by an additive arithmetic. And an additive arithmetic is something that supports addition and subtraction. Then if you also have multiplication, you can be numeric. And if you also have negation, then it can be signed. And then you can have integers and you can have floating point. And now you can have all these different things that exist in the ecosystem of your program and you can describe them. This is all very... These things, these ways to reason about these groups of types give you the ability to get these abstractions that we all want. And these types already exist in Swift. These all exist and you can go see them in the Sand Library. And so why do you want this? Well, the cool thing about this is that now you can define behavior that applies to all of the members of that class. And so what we're doing here is we're saying not equal. Well, all things that are equatable, and this t colon equatable says, I work with any type that is equatable. What this is doing is this is defining a function, not equal, on any type that's equatable. And it takes two of these things and returns a bool. And we can implement not equal by just calling equals equals, which all equatable things are, and then inverting it. So to be clear, what Chris just did here was he wrote one function that is now going to add behavior to every single thing that defines equals automatically, which is pretty magic. Just like everywhere, boom, one place. Super, super abstract. But this also works for lots of other things. This works for absolute value. What does absolute value mean? Well, it needs any type that is signed and numeric and that's comparable. And how do you implement absolute value? Well, you compare the thing against zero, if it's less than zero, you negate it, otherwise you return it. Simple, but now everything that is a number that can be compared is now absable. Types, same thing. All these things work the same way. And so with dictionary, what you want is you want the keys in a dictionary all have to be hashable. This is how the dictionary does its efficient lookups and things like that. The value can be anything though. And so all these things kind of stack together and fit together like building blocks. One of the really cool things about this now is that we can start taking this further. It's not equal building on top of equal equal. In the last lesson we defined this isOdd function. We defined it on int. Well, because this protocol exists, we can actually add it as a method to all things that are binary integers. And so we can say, hey, put this on all binary integers and give them all an isOdd method. And now I don't have to put isOdd on int and int16 and int32 and the C weird things. You could just do it in one place and now everybody gets this method on layers. This is something that's closer to home. Here we can say, hey, I want a inferring from method that does some learning phase switching magic nonsense. But now because I put this on all layers, well, I can use it on my model because your model is a layer, my dense layer, that's a layer. And so this thing allows this one simple idea of defining groups of types and then broadcasting behavior and functionality onto all of them at once is really powerful. Yeah. I mean, it's like Python's monkey patching, which we use all the time. But A, it's not this kind of hacky thing with weird undefined behavior sometime. And B, we don't have to monkey patch lots and lots of different places to add functionality to lots of things. We just put it in one place and the functionality gets sucked in by everything that can use it. Yeah. And remember, extensions are really cool because they work even if you didn't define the type. And so what you can literally do is you can pull in some C library. Not that we'd love C, but some C library and add things to its structs. I mean, this is... Or have things automatically added to those structs because it already supports those operations. Yeah. So like all this stuff composes together in a really nice way, which allows you to do very powerful and very simple and beautiful things. So mixins show up and you can control where they go. And so here's an example. This is something that Jeremy wrote. And so he defined his own protocol, countable. And he says things are countable and they have a variable named count. And the only thing I care about for countable things is that I can read it. I don't have to write it. That's what the get means. And so he says array is countable. His open cv mat thingy is countable. It's a number of pixels in it. Yeah, all these things are countable. And then Jeremy says, hey, we'll take any sequence. Let's add a method or a property called total count to anything that's a sequence. So a sequence is the same as Python's iterable. So anything you can iterate through. Exactly. So this is things like dictionaries and arrays and ranges and all these things are sequences. And it says, so long as the element is countable, so I have an array of countable things, then I can get a total count method or a property. And the way this is implemented is it just says, hey, map over myself, get the count, add all the counts up together, and then I have a total count of all the things in my array. And now if I have an array of arrays, an array of mats, a lazy mapped collection sequence thingy of mats, whatever it is, now I can just ask for its count or its total count. Chris, so this functionality you're describing is basically the same as what Haskell calls type classes. Yes. Is that right? Yeah. So is this kind of like stolen from Haskell? Spurred. I mean, so the interesting thing for me here is that... We let them play with it too. Well, the reason I ask is because I've tried to use Haskell before many times and have always failed. I'm clearly not somebody who's smart enough to use Haskell. Yet I wrote the code that's on the screen right now, like a couple of days ago, and I didn't spend a moment even thinking about the fact I was writing the code. It was only the next day that I looked back at this code and I thought, like, wow, I just did something which no other language I've used both could do and I was smart enough to do it. Like, it kind of makes this what I think of as like super genius functionality available to normal people. Yeah. And so back at the very, very beginning of this, we talked about Swift's goals to be able to take good ideas wherever they are and assemble them in a tasteful way and then be not weird. Like, being not weird is a pretty hard but important goal. So the way I look at programming languages is that programming languages in the world have these gems in them. And Haskell has a lot of gems, incidentally. It's a really cool functional language. It's very academic. It's super powerful in lots of ways. But then it gets buried in weird syntax and it's purely functional. You have to be, you know, has a very opinionated worldview of how you're supposed to write code. And so it appeals to a group of people, which is great. But then it gets ignored by the masses. And to me, it's really sad that the great technology in programming languages that's been invented for decades and decades and decades gets forgotten just because it happened to be put in the wrong place. Yeah. It's not just that. But see, from the whole way things are described are all about monoids and monads and whatever. Existentials and things like that. Yeah, exactly. And so a lot of what Swift is trying to do is just trying to take those ideas, re-explain them in a way that actually makes sense, stack them up in a very nice, consistent way and design it. And so a lot of this was pull these things together and really polish and really push and make sure that the core is really solid. Okay, we have a question. How does the Swift protocol approach avoid the inheritance tree hell problem in languages like C-sharp where you end up with enormous trees that are impossible to refactor? Yes. And similarly, what are the top opinions around using the mix-in pattern, which has been found to be an anti-pattern in other contexts? Yeah, so the way that Swift influences is completely different than the way that subclasses work in C-sharp or Java or other object-oriented languages. There what you get is something called a v-table. And so your type has to have one set of mappings for all these different methods and then you get very deep inheritance hierarchies. In Swift, you end up adding methods to int. Like so we, on the last slide, added a method is odd to all the integers. And integers don't have a v-table. That would be a very inefficient thing to do. And so the implementation is completely different. The tree-offs are completely different. I will, at the end of this, I think in a couple of slides have a good pointer that will give you a very nice deep dive on all that kind of stuff. So also there's the binary method problem and there's a whole bunch of other things that are very cleanly solved in Swift protocols. Okay, and then there was also a question, out of curiosity, could you give an estimate of how long it would take someone to go from a fair level of knowledge in Python, TensorFlow deep learning to start being able to be a competent contributor to Swift for TensorFlow? Yeah, so we've designed Swift in general to be really easy to learn, and so that you can learn as you go. And this course is a little bit very, it's very bottoms up, but a lot of Swift, just like Python, was designed to be taught. And what you start with when you, when you go in from that perspective is you get a very top-down kind of perspective. And what I would do is I would start with the Google for a Swift tour and you get a very nice top-down view of the language and it's very approachable. And like, just pick something that, like, is in fast.ai, in some fast.ai notebook now. We haven't implemented it yet. And pop it into a notebook, right? And the first time you try to do that, you'll get all kinds of weird errors and abstractions and you won't know what's going on, but after a few weeks... That's on the forum, and that's what the community is about. Yeah, lots of help from the forum, and Chris and I are both on the forum and there's the STF teams on the forum. We'll help you out and in a few weeks time you'll be writing stuff from scratch and finding out a much smoother process. Yeah. I want to address one weird thing here and give you something to think about and you might wonder, okay, well, Jeremy wants to know all the countable things. We have arrays and we have matte and we have to say that they are countable. But the compiler knows that it's countable or not. Like, if you try to make something countable and it doesn't have a count method, the compiler will complain to you. So why do we have to do this? Well, let's talk about a different example and the answer is that protocols and methods, and this is also related to the C-sharp question, but the behavior of those methods also matters. And so here we're going to switch domains and talk about shapes. All shapes have to have a draw method. This is super easy. And what I can do is I can define an octagon and tell it how to draw. I can define a diamond, tell it how to draw. Using exactly the same stuff that we just saw before. Really easy. And the cool thing about this is now I can define a method, refresh, and now refresh, and draw a canvas, and draw as a shape. And so all shapes will get a refresh method. So if you go do a tab completion on your octagon, it all just works. But what happens if you have something else with the draw method? So cowboys know how to draw. It's a very different notion of what drawing is. We don't want cowboys to have a refresh method. It doesn't make sense to clear the screen and then pull out a gun. That's not what we're talking about here. And so the idea of protocols is really, again, to categorize and describe groups of types. And one of the things you'll see, which is kind of cool, is you can define a protocol with nothing in it. So it's a protocol that doesn't require anything. And then you go say, I want that type, that type, that type, that type to be in this group. And now I will have a way to describe that group of types. It can be totally random, whatever makes sense for you. And then you can do reflection on it. You can do lots of different things that now apply to exactly that group of types. I actually found, I still find that this kind of protocol-based programming approach is like the exact upside-down opposite of how I've always thought about things. It's kind of like, you don't create something that contains these things, but you kind of like, I don't know, somehow shove things in. And the more I've looked at code that works this way, the more I realize it tends to be clearer and more concise. But I still find it a struggle to have that sense of this is how to go about creating these kinds of APIs. And one of the things you'll notice is that we added this protocol to a ray in an extension. So unlike interfaces in a Java or C-sharp type of language, we can take somebody else's type and then make it work with the protocol after the fact. And so I think that's a superpower here that allows you to work with these values in different ways. So this is a super-brief, high-level view of these protocols. Protocols are really cool in Swift, and they've drawn a lot of great work in the Haskell and other communities. There's a bunch of talks, and even Jeremy wrote a blog post that's really cool that talks about some of the fun things you can do. One extensions make code hard to read because once a functionality of a particular API or class is extended in this way, you won't know if the functionality is coming from the original class or from somewhere else. Yeah, so that's something you let go of when you write Swift code. There's a couple of reasons for that. One of which is that you get good ID support. And so, again, we're kind of building some parts of this airplane as we fly it, but in Xcode, for example, you can click on a method and jump to the definition. Right, and so you can say, well, okay, here's a map on array. Where does map come from? Well, map isn't defined on array. Map, filter, reduce. Those aren't defined on array. Those are actually defined on sequence. And so all sequences have map, filter, reduce, and a whole bunch of other stuff. And so arrays are, of course, sequences, and so they get all that behavior. And so the fact that it's coming out of sequence as a Swift programmer, particularly when you're starting, doesn't really matter. It's just good functionality. We've had this same discussion around Python, which is like, oh, Jeremy import star, and therefore I don't know where things come from, because any way I used to know where things come from is because I looked at the top of a file and it would say from blah import foo, and so I know foo comes from blah. And we had that whole discussion earlier lesson that's not how you figure out where things come from. You learn to use jump to symbol in your IDE, or you learn to use Jupyter Notebox ability to show you where things come from. That's just the way to go. Thank you. I feel that Scala is often a very nicely designed language that my knowledge doesn't lack in terms of the features I've seen so far in Swift. Is that true? And if so, is the choice of Swift more about JVM as opposed to non-JVM run times and compilers? Yeah, so Scala is a great language. Scala is one of the... The way I explain Scala is that they are very generous in the features they accept. They're undergoing a big redesign of the language to kind of cut it down and try to make the features more sensible and stack up nicely together. Swift and Scala have a lot of similarities in some places and they diverge wildly in other places. I mean, I would say there's a... I feel like anybody doing this course understands the value of tasteful curation because PyTorch is very tastefully curated and TensorFlow might not be. And so like using a tastefully curated, carefully put together API like Swift has and like PyTorch has, I think it makes life easier for us as data scientists and programmers. Yeah, but I think the other point is also very good. So Scala is very strong in the JVM Java virtual machine ecosystem and it works very well with all the Java APIs and it's great in that space. Swift is really great if you don't want to wait for JVM to start up so you can run a script. And so there's nice duels and they have different strengths and weaknesses in that sense. So do we have time before our break that I can quickly show how this all goes together? I probably can't stop you even if I wanted to. So just to come back to this HC, you can basically see what's happened here. We have to find this protocol saying these are the things that we want to have in a data box API. And then we said here is a specific example of a data box API. Now at this point, we are missing one important thing which we've never actually created a bit that says this is how you open an image and resize it and stuff like that, right? So we just go through it and we can say let's call dot download, let's call dot get items. We can create nice simple little functions now. We don't have to create complex class hierarchies to say things like tell me about some sample and it prints it out, right? And we can create a single little function which creates a train and a valid. This is neat, right? This is something I really like about this style of programming is this is a named tuple. And I really like this idea that we don't have to create our unstructured class all the time. It's kind of a very functional style of programming where you just say, I'm just going to define my type as soon as I need it. And this type is defined as being a thing with a train and a thing with a valid. So as soon as I work brackets, parentheses around this, it's both a type and a thing now. And so now I can partition into train and valid and that's returned something where I can grab a random element from valid and a random element from train. We can create a processor. Again, it's just a protocol, right? So remember, a processor is a thing for categories, creating a vocab of all of the possible categories. And so a processor is something where there's some way to say, like, what is the vocab? And if you have a vocab, then process things from text into numbers or deprocess things from numbers into text. And so we can now go ahead and create a category processor, right? So here's, like, grab all the unique values and here's label to int and here's int to label. Why are you using parentheses on your map, Jeremy? I didn't write that. There we go. Yeah. So now that we have a category processor, we can try using it to make sure that it looks sensible. And we can now label and process our data. So we first have to call label and then we have to call process. Now, given that we have to do those things in a row, rather than creating whole new API functions, we can actually just use function composition. Now in PyTorch, we've often used a thing called compose. That actually, it turns out to be much easier, as you'll see. If you don't create a function called compose, but you actually create an operator. And so here's an operator, which we will call compose, which is just defined as first call this function, f, and then call this function, g, on whatever the first thing you passed it is. So now we've defined a new function composition, which first labels, and then processes. And so now, here's something which does both. And so we can map. So we don't have to create, again, all these classes and special purpose functions. We're just putting together function composition and map to label all of our training data and all of our validation data. And so then finally, we can say, well, this is the final structure we want. We want a training set. We want a validation set. And let's, again, create our own little type in line, right? So that's an array of tuples. Yeah, so our training set's an array of named tuples. A validation set is an array of named tuples. And so we're going to initialize it by passing both in. And so this basically is now our data blocks API. There's a function called make, split, labeled, data. And we're just going to pass in one of those configuration protocols we saw. So we're going to be passing in the ImageNet configuration protocol, the thing that conforms to that protocol. And we're going to be passing in some processor, right? Which is going to be a category processor. And it's going to call download, get the items, partition, map, label of. And then initialize the processor state. And then do label of and then process is our processing function. And then map that, right? And so that's it. So now we can say to use this with OpenCV, we define how to open an image. There it is. We define how to convert BGR to RGB, because OpenCV uses BGR. That's how old it is. We define the thing that resizes to 224 by 224 with bilinear interpolation. And so the process of opening an image is to open, then BGR to RGB, and then resize. And we compose them all together. And that's it, right? So now, now that we've got that, we then need to convert it to a tensor. So the entire process is to go through all those transforms and then convert to a tensor. And then, I'll skip over the bit that does the mini-batches. There's a thing we've got to do the mini-batches with that split label data we created. And we then just pass in the transforms that we want. And we're done, right? So the data blocks API in kind of functional-ish, protocol-ish, Swift, you know, ends up being a lot less code to write and a lot easier for the end user. Because now for the end user, there's a lot less they have to learn to use this data blocks API. It's really just like the normal kind of maps and function composition that hopefully they're familiar with as Swift programmers. So I'm really excited to see how this came out because it solves the problems that I've been battling with for the last year with the Python data blocks API, and it's been, you know, really just a couple of days of work to get to this point. And one of the things that this points to in Swift that is a big focus is on building APIs. And so, again, we've been talking about this idea of being able to take an API, use it without knowing how it works. It could be in C or Python or whatever. But it's about building these things that compose together and they fit together in very nice ways. And with Swift, you get these clean abstractions. So once you pass in the right things, it works. You don't get the stack trace coming out of the middle of somebody else's library that now you have to figure out what you did somewhere along the way that caused it to break. At least not nearly as often. So to see what this ends up looking like, I've created a package called data block. It contains two files in. It's got a package.swift and it's got a main.swift. And main.swift is that, right? So all that in the end to actually use it, that's how much code it is to use your data blocks API and grab all the batches. It comes out super pretty. So let's take a five-minute break and see you back here at 805. Okay, so we're gradually working our way back to what we briefly saw last week, Notebook 11, training ImageNet and we're gradually making our way back up to hit that point again. It's a bit of a slow process because along the way we've had to kind of invent float and learn about a new language and stuff like that. But we are actually finally up to zero to a fully connected model, believe it or not. And the nice thing is at this point things are going to start looking more and more familiar. One thing I will say though that can look quite unfamiliar is the amount of typing that you have to type with Swift. But there's actually a trick which is you don't have to type all these types. You don't have to type types. What you can actually do is you can say like, oh, you know, here's a type I use all the time, tensor, bracket, float. And I don't like writing angle brackets either. So let's just create a type alias called tf. And now I just use tf everywhere. Now to be clear, a lot of real Swift programmers in their production code might not like doing that a lot. I mean, personally, I do do that a lot, even not in notebooks. But you might want to be careful if you're doing actual Swift programming. The way I would look at it is if you're building something for somebody else to use, if you're publishing an API, you probably don't want to do that. But if you're hacking things together and you're playing and having fun, it's no problem at all. Yeah. I mean, different strokes. Personally, I would say if I'm giving somebody something that's the whole thing's tensor floats, I would do it. But anyway, in a notebook, I definitely don't want to be typing that. So in a notebook, make it easier for your interactive programming by knowing about things like type alias. Yeah. That's something we also want to make better just in general so that these things all just default to float. Yeah. You don't have to worry about it. That would be nice. So then we can write a normalized function that looks exactly the same as our Python normalized function. And we can use mean and standard deviation just like in Python. And we can define tests with asserts just like in Python. So this all looks identical. We can calculate n and m and c the same constant, the variables that we used in Python in exactly the same way as Python. We can create our weights and biases just like in Python, except there's a nice kind of rule of thumb in the Swift world, which is anytime you have a function that's going to create some new thing for you, we always use the init constructor for that. So for example, generating random numbers and dumping them into a tensor, that's constructing a new tensor for you. So you're actually calling tensor float dot init here. And so if you're trying to find where is it in an API that I get to create something in this way, you should generally look for in the init section. So this is how you create random numbers in Swift for TensorFlow. This is how you create tensor of zeros in Swift for TensorFlow. So here's our weights and biases. This is all the same stuff. We just basically copied and pasted it from the PyTorch version with some very, very minor changes. Create our linear function, except rather than at, we use dot, because that's what they use in Swift for TensorFlow. If you're on a Mac, that's option eight. If you're on anything else, it's compose key dot equals. And so now we can go ahead and calculate linear functions. We can calculate value exactly the same as PyTorch. We can do proper climbing init, exactly like PyTorch. And so now we're at the point where we can define the forward pass of a model. And this looks, basically, again, identical to PyTorch. A model can just be something that returns some value. So the forward pass of a model really just builds on stuff that we already know about, and it looks almost identical to PyTorch, as does a loss function. It looks a little bit different, because it's not called squeeze, it's called squeezing shape. It doesn't have that mean squared error. It's the same as PyTorch as well. And so now here's our entire forward pass. So hopefully that all looks very familiar. If it doesn't, go back to zero to in the Python notebooks. And actually, this is one of the tricks, like this is why we've done it this way for you all, is that we have, like, literally kind of these parallel texts. You know, there's a Python version, there's a Swift version, so you can see how they translate and see exactly how you can go from one language and one framework to another. That's all very well, but we also need to do a backward pass. So to do a backward pass, we can do it exactly the same way as, again, we did it in PyTorch. One trick, we kind of Python hack we use in PyTorch. And so this is doing it the hard way. This is doing it all manually, because we have to build it. Doing it all manually, yep, because we have to build everything in Scratch. And the PyTorch version, we actually added a .grad attribute to all of our tensors. You're not allowed to just throw attributes in arbitrary places in Swift, so we have to define a class which has the actual value and the gradient. But once we've done that, the rest of this looks exactly the same as the PyTorch version did. Here's our .mse.grad, our .relu.grad. That's all exactly the same. In fact, you can compare here, right? Here's the Python version we created for ling.grad. Here's the Swift version for ling.grad. It's almost identical, okay? So now that we've done all that, we can go ahead and do our entire forward and backward pass, and we're good to go. But it could be so much better. Well, you skipped past the big flashing red lights that says, don't do this. Did you miss that part? Tell me about it. Oh, okay. So let's talk about this. So this is defining a class and putting things in classes. We haven't seen classes yet, at least not very much. That's true, because before we've used things that looked like classes, but they didn't say class on them, they said struct on them. Yes. And so what is that? And so let's play a little game. And so let's talk about this idea of values and references, because that's what struct versus class really means in Swift. A value is a struct thing, and a reference is a class thing. So let's talk about Python. Here's some really simple Python code, and there's no tricks here. What we're doing is we're assigning four into A, we're copying A into B, we're incrementing A and printing them out. And so when you do this, you see that A gets incremented. Of course, B does not, of course. This all makes perfect sense. In Swift, you do the same thing, you get the same thing out. This is how math works, right? All very straightforward. Let's talk about arrays. So here I have an array or a list in Python, and I put into X, and then I copy X into Y. I add something to X, and it has it. I have to point with this. And then it has the extra item. That makes perfect sense, right? What happens to Y? What? What just happened here? I just added something to X. And now Y changed? What is going on here? Well, we learn that there's this thing called a reference, and we learn that it does things like this, and we learn when it bites us. What happens in Swift? Well, Swift has arrays. It doesn't have lists the same way. And so here we have, again, it's identical code except var. We put in one and a two into X. We copy X into Y. We add something to X. We print it out. We get the extra element. But Y is correct. What just happened? So this is something called value and reference semantics. And in Swift, arrays, dictionaries, tensors, like all these things have what's known as value semantics. And let's dive in a little bit about what that is. So a value in something that has value semantics is a variable that, sorry, this is self-referential. When you declare something in your code, you're declaring a name. And if it's a name for a value, that name stands for that value, right? X stands for the array of elements that it contains. This is how math always works. This is what you expect out of basic integers. This is what you expect out of basic things that you interact with on a daily basis. Reference semantics are weird if you think about it. So what we're doing is we're saying that X is a name for something else. And so we usually don't think about this until it comes around to bite us. And so this is kind of a problem. And let's dive in a little bit to understand why this causes problems. So here's a function. It's do thing. It's something that Jeremy wrote with a very descriptive name. And it takes t, and then it goes and updates this, and that's fine, right? It's super fast. Everything is good. You move on and put in a workbook, and then you build the next workbook. Next workbook calls in a do thing, and you find out, oh, well, it changed the tensor I passed in, but I was using that tensor for something else. And now I've got a problem because it's changing a tensor that I wanted to use. And now I've got this bug. I have to debug it, and I find out the do thing is causing the problem. And so what do I do? I go put a clone in there. I don't know who here adds clones in a principled way, or who here... I do this in a principled way. So what we do in Fast.ai is we kind of don't have clone, and then when things start breaking, I add more until things stop breaking, and then we're done. That sounds great. Yeah, so there's a lot of clone in Fast.ai, and yeah. That's a good principle. Possibly there's the right number, or possibly a few too many, or possibly a few too few. Well, so now think about this. What we have is we have a foot gun here in the first case. So something that's ready to explode if I use it wrong. Now I added clone, and so good news, it's correct but slow. So it's going to do that copy even if I don't need to, which is really sad. And Swift, things just work. You pass in a tensor, you can update it, you can return it, and it leaves the original one alone. Arguments in Swift actually even default to constants, which makes it so that you can't do that. If you do actually want to modify something in the caller, you can do that too. You just have to be a little bit more explicit about it and use this thing called in out. And so now if you want to update the thing somebody passed to you, that's fine, just pass it in out, and everything works fine. And on the call side you pass it with this ampersand thing so that they know that it can change. Now what is going on here? Right, so this is good math. This is like the correct behavior, but how does this work? Well, we're talking about names, we're talking about values. And so here I have a struct. This is a value-y thing. And so I say it has two fields, a real and imaginary, and I define an instance of my complex number here named x. And so this is saying I have x, and it's a name for the value that has 1 and 2 in it. And so if I introduce y, y is another notational instance of this struct. And so it also has a 1 and a 2. And if I go and I copy it, then I get another copy, and if I change one, then I update just y's. This is, again, the way things should work. And so this works with structs, this works with tuples, this works with arrays and dictionaries and all that kind of stuff. How do references work? Well, it references the name here. Here I have a class, and the class has a string, and it has an integer. And so somewhere in memory there is a string, and there is an array, and they're stuck together, just like with a struct. So this x is actually a reference or a pointer or an indirection to use that. The reason for that is because you wrote class instead of struct. So by writing class, you're saying when you create one of these things, please create a reference, not a value. Yes, that's exactly right. And now what happens with references is you now get copies of that reference. And so when I copy x into y, just like in PyTorch or Python, I have another reference or another pointer to the same data. And so that's why when you go and you update it, so I'm going to go change the array through y, it's also going to change the data that you see through x. And so in Swift, you have a choice. And so you can declare things as classes, and classes are good for certain things, and they're important and valuable, and you can subclass them, and classes are nice in various ways. But you have a choice, and a lot of things that you've seen are not defined as structs because they have much more predictable behavior and they stack up more correctly. So in this case, you know, I was trying to literally duplicate a Python slash PyTorch API, and so I just found I wasn't able to unless I used class. Yes. But then you kind of said, okay, well, that's how you do it, but... Yeah, and we'll get back to auto-defit second. But don't do it that way. Yeah, and so you can absolutely do that. And again, when you're learning Swift, it's fine. Just reach for the things that are familiar, and then you can go. That's perfectly acceptable. But here we're trying to talk about things Swift is doing to help save you and make your code more efficient, and things like that. And I still reach for class a lot, but then every time a real Swift programmer takes my thing that had class and replaces it with something more Swift-y, it ends up being shorter and easier to understand, and so I agree, like, go for it, get things working with class. But when it becomes time, you have to work with this and look at it and figure out how it works. Now, there's one thing that's really weird here. And if you remember last time, the first thing I told you about was var and let, right? And what is going on here? This does not make any sense. We've got y, and now we are updating, if this thing will go away, we are updating a thing in y, even though y is a constant. And what does that even mean? Well, the reason here is that the thing that is constant, the thing that is constant is this reference. And so we've made a new copy of the reference, but we're allowed to copy the thing it points to because we're not changing x or y itself, right? So this doesn't make sense. None of this makes sense. But how does var work? Well, this is a thing that comes back to the mutation model in Swift, and I'll go through this pretty quickly. This is not something you have to know. But let's say I have a complex number and it's destruct, and I say, hey, this thing is a constant. I want to go change it, right? That's not supposed to work. What happens? Well, if you try to do that, Swift will tell you, haha, you can't do that. You can't use plus equals on a real that's in a c because c1 is a let. And Swift is helpful, and so it tries to lead you to solving the problem. It says, hey, by the way, if you want to fix this, you want to make it go away, just change let to var and then everything is good. Now, okay, fine. Well, maybe I really do want to change it. And so what I'm going to do is I'm going to get a little bit trickier and I'm going to find this extension. I'm going to add a method increment to my complex number. I'm going to increment it inside the method and then call the method. Can I get away with that? Well, these things may be in different files. The compiler may only be able to see one or the other. And so if you run this, it has no idea whether increment's going to change that thing. And so what the compiler does is says, ah, well, you can't implement increment real inside of this increment method either because it says self is immutable. And it says mark method mutating to make self mutable. Now, the thing to think about in methods, both in Python but also in Swift, is that they have a self. In Python, you have to declare it. Swift has it too. It's just not making you write it all the time because that would be annoying. And so when you declare a method on a struct, what you do is you're getting self and it's a copy of the struct. Now, what this is saying is this is saying that, hey, you're actually changing self.real. Self is constant and so you can't do that here. But what you can do is you can mark it mutating. And so what that looks like is that says, we can mark this function as mutating. And what that does is it says, our self is now one of these in-out things, the in-out thing that allows us to change it in the caller. And because it's now mutating, it's totally fine to change it. That's no big deal. And the compiler leads you to this and shows you what to do. But now we come back to this problem over here. We say, well, we have a constant. We're calling increment. How does that work? Well, it still doesn't. The compiler will tell you, hey, you can't do that. You can't mutate C1. Now it knows the increment can change it. And so it says, really, really, really, if you want to do this, go mark C1 as a var. And Jeremy would say, just mark everything as a var. Pretty much. That's how it is. And so the nice thing about this, though, is it all stacks up nicely and it all works. And this is what allows, this is kind of the underlying mechanics that allow the value stuff to work. Now you may be wondering, how is this efficient? So we were talking about in the PyTorch world, you end up copying all the time, you don't end up needing it. In Swift, we don't want to do all those copies. And so on the other hand, we don't want to be always copying. So where do the copies go and how does that work? So if you're using arrays, or arrays of arrays of arrays of dictionaries of arrays, like super nested things, what ends up happening is arrays are struct. You might be surprised. And inside of that struct, it has a pointer or a reference. And so the elements of an array are actually implemented with the class. And so what I have here is I have a1, which is some array, and I copied it to a2 and I copied it to a3, I copied it to a4 because I'm passing it all around. I'm just passing this array around, no big deal. And what happens is I'm just copying this reference, and it happens to be buried inside of a struct. And so this passing around arrays, full value semantics, super cheap, no problem. It's not copying any data, it's just passing the pointer around, right? Just like you do in C or even in Python. The magic happens when you go and you say, okay, well, I've now got a4. And so all these things are all sharing this thing. I'm going to add one element to a4. Well, what happens? Well, first thing that happens is append is a mutating method. And so it says, hey, I'm this thing called a copy on right type. And so I want to check to see if I'm the only user of this data. And it turns out, no, lots of other things are pointing to our data here. And so lazily, because it's shared, I'll make a copy of this array. And so I only get a copy of the data if it's shared and if it changes. So that should be 1, 2, 3, 92? Yeah, that should be 1, 2, 3, 92. I am bugger than Swift. Now, the interesting thing about this is because of the way this all works out is if you go and you change a4 again, it goes and just updates it in place. There's no extra copy. And so the cool thing about this is that you get exactly the right number of copies, and it just works. You as a programmer don't have to think about this. This is one of the things that Swift is just subtracting from your consciousness of the things that you have to worry about, which is really nice. And so a really nice aspect of this is that you get algebra, like values work the way values are supposed to work. You get super high performance. We get to use more emojis, which I always appreciate if you want to learn more about this, because this is also a really cool, deep topic that you can geek out about, particularly if you've done object-oriented programming before. There's a lot that's really nice about this. There's a video you can see more. So let's go back to that Autodiff thing. Let's actually talk about Autodiff from a different perspective. So this is the Autodiff system implemented the same way as the manually done PyTorch version, and we didn't like it because it was using references. Let's implement it again the very low-level manual way in Swift. But before we do, let's talk about where we want to get to. So Swift has built-in and Swift for TensorFlow has built-in automatic differentiation for your code. So you don't have to write gradients manually. You don't have to worry about all this stuff. And the way it works is really simple. There are functions like gradient, and you call gradient, and you pass it a closure, and you say, what is the gradient of x times x? And it gives you a new function calling that function on a bunch of numbers that we're striding over and printing them out, and it just gives you this gradient of this random little function we wrote. Now, one of the interesting things about this is I wrote this out. It takes just doubles or floats or other things like that. Autodiff in Swift works on any differentiable type, anything that's continuous, anything that's not like integers, anything that has a gradient. So you can't do this in just a library. This has to be built into the language itself because you're actually you're just... you're literally compiling something that's multiplying doubles together, and it has to figure out how to get gradients out of that. You can do things as a library, and that's what PyTorch and other frameworks do in Python, but it doesn't work the same way at all. PyTorch will not do that on doubles. Oh, yes. Yes. And so this doesn't just work on doubles. If you want to define quaternions that are cool numeric, scientific-y things that are continuous, those are differentiable too, and that all stacks out and works. So there's a bunch of cool stuff that works this way. You can define a function. You can get the gradient at some point with the function. You can pass enclosures. All this stuff is really nice. Instead of talking about that, we're going to do the from the bottom up thing. And so I'm going to pretend I understand calculus for a minute, which is sad. So if you think about what differentiation is, computing the derivative of a function, there's two basic things you have to do. You have to know the axioms of the universe, like what is the derivative of plus, or multiply, or sine, or cosine, or tensor, or math mole. Then you have to compose these things together. And the way you compose them together is this thing called the chain rule. And this is something that I relearned, sadly, over the last couple of weeks. And that we did in the Python part of this course. Yes. And we wrote it a different way. We had to y dx equals to y du du dx. Apparently there's some ancient feud between the people who invented calculus independently, and they could not agree on notation. So what this is saying is this is saying, if you want the derivative of f calling g, the derivative of f calling g is the derivative of f applied to the forward version of g multiplied by the derivative of g. And this is important because this is actually computing the forward version of g in order to get the derivative of this. Which we kind of hit away in our dy du du dx version. Do you want me to do the following? Oh, sure. I don't know how to do it on your machine. There you go. So how are we going to do this? What we're going to do is we're going to look at defining the forward function of this. And so we'll use the mean squared error as the example function. This is a little bit more complicated than I want. And so what I'm going to do is I'm going to actually just look at this piece here. And so I'm going to find this function m s e inner. And all it is is it's the dot squared dot mean. So it's conceptually this thing. m s e inner that just gets the square of x and then does the mean just because that's simpler. And then we'll come back to m s e at the end. And so in order to understand what's going on, I'm going to find this little helper function called trace. And all trace does is it you can put it in your function and it uses this little magic thingy called pound function. And when you call trace, it just prints out the function that it's called from. And so here we call foo and it prints out hey, I'm in foo a b and I'm in bar x. And so we'll use that to understand what's happening in these cells. So here I can define just like you did in the PyTorch version, the forward and the derivative versions of these things. And so x times x is the forward. The gradient version is two times x. x dot mean is the forward. This weird thing of doing a divide is apparently the gradient of mean. And I checked it, it apparently works. I don't know why. So then when you define the forward function of this m s e inner function, it's just saying give me the square and take the mean. Super simple. And then we can use the chain rule and this is literally where we use the chain rule to say okay, we want the gradient of one function on another function just like the syntax shows. And the way we do that is we get the gradient of mean and pass it to the inner thing and multiply it by the gradient of the other thing. And so this is really literally the math interpretation of this stuff. And given that we have this, we can now wrap it up into more functions and we can say let's compute the forward and the backwards version of m s e. We just call the forward version, we call the backward version. And then we can run on some example data one, two, three, four. Just to be clear, the upside-down triangle thing is not an operator here, it's just using inner code as part of the name of that function. That's the gradient delta symbol thingy. I found that on Wikipedia. So when you run this, what you'll see is it computes the forward version of this thing. It runs square and then it runs mean. And then it runs square again and then it runs the backward version of mean and square. And this makes sense given the chain rule, right? You have to recompute the forward version of square to do this. And for this simple example, that's fine. Square is just one multiply. But consider it might be a multiply of megabytes worth of stuff. It's not necessarily cheap. And when you start composing these things together, this recomputation can really come back to bite you. So let's look at what we can do to factor that out. So there's this pattern called chainers and what we call the value and chainer pattern. And what we want to do is we want to find each of these functions, like square or mean or your model, as one function that returns two things. And so what we're going to do is we're going to look at the other version of calculus's form of this. And so when you say that the derivative of x squared is 2x, you actually have to move the dx over with it. And this matters because the functions we just defined are actually only, those are only valid if you're looking at a given point. They're not valid if you compose it with another function. It's just another way of writing the chain rule. It's the exact same thing. And so we're going to call this the gradient chain. And all it is is an extra multiply. And Chris, I just need to warn you, in one of the earlier courses, I got my upside down triangles mixed up as you just did. So the other way around is delta. And this one is called nabla. And I only know that because I got in trouble for screwing it up for the last time. Thank you, Jeremy, for saving me. So all this is the same thing we saw before. It just has an extra multiplication there because that's what the chain rule apparently really says. So what we can do now is, now that we have this, we can actually define this value with chain function and check this out. What this is doing is it's wrapping up both of these things into one thing. So here we're returning the value when you call this. We're also returning this chain function. Can you just explain this tf arrow, tf, like what does that have to do with that? Yeah, sure. tf arrow, tf. So what this is doing is this is saying we're defining a function, square w, v, c. It takes x. It returns a tuple, right? We know what tuples are. These are fancy tuples like you were showing before where the two things are labeled. So there's a value member of the tuple and there's a chain label of the tuple. The value is just a tensor float. The chain is actually going to be a closure. And so this says it is a closure that takes a tensor of float and returns a tensor of float. So that's just the way of defining a type in Swift where the type is itself a function. A function, yep. And so what square v, w, c is going to be is going to be two things. It's the forward thing, the multiply x times x. And that's the backwards thing, the thing we showed just up above that does ddx times two times x. And the forward thing is the actual value of the forward thing. The backward thing is a function that will calculate the backward thing. Yep. And the chain here is returning a closure. And so it's not actually doing that computation. So we can do the same thing with mean and there's the same computation. And so now what this is doing is it's a little abstraction that allows us to pull together the forward function and the backward function into one little unit. And the reason why this is interesting is we can start composing these things. And so this MSE inner thing that we were talking about before, which is mean followed by square or square followed by mean. We can define, we just call square v, w, c and then we pass the value that it returns into the mean v, w, c. And then the result of calling this thing is mean.value. And the derivative is those two chains stuck together. And so if we run this, now we get this really interesting behavior where when we call it, we're only calling the forward functions once and the backward function once as well. And we also get the ability to separate this out. And so here what we're doing is we're calling the v, w, c for the whole computation, which gives us the two things. And here we're using the value. So we got the forward version of the value. And if that's all we want, that's cool. We can stop there. But we don't. We want the backward version too. And so here we call it, we call the chain function to get that derivative. And so that's what gives us both the ability to get the forward and the backward separate, which we need. But also it makes it so we're not getting the recomputation because we're reusing the same values within these closures. So given that we have these infinitesimally tiny little things, let's talk about applying this pattern. I'll go pretty quickly because the details aren't really important. So relu is just max with zero. And so we're using the same thing as relu grad from before. Here's the lin grad using the PyTorch style of doing this. And so all we're doing is we're pulling together the forward computation in the value thing here. And then we're doing this backward computation here. And we're doing this with closures. So can I just talk about this difference because it's really interesting because this is the version that Silva and I wrote when we just pushed it over from PyTorch. And we actually did the same thing that Chris just did, which is we avoided calculating the forward pass twice. And the way we did it was to cache away in in dot grad and out dot grad and dot in the intermediate values so that we could then use them again later without recalculating them. Now what Chris is showing you here is doing the exact same thing but in a much more automated way, right? It's a very mechanical process. Yeah, so rather than having to kind of use this kind of very heuristic, hacky, one-at-a-time approach of saying, what do I need at each point? Let's save it away in something or give it a name and then we'll use it again later. It's kind of interesting and also without any mutation, this functional approach is basically saying let's package of everything we need and hand it over to everything that needs it. And so that way we never had to say, what are we going to need for later? It just works. So you'll see all the steps are here, out times blah dot transposed, out times blah dot transposed. But we never had to think about what to cache away. And so this is not something I would want to write ever again manually, personally. But the advantage of this is it's really mechanical and it's very structured. And so when you write the full MSc, what we can do is we can say, well, it's that subtraction, then it's that dot mean dot squared. And then on the backwards pass, we have to undo the squeeze and the subtraction thingy. And so it's very mechanical how it plugs together. Now we can write that forward and backward function. And it looks very similar to what the manual version, the PyTorch thing looked like, where you're calling these functions. And then in the backward version, you start out with one because the gradient of the loss with respect to itself is one, which now I understand. Thanks, Jeremy. And then they chain all together and you get the gradients. And through all of this work, again, what we've ended up with is we've gotten the forward and backwards pass. We get the gradients of the thing. And now we can do optimizers and apply the updates. Now the- I've got to mention, like what Chris was saying about this one thing here and so forth. For Chris and I, we took a really long time to get to this point and we found it extremely difficult. And at every point up until the point where it was done, we were totally sure we weren't smart enough to do it. Yes. And so like, please don't worry that there's a lot here and that you might be feeling the same way Chris and I did. But yeah, you'll get there, right? This was a harrowing journey. Yeah. It's okay if this seems tricky. But just go through each step one at a time. Yeah. So again, this is talking about the low level, low level mathy stuff that underlies calculus. And so the cool thing about this, though, from the Swift perspective is this is mechanical. And compilers are good at mechanical things. And so one of the things that we've talked about a lot in this course is the idea of their primitives. They're the atoms of the universe and then there are things you build out of them. And so the atoms of the universe for tensor, the atoms of the universe for float, we've seen, right? And so we've seen multiply and we've seen add on floats. Well, if you look at the primitives of the universe for tensor, they're just methods and they call the raw ops that we showed you last time, right? And so if you go look at the TensorFlow API, what you'll see is those atoms have this thing that Swift calls them VJPs for weird reasons. This defines exactly the mechanical thingy that we showed you. And so the atoms know what their derivatives are. And the compiler doesn't have to know about the atoms, but that means that if you want to, you can introduce new atoms and that's fine. The payoff of this now though is you don't have to deal with any of this stuff. So that's the upshot. What I can do is I can define a function. So here's msc inner and it just does dot squared dot mean and I say make it differentiable and I can actually get that weird thing, that chainer thing directly out of it and I can get direct low level access if for some reason I ever wanted to. Generally you don't and that's why you say give me the gradient or give me the value and the gradient. And so this stuff just works. And the cool thing about this is this all stacks up from very simple things and it composes in a very nice way. And if you want to, you can now go hack up the internals of the system and play around with the guts and it's exposed and open for you. But if you're like me at least, you would stay away from it and just write math. Well, I mean sometimes we do need it, right? You'll remember when we did the heat maps, right? To calculate those heat maps, we actually had to dive into the registering a backward callback in PyTorch and grab the gradients and then use those in our calculations. And so there's plenty of stuff we come across where you actually need to work with this. Yeah, and there are some really cool things you can do too. So now let's, we ended up with the model. And so this is something that I'd never got around to doing with FixMe. So here's our forward function. Here we're implementing it with map malls and with the lin function, the rel use and things like that. The bad thing about defining a forward function like this is you get tons of arguments to your function. And so some of these arguments are things that you want to feed into the model. Some of these things are parameters. And so as we're factoring, what we can do is we can introduce a struct, you might be surprised, that puts all of our parameters into it. And so here we have my model and we're saying it is differentiable. And what differentiable means is it has a whole bunch of floating point stuff in it. And I want to get the gradient with respect to all of these. So now I can shove all those arguments into the struct. It gives me a nice capsule to deal with. And now I can use the forward function on my model. I can declare as a method. This is starting to look nicer. This is more familiar. And I can just do math. And I can use w1 and b1. And these are just values defined on our struct. Now I can get the gradient with respect to the whole model in our loss. And all of this is building up on top of all those different primitives that we saw before and the chain rule and all these things. Now we can say, hey, give us the gradient of the model with respect to x-train and y-train. And we get all the gradients of w1, b1, w2, b2. And all this stuff works. You can see it all calling the little functions that we wrote. And it's all pretty fast. Now, again, like we were just talking about, this is not something you should do for Matmul or convolution. But there are reasons why this is cool. And so there are good reasons and there are annoying reasons, I guess. So sometimes the gradients that you get out of any automotive system will be slow. Because you do a ton of computation. And it turns out the gradient ends up being more complicated. And sometimes you want to do an approximation. And so it's actually really nice that you can say, hey, here's the forward version of this big complicated computation. I'm going to have an approximation that just runs faster. Sometimes you'll get numerical instabilities in your gradients. And so you can define, again, a different implementation of the backwards pass, which can be useful for exotic cases. There are some people on the far research side of things that want to use learning and things like that to learn gradients, which is cool. And so having the system where everything is just simple and composes but is hackable is really nice. There are also always going to be limitations of the system. Now, one of the limitations that we currently have today, which will hopefully be fixed by the time the video comes out, is we don't support control flow in autodiff. And so if you do an if or a loop, like an RNN, autodiff will say, eh, I don't support that yet. But that's OK because you can do it yourself. So we'll go see an example of that in 11. There we go. And so what we have implemented here, and we'll talk about layers more in a second, is we have this thing called switchable layer. And what switchable layer is, is it's just a layer that allows us to have a Boolean tolerable to turn it on and off. And the on and off needs an if. And so Swift autodiff doesn't currently support if. And so when we define the forward function, it's super easy. We just check to see if it's on. And if so, we run the forward. Otherwise, we don't. Because it doesn't support that control flow yet, we have to write the backwards pass manually. And we can do that using exactly all the stuff that we just showed. We implement the value, and we implement the chain or thing. And we can implement it by returning the right magic set of closures and stuff like that. And so, you know, it sucks that Swift doesn't support this yet. But it's an infinitely hackable system. And so for this or anything else, you can go and customize it to your heart's content. Yeah. And I mean, one of the key things here is that Chris was talking about kind of the atoms. And at the moment, the atoms is TensorFlow, which is way too big an atom. It's a very large atom. But at the point when we're kind of in MLIR world, the atoms are the things going on inside your little kernel DSL that you've written. And so this ability to actually differentiate on float directly suddenly becomes super important because it means that, like, I mean, for decades, people weren't doing much researchy stuff with deep learning. And one of the reasons was that none of us could be bothered implementing an accelerated version of every damned, you know, CUDA operation that we needed to do the backward pass off and do the calculus and blah, blah, blah. Nowadays, we only work with a subset of things that, like, PyTorch and stuff already supports. So at the point where, you know, so this is the thing about why we're doing this stuff with Swift now is that this is the foundations of something that in the next year or two will give us all the way down infinitely hackable, fully differentiable system. Yep. Can we jump to the layer really quick? So we've talked about MatMol, we've talked about Autodiff. Now let's talk about other stuff. So layers are now super easy. It just uses all the same stuff you've seen. And so if you go look at layer, it's a protocol, just like we were talking before. And layers are differentiable. Like, they contain bags of parameters, just like we just saw. The requirement inside of a layer is you have to have a call. So layers in Swift are callable just like you'd expect. And they work with any type that's an input or output. And what layer says is the input and output types just have to be differentiable. And so layer itself is really simple. Yeah. And so underneath here you can see us defining a few different layers. So for example, here is the definition of a dense layer. Right? And so then now that we've got our layers and we've got our forward pass, that's enough to actually allow us to do many batch training. And I'm not going to go through all this in any detail other than just to point out that you can see here is defining a model and it's just a layer because it's just a differentiable thing that has a call function. And you can call the model layer. We can define logsoftmax. We can define negative loglikelihood. Logsumexp. Once we've done all that, we're allowed to use the Swift for TS version because we've done it ourselves. And at that point, we can create a training loop. So we all define accuracy just like we did in PyTorch. Set up our mini batch just like we did in PyTorch. And at this point, we can create a training loop. So we just go through and grab our X and Y and update all of our things. You'll notice that there's no torch.nograd here. And that's because in Swift, you opt into gradients, not out of gradients. So you wrap the stuff that wants gradients inside value with gradient. And there we go. So we've got a training loop. Now, one really cool thing is that we've got a training loop. We've got a training loop. We've got a training loop. Now, one really cool thing is that all of these things end up packaged up together thanks to the layer protocol into a thing called variables. Which layer is differentiable? Yeah, so... Differentiable is also a protocol. Protocols have lots of cool stuff on them. So thanks to that, we don't have to write anything else. We can just say model.variables minus equals lr times grad. And it just works. So that's the basic of protocol extensions. Our model got that for free because we said it's a layer. Okay, so I think that's about all we wanted to show there. So now that we've got that, we're actually allowed to use optimizers. So we can just use that instead. And that gives us a standard training loop which we can use. And then on top of that, we can add callbacks, which I won't go into the details, but you can check it out in 0, 4. And you will find that... Let's find them. Here we go. We'll find a loader class which has the same callbacks that we're used to. And then eventually, we'll get to the point where we've actually written a stateful optimizer with hyperparameters. Again, just like we saw in PyTorch. All of this will now look very familiar. We won't look at dictionaries now, but they're almost identical to PyTorch dictionaries, and we use them in almost the same way. So you see we've got states and steppers and stats, just like in PyTorch. And so eventually, you'll see we have things like the lamb optimizer written in Swift, which is pretty great. And it's the same amount of code. And things like squared derivatives. We can use our nice little yoni code to make it easy. And so now we have a function created as SGD optimizer, a function to create an atom optimizer. We have a function to do one cycle scheduling. And thanks to Matplotlib, we can check that it all works. So this is really the power of the abstraction coming back to one of the earlier questions of earlier today, we started in C. And we're talking about very abstract things like protocols and how things fit together. You get those basic things, and this is one of the reasons why learning Swift goes pretty quickly. You get the basic idea and now it applies everywhere. Yeah, and here we are doing mixup. And so now we're in 10. And here we are doing label smoothing. And to say it, it's really very similar looking code to what we have in PyTorch. So then by the time we get to 11, other than this hacky workaround for the fact that we can't do control flow differentiation yet coming very soon, our ex-resnet as you've seen looks very similar. And we can train it in the usual way, and there we go. So we've kind of started with nothing. And Chris spent a couple of decades for us first of all building a compiler framework and then a compiler and then a C compiler and then a C++ compiler and then a new language and then a compiler for the language. And then we came in and let me correct you on one minor detail. Some people helped you. Yeah. I did not build all this stuff. Like amazing people that I got to work with built all of this stuff. And likewise, like all of these workbooks were built by amazing people that we were lucky enough to work with. Yeah, absolutely. So that's all happened. And then let's look at, so like it's kind of like thanks to all that work, we then got to a point where 18 months ago you and I met, you just joined Google. We were at the TensorFlow Symposium and I said, what are you doing here? I thought, you're a compiler guy. And you said, oh, well, now I'm going to be a deep learning guy. Well, deep learning sounds really cool. Yeah. He hadn't told me it was uncool yet. Yeah, so then I complained about how terrible everything was. And Chris said, so Chris said, I've got to create a new framework. I was like, we need a lot more than a new framework. I described the problems that we've talked about with like where Python's up to. Chris said, well, I might actually be creating a new language for deep learning, which I was very excited about because I'm totally not happy with the current languages we have for deep learning. So then 12 months ago, I guess we started asking this question of like, what if high level API design actually influenced the creation of a differentiable programming language? What would that mean? And so to me, one of the dreams is is when you connect the building of a thing with teaching of a thing with using a thing in reality. And one of the beautiful things about FastAI is pulling together both building the framework, teaching the framework, and doing research with the framework. Yeah. So next time we caught up I said, maybe we should try writing FastAI in Swift. And you're like, we could do that, I guess. I was like, great. I think the one thing before this, I'm like, hey Jeremy, it's starting to work. Yeah. And he says, oh cool, can we ship it yet? I'm like, it's starting to work. It needs a high level API. So let's announce the course where we teach people to use this thing that doesn't exist yet. And I think I said naively, I like deadlines. Deadlines are a good thing. They force progress to happen. So then one month ago we created a GitHub repo and we put a notebook in it and we got the last TensorFlow Dev Summit. We sat in a room with the Swift with TensorFlow team and we wrote the first line of the first notebook and you told your team, hey, we're going to write all of the Python notebooks from scratch. And they basically said, what have you got us into? So, and I think we've learned a lot. Yeah. So, I mean, to me, the question is still this, which is what if high level API design was able to influence the creation of a differentiable programming language? And I guess we've started answering that question. Yeah. I don't think we're there yet. I mean, I think that what we've learned even over the last month is that there's still a really long way to go. And I think this is the kind of thing that really benefits from different kinds of people and perspectives and a different set of challenges. And just today and yesterday working on data blocks, a breakthrough happened where there's an entirely new way to reimagine it as this functional composition that solves a lot of problems. Yeah. A lot of those kinds of breakthroughs I think are still just waiting to happen. I mean, it's been an interesting process for me, Chris, like, because we decided to go back and redo the Python library from scratch. And as we did it, we were thinking, like, what would this look like when we get to Swift. And so we, you know, even as we did the Python library, we created the idea of state voloptimizes. We created the new callbacks version 2. So that was interesting. But that's also been interesting I've seen from as an outsider from a distance that Swift syntax seems to be changing thanks to some of this. Yeah, absolutely. So there are new features in Swift, including callables. That's a thing that exists because of Swift Redenser Flow. The Python interoperability, believe it or not, we drove that because it's really important for what we're doing. There's a bunch of stuff like that that's already being driven by this project, and I think there's going to be more. And so, like, making it so float can default to weight and nothing. That's really important. We have to do that. And otherwise, it wouldn't have been a priority. So, I mean, so it's still really, really early days. And I think the question in my mind is now, like, what will happen when data scientists in lots of different domains have access to an infinitely hackable differentiable language, along with the world of all of the C libraries. You know, like, what do we end up with? Because we're starting from very little in terms of ecosystem, right? But, like, there are things in Swift, we haven't covered, for example, something called keypaths. There's this thing called keypaths which might let us write, like, little query language DSLs in Swift with type safety. Yeah, give me all the parameters out of this thing, and let me do something interesting to them. Yeah. That's really cool. And so, you know, I guess at this point, I'm kind of saying that people is like, pick some piece of this that might be interesting in your domain and over the next 12 to 24 months, explore with us so that, you know, as Chris said, putting this airplane together whilst it's flying, you know, by the time it's actually the all the pieces are together, you'll have your domain-specific pieces together and I think it'll be super exciting. And one of the things that's also really important about this project is it's not cast in concrete. So we can and we will change it to make it great. And to me, we're very much in the phase of let's focus on making the basic ingredients that everybody puts things together. Like, let's talk about the core of layer is. Let's talk about what data blocks should be. Let's talk about what all these basic things are. Let's not mess with float anymore. Let's go up a few levels. Yeah, we can consider float done. But let's actually really focus on getting these right so that then we can build amazing things on top of them. And to me, the thing I'm looking forward to is just innovation. Innovation happens when you make things that are previously hard accessible to more people. And that's what I would just love to see. So the thing I keep hearing is how do I get involved? I think there's many places you can get involved. But to me, the best way to get involved is by trying to start using little bits of this in work that you're doing or utilities you're building or hobbies you have. Just try. It's not so much how do I add some new custom derivative thing into Swift for TensorFlow. But it's like implement some notebook that didn't exist before or take some Python library that you've liked using and try and create a Swift version. Try something like a blog post. So one of the things when Swift first came up is that a lot of people were blogging about their experiences and what they learned and what they liked and what they didn't like. And that's an amazing communication channel because the team listens to that. And that's a huge feedback loop because we can see somebody was struggling about it and even over the last couple of weeks when Jeremy complains about something, we're like, oh, that is really hard. Maybe we should fix that and we do change it. And then progress happens. So we want that feedback loop and blogs and other kinds of forms. It's a very receptive community, a very receptive team for sure. Were there any highlight questions that you wanted to ask before we wrapped up, Rachel? Really? Okay. An absolute honor and absolute pleasure to get to work with you and with your team. It's like a dream come true for me and to see what is being built here. And you're always super humble about your influence. But I mean, you've been such an extraordinary influence in all the things that you've helped make happen. And I'm super thrilled for our little community that you've let us piggyback on yours a little bit. I'm... Thank you so much. And from my perspective, as a tool builder, tool builders exist because of users. And I want to build a beautiful thing and I think everybody working on the project wants to build something that is really beautiful, really profound, that enables people to do things they've never done before. I'm really excited to see that. I think we're already seeing that starting to happen. So thank you so much and thanks everybody for joining us. See you on the forums.