 But instead of telling you all this now, I'd love to introduce Jeremy Howard, who will show you how Mojo works in practice. Thanks, Chris. You know, I realized it's been 30 years since I first trained a neural network. And to be honest, I haven't been that satisfied with any of the languages that I've been able to use throughout that time. In fact, I complained to Chris about this when I first met him years ago. And Chris has been telling me ever since, don't worry, Jeremy. One day we are going to fix this. The thing that I really want to fix is to see a language where I can write performant flexible hardcore code, but it should also be concise, readable, understandable code. And I think that actually Chris and his team here have done it with this new language called Mojo. Mojo is actually a superset of Python, so I can use my Python code. Yeah. Check this out. I'll show you what I mean. So here is a notebook, right? But this notebook is no normal notebook. This is a Mojo notebook. And by way of demonstration, because this is the most fundamental foundational algorithm in deep learning, we're going to look at matrix multiplication. Now, of course, Mojo has got its own. We don't need to write our own, but we're just showing you we can actually write our own high performance matrix multiplication. Let's start by comparing to Python. It's very easy to do because we can just type percent Python in a Mojo notebook, and then it actually is going to run it on the CPython interpreter. So here's our basic matrix multiplication. Go across the rows and the columns, multiply together, add it up. Let's write a little matrix and a little benchmark and try it out. And oh dear, .005 gigaflops. That's not great. How do we speed it up? Well, actually, believe it or not, we just take that code. We copy and paste it into a new cell without the percent Python. And because Mojo is a superset of Python, this runs too, but this time it runs in Mojo, not in Python. And immediately we get an eight and a half times beat up. Now there's a lot of performance left on the table here. And to go faster, we're going to want a nice, fast, compact matrix type. Of course, we can use the one that Mojo provides for us. But just to show you that we can, here we've implemented it from scratch. So we're actually creating a struct here. So this is nice, compact in memory, and it's got the normal things we're used to, like Dunder GetItem and Dunder SetItem, and stuff you don't expect to see in Python, like Alec and like SIMD. And as you can see, the whole thing fits in about a page of code, a screen of code. So that's our matrix. And so to use it, we take, copy and paste the code again, but this time just added type annotation. These are matrices. And now it's a 300 times beat up. Suddenly things are looking pretty amazing. But there's a lot more we can do. We can look at doing, if our CPU supports it, say eight elements at a time using SIMD instructions. It's a bit of a mess to do that manually. There's quite a bit of code, but we can do it manually. And we get a 570 times beat up. But better still, we can just call vectorize. So just write a product operation, call vectorize, and it will automatically handle it on SIMD for us with the same performance beat up. So that's going to be happening in the innermost loop. We're going to be using SIMD. And in the outermost loop, what if we just call paralyze? This is something we can do. And now suddenly the rows are going to be done on separate cores for a 2,000 times beat up. So we've only got four cores going on here. So it's not huge. If you've got more cores, it'll be much bigger. This is something you absolutely can't do with Python. You can do some very, very basic parallel processing with Python, but it's literally creating separate processors and having to move memory around. And it's pretty nasty. And there's all kinds of complexities around the global interpreter lock and so forth as well. This is how easy it is in Mojo. And so suddenly we've got a 2,000 times faster matrix multiplication written from scratch. We can also make sure that we're using the cache really effectively by doing tiling. So doing a few bits of memory that's close to each other at a time and reusing them. Tiling is as easy as creating this little tiling function and then calling it to tile our function. So now we've got something that is parallelized, tiled, and vectorized for a 2,170 times beat up over Python. We can also add unrolling for a 2,200 times beat up. So vectorized unroll is already built into Mojo. So we don't even have to write that. Now, there's a lot of complexity here, though. Like what tile size do we use? How many processors? What SIMD size? All this mess to worry about. And each different person you deploy to, it's going to have different versions of these. They'll have different memory. They're going to have different CPUs and so forth. No worries. Look at what you can do. We can create an auto-tuned version by simply calling auto-tuned. So if we want an auto-tuned tile size, we just say, hey, Mojo, try these different tile sizes for us, figure out which one's the fastest, compile the fastest version for us, cache it for this individual computer, and then use that, parallelized, tiled, unrolled, vectorized for a 4,164 times beat up. So this is pretty remarkable, right? Now, it's not just linear algebra stuff. We can do really iterative stuff like calculating Mandelbrot. So we could create our own complex number type, and it's going to be a struct. So again, it's going to be compact in memory. It looks like absolutely standard Python, as you can see, multiplying, subtracting, using the operations. And to create the Mandelbrot kernel, we just take the classic Mandelbrot set equation, iterative equation, and pop it in Python here. And then we can call it a bunch of times in a loop, returning at the appropriate time to compute the Mandelbrot set. That's all very well and good. Did it work? Well, it'd be nice to look at it. So how would you look at it? Well, it'd be nice to use Matplotlib. Oh, no worries. Every single Python library works in Mojo. And you can import it. Check this out. Plot is import the Python module, Matplotlib. Np is import the module numpy. And the rest of it, this is actually Mojo code. But it's also Python code. And it works. And I don't know if you remember, but Chris actually said the Mandelbrot set is 35,000 times faster than Python. And that's because we can also do an even faster version where we're handling it with SIMD. And we can actually create the kind of iterative algorithm that you just can't do in Python, even with the help of stuff like numpy. This is something which is really unique to Mojo. So we now have something here, which is incredibly flexible, incredibly fast, and it can utilize the hardware you have, no matter what it is, and is really understandable to Python programmers like you and me. I think finally we're at a point where we are going to have something where I actually enjoy writing neural networks. Wow. How awesome was that?