 What better way to start a recording after, I don't know, two months since the last time we did this and change up everything? What could possibly go wrong? Should I talk a bit about WebAssembly threats, Jake? What do you think? Well, have you written slides about that? Because if that's what you've written all the slides about, then I think that is what you should talk about. Otherwise it would get very confusing. I wrote the title and I'm like, well, now I wrote the title. I've got to write the rest around this. And so I did. Well, so the thing I really like about WebAssembly and this is very much, I think, a surma thing that doesn't necessarily apply to everyone is that WebAssembly has very little surface in itself. WebAssembly can't do a lot of things by itself. Very tiny things, and I'm going to talk about this, what we can do. And yet, we end up with capabilities like threads and SIMD, all these things that JavaScript can't provide. And so I kind of wanted to talk about that, talk about WebAssembly, what it can and cannot do by itself and how interactions with JavaScript can open up all these possibilities. And for that, I thought I'd start a little bit with a small introduction to WebAssembly at the low level. Before I start, it's probably important to say like, this is not what you need to know to use WebAssembly. This is a bit like looking under the hood. So before we continue, you said JavaScript doesn't have threads and JavaScript doesn't have threads, correct? But the web has workers, node has workers. I'll talk about that. Oh, okay, fair enough. Yeah, because that's exactly the interesting bit because to an extent yes, but on other perspective, you would say no, and then you combine these two and suddenly magic happens. And that's kind of what I want to talk about. So I'm going to give a bit of an incomplete overview enough so that you hopefully understand what WebAssembly can and cannot do, but probably not enough to cover every variant in which it can do these things. But just like a peek under the hood of WebAssembly, something you don't necessarily need to know to use WebAssembly but could be interesting or useful to know every now and then. So this is WebAssembly, kind of. This is the human-readable assembly languages. The human-readable... I'm going to have to take issue with human-readable. Is this human? It can't read it. It's, yes, I hear you. It's assembly. I mean, people who have seen assembly or any form of machine code, the human-readable versions are not readable in, oh, I understand what's happening, but at least you can decipher individual words compared to the binary representation of this file. So this text presentation is called WAT, or What WebAssembly Text, period, I guess. This language is literally a text representation of what is in the file in binary. So you can define module and one module will basically end up as one WebAssembly file. So as a module can contain multiple functions and functions can take numbers as parameters. And numbers in WebAssembly can be 32 and 64-bit integers and 32 and 64-bit floats. And you can do math with these numbers and then you can return a number as a result. And you can export... What more could you possibly want? Exactly, it's all you need, right? And you can export some of these functions to be callable from the outside. I'm going to talk about outside in a bit. So JavaScript is one of the host systems for WebAssembly. There's now actually multiple host systems out there, some for PHP, some just standalone like Wasm Time, which runs as a standalone app on your desktop machine. But we're going to talk about JavaScript because we are a web show. And so in JavaScript land, you would fetch the .wasm module and instantiate it, which means that it will compile the module and instantiate it and you can call these exported functions, which is now, you know, this is what the outside is. And here I declare a function that takes two parameters and a return type. 32-bit integer is the return type of this function. And then I use these two parameters to add them and that is also implicitly the return value of this function. And at that point, it returns to JavaScript and the JavaScript environment knows how to convert between JavaScript types and WebAssembly types. And pretty much all the wasm times just turn into 64-bit floats because that's the only number type JavaScript has. Recently, there's been an addition to WebAssembly where 64-bit integers are now going to map to big ins because a 64-bit float can't represent all the numbers a 64-bit integer can assume. So that will address that problem. But does that mean then JavaScript can, like JavaScript has a number type that WebAssembly doesn't support as well? Cause like if you get an arbitrarily big number, it will get to a point where WebAssembly can't, once it's beyond the 64-bits. Yeah, that can happen currently. So yeah, if the big end grows too big, it cannot be represented as in 64. I actually don't quite remember how that is handled. If it throws or if it gets clamped or something like that, I would look into that. I have to look into that. You see here that JavaScript can call WebAssembly, but WebAssembly has access to absolutely nothing. You can pass in the numbers and use these numbers to do arithmetic within WebAssembly, but you don't have access to any of the APIs you might be used to. The WebAssembly is completely isolated. And that alone is actually surprisingly powerful, but we need a bit more to make it actually useful. So the next step is that you can declare imports. And here I'm saying that I'm expecting an import in the Sorma's imports namespace, and that the import is called alert. And I expect it to be a function that takes 132-bit integer as a parameter. Later, I call that function with a result of our computation instead of returning the result of the computation. The instantiation for this WebAssembly module remains largely unchanged, except now we have to provide these imports. And that has to happen at instantiation, and instantiation will fail if I don't provide all the imports the module requires. So this here is the so-called import object, and the alert is obviously good old alert function that I hope we all remember from our start of JavaScript debugging. I certainly used that a lot for debugging. So this shows now that you can not only export WebAssembly functions to JavaScript, but also you can expose individual JavaScript functions to WebAssembly, but still only number types will be able to be passed back and forth because WebAssembly, for example, has no built-in understanding of strings. Now, so far, these WebAssembly modules have worked with just parameters and did arithmetic on these parameters. For any more complex kind of work, you actually need a bit of memory. And that's why WebAssembly also has a way to handle chunks of memory that you might already know as array buffers. So here we declare that this WebAssembly module expects a memory in our import object, and that needs to be at least one page big. WebAssembly measures memory in pages, each page being 64 kilobytes, that has security and operating system integration reasoning. It doesn't really matter, but basically the smallest unit of memory is 64 kilobytes, and every memory has to be multiple of that size. And now we can use load and store to manipulate the values in that memory. So instead of adding the two values from the function parameters, we are now adding two values that we find in memory. WebAssembly memories are, as I mentioned, a lot like array buffers, but not quite the same. They have their own type because they grow, they can grow, they have a different unit of measurement, and they need a bit of special setup for security under the hood. But in a way, they behave exactly the same. So here I create a new WebAssembly memory. So we can create a typed array view on this array buffer, just like with normal array buffers, and then we can use this array view to put these values into memory and then use our wasm module to add them up. This is obviously kind of a useful example, just put two values in memory and add them up, but it just shows how the interaction with the memory works. So the memory in wasm is it's just like, or sending a function into wasm like you did with alerts and sending memory into wasm is just exactly the same. It's pretty much the same. You have to declare what an import is supposed to be because WebAssembly is strongly typed. So at Compatim is known that this import needs to be a function and this import needs to be memory, but it's the same way you as the whole system have to make the conscious choice to give something to WebAssembly. WebAssembly cannot just grab anything by itself, which is one of the security. Can I talk to what you said before of how WebAssembly is so lightweight? Like I always assumed like there was a deep integration of how the memory in WebAssembly works, but it is just chucking a JavaScript object in there and then you're performing operations on it in WebAssembly land. So for complete transparency, a WebAssembly model can actually declare its own memory and export it instead, but it still functions the same. It's just like here's because the WebAssembly module declares its own memory, you also know it doesn't get access to anything it shouldn't get access to because it's created as instantiation time. So it doesn't get random access to unknown data. So it's all about the security and the primitives that are being exposed here. That kind of covers all the things that WebAssembly can do that we need to know about to talk about this. This is pretty much what is called the WASM MVP, the Minimal Vibre product, which was the synchronized launch between all the browsers. There are proposals obviously in WebAssembly land to augment what WebAssembly can do, but almost all of them are just almost syntactic sugar on top of these things. Very few of the proposals actually expose new capabilities. And if they are, they're often limited to arithmetic, which I think is very interesting. So let's talk about threads because JavaScript is a bit weird on this topic because it is by design single-threaded. JavaScript, however, supports parallelism at least on the web and a note recently with workers which runs a JavaScript file in a truly parallel fashion to your main thread. However, you can only send messages back and forth with the worker and the main thread with post-message. And there's no way to share a variable between those two threads like you might be used to from other languages that support threads or any form of threading primitive. And Cynthia. Oh, well, except she shared a rainbow thread, right? Well, Jake, that's what I'm getting to. Okay. So if you just added shared memory to JavaScript, things would break because many of the permits are designed around the fact that they cannot get interrupted or that there could be race conditions. So instead, the shared memory concept has to be isolated to a specific type, which, as you already spoiled, Jake, is the shared array buffer. I'm sorry, I'm sorry. I'll just shut up and sit here and let you talk because it's clearly everything that I'm thinking of you've already got covered. I'm just glad you asked this question exactly here because that was my next slide. So I did something right at least. So, yeah, shared array buffer is pretty much just like an array buffer in the end, but you can get shared access with the same array buffer from across the threads. And both of these threads will see the memory manipulation under the hood in real time. So here what I'm doing is I'm running the main thread and basically have a while true loop to wait until the first cell in the memory is bigger or greater than, greater or equal than 100. So this basically means the main thread is blocked. And in the worker, I get access of the same shared array buffer and I increment the first cell value. So even though the main thread is blocked, at some point it's gonna get unblocked because the worker is running in parallel for real and can work on the same except memory chunk as the main thread. So this is called a spin lock where you just keep spinning an endless loop and keep checking your condition to continue. You know, they're a thing, but they're obviously quite bad because you just lock in your CPU at 100% for this thread because that's all the process is doing. Are you going to mention Atomics in a little bit? I am now going to mention Atomics aren't you on top of things? Because I actually think that Atomics are not that well known because very few people probably work with shared array buffers and they only work with shared array buffers. And basically they have just a couple of operations in Atomics to make operating on these shared array buffers more reliable and predictable. As for example, in a worker here we can block on a memory cell and wait for other threads to notify us that this memory cell is now ready to use. It's a form of a mutex. So in the worker we can use Atomics.wait to wait on a certain cell. The first value here is the index and the second parameter is the X and the first memory is the view. I should read my own code. The first memory is the actual memory view. The second parameter is the index and the third parameter is the expected value that needs to be in the cell before we start blocking. And that is a typical mutex programming parameter to check if the value had been changed before you started waiting. Now the main thread we can basically just wait on the user to click a button and once they do, we use Atomics.notify and that will wake up all the threads that are waiting on this memory cell. And so this way the CPU is not locked 100%. This is not a spin lock. It's actually in cooperation with the operating system and will put the thread to sleep and save system resources. Now, that was basically all the prelude, all the building blocks that we need to know to understand the WebAssemblyThreads proposal that is now in Chrome stable. So WebAssembly, the WebAssemblyThreads proposal is actually much less than I thought it was because when you think about threads in C or Rust, you think about calling a function and having it run as a thread. WebAssemblyThreads is not that. What it really is, it's just it allows you to declare a memory as shared, which basically makes this WebAssembly memory behave exactly as a normal memory, but also like a shared array buffer in that you can have multiple views in real time onto the same memory from different threads. And additionally, it exposes these Atomics as WebAssembly instructions. Now, this is interesting because it doesn't actually allow you to spawn a thread. It just gives you these Atomics and this is actually solved by the language compiler or the runtime that you use to write your WebAssembly. So I did a diagram, I know people find these kind of UML diagrams scary, but that's actually what I found interesting, but it is more complicated outside of WebAssembly and it is inside of WebAssembly. So basically you write your JavaScript and whenever you compile something from C to WebAssembly with MScript, MScript will not only generate a .wasn file, but also a JavaScript glue code as it's called. And that piece of JavaScript takes care of loading the WebAssembly module for you, populating the memory with all the values it needs to be in there. And it provides the integration with a host system that the C language expects to be in place. So for example, when in C you call pthreadcreate, which is the function to create a thread, that is actually a JavaScript function that is imported into the WebAssembly. So when you call pthreadcreate, the call goes into JavaScript, it will spawn a worker, it will send the module and the shared array buffer over to the worker. The worker will also load the glue code and instantiate the same module on top of the exact same memory. And now we have main thread and worker running on the same memory with the same WebAssembly module and they can now use the atomics to synchronize. So from here on in, it actually behaves like a real C program, but all the magic really happens in... So what's the new bit then? Because we could have like today, you know, before this wasm feature became a thing, like we can still give WebAssembly a shared array buffer as memory. So there'd be nothing to stop us instantiating two bits of wasm that are actually using the same bit of memory in different workers. And they could be instructed to work on separate things at separate times. So is that the case? And if so, what's the new bit? Before you couldn't instantiate WebAssembly on a shared array buffer, that just wasn't possible. And you didn't have the atomics instructions inside WebAssembly. So that's what I mean. Like those are really just the new additions and they already exist in JavaScript. They're not new capabilities on the web really. The difference is that in JavaScript, we don't have any way of expressing complex objects on top of a shared array buffer. But any normal compiled language does exactly that. You can build complex classes and structs and they all somehow get represented in linear memory. And so that is this combination of old, actually kind of old JavaScript features, shared array buffers and atomics and WebAssembly's capability to bring the high level constructs to a low level virtual machine that in combination we now have real threads on the web which kind of was possible with workers and shared array buffers, but not in a comfortable way. Yeah, we've been using it with Scrooge now and it worked surprisingly well for some of them. Yeah, and so I looked into it and I was like, I was surprised at how small to an extent the proposal really is. And yet the combination of these things really turns it to something very powerful. So I'll put you on the spot a little bit because I haven't looked in detail at what we've done with Scrooge because it wasn't me doing that work. So it's going to be spinning at these workers. When does it get rid of those workers? Like, does it generate them per thread and then destroy them per thread or does it have a thread pool? I think it has to. So actually, no, that's not true. I think different compilers can handle it differently. I know that M-scripten takes a worker pool parameter. So I wonder if that is to an extent how it works because I mean, a computer has, you can spawn as many threads as you like technically, but you have limited amount of cores that can actually run any of these threads at any given time. My hunch is that naively, it will probably spin up as many workers as you call create threads and it will kill the worker when the thread is done. But there's probably smarter things out there when you create a worker pool that they get recycled and reused. I mean, workers, not workers, I think what we find these threads are still so new to an extent that there's so many things to measure and to optimize and to see how, with actual usage patterns, that how that affects performance. I know that Google Earth has been using them for a while. So I'm guessing they had, and I know they've been talking about it. So I'm wondering how much feedback has been flowing back between the M script engineering team and the Google Earth engineering team. But so far, threads has been quote unquote good enough to run all these use cases with good performance results in the wild. And that makes kind of hopeful at least. So is there any proposal to put in like genuine threads? Like, I don't know. I say genuine threads and that was kind of like, what? You want to spawn something without having to write the glue code, right? That's what I thought, that that would be the capability to somehow spawn a thread. No, I don't think so. I think what this would be, that would be WASI, where you have a standardized systems interface. So instead of, for each language, you reinvent how to mock out a thread creation call, there would be a standardized interface to the host system, which is what WASI is. Where you can say, open the file, create a network connection, but probably also spawn a thread. And then the WASI implementation, may that be in a desktop environment runtime, or maybe a JavaScript layer on the web, or wherever would just have this generic implementation. But that is still, I think, bit out being, you know, worked on being service. There's experiments, and that's really, really good, but I had nothing that I would say you could settle on for production right now. But yeah, that was basically the web assembly, handwritten version speedrun with Serma, and threads. Did you mention something about SIMD along the way as well? Is that a, what's happening there? SIMD is an even smaller proposal, because it just, well, it adds one new type, which is 128 bits something. And you can interpret these 128 bits, either as 432-bit integers, 264-bit integers as 816-bit integers, and just see them as vectors, and add them, and multiply them, and they settled on that, because that seems to be what will compile on most actual CPUs. Because web assembly by itself is just an intermediate format, right? Like when you download web assembly, it will get compiled to real machine code that runs on your actual processor. So you need to find a SIMD equivalent that will compile as many processors as possible. And then all it does is just add this new type and add a couple of new instructions. And then the compiler can decide whether the CPU supports or not, and either pretend to run those instructions in one instruction cycle, or just do it in series and pretend that it did it in one cycle. And that's, yeah, that's the other thing that we need to look into, but we haven't done it, have we? That's still one of the things we want to look into first. Actually, so the status of the threads, that's shipped, right? Like the... Well, so Ingvar, our colleague has done some experiments and has got it to work with both Rust and C++ and we have seen some pretty good performance improvement on those. SIMD is harder because often SIMD needs to be handwritten. Like the compiler often can only figure out very clear cases to auto vectorize, as it's called, to automatically turn a loop into a SIMD instruction. That only works in very few cases. The problem is that many codecs that have SIMD instructions use handwritten assembly, but not handwritten web assembly, but for other CPUs, those don't compile to web assembly. So we're kind of in a niche where we don't know how we can make use of the SIMD from these codecs to make use of web assembly SIMD, but I think we'll play around a bit and maybe he'll find something. But in terms of the thread stuff, the new thread stuff, what browsers has that landed in? I know it's in desktop Chrome. I know it's in Firefox desktop if you have co-op and co-op enabled. And I think we even have an Android Chrome soon if you have co-op and co-op enabled, which we have on Squoosh. So we also have future detection. So if you have support on your browser, Squoosh will just use threads. If not, you will get the old codec version without threads. So I think the reason for this is because on desktop Chrome, you get process isolation out of the box, whereas you don't get that in Firefox and you don't get that on Android because there's more of a memory concern. And that's when you need to sort of... Exactly. Yeah, close all the doors to outside content, which is the co-op and co-op stuff. We've got another episode on that. We can link to that. Oh, do we? Yeah. Yeah, I mean, we last shared our A-buffer for... Oh, we already did that one, you're right. ...with all the acronyms. Yeah, I mean, we lost shared our A-buffer for a while due to Spectre and Meltdown, and now we have found mechanisms how we can bring them back without them being a risk, which is co-op and co-op. And we should link that episode because it was good. Thank you. Well done, Noah. All right, that's it. That was my Wuppers and Beespeed run. Cool, bye-bye. Bye. Not good, right? Oh, no.