 So my name is Kate Goldenring, and I'm here today virtually with Joel Dice. And today we're going to be talking about serverless WebAssembly, so what Oscar just mentioned there, and how with WebAssembly we're able to go from zero WebAssembly applications all the way up to 10,000 and back down to zero in under 10 seconds. And we'll talk about the characteristics of WebAssembly that let us do this. So to start off, like I mentioned, I'm Kate Goldenring. Joel and I both are software engineers at Fermion. Joel is very active in the WebAssembly ecosystem with the Wazze subgroup and with many of the language ecosystems around WebAssembly. I personally come from a Kubernetes background. I am co-chair of the CNCF IoT Edge Working Group, and I've moved into a WebAssembly space in part because of what Oscar mentioned. Really excited about the sustainability impacts of switching to serverless with WebAssembly. To talk a little bit about what we're going to talk about today, we'll first level set on our definition of what is serverless. Then we'll talk about why would I use it, and what are the characteristics that we need for an ideal unit of compute with serverless and where does WebAssembly fall among those characteristics. Then I'll pass it off to Joel to talk about what isolation looks like in the context of serverless and how to optimize the performance of serverless WebAssembly. Finally, we'll do that demo that we talked about of scaling up and down from none to many with WebAssembly. To start off, what is serverless? Serverless you can think of as a two-part definition. It's both a type of application. When we talk about serverless applications, we're not talking about long-run daemon services. Rather, we're talking about short-lived event-driven functions. On the other hand, when we talk about serverless, we're talking about a development model, one wherein I don't have to think about the servers. I don't have to manage infrastructure. All I think about is my business logic, and that's what I can focus on as a developer. All of this can come together as a definition of functions as a service or fast. Why should I care about serverless? The first one I mentioned briefly was serverless. You really only need to think about the business logic, and that leads to a lot of developer productivity. Beyond that, we have faster scaling. The idea is that with serverless, you know that you can scale up as your events or requests happen and then scale down, and you expect that underlying unit for serverless can handle that fast scaling. What does this lead to? It leads to better utilization of hardware. If I'm able to only execute my functions when I need them, I'm able to only use hardware when I need to use it, and that leads to cost savings. You're using less of your resources at any given time and only when you need, and that leads to those environmental benefits of more sustainable software. If you're not aware of the Green Software Foundation, they actually have a formula for calculating the software carbon intensity, and it is a function of operational emissions and embodied emissions. What I just mentioned there is a clear reduction in operational emissions. I'm only operating or executing my application when needed, and then when I'm not using it, it's not there. It's not running. It's not using any resources. The embodied emissions are the emissions of the actual physical resources. The extension with serverless is actually I can do more with less physical hardware, and part of the reason for that is the concept of multi-tenancy and assumptions that we get from multi-tenancy. So multi-tenancy means that I'm able to have multiple independent applications running in a shared environment. So multiple tenants are using the same physical hardware, and what this does is it brings the cost of a system closer to the value you get from that system, and it does that because if you can think about it, if I'm a platform provider, the cost for me to run that platform is the most amount of resources that are used at any short-term peak time. So if everyone starts using their application on a Wednesday, I need to make sure that the infrastructure I provide, so that I have enough CPU and memory to handle all those requests at once. So that is the physical infrastructure or the cost of running that platform. The value that someone gets from that platform is the long-term average requests over time. So the long-term average usage of those applications over time. So if you can think about it, a platform provider may be charging based on monthly average requests. And to put this in picture, so I've read about this through Mark Brooker's blog. He is a distinguished engineer at AWS and also one of the original authors of AWS Lambda and Firecracker, and so he pointed out this idea of how multi-tenancy helps us close the peak to average gap. And as you can see here, I basically created a random set of numbers between one and ten representing average, representing a load at any given time. And the thing to focus on with these graphs is if you look at the right one, you can see that the two lines there, one representing the peak usage of resources at any given point and the bottom representing the average, you can see that the two lines are farther apart on the right and closer together on the left. And the only difference here is we've added a couple more tenants. So the more tenants we add to a physical hardware system, the closer our peak traffic gets to our average traffic. And that's because we can expect that each of these tenants are uncorrelated. So they're being used at given times to the point where their usage flattens out. And if we can also think about the waves of cloud computing in terms of multi-tenancy as well. So we started off with the invention of virtual machines which kind of kicked off the cloud computing paradigm. Now I can take the same set of physical hardware and split it up into virtual machines and have applications running isolated on each machine. Then we moved to the world of containers which allowed for microservices to take fold. And now we can have a portable container that is not only portable but isolated and we can have more on the same physical hardware. And you can think of the next wave as being a wave of a better serverless and as the unit for that being WebAssembly. So now we can have thousands of WebAssembly functions on the same physical hardware and it's just as isolated and it's just as portable. And you may be looking at this and think that something's missing. So if you're familiar with AWS Lambda, the unit of AWS Lambda is a micro-VM, the firecracker micro-VM, and we could slot that in right next to containers and WebAssembly. And actually I want to talk a little bit about how firecracker micro-VMs came to be. And there was a 2020 paper that came out called Firecracker Lightweight Virtualization for serverless applications in that the firecracker team basically discussed how they came about inventing the firecracker micro-VM. And it started with a discussion on the lines of what does an ideal serverless unit look like? What are the characteristics we need to select for? And they came down to six units, six characteristics. So they said it needs to be isolatable so it can live next to the other tenants. It needs to be, we need to be able to have a high density of them on a single node. We need to have as near to native performance as possible because we're serverless event-driven instant. We want this to be as compatible. You shouldn't have to change your application to switch to this serverless paradigm. Fast switching, so this is where I should be able to start, run, and bring down this serverless unit as quick as possible. And soft allocation. I should be able to over-commit resources on this one machine because I have that multi-tenant assumption that not all of them are running at once. So in that same paper, they go through and assess Firecracker Micro VMs according to these characteristics. So I went through the same and did it for WebAssembly and there's a few things I want to point out. For one, you can see both are sandboxed. So we have these isolated secure units. Furthermore, while we can put thousands on a single node for both, you can see that we can get more out of the same set of hardware with WebAssembly. So with a lot beefier of hardware, they put 1,000 on a single node. While with just 8 cores and 32 gigabytes of RAM, when we were load testing Fermion Cloud, we were able to have thousands of WebAssembly applications. And the final, the most important one to notice here is that while it takes 125 milliseconds to start up from start to end of a Firecracker Micro VM, it takes less than a millisecond to instantiate and run a WebAssembly module. And this is why I was saying WebAssembly really is this ideal unit of serverless that we've been waiting for. And not only when we think about serverless do we want to think about the functional characteristics it should have, but we also want to think about what makes it adoptable. What are the characteristics that are going to make you as a developer want to switch to serverless and use it? And those can be, first of all, you want to be able to use your language of choice. You should be able to run out any operating system and platform and it should be portable from one cloud to the other. I should be able to take the same serverless function and run it on one cloud and then run it on the other. And it should also have an amazing developer experience. And these are all places where WebAssembly really shines as many people have probably heard about today. So with that, I want to pass it off to Joel and see if I can properly switch this video. Oh, wonderful, I was nervous about that. And Joel's going to continue by talking about isolation and performance in context of serverless and WebAssembly. Hi, everyone. Sorry I couldn't make it to join you in person, but I will be on Slack if you have any questions or want to chat about any of this. The two things I'd like to talk about today are the isolation model we pursue for serverless types of workloads and the other is the performance implications of implementing that isolation model. So to start with, let's look at the two types of isolation that we would like to see in a serverless system. One is per-tenant isolation, which is to say applications deployed by a given tenant would have no way of interfering, whether deliberately or accidentally, with another tenant's applications, be that in terms of using too many resources or accessing data that should not be available to that application. And that's pretty much table stakes for any multi-tenant system. So kind of a given and certainly the baseline for any set system. The other kind of isolation is a bit more fine grained, which is each request to a given application should also be isolated from other requests to that application and of course to other applications. And what this gives us is guaranteed statelessness across requests so that we're able to operate them in parallel and concurrently across a wide range of hardware without breaking the semantics or requiring state to be propagated in a distributed environment. The other advantage is fault containment, which is to say a request that triggers a bug in an application should at most affect that request and not any other request happening concurrently for that application or subsequently. And then finally, this contributes to defense in depth security, security vulnerabilities in an application should be at most exploitable in terms of the data available to a single request rather than to other requests. But this sort of fine grained isolation is not just useful for functions as a service. We've also found it can be useful and no surprise in web browsers. Indeed, that's what WebAssembly was originally invented for. In particular, we don't expect that when we load a new website, the application code downloaded from that website would have access to other browser tabs or to previously visited websites. Another example where this is useful is a multi-tenant database or even a single tenant database with multiple users where we want to scope the available data for a user-defined function such as an aggregate to what that user has permission to access. Another is blockchain-based smart contracts. And then finally, any application that involves a third-party dependency can benefit from this type of isolation if you have a library that needs to encode some of your application's data. It doesn't necessarily need access to your entire application state or to the network or to the file system. And a strong isolation guarantee will ensure that it only has access to what it needs to do its job. So when we implement these sorts of fine-grained isolation scenarios, it does come at a performance cost and some of the most exciting research in systems design these days is reducing that cost to a minimum. And we've done some of our own benchmarks and optimizations, which we'd like to share with you here. And in order to understand some of those optimizations, it's worth looking at the pipeline that is invoked when a serverless function based on WebAssembly and using WasmTime, the WebAssembly runtime, what that function pipeline looks like. To start with, at build time, we do as much work as possible, especially for interpreted languages such as Python, where there's a non-trivial amount of work that the interpreter needs to do upfront to run any application code. And as much as possible, we want to use tools like Wiser to do that work at build time rather than each time the function or application is invoked. The next step is WebAssembly bytecode to native machine code compilation, which is an essential step for optimization but also very expensive, and it's something that we want to do once and cache the results of if possible. And it turns out that's quite easy to do with WebAssembly and WasmTime. The next step is pre-instantiation, which involves linking guest functions and host functions together so that they can call back and forth between them. Then we do instantiation, which actually allocates the memory needed for the invocation. And then finally, we invoke the function which involves some context switching from the native host to the guest environment. And to the right here, you do see some timings for a simple Rust-based HDB handler just to give you a ballpark idea of how long these take at each step. So some of the optimizations that we found particularly useful, I already mentioned Wiser, especially for interpreted languages. In WasmTime, you also have the opportunity to pre-compile a component and either cache it on disk for later efficient use or keep it in memory. In addition to that state, you can pre-instantiate a module or component and keep that in memory as well, which saves some time in terms of linking. And then also allocation pooling reduces the cost of instantiation by reusing memory across instances, taking care to zero out the pages so that no information is leaked. And this can be particularly useful for amortizing the costs of system calls in order to do that allocation. And then finally, what we found particularly useful for large .NET apps is tweaking the heuristics that WasmTime usually does to determine how to handle statically initialized memory, whether via memory mapping or via sort of an iterative process that takes quite a bit longer. And we had to tweak the settings on that to ensure that even large .NET apps benefited from that memory mapping. So here, just as a comparison between web assembly-based isolation and other forms of coarse-grained isolation such as containers and VMs, this is a table that kind of gives you a sense of the order of magnitude difference between some isolation models and others, going from VMs to containers to WasmTime and its various optimizations, and finally to native, where there's no isolation at all. Ideally, we'd like to continue to reduce the overhead as close to what we see with no isolation as possible. And one particular proposal that has been made recently is something called Hardware Fault Isolation which is a proposed extension to various instruction set architectures such as X86 and ARM which would achieve user-space sandboxing that is hardware-enforced. And what that would mean is when invoking a function within a process that the invoker, the caller of that function would have the opportunity to restrict the parts of that process's memory that the function has access to. It could grant access to, say, three regions if those three regions were the only items that the function was expected to need. And the switching from a set of permissions on these regions back and forth from caller to callee and back again are called context switches, very much analogous to the sort of context switch that is traditionally done with user-space into kernel space and back. However, in this case for WebAssembly, what's being proposed is that that context switch could be reduced to almost zero or actually zero in many cases which is exciting and could definitely narrow the gap between a non-isolated and a fully-isolated invocation. And with that, I'll hand it back to Kate. Thanks so much for your time. And again, I'll be on Slack if you have any questions or just want to chat. Thanks a lot. Great. So a lot of those optimizations that Joel mentioned, we've started to apply not only to SPIN but also to Fermion Cloud, our hosted platform for SPIN applications. And so I want to go ahead and share a demo of us basically seeing all that we've talked about. So in the demo, we're going to have a single-node Nomad cluster that I've set up on this MacBook Pro here. And I've deployed Fermion Cloud to it. What that consists of is, like I said in single-node Nomad cluster, also standard cloud services. And then I deployed 500 SPIN applications to it. So SPIN is not only a developer tool for building applications but also a runtime with Wasm time embedded. So SPIN is also running on it so that you can then invoke those applications. And I've gone ahead and pre-recorded this because 500 on this laptop definitely is a bit beefy to present right now. So as you'll see, this is the Nomad portal right here. So we have those cloud services. So we've got traffic for networking, Bindle for our registry for our WebAssembly applications, Journal for logging, and then SPIN-MT is this multi-tenant version of SPIN we have so that we have one instance of SPIN for all of our applications we're deploying. And then we have a garbage collection cron job just to clean up applications when they're deleted. So at this point, I'm gonna go ahead and deploy 500 SPIN applications. And each one is deployed as a Nomad job which tells us just to do a lot of those optimizations on each one. So now we've optimized and prepared every single application. And all of these are HTTP Rust applications. So they're just a simple Hello World application, just two megabytes. And now we can see here the resource usage of my MacBook Pro here as I'm doing this demo. And you'll see the CPU is around 10%. And that's what we're gonna want to watch as we scale up to 10,000 invocations of these apps and then back down to zero. So all you can see here is I have a list of basically the endpoints of all the applications I've deployed in just this text file. And we'll just test one out briefly to make sure that it's live. And then for this, for our load testing affirming on cloud that we did, we created our own load testing tool. And that's because we needed to be able to test a whole suite of endpoints and also wanted to record the latency of every single request so that we could track improvements better. So we're gonna use that tool. And it's very similar to other load testing tools like Siege where you can say how many concurrent requesters are there? 25 in this case. How many requests for each one of these tasks? We're gonna say 400, so for a total of 10,000 requests. And then just basically a minute delay of 0.01 seconds between each for some jitter. And then we just point to that app file that had the list of all of the endpoints that we want to round robin test. And now we're gonna go ahead and enter and we'll see that we're running this test. You can see the CPU spiking up almost to like 90% from that 9 to 10%. And here you'll see that we did 10,000 total requests in just 5.83 seconds. So there is our demo. And with that, I just wanted to say we've mentioned this previously, but we have our happy hour in junction with Docker. So if you have any questions and want to talk more about it, that's a good place to do it. Or just to talk about Wasm in general with everyone. And are there any questions? Let's give Kate a huge round of applause, everyone. Questions? So introduce yourself, please. Hi, I'm Frank. I would like most servers I come across, they need to reach out to something, a database, another server, something. Is there some way to keep connections hot because doing a new handshake, especially if it's TLS, takes a lot of time? Okay, so I think the question was I need to reach out to something. So I have to make that request, make that handshake. How do I keep all of these instances hot? Was that your question? So my answer to that is that it doesn't need to be hot. So the something that you're hitting, that application, that SpinApp endpoint, what you're hitting is Spin itself. So Spin's listening, and when you hit it with a certain endpoint, it'll know which WebAssembly application to invoke based on the endpoint URL. So Spin is what is listening. What is that HTTP server? That's Spin, which is just a native binary. And when it gets a request, a triggered event for that HTTP endpoint for that application, that's what then spins up or runs that WebAssembly module. If that answers your question, does that get to it? Okay. Another one in the back here. So Larry Corvallo with Robus Cloud, you compared Firecracker and Wasm, what are other companies doing other than Amazon who uses Firecracker? What are Microsoft doing, Google doing? And in that context, why cannot they start offering Wasm on their platforms instead of Firecracker? Yeah. So I cannot speak for why or whether there will be support in some of the big cloud providers for switching to kind of fast with WebAssembly. I would expect, given the criteria that Firecracker had, with these being the criteria and seeing where WebAssembly stands, I can see that being an obvious next step is supporting WebAssembly. As for what other cloud providers are using besides micro VMs, I'm actually not positive on that one. What Azure functions is, I'm not sure, but I know we have some Microsoft folks here who might be able to answer that one. Hi, I'm Evin. I might get this wrong, but since you're saying spin is the listener, do you have any measurements of how much the latency is for the request that that takes so you can still maintain a low millisecond rate when you spin up stuff? Sorry, I didn't fully hear that. Were you asking if spin is the listener, how long does it take for spin to then improve? Yeah, because it has to be some latency since spin is listening, and then it starts the machines and what the measurements are for that. Great, that's a great question. So how long, just to repeat it, how long does it take for spin to then invoke the WebAssembly module? What is the latency there? So from spin then executing the module, that itself is a call to a component using the WebAssembly component interface. So that itself is also microseconds. So from calling spin to then run the application is still milliseconds and submillisecond times. As for how long does it take me as a user, affirming on cloud, to call a spin application and get the response, that's in the 52 millisecond range. And that's because when we bring in cloud services like traffic and TLS handshakes, that's when it starts to get heavier. And that's where us personally, we're looking at trying to optimize that. I'm William. And today, we learn a lot about WASM and runtime, the isolation. My question is more about the ecosystem. For example, a cloud function I will use, for example, a sent in an email, a PostgreSQL. What is the steps of the art of integration of such a library we will usually use? Okay, to make sure I understand the question, are you asking, I have my favorite libraries, can I go ahead and use them with WebAssembly today? Yeah, that's it. So I think that ties into kind of the six criteria, the compatibility block for serverless. And right now, the WebAssembly systems interface is definitely under development. There are some capabilities that aren't there yet. For example, multi-threading. So your favorite libraries might not yet be supported by WASI. So I think that is definitely something to check in. The ability to port your everyday application to WebAssembly might not be there, especially if it's a stateful, long-running application that does a lot of things. However, if you're already in kind of a serverless function, functions of the service, event-driven handler paradigm, the likelihood that you're able to port it is probably higher just because of what those applications tend to do is much simpler. That's a great question. Anyone else? Oh, wait, one up front here. I'm sorry. Kate, you're doing awesome. Thank you. Yeah, I mean, you're getting really great questions today. Hello, Kate. Thanks for the talk. I really loved the reference to the equation of the green software foundation in the beginning. You mentioned that the scale to zero will reduce the operator energy, but the embodied emissions for the hardware I need. And they also reduce because I need less hardware in the end. Yeah, so the question was, we saw that calculation for the software carbon intensity and we saw operational emissions is obvious. That's scaling to zero. What about embodied emissions? So embodied emissions, the point I was trying to make there is that when we have high multi-tenancy with serverless, we don't need as much hardware because the idea is that each one of those workloads is uncorrelated so we can have that soft allocation where we can over-commit resources because the assumption is we're only hitting certain things at certain times and that's not correlated time periods. Does that answer the question?