 Hey everyone, welcome to this webinar on a greener cost-effective cloud with serverless WebAssembly. My name is Sohan Maheshwar. I'm a dev advocate at Fermion and with me is Kate. Kate. Hi everyone, my name is Kate. I am a software engineer here at Fermion. So today we are going to talk to you about a greener cost-effective cloud with serverless WebAssembly. Now if you're not familiar with the concept of either serverless or WebAssembly, well that's what we're here to talk to you about today. So first things first, let's actually talk about sustainability in tech and essentially the green software foundation has defined this term called software carbon intensity and how they define it is C per R. Now what C is? C is carbon and now this is a rate, right? It is not a finite or an entire value of the carbon intensity of your workload in the cloud but it is a rate of which this actually happens. So for instance if you're a company that has an API well this is the rate at which you know carbon is being consumed by your workload. Now the carbon in itself, that formula can be expanded to O plus M per R, right? O stands for operational emissions. Now these are based on the emissions caused by the energy consumption of your workload and M is something called the embodied emissions and this essentially is the emissions of the hardware really needed to operate a software system such as the one that your workload is actually running. Now this is an important formula and an important thing because when it comes to sustainability in technology we have to have some metrics to really sort of determine whether your software is sustainable or not or whether over a period of time there is a reduction in the carbon footprint of your software. Why this actually is important? Well there's a recent study that showed that the software industry is responsible for about 3% of carbon emissions in the world which is actually equivalent to that of the airline industry. So it is a fairly large number and in today's terms with things like climate change, sustainability and tech plays such an important role. So today we're here to talk to you about how serverless and WebAssembly and both can help with making your workload a little more greener and cost effective as well. Now the Green Software Foundation actually defines three actions that you and me can take as people who create software to make your software a little bit more sustainable. The first one is energy efficiency which essentially means making your software use less electricity to perform the same function. The second one is hardware efficiency where you make your software use fewer physical resources i.e. hardware to perform the same function and the third one is something called carbon awareness where you're essentially time shifting or region shifting software computation that takes advantage of cleaner more renewable or lower carbon sources of electricity. What this essentially means is when you're choosing a cloud provider for instance if you choose something that's in the US the carbon and the sources of power for the data center in the US might actually differ to the ones in say a part of Asia. So just by shifting the region where your workload is deployed you can be more carbon or less carbon intensive and more carbon aware. Now the reason we are showing you this particular slide is that we want you to take a look at everything else after the slide in the lens of either any energy efficiency or hardware efficiency and we'll talk about how serverless WebAssembly would make you both energy efficient and hardware efficient. Now if you're curious about what WebAssembly actually is few things you need to know. One is then this is the boring answer it is just another bytecode format it's designed as a portable compilation target. Essentially what happened was sometime in the mid 2010s a bunch of companies got together and they came up with this portable format that could run in the browser. The idea was you could write code in any language it would compile to WebAssembly that would run in the browser. The idea also was you could write it and it would compile once and that compile program could run on any number of targets. Now this could be different architectures like ARM, Intel, whatever it could mean different OSs like Windows, Mac, Linux. It could also mean things like Kubernetes and cloud and all of that. So the idea really was to compile it once and then run that code anywhere. And you might hear me refer to this as wasm from time to time. It is just another name for it. So WebAssembly wasm they refer to essentially the same thing. So how does WebAssembly really work? Well you write code like I said in any language this could be in Python Rust TypeScript and there is this compiles to a wasm format. You will get a dot wasm file and this dot wasm file essentially can be run in any place that has a runtime that supports it. So in a sense quote unquote a virtual machine that can support a wasm runtime. And conceptually yes this can be similar to how the idea of a JVM or Java virtual machine ran. The one thing that you do need to know is it is security sandboxed by default unlike unlike the original idea for a JVM. So by default wasm files have a security sandbox which means that nothing can access it unless you give it explicit permissions to do so. Now the thing is all the idea behind WebAssembly was the fact that you could write it in multiple languages. It was a portable compilation target. It was security sandboxed. All of this made it really good for you to run it in the browser. But it also means it's actually great to run outside of the browser potentially on the server side. So in around I think 2018-19 there was this new thing called WASI that was developed which is WebAssembly system interface. For something to run in the browser you need access to things like files, operating system features and sockets and clocks and random number generators. This didn't exist prior to 2019 but once WASI actually was implemented you could actually run WebAssembly on the server side and this opened up a whole new world of possibilities because suddenly WebAssembly was independent of browsers. It didn't need a JavaScript API or a web API and WASI essentially extended WASM's sandboxing to include input-output. What this essentially means is when you run WebAssembly on the server side there are things that actually make it really effective. And these are the four things we think that make WebAssembly ideal for you to run it on the server side. The first one is binary size. And again with the lens of energy, efficiency and sustainability things like binary size make such a big difference when you're doing software at scale, when you're deploying software to like millions of users or to like thousands of applications. Reducing your binary size makes such a huge difference. And just to give you a benchmark a simple Rust Hello World written using WebAssembly is only about two MB. And if you compile it ahead of time to assembly level code, when you know what operating system and you know the details beforehand, you can actually reduce that to around 300 KB, which again is a very small file. If you extend that to SPIN, which is an open source framework for writing WebAssembly, you can do a simple HTTP API app with a 2.3 MB just in time compilation or a 1.1 MB ahead of time compilation. So again, really tiny, tiny binary sizes, which again is more energy efficient. The tradeoff is the startup time where the startup time is comparable with code that is compiled natively. In a recent benchmark, we saw that it's about only two and 2.3 X lower than a startup time for a native code. The other good thing is it's completely portable. So you can build the code once into a wasm file and run that anywhere. And that same build works across different operating systems and different platform architectures as well. And like I mentioned, the default security sandbox applies to any wasm app. So it is only a capability based security model where you explicitly have to give your app permissions for other files or anything to access it. So even if you're writing a HTTP API, you can say that only this particular URL or this particular HTTP API can access your particular Web Assembly component. Just to give you a visual comparison of how a WebAssembly file would look in comparison to say a Docker file, we did a benchmark where we took a simple Python Flask app. And this is what the Docker file looked like in terms of size. And this is what it looked like in terms of a spin app. And it is a fair difference. Essentially, we saw a difference from 23 MB to 550 kilobytes. And that is a significant difference, again, in the lens of being more energy efficient with writing your code. I can maybe just show you a quick demo of how spin would work. So I'm going to open my terminal here. Spin, like I said, is it is a developer tool for building WebAssembly, microservices and applications. It's hosted on GitHub. If you go to GitHub.com slash for me on slash spin, completely open source. So feel free to contribute and join in the discussions. I can show you a quick demo of how spin actually works. The developer experience basically boils down to three main commands. I can do a spin new. And essentially, this gives me a list of preexisting templates that I can use to get started very quickly. I can maybe just choose Rust. And as you can see, there are different languages that you can actually write code in. So I will choose Rust. And I will call it webinar demo, because, you know, it is a webinar and this is a demo description. You can choose a path that would trigger this particular spin app. In this case, I'm just going to give it slash, which means any request to this app will trigger it. And yeah, let me just webinar demo. And this opens up my VS code. And I can just quickly show you how this looks. So essentially, every spin app has a manifest file, which mentions all the components that are in it. And it has source code. And I can actually say hello, CNCF here. And we can perhaps run the app. All I have to do is a spin build. I did say there were three commands. So this is sorry, I need to CD into the thing and just do spin build. And this essentially builds my app. And I'm going to use my third command, which is spin up, which will actually run it on localhost for me to test the app out. So as soon as this is built, I'm just going to do spin up. And oh, I already have something running on this. I'm just going to do it again. Spin up. And as you can see, I'm just going to open this in a browser window. Yeah, you can see it here. It says hello, CNCF. Right. And that was pretty much how spin works. And we can actually look at this file, which is, yeah, again, fairly small. You can see the source code. The wasm file is in the target folder here. I think it is in this folder right here. There we go. Oh, no, it's in the release folder, my bad. Yes, minus L. Yeah, you can see the dot wasm file is just about two MB, right. And again, looking at it in the length of being it more energy efficient, small binary size for our file. So we spoke about the WebAssembly side of things and why WebAssembly is ideal for serverless. Serverless in itself is both more energy efficient and hardware efficient and thereby also cost effective. And to talk to you a bit about that is Kate. Hi, everyone. So we just talked a little bit about WebAssembly. And so I want to take a couple steps back and talk about other ways that you've thought of packaging your applications or isolating your applications. So if we go back to the slide there, we can see that one way that we're really familiar with isolating your applications is virtual machines and virtual machines kind of initiated cloud computing. It started this concept of being able to have multiple applications on the same physical hardware. And the way it did that was by isolating the entire operating system. And with virtual machines, there was some amount of scalability. You were able to bring up a new virtual machine in a few minutes. But it wasn't something where you would bring up a bunch of virtual machines instantly. And this resulted in more monolithic types of applications that were isolated in their virtual machine and stayed there and contain and continue to preserve those physical hardware resources. Then we moved on to a new way of isolating our applications, which was containers. And this created a smaller, isolatable unit. And you would just package in your application along with its dependencies. And this created a situation where you could start up new instances of your application in a few seconds. So you had a greater ability to scale up your applications and we had the microservice paradigm come out of this. And now we've moved on to a WebAssembly world where it takes milliseconds to start. And you're able to isolate them in these small WebAssembly unit with linear memory isolation. And this creates a world where we can start up applications and instantaneously right when they're needed. And this comes into the world of serverless or functions as a service. And so if we go to the next slide, we can talk a little bit about what that serverless word means. So serverless can be one of two things in a way, it can be a type of application like I was just explaining, where it's short lived and event driven functions that happen upon a request, or it can also be thought of as a development development model, where you don't need to manage any servers, or any of the back end infrastructure, and therefore it is serverless. And these two definitions of a type of application and development model come together to describe functions as a service or fast, which you also may be familiar with that term. So whether you've heard of it in the form of fast or serverless, we're talking about the same thing. So why should I care about serverless? So we talked about it a little earlier, it's moving forward in that smaller isolatable unit that's faster and event driven. But also when we talk about serverless, we can talk about how with serverless, you can be more and more focused on your business logic. You don't need to think about all the infrastructure behind it, you just think about your business logic that you want to execute on a certain event. Beyond that, serverless leads to faster scaling, like you're saying, it can start instantly. And therefore, if you need to scale up the number of units you have, you can do that instantly. And then finally, you get a better utilization of hardware. So the more units that you can put on a machine, the better utilization you have of that hardware. And all that comes together to lead to cost savings, this wonderful piggy bank we have, and environmental benefits, like we were saying, it has two of those parts of our formula, it has operational emissions are reduced, because you're only running things when you need, and embodied emissions are reduced because you need less hardware to do more. So digging into that less hardware to do more. One way to achieve that is through greater multi tenancy. So multi tenancy is the concept of where you can have multiple independent applications and often independently owned, so they might be owned by different tenants as well, running in a shared environment. And what multi tenancy does is it brings the cost of running a system closer to the value that you get out of a system. So you can think of the value of a system as the long term average traffic of a system or long term average use of a system. And the cost of running a system is the peak use at any given time because your system needs to support the maximum load at any given moment, even though what people get out of the system is about the average amount that it's used over time. And a lot of in this idea of focusing on multi tenancy and the idea of bringing the peak closer to the average. There's a great blog by Mark Brooker that we've linked here if you want to kind of dive into that concept. So to kind of visualize this idea of bringing the cost of the system closer to the average use of the system or what you get out of that system, bringing that peak closer to the average, we generated some random numbers here in order to kind of simulate what this might look like. So basically what I did is I used a random number generator to generate a load between one and 10. And then I at the same time then plotted what was the sum of all those loads between one and 10, which is the yellow line or the uppermost line on the graph. And then what is the average? So we can see on these these graphs that you have two horizontally dotted lines. The top horizontal line is the peak use of the sum use of a system at any given point. And then the lower line is the average use of the system at any given time. And the goal is to bring these two lines as close to possible, so that the peak use and given time is as close to the average use at any given time. And you can see when we have two tenants or just two randomly used amounts that the peak in the average, the peak is about 1.7 times the amount of the average. However, when we throw two more tenants into the system, you can see that the use starts to balance out. We're seeing these peaks and these eddies kind of forming one line. And that's creating a situation where the peak and the average are closer together and the ratios only 1.26. So just this mock scenario shows that the more independent users you add to a system, the more likely they are to balance out to the point where that what you're getting out of the system is really close to the what what you're spending on a system or what you need to support in that system. And all this leads up to the idea of how can we get as many tenants on a system as possible. And so we talked about this briefly when we're talking about VMs versus containers versus WebAssembly. But to see this visually, we can see how you can think of each wave of cloud computing as also increasing multi-tenancy. We can have a few virtual machines on our hardware with virtual machine isolation. And we can have dozens of containers on one unit of hardware with containerization. And with WebAssembly, we can have thousands of applications, WebAssembly applications or WebAssembly functions on a single unit of hardware, because it has a higher level of multi-tenancy. And if you're looking at this image here, you may think that something is missing. And you might be familiar with micro VMs, which aimed to be the next wave of multi-tenancy beyond containers. And let's talk a little bit about micro VMs, or you may be most familiar with AWS's firecracker VMs. So when the firecracker team at AWS was conceptualizing the firecracker micro VM, they actually created a paper to kind of discuss the process that they went to went through. And it's called firecracker lightweight virtualization for serverless applications. And the firecracker team at AWS released that paper in 2020, firecracker micro VMs came out around in 2018. And basically what they walked through in this paper is how they went about defining what this ideal serverless unit would look like this unit that could be instantly started, instantly started up and achieve a high level of multi-tenancy. And they defined six areas, or six characteristics of an ideal serverless unit. And so I want to talk through each of these. And then maybe you'll recognize how WebAssembly could satisfy these as well. So first of all, an ideal serverless unit needs to be isolatable. So we've talked about that with VMs, they're isolatable with containers, they're isolatable and with WebAssembly, they're also sandbox. So the key thing to a serverless unit is that it needs to be able to run next to all these other applications and be fully isolated from it. You need to be able to support a high level of density on a single machine. So you could run thousands on a machine with minimal waste. So we're using all of those hardware resources. You want to have as near native to performance as possible. So you don't want the fact that you're isolating this unit to be too much of a detriment to its performance. You want it to be compatible. So if I am switching to serverless, that paradigm of application, I shouldn't need to change the way I build applications too much, I should be able to use my favorite libraries and use my same operating systems and platforms as I've always loved using. It should be fast switching. So this is the idea that I should be able to start, execute and bring down this function or the serverless unit as fast as possible. And that's part of the utilization of hardware. So I need to be able to run it and bring it down as fast as possible so that other applications are also able to get their share of resources and I'm not hogging them all. And then soft allocation. So this is the idea that you should be able to over commit your resources. So I should be able to schedule 1000 applications to a node under the assumption that even if the node can only handle, say 250 at once, the idea is that these applications are all independent enough that we can assume that only 250 are being used at once. And that also assumes that the other 750 are taking up very minimal resources when they're not being executed. So if we go to the next slide, we can then look at this comparison explicitly between Firecracker Micro VMs and WebAssembly. So the team at Firecracker made a unit that satisfied all these characteristics. And WebAssembly uniquely also satisfies all these characteristics, making it a very good serverless unit. Both of them are isolated, they have different mechanisms. So WebAssembly with its linear memory and capabilities based security that Sohan mentioned, and Firecracker Micro VMs with a virtual machine monitor and KVM. They both can support great density. The paper shows that they were able to support 1000 on a node. And so were we when we did stress testing, but one thing to note is with Web Assembly, we were able to do a lot more, or we were able to do just as much with a lot less resources. So in our load tests, we only had a 32 gigabyte RAM machine and only eight cores while they had a lot beefier machines when they were doing their tests. With performance, both of them are near native. And fast switching is another part where WebAssembly once again jumps ahead. So with a Firecracker Micro VM, it takes about 125 milliseconds to start, which is why oftentimes they'll do some prewarming to bring that down a little. While with WebAssembly, you don't need any prewarming and you're sometimes below one millisecond times and oftentimes around one, one to five milliseconds. Soft allocation, we didn't test that in our load testing that we did, but with Firecracker Micro VMs, they were able to oversubscribe or the hardware resources as they desired. And then compatibility. While Firecracker Micro VMs are limited to Linux and KVM, WebAssembly is OS and platform agnostic. However, library support is where it maybe takes a downfall is that these libraries need to support Wazze and more and more are. But there is still some limitations there. So we talked about the characteristics of an ideal serverless unit. So what we need to be built into the technology in order for us to want to adopt it or use it. But what makes a unit adoptable? What makes me as a developer want to use a completely different way of building my applications? And there's four things we can think of. One, it needs to be language agnostic. I shouldn't need to learn a different language or limit my language choice in order to use a new type of software. It should be cross platform. I should be able to run it anywhere I want and not need to switch up my environment. It should be portable. So I need to be able to distribute my application from one location to another. And then finally, I should be having as good of a developer experience as possible. And what comes with a good developer experience is good tooling, a robust ecosystem of libraries ideally. It needs to be very debuggable and easy to test locally before I put it in my production environment. And we're seeing with WebAssembly more and more developer tooling coming out. And the portability really does make it so that there is an incredibly good developer experience for serverless that we haven't had in the past. So to kind of bring this in together, we wanted to show a demo of how we were able to create high levels of multi-tenancy with spin applications. And a lot of this came from we were building out a cloud and we wanted to know how many spin applications can we place on a single node in our cloud? How great of a unit is this WebAssembly unit? And so in this demo, I want to show how on my MacBook Pro here, and we prerecorded this, I basically set up for me on cloud, which is our environment for hosting spin applications. So what that entailed was a single node, nomad cluster, your typical cloud services, logging services, storage, networking, and spin. So spin is both a tool for creating applications, and it's also a runtime. So inside spin is the bosom time runtime, and that it will execute and isolate our WebAssembly applications. And then we deployed 500 spin applications to this one machine here. And in just 10 seconds, we brought up and down 10,000 instances or invocations of those spin applications to show how fast it's scaled up and scaled down. And we'll see CPU spike up and then immediately drop all the way back down to show how instantaneous the resource usage jumps and comes back down. So you can see here, this is the nomad portal brought up locally on my machine. And we have those cloud, those typical cloud services, you have traffic for networking, Bindle for application storage, which is specific to WebAssembly, journal for logging. And then you can see spin MT, which is our multi tenant spin. So this one spin binary that for executing all of our applications. And then we have spin garbage collection, which is basically just cleaning up your spin applications after you've deleted them, which is another way of removing any extra resources that aren't needed at the moment. And so I'm deploying 500 spin applications to our nomad cluster. So you can see now we have 500 nomad jobs and each one of these jobs basically just did these optimizations and that pre that head of time compilation that Sohan mentioned, to make sure that they have even faster startup times. So now we're looking at our resource portal usage portal for nomad, and we can see that we have CPU on the left and memory on the right. And CPU is pretty low right now, it's around like the 15% point. And just on the right, we have our terminal with the endpoints of all of our applications. So we can see that there were 500 and they're all in the format of an HTTP Rust application. And just the the URL and we can see that when we ping one, we expect a simple hello for me on we had hello CNCF earlier response. So for load testing, our cloud, we built our own load testing tool, we wanted to be able to hit a bunch of endpoints, which a lot of tools didn't support. But you can see that it has the same flags as many load testers you may be familiar with such as these. So we're here saying that we're having 25 concurrent users sending requests, and that each concurrent user is sending 400 requests. And then we're adding a point zero one second jitter here to throw off a little bit there. And total that it's going to send 10,000 requests, and it's going to send 10,000 requests to that 500 endpoints here. And you'll see a CPU now spiked to almost 90% here. And then now it's coming all the way back down in just that 10 second interval, we pinged in that actually it ended up being under six seconds. So 5.8 seconds, we pinged 10,000 in some requests to our 500 applications. So that just shows how fast WebAssembly is and how scalable we only used all that CPU when we needed, and it scaled up and down as we expected. So very dynamic, very fast, and very resource efficient. And so now I'm going to pass it off to Sohan again to talk about another resource that you can optimize with WebAssembly. Yeah, thanks, Kate. So I mentioned earlier in the talk about those two lenses to at which you can actually look at you know, being more looking at your software carbon intensity, which is being energy efficient, and being hardware efficient. So serverless AI inferencing is potentially one way to actually look at this as well. So AI is the talk of town. Everyone is talking about Generative AI and large language models. For you to actually run your own large language models, you'll have to use powerful GPUs. So typically companies and programmers are renting powerful GPUs that are running in the cloud. And these are running for a longer period of time and requests are being made in terms of either inferencing or like image generation. Now, regardless of whether you make one request a month or like a million requests a month, these CPUs are actually running in the cloud for people to use. So another way of actually looking at this is to do this serverlessly as well. So instead of having GPU running in the cloud or, you know, running containers for a longer period of time, we were actually able to time slice a GPU that's running in the cloud and use it for AI inferencing. Because you know, a dedicated GPU in the cloud is both carbon intensive and it's low and it can be expensive. So how this works essentially is there is a time slicing and for any request, you can actually essentially make AI inferencing request to a large language model that's running in the cloud. And you will get, you know, whatever AI inferencing request you make. And we at Fermion, we actually built this along with a partner called Civo. Civo is a hardware partner and they essentially use a company called Deepgreen. Now what Deepgreen does is fairly it's it's quite interesting if you ask me. So they turn unwanted waste heat from one system into a desired warmth for another system. And what this means is they're using cloud GPUs and their actual GPUs are immersed in a particular kind of oil. You can see the image there and the heat from these GPUs are actually captured and they're transferred to other things that potentially need heat. Now this is a startup and they're actually in production right now. And they're using the heat from these CPUs to warm a swimming pool somewhere in or near London. So it's a very real world use case of using like an energy in this case heat from running something in a GPU to a place that actually needs warmth in this case, a swimming pool. And what essentially this could potentially unlock is AI inferencing that is actually carbon neutral. So when you make an inferencing request to a large language model, not only are you using serverless computing to be energy and hardware efficient, you're also transferring some of the heat generated from running that piece of hardware into something useful. Right. So the cost or the carbon intensity of that LLM request in itself could be either carbon neutral or very close to being carbon neutral. Right. So this is a great real world use case of how both serverless and WebAssembly came together, you know, to make software a little more greener. So we have thrown a lot of interesting and hopefully new, informative, you know, concepts at you today. For some resources, you can actually look at building your first WebAssembly app using spin. It's very easy to get started. Just go to github.com slash fermion slash spin. Also check out serverless AI inferencing fermion.com slash serverless AI. And if you're interested in reading more about carbon neutral AI inferencing, we also have a blog post that goes into the in depth, you know, the nerdy stuff around like GPUs and energy and cloud and all of that. So check that out. Anyway, that brings to us an end for this session. Please join us on our discord server. We have a discord. If you go to fermion.com slash blog slash fermion hyphen discord. Feel free to connect with both Kate and myself on LinkedIn and reach out to us if you have questions about serverless, what WebAssembly or software sustainability from Kate and myself. That's it. Thank you and have a good rest of the day. Bye.