 Welcome to Wasm, Kubernetes, or what else, an enterprise architecture debrief. My name is Sean Isom. Today we're gonna talk a little bit about Kubernetes, a little bit about WebAssembly, a lot about how to build better applications through a more flexible architecture. This is kind of the latest installment in some of my talks about Wasm and Kubernetes. But we'll also likely be my last talk about that as I don't really work with Kubernetes anymore. Last month I left my role at Adobe to go work with WebAssembly full-time, working on some really interesting and unique applications. But there's still plenty of knowledge, many, many years of experience to draw on, to show some thoughts on this. So let's dig right in. I always like to start my talks with the quote and to form a thesis statement. And this is actually my own quote for the first time ever. All software architectures are bad, and that's okay. We can go home now, right? Problem solved. What do I mean by this? Other than it being a potentially inflammatory statement. Simply that there is no perfect architecture and that architectures are inherently ephemeral, ephemeral, they decay, they don't fix everything. Rigid architectures prevent you from being able to adapt to changing business requirements. And adherence to let's say some optimal architectures for consistency's sake might ultimately cost more in the long run. What do I mean by all of this? Am I trying to put software architects out of a job? No, not at all. Quick show of hands. This is a small crowd, but we'll see. Who here would say that they have a good architecture? Okay, that's one, two, yeah. Not very many people, right? We tend to remember the bad parts and not the good parts. If you had a good architecture, how long did that architecture last? How long did it stay good? Okay, so yeah, not a lot of confidence yet in that. How many, okay, so if you've seen good architectures, how many examples of these? Do you repeatedly see good examples of architecture or do things tend to decay over time? I think we all typically know the answer to that. Let's shift gears. How many of you would say you were working under, maybe bad's not the right word, but a suboptimal architecture for your use case right now? Yeah, that's what I would expect. Okay, what makes your architecture bad? Anyone wanna throw anything out? Don't be shy. Age, age is a good one. Did they start as a bad architecture? How has this evolved? These are all questions we should be asking ourselves as we're thinking through, can we build things that last? This is pretty pessimistic, right? Like, do good architectures even exist? It would be easy to write this off and say don't do architecture at all, if they're all ultimately going to be bad. But my point is simply that most architectures aren't flexible enough to adapt to changing business or user requirements over time, which really just causes you to need to redo them or deal with apparel of it, watching it devolve and do things that was never designed to do in the first place. So, okay, let's take another stab at that quote, that may be a little bit more applicable. Your architecture probably isn't as flexible as you think it is. Architecture itself is not the problem so much as rigidly adhering to an architecture that is designed around a certain system, a certain platform, certain bundled capabilities. The problem with this is as business requirements change, if you're tethered into a specific runtime, a specific hardware capability, what happens when you need to run that on another platform? Do you port all of your, let's say, bundled dependencies? Do you rewrite it? Do you settle for a hacked version of it? We build architectures to bring some sort of stability to the entropy of the chaos of business and we don't always know a whole lot about the execution environment at the time that we write it. So, architecture itself isn't bad but we should invert our traditional thinking about systems architecture as it probably isn't as flexible as you think. Here's an example I like to use of this. Who here has written a serverless function, like I'll use the canonical example of an AWS Lambda? Have you ever tried to run that code somewhere else? Other than your local box and some sort of, you know, Shem? Yeah, it's hard, right? Did you run the artifact directly? You know, have you been able to lift and shift it to other cloud providers? Do you have to rip apart the code and, you know, like structure a module that's got your interface with the raw cloud provider and a different module for your actual business logic? You know, you throw the dependencies in some sort of bucket and let the system deal with it. Like, there's all sorts of strategies for this but it's not easy to reuse that. So let's, now, this is a WASM conference, obviously. Let's shift gears to talk a little bit about how WASM can lead to that flexibility. Because you shouldn't, you should still try for that flexibility. And this is my thesis statement. WASM components represent a more flexible approach for adapting your applications architecture to changing business requirements. I'm not here to convince you to not do architecture or to build for Kubernetes or to build for necessarily an architecture around WebAssembly. My goal is to convince you to build your code for WASM by default, ship it as components, let it run in the environment that makes the most sense. What is a typical application? You've got code, which is hopefully mostly business logic and not boilerplate. You've got the dependencies that do the heavy lifting, which your code is orchestrating. You've got data that those two things are intersecting and acting on. You're likely processing or transferring input, pulling data, joining, calling, processing functions, storing, sending results. Essentially you're brokering where that data is going. This leads to dependency maintenance problems. Is anybody familiar with this graph on the right? This is from the component model repo. This actually isn't 100% up to date. Some stuff has changed in this in the last couple of weeks, but this is something called shared everything dynamic linking. And the basic idea is that when you've got an application dag of dependencies that rely on all these different components, typically you'd have to statically compile all of those together. If you got dependencies of dependencies, if you watched the earlier talk about dependency management, you see these huge tree-like structures where maybe you've got 10 different versions of libc pulled in, that's bad, obviously. And so this is really rearch-checking WebAssembly around the concept of being able to share these modules, each with their own linear memory, but within specific components, within specific capabilities for your application. This allows us to untangle our dependency graph, better isolate our business logic, and allow the system to take care of providing those dependencies at runtime. This kind of shifts the dynamic, lessens the burden of application maintenance. Like I used the analogy of a container, it's kind of opposite of this, you take all of your dependencies and you throw them in there. And so you as the application developer are responsible for providing a package, which is a group of all of these dependencies that you ship out at runtime. But with this model, the idea is like you're focusing more on your actual application's logic. And that's kind of the key goal of some of these more flexible architectures that we'd like to examine. And so this is where I get to this, because obviously the title of my talk involves Kubernetes. And you hear a lot, hopefully a lot less now, but especially earlier this year, late last year, like do I build for WebAssembly or do I build for Kubernetes? And I think that the whole wasm versus Kubernetes versus native versus what else, is kind of a silly question. Like what angle are you looking at this from? Are you looking at this from a platform engineering perspective? Are you looking at this as an SRE? Are you a pure engineer? Like I could see an argument where if you were part of a platform engineering organization, your job is providing compute as a product, for example. Okay, maybe this is a little bit more relevant question. Like what offering am I going to provide to my users? But as someone whose goal is maybe just writing that said business logic and being able to stitch these components together to form an application that hopefully does something greater than the sum of its components, why should you care? I mean, unless you're super tight into the Kubernetes ecosystem, super tight into, and I'm picking on Kubernetes, super tight into serverless, super tight into whatever ecosystem that you're building for, why not abstract those dependencies away and view this through a lens of just wanting to be able to write that code faster and better? You shouldn't be building, let's say Kubernetes native, okay, here's a better example of this. You shouldn't be building for Kubernetes native. You shouldn't be building for Apple Silicon native. You probably shouldn't be building for an AWS Lambda package if you're writing, let's say a filter that's acting on data. You should be thinking in terms of streams of what the format of that data looks like coming in, the format of the data coming out. And in terms of providers, that's then the back-end shim that's hooking into that data. Basic clean architecture. If you're Google, if you're Meta, pick on anybody from there if they're in the room. Maybe sometimes native makes sense. If you control your own hardware environment, you build everything from source continuously, you have a mature build pipeline, you can replatform continuously to adapting requirements, that's fine. You can get that extra 10% of juice out of your resources at that kind of scale. The abstraction will add to the engineering complexity, but at the same time, are you adding architecture just for the sake of architecting? That's bad, or if you're a small company, it's a startup bootstrap project that doesn't wanna introduce additional complexity that you may think lead to waste or something like that. So it's kind of a negative line of thinking and you'll notice a common theme here that flexibility is the key. Where do we want to ship our code, assuming we build it as WebAssembly? There's this concept I think, I wanna say Luke Wagner came up with this quote, but right once, run anywhere. I don't have to explain to this crowd what WebAssembly is, keep this very focused, but obviously WebAssembly started a browser. So if you got a checkup code, why not run it in a browser? This is something that a lot of companies, including Adobe, had done to enable new use cases for their products, reach a larger market share, or do things in a different way. Over time, obviously WebAssembly just on, this is a picture of a desktop, but the idea of just on a generic machine running outside of a web browser also makes sense. You've got things like plug-in models, you've got things like filters. You've got a lot of use cases where those same guarantees of isolated code, of being able to run this kind of native-like code in a performant way makes a lot of sense, carries its way over to things like IoT devices. If you've got code that's running in a web browser, you've got code that's running on desktop, why not run that same code on your smart fridge? I don't know, but there are a lot of really interesting use cases for those IoT-type devices. Same code, same package, why not run it on the edge? And this is where we start running in some really interesting value that we might be unlocking for business. If you can run the same code client-side, that might also make sense under different contexts to be able to run it on the backend or be able to run it on an intermediate edge device closer to the user. Of course, this leads to kind of the ultimate question of, if I've already got a cloud, I've already got a large amount of Kubernetes, should I just use Kubernetes? Well, yes, you can do that, but why not run WebAssembly within Kubernetes itself as well? That's another means to an end, to leverage an existing infrastructure, be able to run these same chunks of code everywhere. And that's because these Waslin components enable this interoperability. So from my viewpoint, others might argue with this, I view WebAssembly as an isolation technology. I don't think any other isolation technology really compares with that level of flexibility. You can work across form factors. I didn't call it out here, but you can work across languages with components. You can work across devices and hardware to focus on that business logic. And so that's why I think we should invert our thinking of where our code runs. We should not necessarily be architecting something just directly for Kubernetes, not the most convenient platform, but maybe where we're actually able to take best advantage of our users, of our data. If you can move that code, move that compute closest to the use case for it, closest to the device, let's say the end user device, closest to where the most amount of data is the most resident, you can increase the performance, decrease the overall cost of ownership of the application, while also being able to decrease the maintenance by not having to write little specific chunks of code for all of these different form factors. Okay, so quick aside, running Kubernetes and wasm is something I mentioned. There's a bunch of different ways to do this. This is a way that I have used before that works. I want to be crystal clear, this is not the only way to do this. The basic idea here is you've got an existing Kubernetes cluster with existing namespaces. You can leverage those namespaces for service level isolation. And so let's say service A, it's a log parser. Service B is processing image formats, whatever. Obviously this is still a kind of trusted first-party code in most circumstances, but you want to be able to provide this isolation guarantee is to treat it like a multi-tenant environment. And so what's interesting is now we're breaking out of that namespace level isolation to provide a core set of, in this case, wasm cloud hosts that are going to be processing the actual WebAssembly logic itself. In the way this is safe with multi-tenancy, and I'm not a security person myself, I don't want to provide security guarantees or anything, there's a bunch of excellent talks about that. But in this case, is by leveraging NAT's topics in wasm cloud, locking them down to a specific message prefix. Obviously WebAssembly is not the, just being a sandbox format is not the only security concern you have to worry about. But like I said, not my forte, this is a secure enough platform for first-party code. We've got this common pool of hosts. This allows us to reach economies of scale. So if you've got, let's say, a set of Kubernetes nodes, you've got larger pods that are running on them, and backed by, let's say, a cloud provider autoscaling group, this allows you to horizontally scale as the number of wasm requests come in. And what's key and why, you know, by a lot of platforms I would say that operate within Kubernetes, sorry, operate WebAssembly within Kubernetes, kind of diverge from this, is they want you to build something new. And in this model, we've already got existing Kubernetes services that are container-based inside the namespace. We've got paths out to other, let's say, horizontal cross-plane dependencies. If you've got calls service to service, if you got calls out to external parts of your network, different providers, you can still leverage those within the namespace. The idea is simply that you're going to hook up a service, route traffic over that NAT's topic, run the WebAssembly code. If you need to go back out to another dependency in the WebSpace, you can bidirectionally connect to a provider that's present in that namespace already, which then can utilize those existing network pairings. There's... I'll put some links up there if you want to screenshot that. There's, you know, using WasmCloud, there are a couple of different ways to do this. The top one is, I would say, the quickest way. It does involve the, like, Cosmonic product, I think. So if you want to kind of DIY it yourself just with WasmCloud, you can take a look at the kubernetes.sh script and hook it up using Wadam to actually deploy the raw applications that are on it as well. It was a bit of an aside, but point being, like, yes, there are tools out there that exist that allow us to run that same logic in Kubernetes using WebAssembly that would run other places. Switching gears, let's look at architecture. Now, where do you decide to run code? You've got an image processing function. That's what I know very well. Let's say you want to run that on device. You've got a user that's not making this decision themselves. Your user wants to do a thing. They want to, I don't know, like blur their background or something. You've got to make a split-second runtime decision that says on my mobile device, can I run this quickly? I've got, like, a Snapdragon processor and maybe not a super powerful GPU. You know, other concerns are, like, if I push this out to the back end, does that mean I have to move that giant, like, raster images or large? I have to upload that very quickly to some sort of more powerful device. We've got this concept of edge. Edge is a huge topic that I will just be, you know, not an edge expert myself, but it's a viable option, as you see here for a lot of running WebAssembly. And the idea is that let's get a more powerful device. Let's say that idea from my perspective. Let's get a more powerful device in that chain that can run some of this code that is the back end, but that's the back end that's closer to the user. The challenge with edge really is more about caching and data residency. Edges are great if you need to request an operation and then request it again. Same time for the user or for a bunch of different users that are geographically clustered. If you don't have that data for that dependency, if you've always got to call back to a data center to get the data to run it on the edge, why would you run the compute on the edge in the first place? And so like data center, and I say data center, I can also mean cloud, basically traditional back end computing environments, still have a large place here. The key decision maker is you've got a taxonomy of fast to slow, typically, and you've got to run that compute unit where the data makes the most sense. Okay, so we talked about data. What about dependencies? This is actually, I tried to get an updated graphic from Joe and could not pray before this talk. But the idea is programming your dependencies to interfaces as well. So if you look at, you know, some of the history of WASI and what it's done over the years, providing core system capabilities, you see at the very top there we've got something called WASI Cloud, now WASI Cloud Core. And so with this we're providing higher level, common interfaces to be able to do things that your code might depend on. Same idea, right? I don't necessarily want to code something like an interface provided by a cloud provider. I want to take a shim and a standardized way to plug in our WebAssembly module so we could run it on a plethora of backing stores. And so this leads us to the concept of dynamic scheduling. And so now that we've got a chunk of WebAssembly code that we can run on any device using pluggable dependencies, why not ship your code everywhere? By default, if WebAssembly modules are so small, if with components we can do dynamic linking, we can actually start outsourcing some of these scheduling decisions to our runtimes. And you can algorithmically model this. This is a framework that I have used before. This is not the only way to do this. I want to be clear. Just like all things in architecture, there's many different ways to do this. But some metrics that make sense to me, this CPU cost, and all of these have a custom weight factor. Say your cloud provider is giving you a huge discount on CPUs. Maybe you would weight the CPU cost lower. But the sum of the CPU time, times that weight, if you want to run it on an edge, the cost to, well, something goes resources times the rate that the edge provider is charging, times the accuracy of the edge, which I define as cache misses, essentially that require you to go back to the backing store. The cost of storage and storage replicated both at the edge. On the right side, let's assume it's free. Storage on the edge versus storage on the back end. These are going to have a duplication factor sometimes. Being able to de-dupe across that, multiply it times the bytes, multiply it times the rate. Latency. Obviously, if you're making more network calls synchronously in a chain, you're summing across the different providers. I use a fudge factor in here because there is some latency that's always expected. That is something that never happens in real time. If something starts in a database, you always have to call a database. I have that as a static factor. Lastly, I call this Kubernetes Compute, but really, what is the data center or cloud or back-end cost? Same thing. Summing the resources. Kubernetes is kind of interesting. I've given a few other talks about this, about overhead in Kubernetes. You don't have to just be considering the cost of your code running, but also if you're running your own clusters, like your CPU, what are the back-end cloud provider resources that are necessary to run that compute as well? And then egress, which is also a huge factor. Obviously, if you can run something on device and not have to move lots of data around, most cloud providers, egress is free or relatively cheap to bring your network topology. Egress is usually very significantly expensive, especially the public internet. Being able to minimize that typically gets more and more data-intensive applications. So why not write a little chunk of code that can measure latency, let's say, and decide do I run this WebAssembly module locally or do I make a call to the local edge? If you can have a ping time or you can have a graph of different edge nodes that you can interrogate at real-time or actually run it back to the data center. I think this is something that's fully practical today. Okay, so I'd be remiss not to give actual examples. We've talked a lot of theory. Let's dive into three specific examples. First is image processing, image transforms. Obviously, coming from Adobe, this is a huge thing that we do. Here's the concept of a separable image transform. This is a convolutional blur, for example, where you can separate it as an M by N matrix. That means you can run processing in parallel. Can I separate the processing of this blur out? Also, okay, to be clear, if this is a 128 by 128 pixel image, this is going to take no time. What if you have a 30-megapixel? What if you have a 300-megapixel image? What if you're doing this on a batch process of thousands and thousands of, let's say, stock images that you're trying to catalog? That's where you can see some of this starting up. When you've got those larger things, where's the most resident data? Also, what if you're trying to run a non-separable transform, like you have on the right? If you want to know more about image transforms, that's a great blog post where I stole these images from because I did not want to write my own. How do we build this? The answer is, it depends. It depends on all of those metrics I just described. If you have that smaller image, let's say it's a smart device, you've got a camera filter app running on it, you've got that cheap compute locally, you've got a local cache with the data, the data is resident. If your GPU load is low, if you've got a high power device, high battery, typically this can be run most efficiently on the client, but whereas if it's an image uploading function, if you've got a stack of images that you just took and you need to run them and operate on that data, and that's where pushing out to a constrained edge device to run some of that processing can make the most sense, especially if that processing on that edge device is cheaper than running all of that processing all the way back in your core data center region. Let's say you're actually doing batch processing on a stack of images that's already resident in a cloud device. In that case, maybe you run Kubernetes on the cloud region already where that backing storage is. That networking is cheap, that networking is quick. What happens if you send all that back to the user device? Okay, well that inverts the entire equation, now it might make more sense to run it close to the device itself. Another really unique example, procedural content generation. And so lots of times in geometry and machine learning models, you've got a bunch of different data sets. And sometimes you can stream in the code to generate these data sets instead of actually streaming in the data set itself. Running the code to generate the data sets on device or on edge. You exchange that bandwidth latency for runtime cost. And so like in this example, you've got a client device, it's got a GPU, it's got a graphics engine, and it's going to go out to an edge device that has a graphics service. Well first of all, like do I have this data resident in the cache, if not I hydrate my cache. The process of hydrating the cache is not really relevant here. At this point, let's say I've got the formal grammar, I've got the set of instructions, I've got the set of code to execute this data. Do I run this generation step on the edge or do I send back the procedural grammar back to the client side and allow it to actually run the generation on the client. And so a few factors that can come in here is like do I have heuristics am I going to need to run this generation step again? Like have in-users access the same data within the same edge location within the last in-hours, for example. Or is this fresh data, is this expensive data that I know, I keep using GPUs as an example because that's what I know, but like do I know that I've got a more powerful computational framework at this edge device than I do on the client side as well. And so the answer is you can send either back, make that programmatic decision at runtime. At the end of the day, you either just generate it on edge or generate it on the client side. But the end result is you still get that pretty picture on the right, on the client device uploaded to GPU. It's just a matter of what business metrics make the most sense for that generation. Is it latency? Is it egress cost? Is it compute? Et cetera. Last example, if you've been paying any attention at this conference at all, ML inferencing is a huge thing now, whereas it kind of wasn't until recently. An example that resonates for a lot of people is user impressions. Like let's say you've got a web-based advertising system. If you can bring the processing of some of those impressions closer to the actual user is using them, you can increase performance, decrease latency. Why is my mouse not working? Is my computer frozen? Okay, there we go. You can leverage edge caching for better data locality, especially if you've got a user that is repeatedly, maybe through a multitude of devices that are behind the same network, you're shopping on your smartphone, you're shopping on your computer. Maybe you can run that inferencing step closely on the edge instead of having it take more latency. Okay, we've got a user and the basic request path here is you hit an edge device. I've hit a button on the web page, sends a token from a cookie to let's call this a profile service. The profile service looks up the user, says I can identify the user that's on this device. I want to run some personalization for this user based off of this user token. First of all, do I know who this user is? If so, do I have data for it in the cache? If so, do I have a model resident? These are all three different things. But in the happy path you can, and then you can furthermore send that back asynchronously over topic to the data center for further processing. A lot of these real-time inference models do backend retraining over time or the model's updated over time. You've got your friendly neighborhood data scientists over there constantly changing your model. But the idea is that if it's not present on the edge already, you can go to the data center, you can hydrate that data, get it out of the database, and then put it back on the edge, and it stays warm on the edge for a specific period of time that might be averaged to a user session. Okay. That was a lot. Are these architectures bad? We just looked through a bunch of different architectures and I started to talk saying architecture is bad. We need to re-find that definition. I'd like to think that at the core, these are flexible architectures at runtime because we've abstracted away those dependencies, we've separated the business concerns, split up the business logic to run it where it can be closest to the data and closest to the use case itself independent of the runtime or the underlying hardware. But really, this is an exercise left up to the reader when it comes down to nuts and bolts implementations. No architecture is perfect. They will decay over time. It doesn't matter how flexible you think your architecture is to start, but I think WebAssembly represents a unique opportunity to ship this code by default to a variety of form factors and allow you to kind of decrease that decay, increase the flexibility. Do we validate our thesis? It depends. There's always some ambiguity, but let's go back to our initial criteria that's kind of subjective. Do we build a good architecture? Architecture exists. I think more flexible architectures exist. Did we pick the right runtime environment? Did we solve this Kubernetes versus WebAssembly question? I maintain that that is not the right question you should be asking. We brought our code closer to our data, but like I said, it's an exercise left up to the reader. This is often ambiguous, like data processing often goes through multiple steps. And where you run your processing of the data, where you run things like image filters can change over time. Actually, here's an example. How many of you have been asked to add AI into your product at some point of the last year? A year ago, you, would you have thought that that would be a thing? Probably not. This all sprung up overnight. The point being, the requirements change over time, where data's most resident changes over time, and here we've made ourselves more flexible to that. I'm also not going to make the claim that WebAssembly is necessarily cheaper or more performant than native. There are people that do make those claims, but what I can say is that I think it decreases overhead and allows you to be more flexible with where you ship your code. Specifically, more flexible in the face of business entropy or chaos. So all architectures ask you for your overtime. Flexibility is key. That's it. Here's a few architectural nuggets you can take away from that. I'm actually we got a little bit of time left. I have a quick demo. I threw this together the other day. So I actually implemented number two, content generation step. And so you can see what I'm talking about in real time. Is it going to work? Yes. Okay. Cool. So that building right there, that is a set of WebAssembly code that I just compiled, theorem script in. This is actually the code to generate this building actually exists on an edge device. I won't say where, but it exists on an edge device. It was pulled down at the start of this program. You saw it took about a one second to generate that building. You just watched over the last 10, 15 seconds. This is being maps data, for example, the streaming of all of the rest of that data coming in. Pretty sure it's backed by a Kubernetes server. So the idea here is that by shifting that compute to where it makes the most sense, I was able to increase my applications performed. So yes, if you're curious, I'm working with WebAssembly and graphics. So if you're interested in the intersection of WebAssembly and graphics, please come talk to me. And that's it. Thank you.