 All right, hello everyone, thanks. Yeah, my name's Luke. I work at Fastly. I'm gonna be talking about the path to components. So there's a big new standards proposal that we've been working on called the component model. It sits in between WASI and Core Web Assembly and provides a way to compose code together like LEGO bricks. So what this talk will cover is the path leading up to the component model, what problems the component model is working to solve, what the developer experience is shaping up to look like and what's next. So our path starts in 2017 with the release in four browsers of WebAssembly 1.0. And we called it the MVP, short for minimum viable product because there was still a lot of work left to do. It was definitely minimal. So we collected a lot of these remaining features to be done and published this post-MVP roadmap. Lynn Clark made this really cool video game-like skill tree diagram showing how all the different features unlock different use cases. Indeed, a whole bunch of this work has been in progress in the WebAssembly CG, extending the core of WebAssembly with new instructions and value types. But if we look at this one branch of the skill tree and then zoom in, it says portable interface. So what's that about? So in general, if we wanna have portable programs, we need a portable instruction set and a portable system interface. Now in the browser, of course, WebAssembly is designed to be that portable instruction set and JavaScript and the web platform are a natural portable system interface. But if we wanna run WebAssembly everywhere, then what takes the role when we don't necessarily have JavaScript or the web platform? So to answer this question, we started the WASI subgroup, WASI standing for WebAssembly System Interface. But that immediately raised the question, well, what sort of system are we even interfacing with here? One does not just generically interface with systems. And from experience, we had this high-level goals of we wanted capability-based security and modularity. So we went about kinda learning about the background, doing background research and seeing what work had already been done that we could learn from. So in the direction of capability-based security, a great resource is Mark Miller's dissertation on robust composition. And the takeaway here for me was that capabilities by it's not just security, which is kinda like eating our vegetables and that it's good for us and eventually grow to like it, but also composability, which is like the fire flower that makes us like fire Mario and productive. So that was exciting. And we also did some research in the modularity directions around the ML module system and the work on units. And the high-level takeaway here is that complex modular applications need what I'm gonna call parametric linking and I'm gonna talk more about later. And then combining these was a bit of research on the Wyvern language, which kind of really nicely illustrated how capabilities can go really well with parametric linking. So excited by all this, we published and collected all these ideas into this nano process vision blog post, which kinda painted this picture of a bunch of really fine-grained nano processes that encapsulate their memory and are trading fine-grained capabilities between each other. And then there was a bunch of quiet. So sorry about the delay there. There was a whole pandemic and that didn't help that customers to have to switch employment situations. But in general, just digging into this nano process model, like asking like, well, what concretely is this box? It's a hard question. So a reasonable reply is, well, why not just use a wasmified POSIX process? And indeed, if you look at the picture, it says POSIX because that's where we started because it makes sense. That's, you know, we started with cloud ABI and because POSIX has a lot of what we want. It has encapsulated low-level states, cross-language interop, unforgeable capabilities, and lots of code that we wanna reuse. And spoiler alert, don't worry, we will. So is that just it? Do we just take POSIX and stuff it into WASI and then, you know, mission accomplished? Well, let's take a step back and say, first, why are people putting WASM into production? So, you know, there's some obvious reasons like language independence, has an open, formally defined portable standard, has strong sandbox-based security. And then maybe we could ask, is it, are we trying to run the same CLI tools the same way, just OWTH-WASM as our instruction set? Or are we trying to run the same containers the same way, just OWTH-WASM as the instruction set? And, you know, you could, and, you know, definitely people have played around with it, but that's not what's, you know, motivating us as much as taking the next step smaller in the progression from VMs to containers to WebAssembly, or serverless execution with ultra-fast cold start and ephemeral lifetimes. And so getting back to our previous question, just do POSIX would work pretty well, you know, here, but for these more futuristic use cases, we ran into trouble. And I'll summarize this with four high-level pain points and then four proposed design alternatives that we worked on. So our first pain point is linking via the file system. So a common experience, I think, here is you have some software components and it seems like they should be interface-compatible, so I should be able to, like, snap them together and build this, like, tower of beautiful software that I have in my mind. But the reality is, like, less pretty and involves a lot more glue code. And the reason is that, in addition to these explicit interfaces, we've got this file system here that everything is kind of colliding with each other and a lot of that glue code is working around that. So what we're proposing instead is called parametric linking. This is kind of from this background research. And I'm gonna describe in three steps. So first, we distinguish units of code, which are static and immutable, like the stuff you'd store in a file. We distinguish that from instances of the code, which are dynamic and stateful and the type of things you'd load in memory and run. Next, we say that a unit of code declares imports. It must be supplied when the code is instantiated. So, for example, my a.wasm can have an import x of type string. And when I instantiated the first time, I have to supply a value for that import, x. And when I instantiated it a second time, I can supply a different value and get a different instance. And then lastly, we allow units of code to instantiate other units of code by supplying their imports. So I've got my a.wasm and I can have a b.wasm that imports an instance of a. And then C has the flexibility, if it wants to, to say when you instantiate a C, first make one instance of a and an instance of b linked to that and a second instance of a and a second instance of b linked to that. And while many module and package systems have something like imports, often called dependencies, fewer have this ability for code to instantiate other code, although increasingly some do, like react components or terraform parameterized modules. And so we're kind of going with that trend. And so concretely what this enables, things like mocking and local testing without needing a dependency injection framework, dependency resolution without lock files at runtime, and service chaining without having an intervening network stack. So our second pain point is complex data that is passed as unstructured text. So the common experience here, I think, is I've got a program A and it spits out some JSON. I'd like to feed it into the second program and ideally it would take the same JSON, but in practicing often it'll just take its own random thing because it can parse its own arbitrary input format. So I have to write some glue code to adapt between the two and that's a composability pain point. And next after you program A has to do all this work stringifying the JSON and program B has to parse its input format. And so that costs performance. And in this nanoprocess model we have lots of itty bitty nanoprocesses. This can add up to a significant percentage of the overall time. And lastly, let's say program A has a file descriptor pointing to a resource and it wants to pass it over to program B. It could use an like out of band unix domain socket and send message but that's kind of a pain and so mostly we don't do that. Instead what we do is you stick the resource in a whole file system, give it a file name, have both programs import the whole file system and that allows us to pass the file by its file name in the JSON. And this is an example of a pattern that colleague Dan Goman calls ghosts and he has a great blog post explaining kind of how all the problems and pain points that arise when you have ghosts in your programs. And more generally the fact that program A and B import a whole file system just to pass a file as an example of excess authority. So instead we're proposing to have high level value types. So we start with value types like strings, lists, records, variance and add subtyping. So my program A can put out an abstract record with two fields and program B can take the same record or a subtype for example here with an optional extra field which is allowed by subtyping which allows us to evolve our interfaces over time. Next we make reading and writing these values in and out of linear memory configurable yet optimizable. So program A gets to specify how those bits turn into the abstract record and for example gets to choose the string encoding whether it's UTF-8 or UTF-16 and program B gets to make the opposite decision. And then because an AOT compiler can see how these things are configured it can compile this down to do a direct copy and transform directly from A's linear memory into B's linear memory. And lastly we allow value types to contain opaque handles. So for example program B can accept like a picture field via a handle to a blob not copying the contents of the blob but passing just a handle to it. But that does raise the question well we'll handle to what? Where did blob come from? Is it like magically built in the system? And this gets to our next pain point which is that in positives everything is a file descriptor. So if I have two programs and a bunch of high level interfaces they might wanna use they say like program B let's say uses these interfaces or program A uses these ones and program B uses these other interfaces. When I compile it all down to POSIX executables they kinda look the same they both just use file systems and sockets. And so when I look at them from an external perspective I just say like these have the same interface I can't really tell the difference from the outside and this is from the perspective of say a dependency auditor trying to understand risks supply chain attacks, virtualization tooling trying to run programs and other scenarios and they were compiled for and a runtime orchestrator who's orchestrating over a distributed heterogeneous network saying where can I run this program? So instead we're proposing this to have resource and handle types. We do this by first decoupling resources which are non-copiable things with lifetimes, handles which are values that point to resources and then functions that take and return handles and operate on resources. Next we say that resources have a resource type and handles have a handle type that specifies a resource type. So for example, I can have a resource type blob and a function that takes a handle to a blob or two resource types, requests and fields and a function that takes a handle to request and returns a handle to its fields. And lastly we say the resource types are runtime type tags that can be imported, created and exported. So my program A can import the resource type blob and program B can import the request and field resource types. And importantly, they look different from the outside. I can say without looking at the code, program A is working on blobs and program B is working on HB requests. And what this enables is things like layer seven interfaces that avoid sidecar protocol parsing overhead. So for example, if two programs wanna pass a request between them, they don't have to serialize it to a socket just to deserialize it on the other side and just pass the handle across the boundary. We can have resources that are directly usable without manual glue code. And we can have virtual platform layering which is something I'll return to you at the end. And our last pain point is low level concurrency. So if I have two processes, they both start with a thread because every process starts with a thread in POSIX. And that means for them to communicate with each other, they have to do thread synchronization. And then maybe that's not a lot of overhead if I have a few monolithic processes but in this nano process vision where I have lots of tiny little processes, this can add up and add a lot of scheduling overhead. So that's a performance pain point. Next, if I wanna do concur and IO, I can use multiple threads to do multiple blocking requests. But now I'm doing multi-threading, which is hard. And if you can't see, this is a picture of a famous Mozilla hacker, David Barron, who used to have a sign famously that you can see on the top left that says, must be this tall to right multi-threaded code because it's hard and ultimately leads to less composable components. Now I don't have to do multi-threading, instead I can be using concurrency mechanisms like Select and E-Poll, but these are rather coupled to the built-in resource types of the operating system and harder to make work with these user-defined resource types that we were just talking about. And additionally, I wanna use these from a source language that means writing a lot of high-level glue code to manually bind it to all these low-level concurrency mechanisms, which is a language-interrupt pain point. So instead what we're proposing is to have future and stream types where we say that calls between instances can share a stack. So for example, if I have two instances linked together and the host calls G, I can be pushing frames on the stack and then when I2 calls into I1 through F, I can just keep pushing frames on the stack, sharing a stack. And in operating system literature, this is called thread migration and it's an optimization to avoid cross-process overhead, which is effectively what we want here. But additionally, a colleague can suspend itself allowing the caller to make progress. So I1 can suspend itself returning control flow to I2 which keeps running, holding a handle to the suspended F task. Next, we add future and stream types for passing or returning these handles to suspended tasks. So I could have a fetch function that returns a future response or more generally a merge function that takes a list of futures and returns a stream. And lastly, we define a low-level control flow protocol for futures and streams that language runtimes can integrate with. So if I have three different instances linked together, compiled from three different languages, they can have their three separate async runtimes compiled inside of them and they work together because they're all speaking the same low-level control flow protocol with the async runtimes integrating kind of like they integrate with an operating system. And what this enables is efficient bulk data streaming into and out of WASM linear memory, async interfaces that are directly usable without manual glue code and defining and enforcing cross-language structure concurrency which is a whole separate topic that I don't have time to get into but it's enabled by the high-level nature of futures and streams. So putting this all together, we started with POSIX, which defines IO with the outside world and also a process model. We identified four pain points with the process model and four proposed design alternatives. So kind of performing that substitution is the component model. And it's layered on top of WebAssembly Core, WebAssembly providing the portable instruction set that we can now bundle up into these abstract black box reusable units of code. And this frees up WASI to do what is originally designed to do or started to do, which is define IO with the outside world being layered on top of the component model and being able to define modular interfaces for all the sorts of things like logging, config, file system, sockets, GRPC and all that. So the component model is what we've been spending a lot of time factoring out of WASI. Now if you've been around the block a few times you might be saying right now like components, OMG, not Korba again, or Java EE or MSD com, right? So we've seen this before. And there's a lot of differences that could be pointed out enough to fill a whole talk. But the big one I'd like to point out is that distributed computing is out of scope for the component model. This means partial failure, persistence, live upgrade, scalability and availability. These are definitely important problems but they're not something we can automatically solve in the component model. And instead we're just leaving these up to a higher layer like the platform, embedding and future higher level specifications. So what's the developer experience look like for all this? So a good starting point is to say that components are sort of like the loved child of two things we're familiar with. On the one side, source language modules like JavaScript modules where we're doing fast sync and async calls to other modules through imports and exports. And microservices defined via IDL like OpenAPI or GRPC where we're defining shared nothing interfaces in an IDL and deriving our language specific bindings from that. So that raises the question well what's the IDL for components? And that's something that the WASI subgroup has been working on for a while and it's called WIT. So just showing WIT by way of a bridged WASI examples. WASI could define a WASI CLI main interface that defines a main function that takes an argv list of strings and the standard in is a stream of chars return standard out is stream of chars. I could define an interface logging backend defines a log function that takes a log level and two strings. Define a WASI file system interface that defines two resource types for directories and files and the operations that are appropriate on them. And then totally separately we can define a WASI HB types interface that defines a totally disjoint set of resource types for appropriate for HDP things like fields, body, requests and responses. And then building on that HB types interface we can define a handler interface that defines a handle function that takes a request and returns a future response. And a whole bunch more interfaces like that. And hopefully the picture you're getting here is that these interfaces can be small and modular and focus on just one task. But then from a component other perspective I'll have to ask the question well like which interfaces can I use? Like all of them? I mean that sounds nice but like I'm a little skeptical that I'm gonna get a GPU or neural network hardware everywhere. And then from a platform vendor's perspective I ask the same question do I have to implement all these interfaces? And that's a much scarier proposition because there's like a lot of them and they're gonna be growing over time and not even possible to implement in all places. So we're missing one more thing to kind of link these together or answer these questions. And that thing is being actively worked on and it's called worlds. So just to show worlds by way of example I define a world called WASI CLI command that kind of captures a traditional POSIX environment where I get to import a logging back in as my console and a file system in Sockets and then I export a main function that's like a traditional kind of POSIX view of the world. But separately I could have a WASI HP proxy world where I get to import a console so here we're using the same interface so interfaces don't care which world they're used in they can be used by multiple worlds. But instead of a file system in Sockets I'm importing an upstream handler in a configuration values. And instead of exporting main I'm exporting an HP handler interface. And here we can see the same WASI HP handler interface used as both an import and an export because interfaces don't care they can be imported or exported. And this is critical for virtualization which we'll get to in a bit. So in general we're gonna have bunches of interfaces growing over time and then worlds kind of select subsets of these interfaces that are imported or exported in a certain context. So I showed these two but you can imagine a whole bunch more like a database user defined function or a cloud service world and a bunch more. And because these worlds are set like we can say things like the HP proxy world is included in the cloud service world. Or we could define HP caching proxy world that extends an HP proxy with a cache. We could define things like unions and intersections on worlds. So from a platform vendor perspective I support a world by implementing its imports and calling its exports. And then duly from a component author perspective I target a world by calling its imports and implementing its exports. And thus a world is sort of like a higher order interface that sits in between platform vendors and component authors. Okay so I have a world now what can I do with it? So let's take our example HP proxy world. There's a new tool that's being worked on in the bytecode alliance called wit-bind-gen. And wit-bind-gen gives you a set of choices. The first one is are you making host bindings or guest bindings? If I go the guest direction my next question is well what guest language are you using? And there's a bunch of languages in progress more than I'm showing here including Python, Go, and Java, I'm C sharp. But I'm gonna show JSON Rust. So if I go the rust direction wit-bind-gen will generate some rust glue code. And this glue code will allow me to write idiomatic nice kinda looking rust code like in this white box. Like a handle function that takes your request it calls the log before passing on the request to the upstream. And I compile all this rust codes some core wasm and feed that into a second tool being developed in the bytecode alliance called wit-component. Wit-component will bundle up this core wasm along with the high level types in the original world to create a self-contained self-defining component in a binary format defined by the component model. Now if I back up and go the JS direction that will generate some JS engine glue code which will allow me to write some idiomatic looking JS code where I get to use JS imports to import the worlds imports and write JS exports for world exports. And when I compile this to core wasm and feed it wit-to-wit-component I'll get a component of the same shape and type as the rust component just different guts. And so from a guest perspective what wits and wit-component tooling give us is SDKs made easy. Now if I go the host direction I'll again have the same language choice. If I go the rust direction what that will do is wrap wasm time with some glue code that gives me safe high level traits that I implement for imports and call for exports. And into this I can plug in reusable implementations of common standard WASM interfaces or just my own custom implementations. And then this wrapped wasm time will be able to run the components that I compiled from the same world or any compatible world. Now in theory I shouldn't need the host JS direction because the goal is that components will be native implement, will be natively implemented by JS runtimes just like core wasm is today. But until then the host JS direction will give us a polyfill that allow us to run components in today's JS engines and browsers as WASM 1.0 and JS glue code. So from a host perspective what WIT gives us is embedding made easy. And I get all of this from just the low-low cost of one world. So that's the tooling that's in progress now. The next step in tooling is around virtualization. So to motivate this we've got tons of POSIX programs that are already written today and they use libc and file systems and sockets. And so these implicitly target the WASI CLI command world that I showed. So that's what I have. But let's say I wanna run this code in a cloud service world. So how do I do that? Well I can start by compiling my code to a components and because it targets the CLI command world I'm gonna get a component that imports a file system and exports a main interface. And now I wanna wrap this in a component that does target the cloud service world. And to do that I'll use some tool that I'm gonna call virtualized commands. Now it doesn't exist today but the idea is not new. This is effectively what MScripten's virtual file system does in WASI VFS do today. And its inputs would be a configuration file that says how do you want me to virtualize this file system? And I could say things like well in the blobs directory I'd like to mount a blob store. And the assets directory I'd like to stick some data, some assets stored as data segments. And virtualized command will then produce, will synthesize some adapter modules that will be linked with my original components to implement the cloud service world. And all of this is linked together using that parametric linking feature that I talked about earlier. And what this means is I can ultimately produce a single component that runs on a cloud service so that the cloud service world doesn't know anything about the CLI command world, it kinda just works. And this specific example is an example of a more general pattern that I think we're gonna see over and over called virtual platform layering where we define a guest world which is what we expose to our customers. So they'll be deploying components that run on our platforms targeting this guest world. But then as an implementation detail we can have a host world that has a totally different set of imports. And we implement the guest world in terms of the host world. And when we link all this together with parametric linking this can run on a host runtime that only knows about the host world. And what this allows us to do is reduce the size of the trusted computing base and also decouple our guest world from our host world allowing us to evolve the host world kind of as our platform evolves. And even wilder is that this host world itself could be a guest world of some platform layer down which is what you see when you have like large enterprises that have separate application and platform teams or when we have companies building platforms layered on top of other companies platforms. And so it's important that the component model support this all directly and efficiently. So zooming way back out, I think the big picture here is we're bootstrapping a WASM meta ecosystem where we start with platforms that are choosing WASM for its unique qualities. And this attracts an initial set of developers who are choosing WASM to run code where they couldn't otherwise. And this causes us to build a set of reusable tools that enable a first wave of languages and APIs. But because these tools aren't coupled to just one platform, they're kind of general using the component model, we can now attract a new wave of platforms who are now tapping into this meta ecosystem and getting a bunch of stuff they didn't have to build themselves. Which can attract a big newer wave of developers who are now choosing WASM for its like ease of reuse and productivity in all these reusable components. And then this can cause us and lead us to build a whole new wave of tooling that allows us to compose applications out of components and a whole new kind of developer programming models. And at this point, time will be fully booted and having a good time. So what's the status of this all? Well, there's a formal specification in progress with an operational semantics, reference interpreter and reference test suite. Large parts are already implemented in WASM time with bind gen and WASM tools. There's an upcoming milestone called sync components with full parametric linking, value, resource and handle types, aim for Q1. And then after that, the big milestone is adding future and streams. So the async support to the component model with the hard, which is gonna be hard, but it's gonna factor out a bunch of otherwise duplicated necessary work. And then some final bits to close out a component model MVP. For example, optional imports and exports. So that's it. Here's a bunch of links for if you're excited and you wanna get involved. You can get involved in the standards. There's regular meetings and here's a list of links to proposals that you can kind of track the standards proposals on. Here's links to, there's regular meetings around WASM time and with bind gen tooling and there's a general bike line to do a little chat. So thanks a lot.