 Thanks for attending here. Welcome fellow web assemblers. Today I'm going to talk a little bit about Python and how we can integrate Python into the component ecosystem that Luke talked so eloquently about yesterday. My name is Joel Dice. I'm a principal engineer at Fermion. One of my big passions, even before I started at Fermion, was better polyglot language support in the WebAssembly ecosystem. I actually showed this slide in a previous presentation last year, but I still like to look at it as sort of the vision, long-term vision, to really fulfill the polyglot promise that WebAssembly offers so that you can take source code written in any of these languages and more and compile it to WebAssembly and target a wide variety of hosts, whether they be embedded or browser or serverless database stored procedures, all of the above. And frankly, we're not there yet. We're getting closer year by year, day by day. In particular, we'll talk about Python today, but at Fermion, we want to see all these languages succeed and use this opportunity to plug Kyle's Brown's awesome special interest group for guest languages in the ByteCode Alliance. I'll have a link at the end of the presentation if you want to get involved in that. But we're spinning up subgroups for Go, first JavaScript for Python, maybe Java here in the near future. So if you're interested in this topic, please don't hesitate to join us. So the agenda today, first, we'll talk about some of the prior arc, some of the solutions for targeting WebAssembly using Python that exist already and the underlying technologies that they use. We'll talk about composing WebAssembly modules. There's a variety of ways to do that, each with their pros and cons. We'll look in detail at what Componentize does behind the scenes. Hopefully we'll have time for a nice little demo that showcases, in particular, the sandboxing capabilities that Python provides for running untrusted code in an otherwise trusted environment. And finally, we'll look at some of the next steps that are required to make this really ready for prime time for everyday development and how you can help out with that effort. So some of the prior art here, it's built on a foundation laid by the C-Python folks, particularly Christian Heims and Brett Cannon and others who have seen to it to provide first class WASI support in C-Python supported in large part by WASI SDK and other critical projects. Christian actually gave a great presentation that you can look up at KubeCon North America, WASM Day last year to get some details on that. And then I just had lunch with Chatham who is leading the Piodide project and that combines C-Python, M-Scripten and a rather impressive library of ported packages. Pure Python packages aren't that difficult to run on C-Python in M-Scripten or WASI. However, packages that have native extensions written in some combination of C, C++ and Fortran can be quite a bit more difficult and I'll allude to that a little bit later. Some of the challenges there and some of the cool sub-projects that came out of that project are PiodideBuild which is responsible for cross-compiling from whatever native architecture you're running on to WASI or in this case subscripten rather and then MicroPip which is responsible for resolving packages, the dependencies in your pyproject.toml for example and finding them either on PyPI or if they have native extensions possibly an alternative index. And then the last thing I'll mention about the C-Python WASI support is that currently it's at Tier 3 status. If you're familiar with the Python C-Python support status kind of the most popular platforms are Tier 1. Tier 2 has a somewhat relaxed criteria for being at that tier and Tier 3 are maybe the lesser supported less common platforms. Right now WASI is considered Tier 3 but Brett is working hard to graduate that to Tier 2 which involves more maintainership coverage and better support in C-I official releases, that sort of thing. And I sort of, I'm doing this out of order but all I should mention M-scripten I'll talk a little bit more here shortly about M-scripten but they've really paved the way with some tooling conventions like dynamic linking for static linking that have made supporting native extensions in WASI even possible. So definitely standing on the shoulders of giants here in building this stuff. So some people might wonder M-scripten's doing stuff, WASI's doing stuff where's the overlap, what's the difference priorities, goals, that sort of thing. The way I would like to sum it up is for M-scripten folks it's really about let's target WASM with existing software kind of fulfill the expectations of existing software probably written not for WebAssembly originally but rather for native systems such as POSIX and Win32 and WASI wants to do all that as well but we also want to kind of advance the ecosystem create a better ecosystem. Brendan Burns talked about this 1970s legacy that we have with POSIX we want to move beyond that. There's some definite warts in POSIX it was designed for a different era different sort of software development environment. And of course there's advantages to both of these you know the great thing with M-scripten is they're laser focused on a very specific goal and they've been able to move very quickly with that and thus pave the way for some of the work that I'm going to talk about today. There's also advantages to the WASI approach of course jettisoning some of this legacy baggage that we find in some of these native systems. Another way of looking at it is M-scripten is about the browser it's focusing on the browser and running all types of software things from like Figma to Photoshop a lot of traditional desktop applications in the browser so we want to give all kinds of software a home in the browser with WASI yes we want to do that and we want to run it everywhere else which may prompt a question doesn't traditional software already have a home after all it was written for these native systems these desktop systems and the answer of course yes it does but that home it needs a remodel I would say you know the equivalent of POSIX I would say if it was a house is it's got you know this 1970s stained linoleum countertop and it's got carpet on the toilet seats and we kind of we want to update it it was a different era back then and so these non-re-entrant APIs this global state that's hard to keep track of Erano all these things this is like carpet on a toilet seat all right so let's talk about composition here web assembly has a few ways of handling this composing code the first one is statically linking and it's not really module composition at this point this is before you even have a module and this is where you use whatever your tool chain if it's a compiled language you're separately compiling a bunch of .o files you're linking them together into a module and this is sort of a foundational way of doing things even if you're using a high level language the interpreter for that language the VM for that language is going to be built this way and the pro is it's very widely supported it's like the original way to compose code and it's very optimal there's no indirection there's no global offset tables that you have to call through or reference other data using but there's a long list of cons as well there's no complication it's harder to patch a binary once you've linked in a library statically you're not really going to tear that stuff apart again and so auditing and addressing security issues can be challenging there's no sandboxing it's all one shared memory space one set of global variables and then polyglot composition is awkward you're basically using C as the lingua franca passing pointers around most languages can do this but it's never fun to compose a polyglot app this way and so if you're familiar with sort of native development you've heard of shared libraries or DLLs and so there is the equivalent of such things in WebAssembly it wasn't a part of WASI until very recently that was part of the work that I had to do here but mscript again paved the way for this and some of the pros are less code duplication now you can actually have multiple modules link against a single version of your Java interpreter or your C library etc or your open SSL and then it matches expectations of existing software since this is a paradigm that's been around for native developers for decades and then it's also auditable and patchable still some cons here we still don't have a good it's all shared memories still at runtime so there's not really any isolation it's hard to virtualize APIs that use this still see API or ABI to pass pointers around and whatnot so some of the really cool use cases where you want to virtualize a file system or virtualize a random number generator for testing become rather difficult to do to kind of there's it's hard to insert layers between these, it can be done but it's not easy and then poly composition you're still using the C ABI not great hard to do high level things and then versioning if you've been around in the windows world you know DLL hell is a thing trying to get the right version of the right DLL and upgrade them in place easy and then the third way which Luke went into in great detail yesterday is component linking where you have a shared nothing boundary where you're passing sort of almost in an RPC style way between components but those components themselves each have their own memory their own global state and there's some big pros there it's very virtualizable it's easy to insert shims or between a guest and a host it's very polygot friendly because they can each have their own runtime in their own sandbox versioning should be easier especially with the component model because we have these high level types functions that pass application level types between them it's easy to audit and figure out whether two components are compatible with each other and then sandboxing for security isolating third party dependencies in their own sandbox and not giving them access to sensitive data can be a real boon and help address the supply chain issues that have become top of mind for a lot of developers but the con is it's a new paradigm it's not what developers are used to dealing with when composing software and so it can be hard to pour existing software and so since building this ecosystem building the foundations of this ecosystem we don't have established package registries the same way we have PyPI or crates.io etc so the question we see these pros and cons the question might come to mind is which one should I use it sounds like there's no clear winner there and I would say the winner is all of them I borrowed this diagram from a shared everything linking page under the component model repository on github but what we see here is on the left it's sort of the static dependencies where we have the application depending on a couple of libraries one that does you know something with images using image magic another that handles zip files and then those in turn depend on some C libraries which then each of which depend on the C standard library and then but dynamically we see that when these modules are actually instantiated within a component we have these red lines around these strictly enforced sandbox boundaries with very clear interfaces between them and we see each one has a libc instantiation so that the actual memory the malloc heap etc and the global variables each one gets their own version the codon disk is all shared but each one gets their own instantiation and so here we get the sort of the best both of all three worlds we're using static linking within each of these modules we're combining these modules within a component using shared everything linking or dynamic library linking and then we're composing the components using shared nothing linking and so as an application developer you can kind of you can isolate third party dependencies you can get the best of each get the strengths of each of these composition models and mitigate the drawbacks by choosing the right tool for the job okay so at a high level a componentized pie I mean I say so myself but I think it's a pretty straightforward flow of inputs and outputs at a high level you basically have to provide two things maybe three if you have a native extension in this example we have a world that we want to target maybe it's a wazzy world that's been standardized or it's just something I want to target a particular platform and it's got a whip file that I can target and I've got my app.py that targets that world that uses the interfaces imports the interfaces that the host will provide and exports the functionality that host will call and then in this particular example we also have a native extension that we know we're going to need at runtime so we're going to use wazzy SDK to compile that from C into a .so which is really just a .wazm module with some metadata a special custom section that provides metadata about how to link this thing with other modules and then we take all that input and we run it through give it to componentized pie and it spits out a component that contains everything we need to run that module including some stuff that we didn't provide that was actually built into componentized pie such as the C Python interpreter and libc and so on and we'll see that in detail in a moment here but hopefully pretty straightforward you know as a user this is all you really need to know you need to provide these things put them into componentized pie you'll get out a component you can then run it on the host that you were targeting with that world and so if that's all you want to know and you just want to use it you could just kind of tune out the rest of this presentation because now we're going to dive deep into actually how it works behind the scenes and there's quite a few steps it's a bit complicated as a matter of necessity because of the kind of the world that you know the legacy we have with Python and how packages are structured and so on we need to accommodate all that so I'm going to split this up into three stages roughly just because fitting it on one slide turned in an eye exam and it didn't want to subject you to that but at the first stage is what I call pre-linking and the goal here is to get all our modules that we're going to pack into a component get all those ready and staged to be to be linked together and some of those modules in the native extension case that was provided you know by the application developer maybe either explicitly or if they depended on a third party package we found that.so in a wheel which is Python's way of of calling packages and then but we also have a few other WASM modules that are built into for convenience are built into componentizepy that it will inject into the produce component that includes a runtime.so which sort of interfaces between the WebAssembly code that we're going to produce here in a moment I'll talk about that and the Python world and that uses py03 which is an excellent Rust library for interacting with Cpython. Cpython itself is packaged as libpython311 we've got libc which pretty much all these other modules will depend on this that includes the allocator and the interface to the host for doing things like file systems environment variables and so on and then we also have this WASI preview 1 adapter which because libc currently targets the WASI SDK libc currently targets WASI preview 1 and components need to target WASI preview 2 this allows us to transparently adapt between the two and then also we've taken that wit world and we've used a feature in componentizepy that generates bindings from that wit and those bindings have on this particular slide we're just showing one of those parts but there's two parts one is Python code which you're going to essentially import into your guest application to access that wit functionality call those imported functions and then it will provide if there's any exports it will provide a abstract base class that you can inherit from and then implement the export functions and the other part which we see on this slide is some WebAssembly code that we are synthesizing from scratch based on that wit world that handles the canonical ABI translations between the Python world and the host or the wit world the canonical ABI and so the canonical ABI as a refresher has concepts of integers booleans lists, strings, records, variants all these sort of high level types and so we need to generate WebAssembly code we could have generated Python code but it turns out more efficient in long run to generate WebAssembly code directly that will handle that conversion and then it works in tandem with this runtime.so that handles the bridge between to actually create the on one end create the Python objects corresponding to the to the wit objects and vice versa when lifting and lowering that's the terminology as we're converting between low level WebAssembly sort of bytes in memory and high level Python objects so we feed all that stuff above the wit component prelink into this prelink thing and that so those are all our input.so's internally and then those go into this prelink step which then synthesizes two more WebAssembly modules one is the main module and this is the one that has a memory all these other.so's all the other WebAssembly modules import a memory and so it's the main module that will export the memory and that will be sized appropriately for the total amount of memory that each of these .so's above the line have declared that they need needed in their metadata that I mentioned earlier and then finally we have an init.wazm which is responsible for doing sort of like runtime fix ups for the global offset tables for global variables sometimes you want to take the address of a global in the C code and we need to fix up position independent code so that it actually points to the right place at runtime so that's what init does plus a few other things that I won't go into so now we have all our core modules all of these are core modules there's no components in the picture yet but that's what we'll get to next in the next step which is actually linking so now we're taking all those .so's those .wazms they're all core modules we're going to feed them all into this with component link step and that's going to do things like it's going to try to topologically sort all these modules so that you know the ones that have the fewer dependencies will come first and then ones that depend on the earlier ones will will be able to kind of hook up all the imports to the required exports in some cases though we have cycles so we have to break those cycles via an indirect function table and so that's part of what this step is involved in and then finally we do this link and the output of all this is a what I'm calling the uninitialized component which is not quite done yet this component is ready to have python code injected to it because you'll notice we haven't actually used any of the python code yet so it generates this wasm file and this wasm file could actually be used at this stage if you wanted to kind of provide on the host file system amount so that it could actually find your python code and you could use it this way we want the interpreter and your application code and all the dependencies all to be bundled into the single component and so that's the next step which is this initialization or pre-initialization which to be clear all this happens at build time and this is at the point where our python code comes in and that includes the python code you provided and any dependencies and a synthesized set of python code I called it world.py it's actually multiple files but let's just pretend it's a single file here and that gets derived this is the second part of the binding generation process that I mentioned earlier that generates the python code that you're actually importing into your application and all that gets fed into along with our initialized component and the python system libraries all the battery included python features like the JSON all that stuff that you expect to be built into python that gets bundled in as well to a component and it step and you may have heard of wiser that's been mentioned in a few presentations so far this is the component so wiser only operates on modules currently I've been talking with net Fitzgerald about maybe integrating this component support but for now it's sort of a separate project and this does component pre-initialization and this takes all this stuff and essentially runs the python interpreter at build time running the top level of your application so it's satisfying any imports and basically taking a snapshot once that init function returns taking a snapshot of the linear memory that was produced and then creating a new component that represents that snapshot and that snapshot is ready to run it's close to my heart as Fermion a serverless application it's already resolved all the imports it's already run some code possibly loaded stuff from the file system at build time and it's ready to go and handle an incoming hdb request or an incoming Kafka message or something like that in under a millisecond because we've already warmed up the interpreter it's ready to run before I move on any questions about that, that was a lot of stuff I know okay it's demo time the demo I'm going to show you here is based on a simple world here this was actually prompted by a gentleman on the bytecode alliance Zulip who had asked what are my options for sandboxing untrusted python code say I'm reading it off an hdb request it came from somewhere that I don't trust but I want to run it in some safe environment so I got to thinking how could we use what was then an in progress componentized py with wasm time py which is essentially a python wrapper around the wasm time run time and actually do what this gentleman is asking for and so what I came up with what I felt like was the simplest way to use a val that would express this if you're familiar with python it has a distinction between statements and expressions and so if you're if you want to execute a statement you have to use exec if you want to execute an expression use a val but they're otherwise kind of the same thing so I thought okay we'll have functions for both in case you want to do both and and really it's just yeah it takes in a string which is some arbitrary python code your untrusted python code and so that's what we're going to do we're going to run this in a sandbox and we can restrict it also we'll see in a moment that we can restrict how long it gets to run in terms of time we can restrict how much memory it can use and we can basically cut it off from all host access to the file system and so on so that'll be fun so let's look at what this looks like at the guest level so this is importing sandbox the code that was generated by componentized pi from that WIP file we just looked at and then and then we're using that here it produced an abstract base class called sandbox with a capital S and we're inheriting from that and we're overriding the abstract bit methods to implement the exports for this for this particular world this particular example doesn't use any imports I just realized that maybe but suffice to say it does imports work fine too should've added a log statement or something and so basically what you would sort of expect what it's doing here is it's evaluating the expression using the built-in eval or exec functions and then for expressions it's returning the result as a JSON string which then the host can kind of deserialize at its leisure and then it catches errors and it can raise errors there's some details there about how we translate Python exceptions into the WIT contract which just wants strings but kind of just simple details there hopefully pretty straightforward and then we can look at the host which is using wasm time pi to do the same thing so wasm time pi has an equivalent generator to generate the host view of the of that WIT world which imports and the exports are sort of reversed in terms of like who's calling what but it will look very much very similar there's more in here because I actually have I've sandboxed I've implemented a timeout and a memory limit and so on I'm not going to go through this line by line plus we have to parse command line arguments and stuff like that not super interesting the one thing I will draw your attention to is right now we are talking about the WISI preview 2 even though wasm time itself does technical reasons you can ask me later why that is but so we have to provide our own sort of stub implementation and that's what we're doing here most of which is just we're saying you can't do this you can't you can't access standard id standard out you can't access file system etc and then finally based on what was passed in on the wasm time run the guest call the exec functions the valve functions and then print out the results all in a very sandbox secure way so let's go ahead and run this I'm going to peek at my cheat sheet here this is all I'll have a link to it in the presentation you can play with this yourself but I've already installed componentize pie and wasm time I also have a build a wazzy build of numpy there's no official build yet so I built one for your convenience it does use numpy behind the scenes and we'll exercise that so I did all that stuff and now we can run the demo so the first step is I'm going to run so I'm going to run this I'm passing in my I'm telling it what directory to look in for the WIT files I'm telling it which WIT world I want to target in case there's multiple worlds involved and then I want to the sub command is called componentize that PY is what we just looked at so we're telling it which module that we want to be the entry point for this application and then we want the output to go to sandbox.wazm takes a few moments there there we go and then next step is we want to generate the bindings on the wasm time side so we're using wasm time bind gen a sub module of the wasm time pie project and this is we're giving it not the WIT file but it's actually we're giving it the sandbox wasm we produced in the previous step and this will extract the component type and generate host bindings, host python bindings from it it also does some other stuff that I won't go into but it basically prepares this component for running in wasm time pie so we'll do that there and then finally we can run we can run this so the host pie is something I just showed you and we're going to give it a single expression which uses numpy and of course this is going to use the C extension capabilities that we kind of saw in the diagram there to multiply a couple of matrices we just kind of hard coded that matrices convert it back from a numpy list which can't really be serialized to JSON to something that is a python list and and there's the result so that all ran in a sandbox another thing we can do is run something that deliberately triggers the time out the five second time out that we've enforced here so it's an infinite loop well this is untrusted code it shouldn't be doing that misbehaving so we will let that run for five seconds and eventually it's a big ugly exception trace ignore that but it timed out so we're secured that way and then we also have a let me see if I can find it oh yeah so I've also got a little thing that tries to allocate more memory than we're going to give it access to we limited it to 20 megabytes so we're going to do that we're going to try and run that and we get a memory error I don't know why we don't get a stack trace there I should have given a disclaimer earlier I'm very new to python actually I've done about oh stop ok people are telling me to stop so I've reached the end of this feel free to catch me in the hallway for any questions thank you very much