 Hello, Detroit. Welcome to my prerecorded talk about how I ported Python to WebAssembly. It's an honor to present this topic to you today. Thank you very much, Liam, for inviting me and giving me the chance to talk about Python and WebAssembly. Hi, I'm Christian Hymas. I'm a Python call developer and principal software engineer for Red Hat. In the past, I usually worked on security-related topics for Python, but a year ago, WebAssembly really piqued my interest, and I started to dig into WebAssembly. A year ago, also, Python 3.10 did not have any support for WebAssembly platforms in upstream at all. But this year, with the upcoming release of Python 3.11, we have official support for the WebAssembly platforms, Amscript and Wazzy, and even have most of our tests running on these platforms, except for some features that just not supported. This talk is divided in three chapters. In the first chapter, I'm going to talk about Anscript and support, followed by how WebAssembly system interface support was implemented, and finally, some words, how I think the future for Python and WebAssembly will look like and some of the problems we're still facing with Anscript and Wazzy. A big shout out to lots of people who helped contributing to this effort to get Python working on WebAssembly. It wouldn't have been possible with several quality developers helping out with people from the Anscript and community, with the Paiodei team, and people from the Vitacult Alliance helped with the Wazzy and Wazzy time problems. So first part, Anscript and. If you're not familiar with Anscript and, it's a tool chain and SDK to compile C and C++ code for WebAssembly, targeting browsers or Node.js. There's usually some kind of JavaScript Google layer implemented that provides a Cisco layer, a visual file system, and other features. The base implementation of Python just kind of works with Anscript. So we have a very major importable code base. We use C11 without many fancy features because we target a variety of compilers. We use Autocon and make files and even extensive test suite. But some of the limitations made it a bit more harder and also some problems in our build system. For example, we require pthreads, which are mostly not available, not in Wazzy at all, and in Anscript only under some circumstances. Our cross-compiling system was broken at the point I started looking into that. And the way how we bootstrap Python, so Python needs to compile Python, makes it a bit tricky to get Python working. But with some efforts and also help from Eason Smith, last year in November 2021, we got the main branch of Python compiled to WebAssembly. And running that in the default HTML in the face of mscript in the browser. This required lots of hackery and tooling and scripts and shellcode to just get it bootstrapped. While Eason had that already working, I was still facing some problems with my initiative. You see here a screenshot of a web console that shows that there's a vital Python error. So because it can't import the encodings package or model, this is the first Python code that gets imported by Python. So that was very exciting for me. The first time I got Python working in the browser, at least to a point where it needed the standard library. While Eason had already figured out how to bundle the standard library and package it as a visual file system and contribute that distributed through the browser. So the build system of Python, we use, as I mentioned before, Autoconf and Configure, which detects things like the compiler, available header files, available functions, and other properties of the system. When we have these things detected, we compile a very minimal core to bootstrap the import system. So import lib is Python's import system. The import statement, but it's written in Python and we don't have anything to import the import system. So we compile the Python code to bytecode, inject the bytecode into a header file, and then just load the bytecode to C code. The next step, when we have the import system working, we can create a bootstrap interpreter that can do a bit more, can actually run Python code. And so we use that to deep freeze the standard library. There's a new feature introduced in 3.11 by the faster Python team, which makes it faster to start Python by having common packages and common models also frozen in the C code. Once we have that done, we can compile the actual Python interpreter with the static built-in models, and then use the Python interpreter to compile the shared extension models. For having WebAssembly support and having cross-compiling working again, I had to change several aspects. So what we do, for example, is introduce a new WebAssembly cross-compiling target. So we have like WebM32, M-scripten, and WASI. For M-scripten, I'll introduce the idea of different targets or flavors, because for one, we're going to target browsers, but for running the tests, we also wanted to have raw file system access from Node. So there are different ways and targets to compile Python. I'll tell you an option to specify and use a built Python interpreter. So in order to cross-compile, we first have to compile a Python interpreter for the current platform and then use that to bootstrap the freezing of the import lip and the deep frozen files. Also, running setup and detecting features is kind of awkward. It doesn't work properly. So we replaced the setup pie with additional checks in Configure and had the building of shared extension models also moved to the make file. This is not completely done in 3.11, but probably done for 3.12, because we also wanted to get rid of setup pie and distitiles in the core. There are also improvements for Config Cache to just speed up development and some fixes for VPath and other tree builds. So we can use the same checkout to build both the built Python interpreter and the final website builds. While we're working on that, we were also collaborating with the PyOdit team. The PyOdit is an existing working stable distribution of Python for the web browser. They have everything figured out, but they also had to contribute and keep lots of patches. They have like a dozen downstream patches to work around the same problems we're having. And slowly we're migrated and upstreamed several of these patches into Cpython or replaced them by better efforts, like instead of having the hackish PyConfig H patches used an auto-confeature called config-site. Next thing, when we had everything like working more or less, we wanted to make sure that our tests were passing. And with that, I used Node with raw false system access because I couldn't figure out how to run the test in the browser and with Node it was more akin to things that do normally running things on the comment line. So for each other failing tests or crashing tests, and they were also failing crashing tests, I had to check is this caused by missing or unsupported API. There's some wrong assumption in the test, maybe the bug in Python, maybe the bug in M-scripten, or there were even some known bugs in MuscleLib C. Some of the unsupported or unavailable features are like starting a new process because browsers are not there to sandbox and you can't just run a new process from a browser that will be insecure. So for all tests, they're using fork, exec or other features related to processes, added markers and decorators to skip these tests. But we also used sub-processes for our internal test system. So regression tests in Python were using sub-process for isolation and for running tests in parallel. For that, Ethan implemented a host runner so he can use the buildPython operator to drive the test system that would then spawn new Node.js processes for every test case. P-threads are not supported unless you use p-threads. So we need to add regular and skip tests. Sockets, signals and other APIs just weren't available because they're not implemented in the browser for good reasons. There are also some invalid assumptions that no longer hold true for M-scripten, like tests assume if they're running with effective user ID 0, so root. The test will also have capsis admin to have additional privileges and do think that normal user can't do in the system. Or that we can read arg value 0. So read the WebAssembly file itself. It is not accessible. Having UTF-8, non-UTF-8 files or inaccessible files also not supported by M-scripten or calling fstat on PyFolk descriptors. So the interesting bugs for the most annoying bug are until you had this runtime error from Node complaining about a function signature mismatch while the interpreter was shut down, so on PyFarmless X. This happened because one of the test models had like an invalid function signature, so it didn't accept one of the arguments. By default, if you free a model, you get a reference to the model as you can do any additional things on the model, freeing the model. But the zone info implementation just accepted no argument, so just void. This is normally not a problem for ABIs like Linux or Windows, because missing arguments or additional arguments just passed on registers or on the stack and just ignored by the function. But WebAssembly has a much more stricter call, so the call indirect call in WebAssembly has strict checks and the function signature mismatches is fails and crashes. I also ran an interesting bugs in M-scripten and also the way how to fix or address this bugs was a very interesting approach because I had to rewrite tests or implement the reproducer in C code and the fix is usually involved modifying a JavaScript layer that looks a bit like syscall interceptor in kernel code, interesting. So here's a list of bugs I'm going through, all the bugs I found, I'll put the slides up, you can follow any of the bugs if you're interested yourself. Also contributed several improvements to M-scripten, like having SQLite port for an M-scripten build environment, several fixes for WESM 64, and since beginning of this month, so September, we have all Python tests passing on WESM 642 and several other fixes. And by April 2022, we finally got all tests passing. And you see here, we're still skipping like 92 tests, it's a mostly network related test because networking doesn't work like in normal environments here, because browsers are also tightly secured at sandboxed, so you can't open raw TCP sockets. And this is unmodified C-Python 3.11 source codes, so very cool. At this point, Brad and I were considering to move the WebAssembly M-scripten to tier 3. This is a new concept we also introduced in this year, different level of support for environments like CPU architectures and operating systems. It's a bit more like the Rust concept of tier 1, 2, 3 for target platforms. For that, we need to have stable build bot, that's our CI system. Microsoft and Ralph, thank you for that, contributed a virtual machine where I installed build bot and the SDKs. The steering console also wanted to have end user documentation just to explain to end users which features are missing on M-scripten. I implemented core developer documentation to give core developers more details how the platform works and how they can compile and debug problems. Also implemented things like container images with the SDKs and another host hosted by Microsoft, the pre-installed SDKs just to test things. Finally, a new automation script which made it very, very simple to compile Python. So if you run this script in main or in 3.11.1, it's not going to be in 3.11.0. It will first tell you how to install the SDKs and give you some tips and point to the right documentation. And then if everything installed, it will configure and compile a build Python for the browser, run a local web server and then open the right files in your browser and just give you a prompt. And this also implemented for other targets like M-scripten with Node or Wasi. And that thing I've currently ran like on my own, like unofficial builds and also smoke tests for M-scripten SDK, latest version, tip-off, tree upstream and for Wasi, with different version of the weather time. So speaking of Wasi or WebAssembly system interface, after I had M-scripten working, that helped me to look into WebAssembly system interface, Wasi. It turns to be that most of my work for M-scripten, although prepared, Python code, there are still several other things that were missing, like lots of features from the socket API when implemented. There's no pthread and no pthread stubs. Well, M-scripten provides stubs for pthread APIs. No user API, no dub, just call and some other descriptions. Thankfully, SignalStore Labs has written an additional layer library called Wasix, which provides stubs for most of these APIs. And like with M-scripten, I also ran into problems and bugs with the Wadi SDK and wasn't timed for development. You can, some links, you will have to follow the map. After I started to understand how Wasix works and how Wasix works, I came to the conclusion that Wasix helped us a lot to bootstrap and stop the Wasi support. But Wasix had some downside. All the missing functions were replaced by stubs that returned an error. Well, Python developed would usually check if a function exists and then do something. So having found that error, they're present, but error out, are not very user friendly. So I wrote my own threading stubs and replaced all the missing functions with additional autoconf checks and if-deaths. So they are the BSD socket APIs, the socket net database APIs, like host lookups, wait pit, dub and missing constants. So dub was especially tricky because we required dub in one place and the Python parser for error reporting, but I was able to replace that with another function called fopen-cookie. Because we only dub because we convert a file descriptor to a file handle. And when we close the file handle, we don't want to close the file descriptor. So from future important wasm, so what holds the future of Python on WebAssembly? So there are different things that are going on. The first person who actually adopted my Cpython upstream work was Tre. He's running a pastebin where you can just copy and paste Python code and then have the code run on your system. Well, that's cool. And this is something that we want to include in the future in the Cpython documentation. So they have executable documentation examples in the Python world, but what else? So currently the system is structured and how envisioned they will work in the future is on the very low level, we will have basic support for Wasi and M script in Cpython. But since we lack a good understanding and also the resource in the core developer team, lots of additional work will be provided by PyOdite. PyLite will have the installer, will have packages provided, will have a glue layer to the JavaScript API. Now talk about that. There will be efforts like running Jupyter Notebook, like PyScript, PyGain, NumPy, other things. Well, for Wasi, we are not sure yet because we're very new and there is no production for Python and Wasi. Because there's still lots of things to do. So for example, we have not figured out how to do proper deployment and distribution of Python on Wasi. I've been playing around with Visor and Wasi VFS to create single file distributions. Soccer support does not work yet correctly. Also facing some issues with wasn't time. I have not had the time to test other wasn't run times, except wasn't time. And we're also missing popular Python extensions that rely on third party libraries. One other thing that I hope that we're presently may help us in the future to get out of the Python binary extension hell. So if you have Python extensions with binary code, like C code, Fortran code, C++ code, whatever, they have to be compiled for a specific combination of CPU platforming Python version. But maybe in the future, we could actually just compile it to WebAssembly, have a WebAssembly runtime embedded in Python as an extension, and then only have one file to distribute. There are things from M-scripten and browsers that may help us to run Python on these platforms. For example, M-scripten doesn't have a stable ABI. For PyOdiet, if they've distributed Python and extensions, they have to make sure that every extension is compiled with the same version, otherwise it will just crash or fail. Debank is also tricky, especially for core developers who like C-hackers and they use like GDB or other low-level debuggers. That doesn't work good at the moment. C, C++, DevTools, and Chrome, they don't give the same experience and the same readable information. Compile time checks for this involved function point like CASC or function mismatch would also help us a lot. There's some effort calling Chrome to improve at least error reporting here. About the SDK, the same problem. There is an option to run wasn't time to provide Dwarf debux symbol that features currently also broken. Dynamic linking would help us a lot to distribute and add more Python extensions. Some tooling like M-builders or the M-script import system to create distributions for extensions like ZLib, VZip2 would help us too. Maybe having steps for p-thread and other missing features in YSCK may be helpful, but not just that. We also want to help the community. I think one thing that Python could help you really well is use Python as smoke tests because Python is very, very good to find compiler bugs, even kernel crashes and kernel bugs. Most recently, we found a bug in QMO on PPC-64. Data could help you, but in general, I think that Python on WebAssembly will provide lots of opportunities, but there's still lots of work to do, that Python loves WebAssembly. Thank you very much for listening and I hope to see you in the chat. I hope that we'll be able to attend talk remotely and answer your questions in the chat.