 All right, so it's on wow. It's amazing to have so many people I've been in a couple of awesome days before and you know the amount of people it's just gets more and more Which is really exciting. So my name is Saul Cabrera. I'm excited to be here in Amsterdam. It's actually my first time Excited to see many of you again And yeah, thanks for having me you can find me on the internet as at Saul E cabrera or just in my website There's more information about me. So I'm a staff engineer at Shopify where I lead the Watson foundations team, which is the team responsible for any open source contributions For WebAssembly. So we are just focused on improving the ecosystem like developer experience and performance This case in this talk is about mostly performance I think I'm mostly known for creating Javi, which is a JavaScript to WebAssembly toolchain But I would like to change that today by talking a bit more about this project that I've been working on for the past couple of months, which is winch So here is today's itinerary We're going to learn about what is winch why we decided to build winch the anatomy of winch and the progress to date and what's next my real expectation after this talk is to Let you know why you could use winch in your application if you wanted to make it faster and in the ways that you could use it In case that's something that interests you I'm not going to go too deep into compilers But I'm going to give a really high level of what happens when you compile your WebAssembly into into machine code so Winch is a baseline single pass or one pass compiler built exclusively for wasm time I say this because wasm time already has a compiler at the moment Which is CraneLift, which is an optimizing compiler and this type of compilers work to make your code The most efficient code at runtime So they do a lot of work ahead of time or at the beginning of the startup so that you can get a good performance Right, but what the literature says about baseline compilers If we if you look for that on the internet is that they are compilers that pass through their compilation unit parts Exactly once and immediately emit machine code So this this this distinction on passing through the compilation unit parts exactly once and immediately Emitting machine code is really important and we'll see later Why so what that what this definition means for for winch is? That if we have a source for example that has multiple functions Which are going to be the compilation units if we go through the parts of that compilation unit Which are going to be the instructions. We're going to immediately Create machine code for those instructions So I've already talked about what? A baseline compiler is but I haven't said what winch stands for so people that have worked with me on this project mainly the CraneLift and Washington Members can I think can really agree that coming up with a name for this Project has been one of the most difficult things more than the project itself I would say there was a lot of back and forth and you know wide On how to name this but essentially winch stands for WebAssembly intentionally not optimizing compiler and host so The the emphasis on non-optimizing here is it's really important because it The objective that winch has is totally the contrary of what CraneLift has Which is not being intelligent about the code that it produces but being intelligent about how that code is produced in a fast way so I like to try I like to draw an analogy of winch and CraneLift by Having having this this statement here that a winch is to a cable What a CraneLift is to a CraneBoom in the sense that you can get very fast things done with cable and a winch but if you want like more high-level or more More you want to do more difficult things you can you know rely on a CraneLift to do those things So I wanted to give a special shout out to Chris Fallon Alex Crichton and Nick Fitzgerald for their help in Coming up with a name and all the help that they have given me to make winch The the project that it is today To wrap up this section I would like to just to give a TLDR on on what's the objective of winch and the idea of winch is to Significantly improve the startup time of your WebAssembly When we talk about compilers the question that comes to everyone's mind is like why do we need another Just-in-time compiler right and it's a very valid question, and I think I've already hinted why but I want to make it explicit through a series of of slides and the first one is that we want to optimize to improve cold startup time because when we look For the concept of a jet in the internet many of the things that come Many of the statements that appear are startup delay time or the more optimizations a jet performs the better code It will generate but the initial a delay will also be Increased right so we will have more startup time your application would Take longer to start running to get that first execution So the idea of optimizing for a startup time is not new Browsers have been doing this ever since a long time. They have very specific tiers to to optimize for very specific purposes for example browsers like Firefox and Have two tiers for WebAssembly as far as I know the first one is a baseline tier Which optimizes to get your application getting running as fast as possible and then they have an optimizing tier Which is the tier that generates better code for your application But I think in the case of wasm time and CrainLift Chris Chris Fallon, which is one of the maintainers of CrainLift puts it Way better and way more succinct in in the latest CrainLift progress report where he Says that the baseline compiler has sprung out of compiled time discussions in which we realize that Fundamentally CrainLift's optimizing design which has a heavyweight register allocator and a real SSAR are too slow for for some applications Also, I think the question still stands though. Do I need a baseline compiler or do you need a new jit? And I think the question is it depends Radu and and Matt from Furman have created this I've written this article called the six ways of optimizing WebAssembly and one of the ways that they Explain on how to optimize your WebAssembly is to ahead of time compile Your your code right this the general gist of this idea is that you have some WebAssembly That you then ahead of time compile to your target architecture and then you load into your runtime if actually Effectively skipping the compilation time once you load your code into your runtime and you start executing it So this way you can optimize for a startup time. So one can say well, that's easy, right? But why don't you just use that instead? And I think it depends you could use it depending on your architecture and depending on on your needs but in Shopify's case we have identified several Downsides or several cons of this approach and I think the first one is dealing with machine code Dling with machine code is inherently unsafe. You have you have to treat your machine code with care and also It's important to note that when you compile your WebAssembly to machine code you are going to be getting three or four times a size increase which means that makes your You know makes your application or your code difficult to more difficult to transfer and more difficult to catch The second con that it's important to take into account is the maintenance of machine code Let's assume that you come that you have a wide range of modules that you have already compiled and you have already stored But what happens if your runtime doesn't? Maintain compatibility on new versions for those older versions of your machine code This means that you potentially have to recompile everything and that's just Introducing another process into your architecture and introducing more overhead We're now getting into the part of the talk in which we see a bit more what happens when you compile your WebAssembly into into machine code and Before getting into how winch those those those things. It's important to look at how Things work right now in wasm time when you create a new module So this section is a module new one one on how these things are done at the wasm time level today So the first thing let's assume that we have a WebAssembly module that has multiple functions What was going to happen is that wasm time is going to delegate all of those functions in parallel to CranLift and Is going to call compile function for each of those The important piece here happens in the compile function box Which is I'm going to highlight what all the intermediate representations and all that steps that go on here in order to transform a WebAssembly function into its particular Machine code function so the first thing we have a WebAssembly function and then we create What we call the CranLift intermediate representation so compilers use intermediate representations to be able to Transform and optimize that code so an intermediate representation is just a data structure That represents your code in some way and these irs have a specific data structure that allow for manipulations So the next step is to perform middle-end optimizations A good example of middle-end optimization is that code elimination for example with in the CranLift IR CranLift can say well this piece of code is not reachable I'm just going to get rid of it and as long as the program keeps its functionality that code is now optimized Then the next representation we perform all the next process is a process called lowering in which you go from the machine dependent high-level IR to a more machine dependent IR which is called a V code is which stands for virtual registered code In this case this IR is way more dependent on the machine code that we're going to generate and So we need to create one of these per target architecture After we have the virtual register code we perform register allocation Which is the problem of assigning values to physical registers in your machine? And this is crucial here because the compiler is going to apply a lot of heuristics to try to Match those values into the available registers of your machine So there are many trade-offs that happen Here in this in this process, and it's that's why in the CranLift report that I showed earlier Chris says that this part is really heavyweight and then once we have allocated all our registers We can just perform the code generation which is emitting You know the instructions into their binary format Once we have done all that what's going to happen is that was sometime is going to get all the compile functions that were sent into CranLift in parallel and Create an object file that which is the thing that is given back When you when you create a new module or when you ahead of time compile a module to machine code now It's it's time to present how winch does things and there isn't a lot of differences in in in the main in the first and the last entry point in the sense that we compile the functions in parallel and We create an object file in the end, but the main difference is how we Compile its function into machine code, but before getting into that I'm going to present how winches are structured in in You know in the individual pieces that it has to make that compilation possible So the first thing winch relies on this crate called wasm parser, which is the crate responsible for Validating and and parsing web assembly and then we have a very naive register allocator This register allocator the grantees that it gives us is that whenever you request a register is going to give it back to you And and how it does that well It just is going to spill everything into the stack in order to free all the registers that have been used and Give you that back for a particular operation Now we have a shadow stack, which is A data stack data structure that is going to mimic the web assembly value stack Which is going to serve to keep track of all the values that are live at a particular point in the program And then last but not least we have a macro assembly, which is an implementation of Instruction emission per target that we support so in this case for winch we support x86 and ar64 Which is it is x86 and arm Which are built on top of a crane lift cogent x86 64 and crane lift cogent ar64 Which are the libraries are in charge of you know encoding the binary representation of all the instructions in these targets so now we get to the compile function piece of what happens in winch and Here what happens really is just a one-step process in which we Get a wasm we register allocate and we co-generate in the same step, and then we get the machine code now someone could ask can I can I get the best of both worlds and I Guess it depends on what you're trying to do here but I'm going to speak solely from our experience at Shopify and We think that it's possible if you have a model like this Let's say that you have a model in which you're for example having a functions as a service application and You can think of well you get a request and you're caching your pre-compile modules You know in a key value store or something and you get a cache miss well You get a cache miss, but you want to serve your application as fast as possible You can compile with winch that is going to give you faster startup times and then at the same time Create a background job that is going to compile with crane lift so that your next request when you get a cache hit You get you can serve that request fast and also have good runtime performance You could do this while there is no automatic tiering in wasm time Which is a thing that we haven't planned yet But the idea of automatic tiering is that all this will be done by the runtime itself, which is what happens in browsers today Other nice properties of winch simplicity is that it's potentially easier to integrate wasm debugging So all the transformations that we saw that happen in crane lift make it so that when you generate machine code It's really difficult to see where that machine code came from or you know Track it back into the original source code, which makes debugging more difficult And it's also easier to iterate on web assembly proposals winches is very simple So if you wanted to support a new proposal, it's faster than having to do all the necessary steps in crane lift to support a new proposal Now I Want to highlight a bit the progress to date This project has required a lot of design thought and research So there are several milestones that are I think are worth highlighting here in this talk And the first one is that the integration between wasm time and winch is has now landed as of last week So you would be able to use winch in in wasm 10 version 9 Under a feature flag if that's something that you want to try out Just keep in mind that you know this project is still very early and the amount of instructions supported are just getting there so This takes me to the next Thing that I wanted to highlight is that core wasm support You know before supporting all the instructions We needed to make sure that the compiler foundations are solid enough And I think that they are right now which are giving us more traction to To support more and more instructions of of core web assembly We are giving priority to x8664 because it's where most of the code runs But we would like to give the same priority to the ar64 back end too because you know some people use M1 machines to develop So What's next right? So I think the main thing that has top priorities to finalize support for core web assembly ideally We will want to have win support All the things that creative support so that switching can be seamless between the two compilers We also want to invest in fussing while finalizing support for core wasm I think it's important to mention that wasm time is very well fussed. It has Fussing targets for all of its features So we would like to do the same with quench a given that it's integrated tightly with wasm time and then have a concrete benchmark suite Which we don't have one right now now I Cannot finish this talk without giving some performance a sneak peek on how things are going and Just before getting into into that when I started this project I did some research on what would be the gains of using a base and compiler versus Grindlifts, right? So I took other two production ready Base and compilers, which are the spider monkey base and compiler, which is the one used in in Firefox and then I also took lift off which is V8 space and compiler for wasm for wasm and The estimations that I saw here is that by using a base and compiler we can get 10 to 15 percent 15 times better compil compilation times than by using an optimizing compiler, so Where are we with winch, right? So with winch We are here right now in which functions that don't require any optimization by Grindlifts are Already twice as fast and compiling with Grindlifts while we add more support for more wasm instructions, we're hoping to to get here in which the average program is going to get 10 times faster compile times and Ideally when we support all of wasm's Instructions set we'd like to get here in which more complex programs like for example the games that Bailey was showing earlier Are going to get even more benefits on on the start of time like for example 15 times Benefits so I guess that's mostly it for the talk if you want to contribute to winch You just know we're just joined the conversation. We meet bi-weekly there is a Repository under the bike alliance Called bike alliance meetings and there is a winch sub directory Which has the instructions and how to join if you're interested in this so yeah, yes, that's it for me Thank you very much and if some one of the developers would mind linking That repository in the web assembly chat on the cncf slack I know everybody here would appreciate it. I saw a phenomenal talk I know I've got a couple questions, but first let's throw it out to the audience. Does anyone have any questions? Please introduce yourself with your question Hi, my name is Rasmus My main reason is why a compiler versus just a straight-up interpreter for this Yeah, yeah, that's that's a good question and that's something that we discuss with the with the washington time in chronicle teams and The interpreter is not out of scope But the the baseline compiler is the one that has a better trade-off between Startup time and runtime right so for a Shopify at least for a case interpreting was going to give us Probably the same or better start up time, but worse runtime, which is something that we wanted to avoid so yeah Other questions What is it for the benchmarks? What kind of benchmarks will you use to gauge the performance? You talked about that it's going to come in the future And do you have a timeline for that and what would be the mechanics of that or how will that be? communicated The benchmarks or what kind of benchmarks to yeah, yeah, and when you know, what's the timeline and how will that be? Communicated yeah, so I think the benchmarks under the washington time we have Sorry not under the bike alliance. We have a set of benchmarks that are used which are programs that you know have our Have a wide range of variations So I think we're going to just use those And and once we get into Supporting more of core wasm instructions I will be I think that will be the timeline which should be later this year I think around q3 we might see some more complete benchmarks that does that answer your question, right? Bruce will follow up offline with that that question. I know we've got a tight schedule to keep I've got a quick question for you and this may be a naive one But are there any use cases here with streaming the payloads in or partial payloads or anything like that with a compilation? Do you have to have everything already downloaded to compile or can you do can you start execution with you know? Half the half the payload download. Oh like doing something like a streaming compilation Yeah, so I think that's something that should be possible But there are some other crates that are some other pieces that we might need to Modify in order for this to work like for example wasm parsing We wouldn't need to be able to parse while we stream and then I start compiling each of the functions as they come in But it's not something that we have looked into as part of this project Okay, this is really incredible work, and I'm excited to see where it goes Thank you so much for all the hard work and the great talk today. Thank you