 So, hi, I'm David, aka Yorick, and I'm going to talk about the binary AST, the JavaScript binary AST, bin AST. So Previous Talk was about furniture, the one before that about components. I haven't heard any talk this weekend about startup performance, and the sad fact about startup performance is we are not very good about it. I just realized that I forgot to give the credits, so this is about a work between Mozilla, both the company and the community, with the help of Facebook, Bloomberg, and Cloudflare. I am at Mozilla, part of the JavaScript team. So I mentioned startup. We as a web developer community are not very good at that. How many people in this room know how long it takes for their application to start? A few. Good. How many fit within three seconds? Great, I see two of them over there. So a few years ago, Google, both as double click and as actual Google, run tests on the web to find out how fast web applications start. So double click, it's an ad agency. They all have all sorts of interesting numbers. In particular, they know if you're looking at a page or not. That doesn't sound good. What should they do? They know whether you're looking at a page or not. And so they have numbers. So if your page takes more than three seconds to load, you have already lost 53% of your visitors, at least on mobile. And the Google web preference team run tests. So I don't remember whether it's, oh yeah, it's a median value. So on desktop, at least half of the applications took at least eight seconds to start. On mobile, it was 16 seconds. That's not good. That's not good for our users and that's not good for our applications. So the question is, what can we do about that? So the problem is, hey, code. Well, okay, we are in the dev room. So good thing. We know a few things about code. But the problem about code, in that case, is not code. It's that we have lots of code. So maybe the people who know exactly how many seconds it takes for their application to start up know how much code they have, I personally generally don't have a clue. When you create an application, you typically pull lots of dependencies. I mean, if it's a simple application, you can probably get away without dependencies. But most applications tend to pull lots of dependencies. And lots of dependencies have a cost, even if you don't use these dependencies. There is a cost just to the code. Code itself, not its execution, just to downloading and preparing the code. So part of this cost is parsing. Parsing is a computer, reads the source code, and tries to turn it into something that is eventually going to be able to execute. Those are numbers from the same study of how long it takes to parse one megabyte of JavaScript code. So on a MacBook Pro, that was just a few milliseconds. And on some platforms, that's more than six seconds just to parse your first megabyte of JavaScript source code. And I'm not going to give names, but I know some web applications that have more than 40 megabytes of JavaScript source code. So multiply accordingly. It's mostly proportional. Before we try to solve the problem, let's lake, let's take a very quick look at how JavaScript starts. It does things. So the server doesn't do much. I'm assuming a web application, and I don't care whether the server is running JavaScript. I only care about the browser here. So the server doesn't do much. Just sends the file. The file is text. It's called the code that you have written. The JavaScript virtual machine, any virtual machine, is going to do a full parsing, even if it's not called like this, but it's going to do a full parsing of all your files. So turn them into a data structure. Then it's going to run static analysis. It's not something widely known, but all the VMs need to perform static analysis before they can execute JavaScript. Not a lot of static analysis, but sufficient that it takes time and that it prevents lots of optimizations. This gives you a part of the source code, state in memory, and this part is then compiled to bytecode. The bytecode can now be interpreted. I'm not going into jitting. That's just before you can start execution. Other versions of this pipeline had more operations, but that's as much as browser and VM vendors have managed to reduce it. Well, that still takes lots of time. So how can a web developer do to improve the situation? There are a few things that can be done. I didn't include lazy loading, but the typical optimization that you can add automatically to your tool chain are uglification, optimize.js or something like this. So uglification is going to reduce the size of your source code. Optimize.js or something similar is going to add IIFE, which are a bit faster to parse. And then you still get this entire chain. So thanks to uglification, this thing is a bit smaller, not much. And thanks to this one, this part is a little bit faster. We still have lots of work to do. Now look at something completely different. I took, this is a rough outline of how .NET applications start. They start much faster. Of course, they don't need to download. You have installed them already, so you don't need to download anything. But also most of the work that we are doing on that side in .NET happens on this side. All the static analysis is performed early. They have lots of other analysis. They compile. And then when you load, this is not a server anymore. So that's your binary on your hard drive. When you load, you don't even need to read the entire application to memory. That's pretty quick. If your application is large, you only read bits of it. There are still a little bit of analysis that needs to be done. But first, you need to do it when you load the application on the part that you actually load. And most of the work has already been done here. So the end result is this is way faster. So we set out to try and see if there is a way to change how JavaScript is loaded. To try and possibly get something close to that. So this is the bin AST, the JavaScript binary abstract syntax tree. And you actually don't care about the fact that it's a binary abstract syntax tree. Don't panic. You just need to remember that she looks cute. Her name is Bast, by the way. It's an Egyptian goddess. Well, it's a rendition of an Egyptian goddess. So with Bast, we try to fix a number of the points that we have seen in the early puzzle and try and make them faster. So parsing is really clear on what parsing is. Is there someone who needs a reminder of what parsing is about? So apparently nobody could. Parsing is slow. Again, parsing, that was the large graph that took sometimes six seconds on some platforms for one megabyte. Parsing is slow in many languages and parsing is even slower in JavaScript. Because let's face it, the syntax of JavaScript is a bit weird. If you see a slash, is it a division? Is it a comment? Is it a regex? Could be anything. If you see four, is it a keyword? Is it an identifier? It's probably not an identifier, but it could be a property. So it's not as easy as many languages. And it's even worse, strings are complicated. Because JavaScript tries to represent strings for multiple languages and you do not store in memory ASCII or Korean with the same memory model. Because it wouldn't just blow some applications if you use the same model for everything. So even strings need to be verified and then optimized. So every time you see a string in the source code, well, the parser is going to spend precious nanoseconds trying to fit the string as best as possible in its model. Verifying that it's a valid string. Doing the escapes, et cetera. But all of this is nothing in terms of complication with respect to eval. Eval, I assume that everybody knows about eval. I assume that most people don't know how evil eval is. You don't want to know. Fun fact, there are actually four different eval functions. You really don't want to know. But the eval itself complicates parsing. Just the fact that there could be an eval somewhere in the file means that your parser has lots of work to do and lots of optimizations that cannot be done. Also, as I mentioned, so that's, by the way, one of the static analysis that we need to run when we parse JavaScript. Also, in .NET again, we can only parse bits of the executable. In JavaScript, we cannot. We have to throw a syntax error as early as possible. This means that we cannot simply skip parts of the code. Lots of things that get into our way. Also, closures complicate things. Static analysis is needed for closures. We could simplify it. Let's assume that our hands are free and that we can do anything. Everything is permitted. All is fair in love, war, and optimizing JavaScript startup. So if we could redo the syntax of JavaScript, just the syntax, not the language. Just the way it's stored on the hard drive. We could simplify tokens. We could simplify strings. And we could preprocess all of the things that need static analysis. And that's exactly what we do with binary ST. Instead of having a function foo and pay, look, there is no eval. It's an empty function. So yeah, there is no eval. So we're going to store a binary representation that looks a bit more like this. So hey, we have a number of names. There is a name foo. There are probably plenty of other names, but I'm not going to put them on this slide. And then we have a function declaration. Its name is the first name of the list. So name number zero. It doesn't have any eval. It has a body, et cetera. Just by the fact that we have written here, there is no eval. Instead of letting static analysis look for an eval in the entire code, we make things much faster. So just don't, we don't think we make things faster. We actually have numbers. We have a function of Firefox that has hidden behind a preference. The ability to use binary ST instead of the addition to the usual source code. And parsing plus verification goes down by one third. So that's one third of the six seconds you saw in one of the first slides. And we have lab experiments that show that we can do hopefully twice better just by this simple change. So what we have done so far is basically we've changed the order of information inside the file and replaced and used numbers instead of using words. But that's basically the same thing. It's basically a compression format, a weird compression format that changes the order in which the information is presented, but it's still the same programming language. No, that was for the parsing speed. But hey, we talked about downloads. I just mentioned compression format, so let's talk about compression. We'd like to do something better than minification. Actually, we would like to completely get rid of minification. I don't know if you've already faced minification bugs, but minification, yeah, I see someone who has, but minification is an inherently unsafe operation. I mean, there are a few things that you know you can always do, such as removing comments. But many of the operations that say Uglify or other tools are going to do can break sometimes. And you're not really happy when it breaks because someone somewhere in the tool chain decided to apply some optimizing tool that changed the behavior of the code. So we don't want minification, but if we lose minification, we're going to increase our download size. So while we are changing how JavaScript is represented, let's see if we can make it more efficient in terms of bandwidth. So this is a random example of things that happen in quite some few JavaScript files. You import things with require. Doesn't have to be a require, but that's an example. Or you have if constants.dvg do something, probably some logging or some additional tests, et cetera. The actual details of this code don't matter, but what matters is that you have probably a written code that looks like this at some point in your life. It may not have been called log. It may not have been my logger and my module here, but it kind of looked like it. More generally, when we look at source code, there was an entire session dedicated to machine learning on source code because you can learn many things. I mean, today I chose them. There was one and I think it's still on. And as I mentioned, you can learn many things about the source code by looking at it. Doesn't have to be deep learning. But for instance, strings tend to repeat often bulb across files. Hello world is something that happens quite often. More generally, it's very rare to not have a JavaScript file that does not call console or if you're in node require. Prototype is a property that comes very often. And things, if you have a variable X inside your code, if you have declared a variable X, there are pretty good chances that you're going to use it at some point. So you're going to have several instances of the same identifier. And you're going to have patterns of usage of this. Similarly, the structure console.log or console.whatever, that's a dot applied to console and to log. And yeah, that's something that happens a lot. Function.prototype.something, again, patterns. And once we know what kind of patterns to look for, well, we can ask the computer to look for it. And we set our computers in front of the corpus and the computers learned how to predict the code that we are writing. And with this kind of things, we can ask the computer to predict the code. And it's of course going to be wrong quite often. But in the cases it's right, we don't even have to write the code in the file. So with this kind of things for the actual constructions in the code, such as expressions, arithmetics, calling two string, things like that, we managed to actually get down to 1.2 bits, in average, per construction within our tests. We still need to check how good our tests are. It's still a work in progress, but it's pretty encouraging. It's a bit more for strings identifier and property uses, but it's still two to six bits for each use of hello world, which is much better than writing hello world. So it's still very much a work. This compression is very much a work in progress. I landed a patch in compression this morning, but with a good dictionary that we have obtained by learning things on large enough corpus, we obtain performance, compression performance, that's basically the same thing as minification plus broadly, except we did not minify the code. We removed the comments, okay, fair enough. But besides the comments, we did not minify the code. And we were pretty sure that we were going to be able to improve it further. So, so far, we have started that much faster, download that's comparable to broadly, plus minification, so we have not lost anything and we have gained performance. But we're not done yet. We have a new thing, a few more things in store. So, one thing that happens quite often in source code is big surprise, you have code that's executed now and you have code that's executed later. Yeah, probably not such big surprise. As I mentioned initially in the slides, well, right now JavaScript source code, we need to parse and static analysis and static analysis and everything. But turns out that with the information we already shipped in BNIST, in BEST, we actually don't need to run the static analysis on everything. And in fact, we actually don't need to parse everything. So, instead of this, we're going to store separately or at least not in the same place in the file. That's something we need to start and that's something that we're going to use later. So, again, we have reordered things in the file. And with the additional information that we have stored, with the additional information that we have stored, we have the ability to restrict what we are doing to the code that we are actually using. We don't need to run the static analysis on code that we're not using. Even though our static analysis is faster, but we don't even need to run it at all. In many cases, or at least we don't need to run it yet. We don't need to parse. We don't need to compile things that we don't use yet. Ideally, we could parse, compile, and execute while we are receiving. So, we could, there are a few issues here that we have not solved yet, but some of it works, some of it not yet. Imagine a world in which we can start executing JavaScript while we're still downloading it. In this world, we would be effectively streaming source code. And so, it's not that surprising if you look at how native or almost native executables work. They only know what's needed from the disk. Ideally, we'd like to do this from the network. So, just the way, the same way we stream video or audio, we would be streaming code and executing it as we receive it. There are a few complications that we have not solved yet, but some subsets of it we already have hidden somewhere in an experimental version of Firefox. So, with numbers that need to be confirmed, we have not checked these numbers yet. So, the other numbers so far were checked, these ones are not. But we expect that we can divide by four the time spent parsing during startup and completely get rid, basically, of the time spent compiling because it's going to happen at the same time on a different thread. So, hello again, best. So, that's the new scheme of this ideal world in which everything works. Again, we have not finished everything. It's still a work in progress. Very encouraging but still a work in progress. But with best, much of the work that we were doing in the browser during the startup of the application is now done on the server or actually in the tool chain. We run the full parsing, the static analysis, the laziness analysis and compression as part of the tool chain, send smooth chunks. Once we have a large enough chunk, we can start doing getting our partial representation, our partial compilation, doing little parsing, little analysis, little compilation and execute. And we are very hopeful that this is going to make the web a much nicer place. An important part is that this requires no coding. It's not like the refactoring of big applications to have lazy loading, which is often complicated and often backfires because you have to be exactly sure of what is lazy loaded in which order. Otherwise, you just spent your last two months making your application slower to load though you wanted to make it faster. This is something that just replaces, well, whatever is the last part of your tool chain, say webpack. We have not changed the language. It's not a competitor to wasm. It's not a new programming language. It's just a compression format. It's a very, very dedicated compression format that tries very hard to be good for JavaScript. But it's a compression format. And with this, so just already the 30% that we can confirm already a pretty good game. We reduce the total work and, well, hopefully also the energy used. Here's the code. Thank you for listening.