 Hello, everyone. So this session is called Master of the Art of the AST. But before we explain what an AST or an abstract syntax tree is, let's talk about why it's an art form that we would want to master. So if you're using JavaScript today, you're obviously using a lot of tools. And maybe you've even heard about this expression called JavaScript fatigue where you have to learn these tools. But I would argue that the real frustration comes once you've begun to rely on these tools and you find that they fail you. There is some breaking point. But we're going to see that more often than not, there's something you can do about it to solve this problem yourself. And we're going to demo that today by solving these three problems today together. So we're going to see how you can write your own linting rules to enforce your team's conventions, how you can write your own transpiling logic, and how you can write code mods or code modification scripts that will allow you to do a large scale refactoring very quickly and with confidence. These are the three tools that solve these problems today in the JavaScript ecosystem. You have ESLint as a linter, Babel as a giant spiler, and Facebook's JS code shift as a code mod runner. Now, other than the fact that these are open source tools on the web, which means, of course, you can go on GitHub and read their source code, what's unique about these three tools is that they were all developed with this idea for a plugin-based architecture. So that means that the original tool creators went to a lot of trouble to let you extend these tools and give them new capabilities without having to understand exactly all of the internals of how they work. Now, personally, the one that I got started with is ESLint. ESLint comes with about 250 built-in, very powerful, very well-tested rules. But I thought that the behavior that I was getting from one of them, I considered it to be a bug. So I did what you normally do when you encounter a bug. I went on GitHub. I opened up an issue. And I waited very patiently with patients for one of the maintainers at their earlier convenience to give me a response. And so I was able to wait for about 10 minutes. And then I decided to go under a chat room and start hunting down one of the maintainers to give me an answer. And despite the fact that I was pretty pushy, they were very friendly. And what they explained to me is that this behavior that I'm describing, what I consider to be a bug, is not really a bug as they see it. But what's unique about ESLint and what makes it different than previous JavaScript linters that came before is that they really encourage you to write your own rule. And it's meant to be extensible. And so they pointed me to the docs explaining exactly how to create your own rule. And I learned that in order to create an ESLint rule, I have to learn these two concepts. So ASTs, abstract syntax trees, and the visitor pattern. It wasn't very difficult. And after kind of learning the basis of it, I was able to write my own ESLint rule, which became a part of our build chain. And now whenever somebody violated this convention, the build would break. We would catch it early and fix it. Fast forward a few months later, I wanted to write my own Babel plugin. And to my surprise, I didn't have to learn any new concepts because Babel plugins also rely on ASTs and the visitor pattern. And as you can guess by now, the same thing happened when I wanted to write my own code mod as well. So that's why with only 30 minutes left to this talk, we're going to be able to get all this done and learn the basis of extending these tools. So they're no longer a black box and instead something that you can open up and tinker with. But ASTs are not unique to just these three tools. Pretty much any tool in JavaScript ecosystem that you're using today that needs to read JavaScript or output JavaScript does so by manipulating and analyzing ASTs. So it's a worthy skill to have. A bit about me. My name is Jonathan Mevorach. I'm a front end tech lead at Sears Israel, where our mission is to reinvent the online and mobile shopping experience. OK, so what are abstract syntax trees? Simply put, an abstract syntax tree is your original source code represented as a tree data structure. But let's see an example that will clarify this. So here I have pretty much the shortest piece of JavaScript code I could think of and it's matching AST representation. Let's take a look at the code and the AST and kind of figure out why they match. So you have an assignment, var foo equals the string bar. And your AST always starts out as a root node called the program node, which has the body, which is just a list of all the statements in your code. Each statement would have its own node. In this case, we only have one statement, so variable declaration, which is composed of the kind. That's that left-hand part, var. And you have the variable declarator, which is the name of the variable. In this case, the identifier foo, and the value of the assignment, which in this case, since it's a hard-coded string, anything that's hard-coded, the name of the node that relates to it is called a literal node. So this should make it a bit clearer, but let's see if we can even improve on this and make it clearer. So this is the exact same AST, but represented as a JSON object, and as developers, I'm sure this is a lot more nice to look at. One thing to notice about the JSON representation is that each of the nodes always has this type property. So there is a set list of available types that you can have in a valid AST that describes a valid JavaScript. And you kind of have to learn the properties of each one, but you never have to really memorize it, and we're going to see why not in a minute. But do notice that if you know the type of the node that you're on, then you can know which other properties it's going to have, and then you can understand how to read that node. There are four steps we have when we're working with ASTs, and we're going to go over each one. Well, actually, if you're only writing a tool that needs to just read JavaScript, and it doesn't have to output JavaScript, you can only get away with just using the first two. So the parse step. The parse step is all about taking your original source code and converting it into an AST. There are several open source parsers online, and what they all share is that they expose a parse function that takes a string, which uses your original source code, and returns a JavaScript object, which is the AST. Now, for the examples that we're going to take a look at today, you don't have to choose a parser because all the tools that I mentioned already chose one as default, which just works, but you can actually, in most cases, swap it out for a different parser if you think you need to do it. This is a great resource. If you only remember one resource from this talk, make it this one to get started with the AST tinkering. It's called ASTexplorer.net. And it's a similar tool to kind of a JS bin or a code pen that lets you play with code online. And what you can do is you can just type JavaScript on the left, and it will show you the matching AST on the right, and it's interactive. So if I have a lot of code here, I can click on the line on the left-hand side. It'll take me to the matching node on the right-hand side, and I can also hover over nodes here, and it'll highlight the relevant part in the code in the left-hand part. So this is why you never have to memorize the set list of available types, because you can always just paste in some code here similar to the problem that you're trying to solve and be familiarized, again, with what the type of node is called and what properties does it have. So this is just a parser online ready for you to play with. Now let's talk about the traverse step. Why do we need to traverse our AST? If we're reading JavaScript, we're trying to analyze something about that JavaScript, and we only care about some elements or some attributes of our code, not all of them. That means that we first need to find the nodes that we care about to solve whatever problem it is we're trying to solve. So let's talk about the traverse step in the context of the first problem that we're going to solve today together, and that's writing your own ESLint rule. Let's see the example that I told you about earlier, the one that I got started with. First the problem, and then the rule that solves this problem. We're going to be talking about something called code piggybacking, and don't worry if you haven't heard of it because I just made it up recently. So this is piggybacking if you're not familiar with it. It's when you stand on somebody's shoulders. You don't have to have an actual pig involved, but let's see what piggybacking looks like in code. So here I have ESLint hooked up to my editor, so I get this nice integration here, and it's showing me that something is wrong. This is one of the built-in ESLint rules that catches any variable that's not defined. So it was no var statement saying that there's a variable called ACME. It's not a function parameter. It wasn't imported via an import statement. It's just something that's assumed to be globally available, even though it's not. It's clearly something that we just made up. So this works as expected. But what surprised me is watch what happens when I do this. I changed ACME to be a property of the window object. And because window is whitelisted, because ESLint knows that I'm writing browser code, so console is whitelisted and window is whitelisted, it doesn't check anything beyond that point. But if you're writing browser code, you know that anything that's a global variable is also a property of window. So in terms of the runtime applications of this, it's exactly the same. And yet ESLint ignores this completely when ACME starts to piggyback on window. So that's the problem that we're trying to solve, because without this, people will be able to commit code like this. And in runtime, this variable might be undefined and et cetera. So let's see how the rule that solves this works. This applies to this rule, but also to any ESLint rule. The structure is the same. So an ESLint rule is a node script where I have to return an object. The properties of the object, their name, has to match one of the known AST valid types. And the function that we pass in as the value will be called for each node of that type as we traverse our tree. But let's figure out what we need to input here for our own rule. So we care about cases where it's window.something, where that something is not available. But if we zoom out a bit, we're first to need to find all nodes that are just of the type something.something, right, foo.bar, a.b. So we go on ASD Explorer, and we paste something like that. And we wait till both are highlighted. And we can see that the name of the type is called a member expression. And a member expression will always have an object property, which is that left-hand part. And you can see it's highlighting it. And the right-hand part is referred to as the property. So this is why in our rule that solves this problem, we're going to write member expression here. Now, this function will be called for any member expression in our code. But we don't care about all member expressions. We just care about the ones where the object is window and the property, the right-hand part, is something that's not a valid global property. We're going to see how that helper function is implemented in a minute. But assuming that this condition is correct, then I already found the piece of code that essentially means that somebody is piggybacking on top of window. And this is something that I want to report to catch early. So ESN exposes this context.report method. And I can pass in whatever error message that I want. And this is how this helper function is implemented. ESNT also analyzes scope on your behalf. So it exposes this variables array and already prepopulates it with anything that's in scope in the environment you're running under. So as a user configures ESNT, you tell it, are you writing a browser code? Are you writing a node? And it will prepopulate that with the known globals in that environment. So for a browser, this would be document and window and console, et cetera. And as it would iterate through this node, it would see that in our particular case, there is no ACME variable. And that's why it will tell us that this is, in fact, an invalid global. So let's see this in action. This is, again, ESNT integrated with the editor. And my silly little piggybacking error message gets the seal of approval because it looks very nice when you get the integration with ESNT like that. Yeah, so ESNT rules, custom rules, you can support anything that one of the built-in ESNT rules does. Now without knowing, we use something called the visitor pattern here. Just the fact that I didn't have to tell ESNT how to traverse a valid tree, how to go from the root node all the way to the bottom nodes. I only had to tell it which kind of nodes I care about and give that callback function to be called for each node of that type. That's making use of the visitor pattern. And that's going to come in handy in the next examples as well. Now let's talk about a tool that needs to output JavaScript as well. And it does so by manipulating the ASD of the original JavaScript file. The most well-known tool that does this is Babel. Babel started out as a project called 6 to 5. And its main objective was to let you write ES6 and compile it down to ES5, which can run on any browser. But they realized that the problem that they solved is more generic. So they rebranded as Babel. And they went into a lot of trouble so that it's now based on this plugin-based architecture. And you can extend it by writing different plugins. The example that I'm going to show now is actually is one of the built-in Babel plugins. I chose it just because it's very simple but helps to get the message across. But a third-party plugin would also behave the same way. So first, the problem that we're trying to solve. You might be familiar with the debugger statement in JavaScript. If not, it's just a keyword that lets you trigger a break point immediately from code. And it's great for development, but it's not something that you would want to ship to production because under some circumstances, it could actually be triggered for your users and then your entire app would freeze. So as part of preparing our code for production, as part of us using Babel, we can choose to just get rid of it completely. So if I run Babel, it will produce the distribution version of this file. And it would be the exact same code. Only the debugger statement is gone. Now let's see how it's implemented. So this should look familiar to the ESLint example. A Babel plugin is just a node script where we return an object. And here it's even wrapped inside this visitor object, which reminds us that we're using the visitor pattern. But the same rule applies. So we need to figure out what's the name of the type of node that we care about, which again, we can use AST Explorer for. So if I just type debugger, I see that since debugger is a reserved keyword, it also has a reserved node AST type all on its own, which is called debugger statement. And for any debugger statement in my code, I can call path.remove, which is essentially modifying and manipulating the AST and removing that node from the code. Of course, you can apply some custom logic here, do some other conditional checks. But for our example, if we find the debugger statement, we just want to get rid of it completely regardless. Now let's talk about the generating code step. The generating code step is kind of the reverse of the par step, right? The par step was all about converting our code from just a source string into an AST. Now we have our modified AST, and we want to regenerate it back into code. Now you implicitly have to make some choices when you're generating code from the AST. Are you using tabs or are you using spaces? Windows line endings or Linux style line endings? When we are using something like Babel, we don't really care about which one we're going to use, Babel code is typically just throwaway code, right? You regenerate it every time that you push to production, and during development maybe even every time you hit Save. But now we're going to see a case where we're going to need to be a bit smarter with the mechanism we use when we generate code. That's the case with JS Code Shift. JS Code Shift is a framework by Facebook which tries to solve the problems of doing very large scale, very tedious refactoring. We're doing so manually, it would be very time consuming and potentially error prone. So what it lets you do is it lets you write something called a code mod, code modification script, that describes how you want to change your original source code, and it will iterate through your files and change it accordingly. So it essentially means that you're going to let machines write code for you. Is there anyone here who already dismissed this because he thinks this is a terrible idea and it would be super buggy and never let machines write code for you? So this was my initial reaction at first. So what I would say to that is that you should be skeptical. You should definitely understand the implication of letting a machine write code for you. You should test it very well and you should understand all kind of edge cases that you might have missed initially. It's not something you just run, push to production, go home. So skepticism in this case is healthy, but this has merit. It's all a trade off of is it more time consuming to test the code mod very thoroughly composed of doing the refactoring manually. So what are the type of things we can use JS CodeChiff for? Well, kind of the most well-known thing that people are using this for is for doing something that's kind of the reverse of what we use Babel for. When we're using Babel, we typically do it to convert it from the newer JavaScript syntax down to older syntax that can run on any browser. But what if we have a large code base that was written long before the days of ES6, ES7, ES8, and we want to start adopting these latest best practices? For example, what if we want to get rid of all our boring old var statements and turn them into shiny new let statements? Let's see one option of how you can try to tackle this with a code mod. So in this code mod, what we do, again, a node script, here we need to actually explicitly parse our code, but JS CodeChiff does it. So J is just a wrapper around the JS CodeChiff API. We pass in file.source, and we get root, which is the root node of our AST, which contains the entire AST. We're doing something similar to the visitor pattern, but no longer with this object syntax. So here we just call root.find, and we tell it which kind of node we want to find. Because we're trying to get to do something on all variable declarations defined with var, we pass in variable declaration kind var. Again, variable declaration, you can find that out on AST Explorer. Then we pass in the function that will be called for each node of that type, and inside what we want to do is create a new variable declaration, pretty similar to the previous one, that's that P parameter, but different in a way that now it's defined as a let statement. Now this is just an object in memory, so the next thing we do to actually change the AST is we swap them out with the replace with method. And the last thing we do is we call root.source. Now remember, we're talking about in the context of the generate code step, what JS Coachif does when it regenerates our AST into code is it tries to understand everything about the original code style that we were using. So are using spaces or are using tabs. And it will try to change as little as possible about the source file, because it's not writing a new file, it's actually overwriting your original file. What this means is that when we go into our get source control, whatever, diff window, before we commit this, we're gonna be able to see exactly what changed and it will make it a lot easier to pick up on potential bugs introduced in this code. Speaking of bugs introduced by this code mod, does anybody have a problem with this code? I'm not sure if there are no hands up or I'm not seeing them, but you might have noticed that what I did here has actually turned any let statement, sorry, any var statement into a let statement. Now var and let aren't interchangeable. There's a reason that we have this new let keyword, right? It means something different. Var declarations are function scoped and let declarations are block scoped. So that means that we can't just change it like that. We have to check how the variable is used because if before we were to define the variable with var inside a scope and use it outside of the scope, it might have been like smelly frowned upon, but it would still work. And now it would actually be a syntax error because let enforces it so that you can't use the variable outside of the scope where it was defined. The good news is, is that there is a better code mod that solves this a bit longer. More lines of code. It's open source on the web created by Christoph Nakazawa and it's called the no vars code mod and it does this switch intelligently. So it checks how the variable is used and it will even promote it all the way up from var to const if it can. Let's try to see it in action. So here I'm just calling the JS code shift command line tool and I'm telling it to use the no vars transformation on the Sweepo. What it will do, it will spawn different child processes just so that if your code base is indeed very large, it would do so very quickly. And oops, sorry. And if I go in and see the changes, I can see that all my original var statements, most of them are now const statements. But in some cases, like here, we can see that one of these variables, I guess, was being reassigned to. So it was only able to promote it all the way up to a let statement. This repo isn't one of mine actually. What I did is I went on the global issue search on GitHub and I looked for people who maybe aren't familiar with the fact that you can do this type of refactoring very easily with code mods and I looked for any open issues that want to replace var with let and I found this one. This person wrote, it's 1017, use let instead of var or at least that's kind of the tone that I'm imagining in my head when I'm reading this. And what I love about this also is that this was open in 2016. So now we'll pray to the wifi gods and try to commit this and we can open up a pull request. So it's really that easy. Now I trust this code mod because I've used it before and I trust the person who wrote it and it's well tested but yeah it's not something that you would definitely do as quickly like that. But once you have confidence in your code mod it can be that easy. Thank you very much. This repo up here on the top contains a bunch of links related to getting started with the ASDs. So if you want to get started this would be something that I would recommend and also feel free to reach out to me on Twitter. I have open DMs. You can ask me, tell me if you're getting stuck I'll be more than happy to help you out as well as here at the conference. Thank you very much. Thank you so much, Yonatan. I have a few questions for you before we break for lunch. Can you be more specific about the edge cases and testing that you mentioned for these code mods? So the code mod that I use in production isn't this one that I showed you but the one that I wrote first of all I got started with the one that I found online and it's kind of the basis but it was testing it across the code base. When you're working with a code mod you never really have to kind of run it and commit the files that you changed because every time you try to merge back from master it would create a conflict. So what you do is you always kind of test it out and throw the code away until you gain confidence. And sometimes you encounter like little edge cases where for example the code mod that I was writing was converting defined required JS AMD style model syntax into ES6 model syntax and I found that people were doing crazy things and like a handful of files would like return a different model depending on some condition or something like that. So for those cases what I would do is I would fix them manually, write an ES lint rule so that nobody tries to introduce any more files like that while I'm working on the code mod and then eventually the whole branch is green and then you can push to production. Nice. We have time for a couple more questions before you break for lunch. It seems like the AST doesn't have any style information like spaces and semicolons. How do you write rules for ES lint to check for those? Right, so the A and AST is actually abstract syntax tree because originally they were all about you normally use ASTs when you're working with compilers and stuff like that so not when you're exactly depend on white space but in fact all the parsers that these tools use actually produce something called I guess you can call it a CST it's a constant or AST oh sorry, a concrete AST which retains everything about the original source code so it knows the location of every character and it knows white space. Nice, a concrete abstract syntax tree. How clever is ES lint for analyzing scope? Does it know about closures? How does it collect scope information? So ES lint is using underneath the library called ES scope which you can use if you're writing something from scratch but they really did kind of like do all the heavy lifting for example for the example that I showed they're using ES scope to understand scope and they're also using a different package to pre-populate that variables array according to your environment so they do a lot of the heavy lifting for you which lets you be very powerful like if you take a look at a source code with most of the built-in ESL rules most of them are really really short because a lot of the harder work is done internally inside ESLint. Awesome, thank you so much Yonatan. Another round of applause.