 Hi there and thank you for joining my talk what the AST I Like starting my talks off with a question and I know raising raising your hand in this Virtual talk is really difficult. Um, but I still want you to either participate in the chat or at least You know think about this question. Have you ever used? One of these tools angular web pack ESLint prettier parcel view typescript Babel Because in the next 30 minutes, I want to talk to you about What these tools have in common what they're using under the hood? Why these concepts are useful for you to know? Even if you're not trying to build the next ESLint web pack or similar tool and Also, why it's just generally useful to have this in your tool belt Before we start and I want to briefly introduce myself. My name is Dominic I work as a developer evangelist at a company called Twilio if you haven't heard of Twilio where cloud communications platform meaning we have apis and SDKs to allow you to integrate different means of communication such as voice video SMS email to fact authentication or similar into your applications My pronouns are he and him and you can find me anywhere on the internet under the condo So if you're free to send me an email or a DM on Twitter So I want you to think about how do we modify and find code on a regular basis? Typically the way at least it works for me if I'm trying to find something in my project And I'm trying to batch fix it is the first thing I'm going to try is command or control F That works pretty well if you're trying to find like Want a variable once the moment you're trying to use it more often and you're trying to replace it in a bunch of files It gets it gets more tricky and you're starting to Try to fiddle around with settings and hoping that you're not modifying too much or too little and then your tests Hopefully will catch any things you might be breaking and so that can be incredibly frustrating especially if you have to do the same thing multiple times and so the way You might resort to solving that at least it personally is to go the route of writing a bash script or something similar That uses tools like grab or sad to fix things on a regular basis But that means you need to use regular expressions and similar things to kind of Duct tape things together and at least I don't feel that confident with regular expressions as much as I use them in the past There's always things where it was either too flexible or too strict and I kept fiddling around with it And it wasn't catching the right things and that means we're building tools that are unreliable and hard to maintain To deal with our code bases, and that's not a great feeling at least for me personally But that also gets me back to the tools that we use on a regular basis because all of these have something in common with Find and replace and that is they take some code Run some magic on it and produce some output And I want to talk to you about exactly that said magic and in order to talk to you about that magic We first need to talk about how the compiler works because effectively all of these tools use the same concept under the hood If you've never worked with a compile or never looked at how a compiler works under the hood That's okay. We'll still cover it step by step. So you should be able to keep up The first thing here is we have some input in this case for example new symbol one and The compiler will perform what is called a lexical analysis on that that means it will turn that Text into a set of tokens so a list of tokens Representing the different parts of the code that we have you can imagine this being the equivalent of taking a text written in English And then turning it into a list of words and punctuations and then the next step is to perform syntax analysis on this so that means we're taking the tokens that we already have and Creating some structure around them in the shape of an AST so an abstract syntax tree That represents how the program is actually structured so imagine this as if you're turning words and punctuation into Sentences representing as part of paragraphs and those sentences contain information of What what part is the subject? What part is a verb? You know and giving all of that detailed information rather than just having a list of words and punctuations And then you can perform actions on those ASTs But then the last step is co-generation so that means we're taking the AST and we're turning that into some output So this can be machine code. This can be another programming language But generally we're turning it into some sort of output So now that we talked about how those general steps work in In the grander scheme of things let's dive into each of these steps a bit more into detail So first step is lexical analysis. You might also hear about it as tokenization So that means we're taking some example code and I don't want to use the classic hello world here so instead I'm going to use a function called isPanda because I love pandas and It receives one argument in this case a string with a panda emoji in it And we want to turn this into tokens so you could build your own tokenizer But there's plenty of tokenizers especially for JavaScript that you can use in my case I'm going to use as Prima and I'm going to call tokenize on that and that will give us a list of tokens So that will look roughly like this I dropped a couple of pieces of information here to fit it on the slide like what are the line numbers and Things like that, but roughly this is what you would get four tokens one is an identifier is panda two punctuators an opening and a closing parenthesis and in between it is a string that has the string value of a panda now this information if you've ever written a Code editor theme or generally like a syntax highlighting theme You might be familiar with these words of identifier punctuator string because that's typically what is being used in real real practical examples such as syntax highlighting So most code editors use tokenization at some point to do some syntax highlighting Some do additional work on top of that for more advanced highlighting But under the hood at least the basic syntax highlighting is often done by tokenization The token formats might differ though. So for example VS code did some extra work to do very to do important and performance improvements on These syntax highlighting and tokenization part of things and if you want to read about that you can check out this blog post But once we have those tokens the next step would be to turn these tokens into an abstract syntax tree So that is the syntax parsing side of things So we take this list of tokens or list of objects and we turn them into a nested object that represents of a tree of our Program is structured so at the top we have a program and then that has a body which In this case only has one expression statement Which has a call expression in it that consists out of a call E Which is what is being called and a set of arguments in this case only one argument, which is a literal of a panda emoji The format that you've seen there is called es tree It's a spec that was derived of the ast format that Mozilla uses internally in their spider monkey engine But it has been Altered and extended from there. So if you're using tools like a spremar or a babel they use derivatives offset spec but with some additions that help them with their functionality and goals And also Facebook and others have extended this to add functionality like JSX and others into the language to give you some practical examples of why you would want to have an ast in the first place is To visit the different nodes to do things such as linting So if you're if you build a linter or use a linter such as ESLint, TSLint or JSHint You would want to do analysis often not just purely on the format of your code But you want more detailed rules that could for example be around How you name your classes and rather than having to look through an entire array to try to find the class Keywords somewhere and then see what's following instead. You could navigate a tree and look for class decorators class declarations and then See how they fit into the grander scheme so you can do much more advanced linting Another example that sort of fits into that sound same realm is code analysis. So for example, Angela Uses this to build a language service that you can use in your code editor to get better autocomplete The way they're doing that is by turning their HTML templates into ASTs and then being able to tell you based on where your cursor is and how your template is connected to your To your component what things are available and improve your autocomplete there Another example is bundling. So if you've used a bundler like webpack or parcel before those are using ASTs under the hood Because while bundling in the past might have been just concatenating JavaScript files these days in the world of Components and modules and things like that You need to understand how things fit together and you can do one on the one hand optimizations on this, but you can also better, you know fit things together without Adding them in the wrong order and things like that. And so that is exactly where ASTs come into place The nice thing though about ASTs is that they are not only helpful when you're traversing the code But they're also incredibly helpful for modifying themselves because if you've ever had to modify a Text or a list versus modifying an object you might know that it's easier to modify the object Especially if you have to take things out in the middle of a program in the middle of an object Versus taking things out of the middle of an array or the middle of a text So let's look at our AST again And in this case we can actually do an optimization here that would be a good use for modifying an AST So if we look at the expression statement, we have one expression in it. It's a call expression and it Really we can see that if you call isPanda and you give it the value of a Panda mode It will always be true so we can actually do an improvement here and Take out that expression and replace it with just a literal that has the value true in it and therefore Significantly shrinking our code and if you would do that same execution across your entire code base, you know You would eliminate a bunch of code So there is a bunch of practical examples But the one that might be most most familiar to you if you're writing JavaScript these days is Babel so Babel actually uses that concept in all of their transformations whether you're down transpiling or you're Using some of the additional plugins that people have built if you're not familiar with Babel it's a down transpiler that allows you to take code that you might have used in a Written in more modern JavaScript and transpile it to ES5 or ES3 compatible JavaScript or compatible to exactly the browsers that you're targeting But you can also do things like extending a language so as I said JSX does this where really we're just having a AST that has these JSX tags in it and then we modify it to turn those JSX tags Into function calls because that's really how JSX is implemented is All of these tags are being turned into function calls to things like react.createElement or Preact H function and then the third example is Language transpiling so that means we're taking a language such as TypeScript and We're removing all of the TypeScript specific syntax so we can turn it into some JavaScript code That we can then execute now obviously TypeScript does a bunch of other things as well But that's for example how the Babel transform for TypeScript works The one that I thought was really cool is code coverage if you've used code coverage in the past such as Istanbul for example you might have Wondered how it actually works under the hood how it plugs into things and the way it does it we can actually look under the hood and We can do that using the NYC CLI from Istanbul by calling NYC instrument giving it a giving it a file And then giving it an output directory and looking at the output So let's say we have a sumJS file that we want to unit test and want to see how good our test coverage is We have this function in it called sum that takes two parameters and adds it up and then we export that function Now if we run NYC instrument on this This is the code that we get out of it We have a variable declaration at the top that we'll look at in a second and then we have Three added counter incrementers added to it So we have two that end with s0 and s1 and one with f0 and those are function and statement counters And basically that means if we're running our unit test against this file instead We're able to actually tell How often functions have been called how often statements have been called because we can then look into that Object that is at the top and there's a coverage data object as part of this that contains all of these counters For each of those segments for each of these statements functions and branches And also it contains a statement map a function map and a branch map So that later we can look at what are the statements functions and branches that have zero counters and then Inform the user exactly which ones haven't been called in where they are Now if you want to learn more about the code coverage aspect There's a great blog post that I recommend recommend you to read that dives a bit more into that ast transformation Another example is running other languages and browsers. So I'm not talking about web assembly here But instead I'm talking about a pretty concrete example of code combat so I worked on code combat a couple of years ago as a Google summer of code and One thing they do is they aim at teaching kids how to code using different programming languages such as Python or JavaScript or coffee script or Lua and They don't they execute all of the code inside the browser But they're doing that without shipping a an entire Python runtime or an entire coffee script runtime Instead they use a tool called Esper that they built that takes some Python code for example on the left and converts that into an ast that looks similar to the JavaScript one and Then modify that that ast to fix function calls that might be different in JavaScript so for example, we have a Print function call here, which the equivalent will be console logs on the right side in the ast you can actually see there's a call expression to console log and and It also gives you some additional information so that they can do frame execution and kind of like Steps through things without having to teach kids how a debugger works But all of that is powered by ast and that's super powerful. It's not perform it necessarily You're not you wouldn't want to do this to execute production code in the browser But I still think it's a very cool application Once you've modified your ast the last step is rendering your ast So that means we're turning it into output So we have for example our optimized ast here now and we want to turn that into some code So we'll use a library such as ES code gen that knows how to turn an ast into some code and Then we just get the statement true because that's how we optimized it There's some really practical examples that I personally love using one of them is called prettier If you haven't used prettier it's a code formatting tool if you've used prettier You might have been as excited about as I am and the reason why it works so well is Because it takes your code converts it into an ast and then reprints it based on its own instructions Which means that it doesn't really care in which way you wrote your initial code or it doesn't screw Up your code by writing it in the wrong way Doesn't just randomly insert new lines instead it fully understand your code and just prints it based on the rules you gave it and so that's really cool and Minification works in a very similar way like it's just taking an ast and they might do some modifications on the ast to change Variable names or similar things But then it just prints it out into a minified way Rather than just removing white spaces because that could cause trouble in some situations But let's say you're not planning to build the next web pack or be able or similar why would you care about this? One of the good examples that I think we cannot we might all be able to leverage or be able to transform plugins You don't necessarily have to build one for a new cutting edge feature Those can be useful for just improving your own day-to-day developer experience Without having to use grab or said and without having to use regular expressions or at least less regular expressions A good similar example is what react it react has a project called rec code mod That is powered by another tool that Facebook built called just code shift which works similar to Babel in terms of like building transform plugins but they have an entire library of different scripts that you can run to Update your code based for the latest react APIs and similar things And so that is super useful because if you feel like you're doing something on a regular basis Or you want to have certain macros as part of your code base, you know as your code base is growing These tools can be hugely useful for you Um, and so I figured rather than talking More about this we would actually build a plugin together So this is a website called ast explorer that I personally love using to understand how ast's work and play around with them So let's clean up some things here. Um, the top left corner is Where we put our input code the bottom right corner is where our output code is the top right is Where we see the ast and then on the bottom left, we can change or build our transform plugin We chose here already the baby babylon 7 Parcer that's the one that Babel uses and then we have Babel v7 Here as the transform you can for example choose JS code shift here and so If you're a JavaScript developer like me chances are high you use console log to You know debug your things but I was thinking the other day of like the console log is sort of like the general debugging Way, then I would say breakpoints is probably alerting so let's Use this alert statement here. Sorry for that And What we want to do is we want to make sure we're not actually shipping any Alert statements to our customers because why they might be useful for us during development We definitely don't want them to break the experience for customers So we're gonna build a Babel plugin that will remove or change all alert statements to console errors So we don't have to worry about them and they just pop up in the logs The way we do this is by defining a visitor here so visitors are Different functions based on different types of notes that Babel might encounter so that we can tell it What to do during those so we actually click on alert here? We can see it's an identifier But it's wrapped in a call expression and this call expression is really what we're interested in and so we're gonna create a visitor here called call expression and It's a function that receives a path every time it was encountered. That's what we get And so the first thing we want to do here is let's just rename Name this so path that node and then if we go here the callee is what we want to change So we're gonna create a new identifier here. I'm just gonna call this console error and Now we can see on the right here that we have console error Apply to both of these not just cons not just to alert and that means we need to first filter What we're actually doing so I'm gonna actually filter for I'm gonna do sort of like what is called an early exit. So I'm gonna check if We have an identifier as Part of the callee and has a name of alert And because I said we're gonna do an early exit. I actually want to return if this is not the case So if we don't have an identifier with the alert statement I forgot a closing parentheses is here Then we want to change it to console error. So now you can see console log state, but console error didn't Alert didn't say change it to console error And this works even if we have this insider function, for example, so if we do alert here We can see it's still adjusting that so that's great But and similarly, you know, if we would break this up over multiple lines, this is still working We didn't have to work around, you know any formatting things like you would have to when you write a regular expression, for example Now the problem though comes when we do things such as For example redefining a local function here that Just throws an error for example, or let's do return alert So this this is just a local function that we defined meaning that this alert here is actually actually Right, like we want to keep this and not replace this with console error So the cool thing with tools like Babel and others that are ast part is we have much more understanding of the code This will be really hard to do with just regular expressions and token tokenization, for example, but In the case of Babel we have access to the scope and that means we can check for bindings Which means is there a binding for the word alert? Here and if there is we just want to return so that means The alert inside this run function is never actually changed while the global one is still being changed We can take this one step further I don't like the fact that you know, we still have an alert here and that could cause some confusion In the output. So what we're going to do is we actually going to Rename inside the scope We're going to rename alert And what this is going to do is it's going to change alert to a unique name In this case, that still is as close to alert as possible. So it starts by putting an underscore there but if we would for example define a Variable here now called alert underscore alert It would change it to alert two and then similarly if we would have another one here alert two It would change it to alert three and similar It will always try to avoid a clash in the in the respective scope in which we rename this so that's super useful because We don't have to worry about creating clashing variable names as we're changing our code here Similarly, you know, we can for example modify the code entirely if we see the global ones and we just remove the node There's a lot of different things you can do you can insert other things you can modify Things based on various different rules if you want to learn all of the different things you can do with it I would recommend you to check out the Babel plug-in handbook because as much as it hasn't been edited for a while There's a lot of really useful Information in there in terms of what you can do how you can replace, you know one node with multiple nodes how you can Replace it with a source string if you don't want to build an entire Entire tree just by using additional nodes and generating those So there's a lot of things you can do and I recommend you to check out that handbook if you're interested in Exploring more things you can do with it All right, let's get back into the slides to wrap things up So in summary, there's a few things I want you to take away one you already use tokens and ASTs daily They're in all of the tools basically that make your life easier Tokens represent format, you know, they're representing how your code is written and ASTs represent the structure They don't care if you used curly braces around your if statement or if you used to wrote it in one line That also makes ASTs easier to manipulate because you do not have to worry about what parts are code syntax and what parts are actual statements and it makes it safer for cold alterations because you can just take out a node and Put a new node in rather than having to see if you're removing the right amount of elements from an array or Removing the right parts of a string Yes tree is there for interoperability. So a lot of different tools use yes tree as the foundation of their AST And that means you don't have to reinvent the wheel There's a lot of tools to create ASTs a lot of tools to walk ASTs and a lot of tools to render ASTs and you can use all of these different tools to build either your own Tools that you might share with the community or just internal things that are going to be useful for you as you're building your apps Because that means you can replace things like grab or said when you're modifying your code If you want to learn more about ASTs I would recommend you to check out the AST Explorer and just play around with it You know put some code in try to figure out what you're trying to change Click on things because it will highlight the different parts of the AST Play around with different parsers and see what kind of information they give you. There's a lot of fun things to do And then read the Babel plug-in handbook Even if you're not planning to build your own Babel plug-in if you want to learn more about ASTs. It's a it's a great thing to to read The nice thing is also all of the things I showed you are open source Which means that now that you know what ASTs are you can look for them in the different tools that you're using or Different code bases to see what's actually happening under the hood I also wrote a blog post about this topic if you want to rather read up on this In in the shape of a blog post it covers a lot of the same things that I covered in this talk But if you prefer reading it, that's a good format And with that if you want to check out the slides I uploaded them on the on this URL at the left and I'll also tweet about them later if you have any questions feel free to reach out to me as I said You can reach me anywhere on the internet at the Kundal So feel free to send me an email or send me a DM on Twitter with that I'd like to thank you all for your attention and have a great day