 to the compiler, specifically so that I can tell you about how you can use the compiler to make even better tools. So why do I want to do this? Well, I think tools are really important. They're really important for any language's ecosystem because why do you use a new language? Well, you can use it because it's cool and it's fun and it's something that's new. But after a while, you get down to the essentials and you use the language because it makes you more productive. How productive a language is, it's not just about the language itself and it's the libraries that it has and the ecosystems and so forth. It's the tools really help. You can have the best language in the world, but if you have to compile it by hand, you're going to be really slow writing it. So good tools can make a good language, super productive, lots of people will want to use it and that's great. So Rust has some good tools at the moment. Things like cargo, debugger support, these are great, but I think there's scope for making lots more tools and making some really good ones and I think it's just a really interesting thing to hack on and I think it's somewhere where you can have a really big impact on a lot of people's lives by doing this work. So to start with, I want to give you a little bit of a demo of a few tools. This is kind of going to be fun after switch to a different window and a totally different resolution and so forth. So the first tool I want to demonstrate is Rust Format. So here's some code, it's perfectly good code, but it's kind of ugly. There's a few more new lines that you might like. This is kind of confusing having it all on one line. I can't read that very well. So let's run Rust Format, it runs, hey, now it's beautiful. I could have done that by hand, but it's kind of boring and I probably would have missed a bit and that's no fun at all. So as we have a tool, it does it for you, it's fantastic. The next tool is a lint, it's an extra lint, there are some lints in the compiler, but there are some, you can have extra lints and a lint, it's kind of a check, like the checks the compiler does, but it's not essential for kind of the correctness of your program, the soundness of your language, but it might point out where you've done something that might be right, but it's probably a mistake or maybe it's not a mistake, but you could have done it better. So the lint I'm going to show you, it's part of the Rust Clippy, which is like a big suite of lints, and this one, it's going to look at this if statement. This is kind of a typical beginner's mistake, you might say it's like new programmers might do this a lot. It's not necessary, right? Like it's perfectly legal Rust code, but you don't really need it. And we've plugged in the lint, sorry to show you, that means we've plugged in the suite of lints. So when I build this, it's going to tell me, I don't need that. I can just use the predicate, I don't need the whole if statement. I'm going to apologize for this resolution, this is really, okay, that's too much, okay. So final tool I want to show you just now is a tool called DXR, and this is a tool for navigating your code, for searching your code, but we can do a much better job than you can with, say, Grap, or searching on GitHub or whatever. So, okay, I'm going to just search for some string, let's search for this, it's just going to find all the uses, okay, not very impressive, I can do the song with Grap or something, right? But why don't I find all the implementations of a trait, well, here are the three implementations, okay, let's go for one of these, let's find something interesting down here, well, what's this search, well, let's hover over and it will tell me the type, let's kind of jump to the definition of that, and it's going to find, yeah, the highlighting seems to be broken at the moment, that's fine, but it's gone to the right place, right, line 1048, there it's going to show the field search, so I can just jump to definitions like this, so hopefully this is something that's going to make understanding a new code base or a code base, a large code base that you use with often much more, much easier, okay, so hopefully you can whet your appetite a bit about tools, but I promised you an introduction to the compiler and I'm going to go and do that right now, it's going to be kind of high level, the compiler is a big topic, I can't cover it much, so if you've already kind of hacked on the compiler, you're going to find this very dull and I don't know what to do with it. So at the highest level, a compiler is just a bit of software which takes your source code and translates it to machine code, it's just code that can be executed on your machine. Of course, being rust, most of the time it translates it into a whole bunch of borough check errors, but sometimes you'll get machine code. So the rust compiler and most compilers at a very high level, we can break into these three phases. First of all is parsing and then we have the expansion, this is basically how you come from source code that you guys understand into representation that the rest of the compiler can understand. Then we have the analysis phase where, and this is kind of like the bulk of the compiler, this analyzes the code, builds up a whole bunch of information about it and checks it for errors and gives you any of these errors that happen. Then finally, code generation uses the information we built up during analysis and generates that machine code that I was talking about earlier. It's going to drill down a bit into each of these phases, starting with parsing and expansion. So parsing, like I say, you start with the source code and then we gradually refine the information we have about that source code with more and more information until we end up with what's on the right side of the slide. We call it abstract syntax tree, the AST. This is a representation of the source code just in a tree format with the crate root at the top and every item, every expression and sub-expression is a node in that tree. Information about each node is there such as the name of the function and then we have children nodes for things like the arguments and so forth. This is the representations used through the rest of the compiler to compile. So the expansion bit of parsing and expansion is about manipulating that AST that we got from your source code. So for example, if you've got these configuration attributes, then we can, you know, if you don't make the predicate in this CFG attribute, then we're just going to warp off a whole sub-tree of the AST and ignore it forever more. If you've got macros in your code, then we're going to take the bit of the AST, which is the use of macro, and we're going to expand that using the definition of the macro. Likewise with procedural macros or syntax extensions, whatever you want to call them. And there's also some desugaring of language constructs where we translate kind of at the AST level from some kind of higher level concepts into more primitive ones. So for example, if let, we translate from an AST node that looks like an if let expression into a match expression. And there's a few other places where we do that. Okay, analysis. The like analysis is where the compiler kind of gets somewhat complicated. And so I'm going to just kind of like highlight some of the more interesting parts of analysis. First of all is name resolution. So we have this, we've got this AST, but we've still just got like chunks of text. You remember we had the name of that function, but what does that really mean? Well, name resolution is about matching the uses of names with the declarations of names. So for example, like we use bar on the second line here. So we match that with where we first declared bar. And this gets a bit more complicated because we've got scopes, we've got use statements, import, and so forth. But this is essentially all name resolution does. And this happens very early on in the analysis phase. After that, type checking is another kind of important step in the analysis phase. And type checking is about checking the way you expect to get some type. You actually do in fact, you will in fact get a value of that type. So if you expect, if you write a function that takes a string, you're going to actually only pass strings to that function. And type checking includes inference. And what that means is that for every expression, every sub-expression in Rust, we infer a type for that sub-expression, even when these have not been annotated, which we don't do in Rust very often, luckily. Part of type checking is trait resolution. Probably one of the more interesting parts of type checking one and something that's quite unique to the Rust compiler versus a lot of other languages. Trait resolution essentially is about answering the question, if I have a method call, like I want to call foo here, what's the actual concrete implementation of foo that gets called? And the reason this is different from name resolution, which I talked about earlier, is that this doesn't just depend on kind of like scopes and imports and stuff. This actually depends on the type of X, which we only know about during type checking. So for example, if we know like the fully concrete type of X, then we might be able to just find exactly like a concrete implementation, and we can just make a really simple static call to this implementation of foo right here. More interestingly, if we only have a bound on the type of X, so for example, we have a trait, and if you hopefully remember, like a trait is just like an interface, it usually doesn't define any kind of implementation of any of these methods, then at this time in the analysis phase in the compiler, then we can't actually decide what, which implementation is going to be used, only the more abstract declaration of the method. And so we kind of like remember this stuff, and later on in the cogeneration phase, then we'll be able to work it out. Similarly, if you've got a trait object, again, you only know the trait, and therefore only the kind of the abstract declaration of a method, but this time we're not going to know until runtime what we're going to call, and so we just have to generate code, a v-table, which allows us to at runtime dynamically dispatch to the right method. Okay, the final chunk of analysis that I kind of want to call attention to is borrower checking. This is really important because it ensures memory safety and rust, which is something we all care about, and is why you might be using rust in the first place. It's also really interesting subtle bit of coding, which I have no time to explain, so we just have to continue, like, just believe it's magic, and you'll be pretty close to the truth, right? Okay, code generation, final phase of the compiler. Most of code generation is actually handled by LLVM, so as far as we're concerned, code generation, we can think of this like there's a translation phase where we take the AST again and the information we acquired during analysis and we translate into a representation that LLVM wants, the LLVM IR, that's the LLVM dragon there, and then LLVM does all the rest of the hard work to turn out the nodes and ones that you can actually execute on the machine. And that's the compiler, right? So you all understand the compiler, I expect your PRs to start tomorrow. Yeah, so back to tools, and so I want to kind of like maybe classify, kind of like define what I mean by kind of tooling, and I want to really focus on what I kind of think of as productivity tools, and I mean that like in contrast to the essential build tools that you have to use to use Rust, such as the compiler, the linker, basically your toolchain. So productivity tools, for example, the debugger, so you could work without the debugger, right? You have a bug and you can solve this just by staring intently at your code or hitting your head on the keyboard, or like print statements, whatever, but a lot of the time using a debugger makes you much more efficient, you can solve your bugs much quicker, so it's a boost in productivity. Also the the links that I showed you earlier, this is another way to help debug your code. Another kind of tool are tools that help you understand the code, so I showed you DXR which helps you kind of navigate and search the code. Also in this category visualizations, if you want to visualize say the dependency graph of your program or the call graph, well the tools can help you do that and help you understand the code. And another category, kind of automating kind of trivial or tedious tasks, so the Rust format that I showed you, or refactoring tools, like these are all things that you can do yourself, but they're kind of boring and they take a long time and the machine can do them for you very quickly and usually just as well. So another way to kind of classify these tools is kind of how they're used or how they're implemented. So one thing you can do is you can extend the compiler, there's a few ways you can extend the Rust compiler. Adding a lint is one way that I showed you earlier, kind of more familiar for a lot of you will be adding syntax extension to extend the language at the syntax level. You can also kind of plug in extra llv and passes, you can extend the compiler. So at the limit you can fork the compiler, you can write your own code that kind of deeply integrates with it and produce a new tool this way. Alternatively you can make kind of a standalone tool. So it could be really standalone, so for example if you want to just count the number of lines of code in your project, you don't need to talk to the compiler, you just like open all the files and count the number of lines, right? Or sometimes you want to ask the compiler for some more output. So for example a debugger is going to ask the compiler to generate debug info as well as the executable. DXR asks for a whole bunch of metadata from the compiler and then these kind of tools can do some processing offline and then do their job whatever that job is. Also in this category and probably kind of like the least well known approach to implementing a tool is to kind of use parts of the compiler as a library. So we want information from the compiler, the compiler knows a lot about your program, right? And so we want to extract that information without having to compute it ourselves because there's no point in duplicating all this kind of subtle and complex machinery in the compiler. And I want to kind of demonstrate to you how easy this is, okay? And to do so I'm going to demonstrate or walk through the implementation of another tool which is for drawing a call graph. So it's going to draw a call graph kind of like this. So this is the redx crate and I'm going to zoom in and so each node in this graph is just a function or a method and we draw an arrow for every function call from the caller to the callee and the dotted arrows are virtual method calls where we don't know exactly what method is called while we're compiling but we know at least one of these methods is going to get called at runtime. So I hope that implementing this is going to be surprisingly easy for you, okay? Like this, it turns out that implementing kind of quite a nice kind of useful tool is actually quite easy. So show you how it's done, if I can remember. Okay, so how's that? Okay, so the very highest level, the way this tool works is we basically want to emulate what the compiler does. So we want to set up with all the kind of command line options and environment variables the compiler would usually run with. So we're running in the same kind of environment and then we want to do what the compiler does. We want to do the parsing expansion in the analysis phase just like the compiler does and then we want to stop and do our thing. Now you could like write all this code out to do it yourself but you know that would be a lot of work and you probably get it wrong. Luckily the compiler lets us kind of get it to do this stuff as a service for us and there's a little bit of code called the driver which coordinates kind of setting up the environment for compilation and then running each phase of compilation in turn and sending the right data between each phase and there's a few APIs you can use for interacting with this process. So at the highest level, we've got this, so there's the driver and we want to run the compiler. There's exactly what it says in the tin and args are just the command line arguments and this call graph calls is just the way that we interact at the top level with it. So the compiler has this trait compiler calls which we're going to implement and actually like I say we want to emulate the compiler. We want to do pretty much what the compiler does and so we don't actually need to override most of the functionality that that trait already does for us. The only thing we're going to do is override this builder method which, there it is, creates a compile controller which gives us more fine-grained over compilation. So I'm just going to show you here what we're going to do, what did I say we want to do, we're going to run up until the end of analysis and then we want to stop. So this is how we tell the compiler that we want to stop at this stage. We don't need to carry on to code generation and then we want to do our thing and so we get an opportunity to do our thing here where we get to set a callback that gets called after analysis and that callback the driver is going to send us as much state about the compiler's internals as it possibly can at that stage. What we're going to do, so again high level, what we want to do is we want to find every function that's defined like in the project that we're compiling and every function call and record some information about these things and because of the methods we're going to need to do a little bit of post-processing and then we're going to just dump it out in a format that GraphViz will understand and the way we're going to find every single function definition and every single function call is we're going to walk the AST that I talked about being the product of parsing and whenever we find a function call or function we're going to record it. So what this looks like, this is us walking the AST here I'm going to go into that in a second then we do our little bit of post-processing and then we dump out in a format that GraphViz can understand. So this is how we walk the AST. The lib syntax trait gives us a visitor trait that we can implement this is a classic visitor pattern which just walks down the AST and gives us an opportunity if we want at each node in the AST to do something and if not it just keeps on walking which is exactly what we want we want to walk the whole AST and just kind of stop on certain circumstances. So for example if we get to an expression then we might want to do something and in fact we want to check if it's a method call and if it is then we're going to record that and I'll admit at this stage that you kind of need to know a little bit about what the AST looks like to know what you want to do here but hopefully like understanding the AST is not not too bad you don't need to understand like how that AST like how source code is parsed to make that or how it's processed later in the compiler. So once we know we're at a node which is a method call how do we get the information out of the compiler? Well there's an API in the compiler called the save analysis API and it's called that because you can do dash, z, save analysis to get all the information it knows but we're not going to do that and we're going to say give me all the data you have about this expression so we pass the AST node to that and then it gives us back a whole bunch of data. We expect a method call data because we know it's a method call and we record that method call data for later and I'm just going to show you what that looks like this this data you can look this these are just structs it's not very interesting you get kind of integer IDs which you can cross reference you get strings for names and types and stuff it's kind of it's it doesn't expose the compiler's internals but it gives you hopefully kind of data that's relatively easy to work with and we're just going to record it we record all the static calls we record all the dynamic calls the the data tells us perhaps not particularly obviously but you can work out which one's which and then we do the post processing and then we kind of dump it out I mean graphbiz does its thing and you end up with a call graph okay so I don't expect you to follow that entirely but hopefully it gives you an idea that it's not rocket science to do this you don't need to understand too deeply the compiler's internals and this is kind of a useful tool you know it's not like it's not perfect it's kind of a toy but it's like something three four hundred lines of code so if you want to know more have a look through that I'll put my slides up and download so you can find the URLs for all this stuff so okay hopefully that's kind of got you a little bit excited about tooling and the things that are possible to do and rust and how we can make the world a better place with all these tools and you now want to get involved so I kind of selected a few tools that are around I'm sure they appreciate contributions I'd encourage you to think of tools that would help you day to day and think about you know create new tools and come up with exciting things and this would be fantastic and if you've got ideas you want to more help more information about how to use these the APIs I've talked about today how to interact with the compiler in general or good projects to get involved with anything please just ping me and whatever medium you prefer and hopefully I can help you out and thanks for staying awake till the end of my talk and that's all thanks