 So, thank you to be here, I'm Federico Tomassetti, and I build parser and languages for a living. And today I'm going to talk about Java parser. You can use to generate, analyze and refactor Java code. And we also have the maintainer on the first line. So if you have very difficult question, ask him. And what it is that Java parser is a set of libraries that you can use for all sort of things related to code. So you can use it for code generation, code analysis and also code refactoring. And we have a growing user base, but we have problems getting feedback from our users to understand on which kind of usages we should focus on. So the idea is that we are trying to talk with the community and get feedback. So if you have ideas, if you have suggestion, please talk to us here on GitHub. Please silence all of your phones. Yeah. And, okay, so what Java parser does? So surprise, surprise, Java parser is a parser. So it means that you can throw it some code, like in a fire, in a string, whatever. And it give you back an abstract syntax tree. So probably no one is surprised. But it can also do the opposite. So you can give it some abstract syntax tree and it gives you back code. Now, if you combine these two different features, it means that you can start with some code, get the abstract syntax tree, work on it, like changing it, and then go back to code. So you can do refactoring of existing code. Now, a parser is not enough to solve complex problem. And this is why we have also a symbol solver. And here there are some examples of why you need this kind of things. Now, if you look at these two statements, the abstract syntax tree for these two statements looks exactly the same because basically you are assigning the value one to something that is called foo. But the abstract syntax tree doesn't know what foo is. They have no idea that in this case this is a parameter and in this case this is a field. So they are two different foo and these two statements do different things. Or, for example, if you have two separate method calls that call two different methods, two methods that happen to have the same name but they are different, the abstract syntax tree looks exactly the same. So you need some additional logic, a symbol resolution, to distinguish these two cases. Or finally, you could have the same problems with types. Here I'm instantiating two different classes, an internal class or a local class, but the statement per se and the corresponding abstract syntax tree is exactly the same. So what the symbol solver does, well, you parse a piece of code, you get the abstract syntax tree and is able to associate the reference with the corresponding declaration. Like here I'm referring to type D, the symbol solver will be able to examine this reference and figure out that this referred to the class D in the same package somewhere else in another file. The way the integration of the type solver with Java parser works in this way. You basically take nodes of the abstract syntax tree and you have some methods that require symbol resolution and some methods that work all the time. For example, if I have a node of the abstract syntax tree that is representing a method call, I can always get the name because the name is in the code itself. So I have this information directly in the abstract syntax tree. So this works also if the Java symbol solver is not enabled. Well, if you want, stay closer to your laptop. Okay. While if you want to do something more advanced, like calculate the result type resulting of the expression, now you need to use the Java symbol solver because to understand the resulting type of the method call, you need to find the definition of the method that you are calling and looking at the returning type over there. If you want to use a symbol resolution, basically you need to give it one information that is where to look for classes. So you can say to it to look into some source directories or some jar files or for classes from the standard library you may want to use the reflection. Now this is a briefest course of the main features. Now let's look at some use cases. The first use case is supporting code generation. People may want to generate code because they have to write a lot of boilerplate code. Let's imagine you have to write one class for each of your database table, something pretty bold, or maybe you want to buy to build some transpiler. For example, I build DSLs for a living, so I build these higher level languages and then translate to Java. I could use the Java parser to generate the Java code. The way you can use Java parser for this goal is pretty simple. You create a compilation unit that represents an entire file. You set the package, declaration, then for example you create a class inside this file. You add a couple of fields to this class. Then if you want to add a constructor, you have to specify its public. It takes a couple of parameters and then basically you assign these parameters to the corresponding fields. So it's pretty easy, maybe a bit long to write, but pretty easy. And then maybe you add a couple of getters. So again, you create a method, for example, name, get title, and this method will contain just one return statement, so you just return the value of the field. Then in the end you use two string on the wall compilation unit and you get your source code. So this use case is very simple. You just use the API to build an abstract syntax tree called two string and you get the code, so there is not much to say about this. There are more advanced usages, like for example code analysis. You may want to do this because you want to calculate some metrics on your code or you want to ensure that some quality standards are met or maybe you want to do some simple one-off queries, maybe to familiarize with your code base. I have a few examples on this. Suppose that you want to figure out how many methods take more than three parameters in your libraries. So I have parsed all the source code, so I have a bunch of compilation units that are containing all CUs, this list, and then I am using these helper methods to find among all these compilation units all the method declaration. Once I have this list of method declaration, I just start a stream, filter them by the number of parameters and then print the result. I have run these queries on a simple project called Amcrust and in this case I can figure out that in Amcrust there are eleven methods taking more than three parameters. So maybe I want to take a look at them and refactor them. Another query that is a bit more complex, maybe I want to figure out three top classes with most methods. So again, I take all the class or interface declaration. Now, in Java parser we are one type of node that could represent classes or interface and we distinguish it by looking at flag. So in this case we take all the nodes of this type, we throw away the interfaces so we have just the classes and we look at the number of methods, we sort them in the sending order and we take the first three. And so in this case we figure out that core matches have 35 methods, the description has 13 and is equal to S9. Now, in this case we didn't use symbol resolution. So these get methods means only the methods that are declared on the class itself. We are not considering the methods that were in-array because to do that we will need to jump to the parent, jump to the implemented interface and look over there. So we use instead symbol resolution in this example. In this case we are asking, what is the class with most ancestors? So most super classes and interfaces that were extended or implemented recursively. So in this case I take again all the classes, then a core result to get the version of the class declaration. We told this information results with the reference to the parent so that I can jump in that declaration and summon it. In this case I sort by the number of all ancestors, again in the sending order. And for example I can figure out the string contains is the one with more ancestor because it's a standard or implements a bunch of classes. So this is just to give you a rough idea of the kind of queries you can write. Now, another use case that I'm very interested into is code refactoring. The idea is that you could have to modernize a large code base, like you inherit a project written with Java 1.4 and you want to start taking advantage of strings or maybe just for each loop or stuff like that, or maybe you want to update some dependencies. So the API of the library you're using are changed. You want to refactor this automatically because maybe you're using it in one gazillion of places in your code. Or perhaps you want to change some usage patterns. Maybe you're using the singleton pattern somewhere and you want to initialize all these fields in a lazy way. You can do this refactoring automatically. So just to give you an idea, I consider this very simple example. The idea is that you're using a library with a method that is called old method. So it's a very suspicious name for a method. Come up the new version of the library, old method is gone, and now you should use new method instead. And the new method state three parameters while old method used to take two. And this new method take the first two parameters are the same I used to take with old methods. We just need to invert them. And then I need to pass a Boolean as a third parameter that should be true if I want to get the same behavior I used to get with old method. So how I implement this stuff? Well, I find all the method calls, then I resolve the invoke method so I figure out which method is actually called. I look at this signature and I see that it matched the signature I'm interested in. This is important because maybe I have calls to another old method that is not the one that's changed and I want to be sure to just change the correct old method. So I find all these method calls and for each of them I replace the call with a new method call that I will show in the next slide. And this is how I build a new method call. I just create a new method call expression node. I reuse the old scope. The scope is the stuff that comes before the dot. So before I was calling full dot old method now I want to call full dot new method. Once I've done that, well, of course I set the correct name for the method call. Then I add the first two parameters. There are the same parameters I had before but inverted. And then I pass a Boolean literal with value true and I'm done. So with these few lines I will be able to refactor this call to use the new method and you can imagine that if I had a ton of calls to the old method I would save a lot of time and I would be sure to avoid any error because with this kind of boring repetitive task it's very easy to get things wrong. So very quickly some other features we have in Java parser that we think are interesting is comment attribution. So we are able to understand to which piece of code the comments refer to. And this is important because sometimes you want to delete a statement. Probably you want to delete the associated comment. Or if you take a statement and want to connect to a box that is very important that maybe I will do that later. Okay, if you want to move the statement you want to move the associated comment with it. So these kind of things are useful. Another feature that has some bugs is lessical preservation. So if you start parsing a piece of code and you change the abstract syntax tree when you get the code back you would like to get the original code with some adjustment preserving the original layout. So imagine that you have a declaration of three variables on one line and you parse it, you get the abstract syntax tree you navigate and you remove B. Now you need to adapt the code by removing B but also one of the comments otherwise the code doesn't make sense. Or if you are parsing a method that for some reason has a very weird indentation and you add the new statements in this method you want to preserve the same style and so add the new statement with the same indentation as the statement above. So there are a lot of things to consider to get lessical preservation right and so we are thinking it's an important feature to have and not easy to replicate for our competitors. Now there are other things that we are planning for the future and probably we are not going to read this but it doesn't matter because this is just to show an example of a matcher library we are experimenting with and this piece of code is able to identify properties in Java Beam so it's able to find the triplets of a field declaration, a getter and a setter and return you all the matches giving you the name of the property and the type of the property. So this kind of library could be very useful to recognize patterns and code. Another things that we are considering working on is Java templates. So the idea is that you could define a piece of Java code with some place older and we could be able to validate syntactically these templates. For example, the first example is correct while the second is wrong because you cannot use an expression there. So the idea is that this way you can build what you can have templates that you can validate before using them. The final use case that we would love to support is making Java extensible. This is because of some discussion we have with some researcher and with a very large company involved with open source and what they need to do is taking Java and add new statements to it. And the problem is that currently the only way they can do that is taking an original compiler or a tool like Java parts and fork it because there is no way to allow for extension of the grammar itself. So we are looking for better ideas and suggestions and if you have ideas feel free to share them with us. And this is the last slide. I wanted just to share this book that we have been writing that is available for free. I mean it's possible to do a donation but it's possible to take it for free. And we had over 1,000 readers of this book and so it's also a signal that there is some interest in the project and I encourage you to look for the project on GitHub and open an issue if there is something that you would like to improve. So thank you for your attention. I have two questions. The first one is to pass two source files and do a diff between the ASTs so let's see what changes. Look at Git comments and I want to scan one or the other and then see what changes. I should repeat the question. It's possible to pass two source files and calculate the difference automatically. The answer is no. I can pass them, get two UPSAs but we don't have any mechanism to do the diff automatically. But it would be an interesting use case. My second question would be can you pass partially broken files that are not compilable? We have some error tolerance. We support passing single pieces of the code so you can pass a single statement, a single expression, this is done. We have some error tolerance so the first syntactical error shouldn't just explode but something that could be improved. Do you handle generics or lambdas and maybe dynamic code? We... Do we handle generics, lambdas and metadata? Dynamic codes. At the parsing level, yes, 100% at the symbol resolution level we kind of support it. You are not scared by a few bugs here and there, right? If you are not scared then we support it, yes. So you have a parser for generics? Yeah, we parse it. That's easy. Do a bold parsing then. Symbol resolution is very complex with lambdas and generics. Yeah. Time is up.