 Thank you. Hi everyone, so I'm Pierre-Marie and this is Raphael. We are both software engineers at AdaCore and today we'll talk about you, we'll talk to you about Lancet, which is about to bring source code analysis to the message. So, Lancet is a meta-compiler. Okay, what? Lancet, in this project, we created a DSL to make it easy to basically implement frontends, like compiler frontends. So, it's Lancet generates libraries, which can be used to be the first half of compilers that can also be used in debuggers to parse an expression and to evaluate it. For interactive code browsers, when you can click on an identifier and that leads you to the definition of this identifier. To write also static analyzers like linters or think of yeah, Selenkt ID or this kind of thing. And also we want to provide to make it easy to write refactoring tools, that for instance you ask on your whole code base, okay, please rename this type definition and all the usage of it. So, we created this originally because when in the AdaCore system we exactly need this kind of library to implement, to improve tooling in general. So, Lancet was for that, but it can be used for other languages as well. So, let's have a brief tour on the DSL and how you express things, how we can use Lancet actually. So, first of all, so, for now, the Lancet, it's a compiler, you feed it a Python-based DSL. So, you're not really writing Python, just using the Python syntax to express something else. One day we'll probably have a dedicated syntax. So, first of all, that should look familiar. The first, when you create a language, the first step is to create a lecture for it. So, you can split a stream into a sequence of tokens. So, the first thing to do is to actually define the list of tokens that your lecture is supposed to create. And then you provide rules that start from your geeks and compile into some automata to actually implement the lecture. So, this should be familiar if you have already written a specification for flex, for instance. It's the same mechanism. The next layer is to actually define the list of nodes, of three nodes that your passers are going to create. So, here you can see, well, using the Python syntax, you define a class hierarchy where a node can be extracted and so on. And each node can have a list of fields. So, for instance, if you have a function definition, you have a field that gives the name of the function, the list of arguments, the body of it, etc. So, you define the tree. And the next step, you just create, you write passing rules. So, like in Bazen, if you know it, or Gac, you write, so, passers in a recursive descent fashion to, well, you say, okay, this rule, if you find this token and this token, you will produce this node and this in a recursive fashion. So, and this is built, it is compiled into a passing library that is based on packets. So, Lancet will take all the definitions I showed and will compile that. In particular, it will, so, it will analyze the grammar. And from it, it will say, okay, we're creating this kind of node with this kind of field and this kind of field, so it will perform type inference. And if you remember here, so, I can define fields without providing them type. Well, the grammar will infer a type if they are not specified otherwise. You can specify them to be more explicit and, Lancet will check for consistency. Okay, so now we have all the, really, the bottom layer of your front end. So, you have lexing, passing. And the next step is semantic analysis. So, the first thing, so, semantic analysis is the part that will answer questions such as, okay, what is this identifier referring to? What is the type of this expression? And so on. So, the basis for that is what's called in literature scoping or lexical environments. So, here we provide, so, in AST nodes, we provide special annotations that we describe, okay. We have a function, okay, this function creates a scope and everything that will happen inside it will act on the scope, again in a recursive fashion. And so, these special annotations will enable you to create mappings, so, kind of dictionaries that map from identifiers to nodes. So, this is the first step. And then, you will implement semantic analysis using, well, you create, you add yet another kind of annotation on nodes. So, this kind of annotation, so, it's methods. We call them properties in linkage. The public one will actually implement the semantic analysis API that you want to provide to your users as a library author, such as here. So, we have a vertical property that will, well, on the variable reference, it will fetch the corresponding variable declaration. And you can also have private properties that will just be implementation details and users won't see that. And so, these properties, you write them using functional programming language that you created. So, this part here. So, here you take, for instance, the current node that is a variable reference. You take the environment that is associated to it and you perform a request giving its name. So, it's supposed to, is the variable definition in the scope, it will return it. So, here, we'll, we don't see it, but there are types in this, it's a functional language that is typed and we type inference. So, link it, when it compiles it, make sure that your expressions actually have a, are meaningful. And we reject incorrect ones. Okay, so, with just this base, we can already do passing analysis for a great deal of languages. And this is cool. So, all you have to do is to write all the properties you need to answer the request that you want users to, to have access to. But there is still a missing piece, which is, in ADI, we needed it, which is overload resolution, which is so, and type inference. These are problems that are difficult to solve just using a mere functional language because, for example, here, let's see, we have an example, well, we have several functions that are, that have names that are equal, but they accept arguments and have written types that are not exactly similar. So, for simple cases, such as this one or this one, well, here we have a call, a call to F1, so one of these two, that takes an int. So, an integral literal, so it has to be a call to the first one. So, here we have a floating point number, so it has to be this one. Here we call F2 with an integral literal and there are no F2 functions accepting integrals. So, this results to nowhere. And you have complex cases such as this one. So, we can see here that we may be able to solve with some fancy algorithm which F2 is called here. You see, it takes a character literal, but, okay, both F2 takes a character literal. So, here the difficult thing is that solving actually, doing the resolution of names is a problem that requires some non-local knowledge, resolving which function is called by, here by this F1 call. Depends, can depend on something that is quite far. So, yeah, it requires non-local knowledge. So, just to add something about that, this problem is Ada specific. In Ada, we have that. So, you can overload on the return type of a function. But in, for example, a functional programming language is such as OCaml, you have a non-local type inference and you have the same kinds of problem. So, what we are trying to do is have a general solution for these kinds of problems. And this general solution is, well, so, if we take the example that I just quoted earlier. So, this is the complex call that we want to solve. We can collect a set of constraints. So, here, so, let's, for the purpose of discussion, let's call this function called C, this function called B and this one A. So, we see that A takes a character argument. So, here we have this constraint. So, A, its first argument, well, its type must be a character. We pass the result of A2 as an argument of B. So, the return type of A must be the type of the first argument of B. The same applies for the autocoll. And because we're calling, because A is about calling F2, well, we see that A is one of these two. Yes, they are on the same line. One of these two declarations. Same goes for B and C. And so, you collect a set of constraints. And then, well, in Lancet, we provide, along with the functional language, a satisfactory, ah, what's the name? A solver. Yeah, a solver. This don't say satisfiability. Yeah, blah, blah, blah. So, something that takes a set of constraints and that gives you, hopefully, the unique solution to this problem. And so, using all these building blocks, well, you're supposed to already express, be able to express semantic analysis for quite a large corpus of languages. And I will never fail to speak about what's next. Here, here. Let's be quick. We don't have much time. Thank you. Okay. So, Pierre-Marie described to you the mechanism by which you describe your language. So, the language specification. And this is what you as a, like, if you want to write a new language front end, this is what you will do. But then, you will have users for your library. People who want to, for example, write free factorings for AIDA or whatever language. For the moment, we only have AIDA. So, AIDA or AIDA. But then, we expect to have a lot more. So, I'm going to talk about what we generate. So, if it's not obvious yet, because it's a bit complicated, the meta compiler is not a very simple project. This is the pipeline of Lancet. So, basically, you will provide your language specification. This is what Pierre-Marie has been talking about so far. And then, you feed that to Lancet. And it will generate a library, like something you can call from your code. And that will provide you knowledge. So, this library is in AIDA, a language that you may be never heard about. But fear not, because we provide a lot of bindings to other languages, like C, Python. And it's really easy to generate bindings for other languages. And on top of that, the hope, the whole project, is to be able, from this language specification, to generate generic tools that I'm going to get into. So, the base library is made in AIDA. So, it was an easy choice for us. I'm just going to point at the name of the company so that you understand AIDA core. Basically, we needed low-level language that can produce fast codes and that doesn't have memory management, policy built-in that would be complicated for us to handle in IDs. For example, we didn't want to ship a GC along with the libraries because if you need to have results fast, maybe it's not a very good ID, et cetera, et cetera. But we could have chosen another language, but AIDA was an obvious choice for us because we have a lot of AIDA knowledge, and it's right on target for this use. So, since not a lot of people use AIDA, we provide bindings to C and to Python. And Python is the de facto scripting language of the ecosystem. So, once you generated your library, you can write away, use it from Python, launching an interactive shell, and, for example, passing a source code and asking for knowledge about the code. And it's easy to generate bindings to new languages because Lonkit has the knowledge from the language specification about everything, the data types, the fields, the functions that are available so it can very easily generate bindings to other languages. So, an interesting point about Lonkit-generated libraries is that they are crafted for incremental analysis. The reason we express, for example, the properties in a functional language, and the reason everything is declarative is so that we have freedom in the implementation to do something that takes the semantic and generate an incremental analyzer for it. So, as little state as possible in the spec so that we can do an incremental analyzer. And for AIDA, what this means is that if you change a file, you don't need to re-compute every file that depends on it every time like most compiler would do. I know some compiler are more clever than that, but Gnat is not, for example, which is the main compiler for AIDA. So, this was very important for integration into IDEs for us. So, this is an example of something we did with LibAID along, which is an engine generated with Lonkit. And it's a static analyzer for AIDA. It's very small. This code is the whole static analyzer using the LibAID along library. What it does is something very simple. It checks for binary operators like addition, subtraction, and checks if syntactically the two operands are the same. So, it's a very simple static analysis check. And surprisingly, we run it on our code basis and we found a lot of bugs in our code basis despite extensive testing and other more powerful static analysis run on it. So, once you have a static, once you have a syntactic analyzer for your language, well, if you have that at your fingertips, you can do a lot of stuff that is really interesting. And this could be adapted for another Lonkit frontend. We have a Python parser, for example. And we have this one running on Python 2. So, it's very easy to adapt because the API is exactly the same. So, what we also generate automatically from your grammar definition and your tree definition is an unparser. So, we have a mean to parse code, but we also have a mean to take a tree and produce a new source file from it, not using any source information. So, this is useful because if you remember, Pierre-Marie told you that we want to be able to do refactorings with the libraries. So, it means that you can modify the tree and then you can call an unparse primitive and it will generate a new source. So, this is a very neat abstraction to build refactorings upon. So, it uses the grammar and the ST definition. You don't need to write anything else. It will generate the unparser automatically. And so, you can use it to do this kind of stuff. So, this is a native source code, okay? Simple little world program. Very simple. And then we want to change it, change the string literal to by world, but not using said or whatever, but using the tree directly. And so, this is an example program that you can do using the Ada front end. You will look for the call and then you will start a diff and rewrite the string literal and put by world instead of hello world. So, this is all done at a tree level, not at a source level. And then you apply the diff and you obtain the new program. So, this is very convenient. I don't know how much time I have left. Very little, actually. So, we ship a number of tools along with the libraries and the hope from the toolkit project is that the more we go along, the more generic tools we have that you can use. So, once you have a language specification, you have generic tools working on it. So, we have simple command line tools that you can use. Playground is interactive shell based on hyperson. Parse is just an AST dumper. But we have more interesting stuff, like a prototype for a generic code indenter, for example. So, this is for Ada. And what you would do is provide a declarative way of indenting your code for the tree. So, you say for a package declaration, for example, I want the public and the private part to be incremented by block rule, which says incremented by three columns. And then the engine will do that automatically. And if you have a new language front end, all you need to do is to write this map and you have an indenter for your language. Syntax highlighter could be produced along the same way. We have all the information we need to produce a syntax highlighter. And going this way, we could imagine at some point being able to provide a whole server for the language server protocol. So, you write your language specification, you write how to go to definition, autocompletion, et cetera. And it will generate the engine that does it and you can directly plug it into your editor. So, for the moment, this is not done at all. It's just a dream. But we have a prototype for it, which is just a NeoVim plugin, and that already does quite a lot of stuff like indentation, go to definition for Ada. And it does a subset of that for Python too, because we have a Python front end. So, we already have a few prototypes for front ends based on Lancet. So, we have Ada, the beta language that is the most complete. And we have a Python. So, for Python, we mostly have the parser. But we have a very simple model of how the scoping works. So, in Python, it's a dynamic language. You don't really have static scoping, but you can model it so that you can use it in IDE. So, we have that. And we played with other simple languages like JSON or Kconfig to prove that we can actually generate front ends for more languages. And for the moment, that's all. But we hope that raising awareness about the project, people will try to write new front ends, maybe, and tell us that it doesn't work at all and that we need to fix everything. So, thank you for listening. You can find the sources for Lancet on GitHub here. And we have a tutorial. So, if you want to write a new front end for Lancet, we already have a tutorial that you can find here. So, it's still a work in progress. APIs are moving. And for the moment, except the tutorial, we don't have much more doc. But we will gladly accept any feedback issues could request, etc. Thank you very much. Questions? Yes, go ahead. Do you plan on, for example, creating a code generation yourself so somebody can maybe use it to prototype a simple back end for his language or something like this? So, the question was, do you have plans to generate, to be able to generate a simple back end for the languages? I imagine you mean declaratively also. Yeah, of course. So, of course, you can write back end using the front end for the moment, you know, but that's not what you're talking about. We had some ideas about that. It's really not our target at work, but nothing prevents it, nothing prevents it to do it if you, like it's a very interesting topic. So, yeah, I have a few ideas of how to do that, but nothing in the work for the moment. Maybe we'll do that once in no spare time, but maybe. Yeah. Yes? Does your tool preserve comments? Yes. So, we have a concept called 3DS, and so, when you pass the tokens, you can say, this is the token relevant for the tree, and this is the token that you need to store in the tree, but that won't imply, you know, a... Jennifer was passing. Yes, thank you very much. Jennifer was passing. So, in that case, it will be stored in the node. What we don't do for the moment is provide a rewriting API that allows you to insert comments, but we plan to add that at some point too. I think there was a question. It was that question. Okay, cool. Yes? What's the expressiveness of your constraints? So, the question was, what is the expressiveness of your constraints? Since, despite the appearances, I'm not very advanced at constraint solvers, could you precise your question a little bit? For instance, you said that you can implement a type inference like in NL. I think you need to be able to convey the type inference inside the constraint. I'm not sure that you can. So, yeah, so the what the precision was, if you need, if you want to type ML with it, you need to convey the type environment in the constraints. So, we have some ways of doing that. I don't know yet if they are going to be sufficient for ML. I'm working on a prototype and at some point I would like to publish a small example slash paper about that. For the moment, it's we really crafted it for Eda and then we are going to extend the semantics so that it is useful for more than that. Okay, thank you very much.