 So yeah, I'm John Brandt and for about 25 years now I've been building tools to refactor code or migrate code transform code I started out building tools in small talk for refactoring small talk then I went to migrating small talkies in the same tools and now I migrate pretty much any code to other code So there's many ways to migrate code and I have this is kind of my strategy So first thing you have to define the parser for whatever language you're going to migrate from a lot of times We already have the parser, but sometimes you don't so Define the parser. The next thing is you create your transformation program that's going to basically run and convert all your code to the new new language and One of the final things is a lot of times we'll have to have a compatibility layer Because there's features in the old language that aren't available in the new language and we'll need to support those now smack itself just Handles the first two the compatibility layers all up to you but smack can help you both on the first two and The good thing about this migration strategy is that Essentially the developers can continue their normal development using the old system We so really reduces risk as long as you know, they They can develop I can go in and define the rules do all that work and So you're not stopping development if I fail all it is is you know I I didn't you know they can still use the old system The bad bad part about this and the reason why most people don't like these migrations is it keeps the same design So if you had a you know a poor program before you'll still have a poor program after This strategy works fairly well for probably a hundred thousand line programs and up Anything smaller than that you're probably getting into overhead that you might just rewrite it by hand or use some other method Most of the projects I've done have been in their range of one million lines. I've done probably From a hundred thousand up to six million or so so smack itself is basically just a standard parser Generator kind of like long lines of yak or bison or something like that It's an L a lr parser generator and lr one parser generator. It supports the GLR the generalized lr so it can do ambiguous grammars it can also generate ASTs and It does some pattern matching So here here's a basic grammar for a simple expression, you know, I guess it's just a Just simple addition Where you can have number plus a number you can put parentheses around it now all the code If you remove out all the underline code, that's just the grammar for that the underline code is for defining the AST so you can see like the first thing is the that percent root expression Essentially that that defines you the root of the hierarchy for your AST nodes suffix is basically the suffix we're going to put on all the nodes so then In each one of those we can define the variable so if we have on the first line an expression or the first production expression It says we're going to have for that addition expression that will have a left variable that has holds the left expression The operator variable which will hold the operator token, which will be just the plus then we'll have the right Variables will hold the right expression and we'll create a binary node for that Similarly like the the second line we have a left pram and right pram However, we do not name the expression in that line in that case smack will Figure out that what you're really wanting to do is add the left pram and right pram to whatever that expression returned and Since it sees that the type of that node is going to be an expression it can actually figure out that what you want is a Collection of variables with the left prams a multiple or collection of left prams and a collection of right prams on each Expression node so from that grammar we generate essentially these three classes With expression node being the root we have our left prams being a collection of parentheses or collection of parentheses Tokens right prams being a collection of the right parentheses tokens Then and all nodes will have that and our binary node will have our left and right with the operator and number node will have the value Now the transformation programs really where all the work Really takes place in the project because there's where you're writing your rules to convert everything and for the smack It has basically a set of rules It's not a set more it's an ordered list of rules They apply it in order and you can define some methods and properties that you can use for the in those rules So there are essentially two different types of the rules There is the declarative pattern rules and there's the imperative coding rules Imperating coding rules basically gives you small talk. You can write whatever you want It doesn't have to do anything perm the migration you can write stuff that loads files or whatever Just let you do about anything Generally those type of rules we use for a generic or a general syntax So if you're converting a method or a class, this is the one that will convert all the syntax for the method or class A lot of times in those imperative rules and then they can also do some control flows So if you want to process one section of the AST before another section of the AST, that's what they handle Now the pattern rules A lot of times we use those for things that are one off So you're in the middle of a file You notice this one expression needs to be treated differently than every place else. You've had that expression There might be something special in the context so you can write a pattern rule just for that one location and The good thing about those are they're fairly quick to write because they look exactly like the code and You can transform them that way So in the pattern rules essentially the search expression is just a normal Text string like you had to have in a it from your program except that you have these patterns patterns in the middle of them and On the search part it's it gets parsed into an AST on the replace side The whole pattern is just a string that gets macro expanded in order to support pattern matching in our Parcer we have to define first that needs to be a GLR parser Because what we do is every place you have the pattern in there. We have to parse all possible All possible trees with that pattern in there So the pattern could on our example here pattern could match a Binary node, maybe or a pattern can match a number node. We might have to parse, you know all possible interpretations of that So we need to GLR parser. We also need the pattern token Since you know various languages have different grammars You know, we need to have something that's That's Does not conflict with the existing grammar. So I know most languages do not do not use a back quit or back quote And so that's what we normally use. So here we have a pattern token is going to be a back quote followed by anything It's not a back quote ended with a back quote There's languages like JavaScript now that's using back quote So you might have to do something different for those type of languages and that allows you to Specify all you want So using that we can write then expressions Like the one you see there where you have a plus a or the pattern variable a plus the pattern variable a is going to be rewritten by whatever the pattern variable was a times two and Basically that pattern variable can match any AST node so it can match a single number can match a Binary expression So here's our example or an example. We have original code of three plus three And we're going to search for the pattern a plus a So what we do is we parse both of those So on the left we get the standard AST on the right we get an AST but with the pattern variables in there so we have the anything node in there and Well, we run unification across that and we get the pattern variable a equals three Now if we add three plus four we'd run unification unification to fail and so it wouldn't match Now whenever we have a replacement replacement works a little bit differently. It's not we don't parse the expression We treat it as a string macro instead what we do is Whatever got matched we just delete that and replace it with whatever that string macro expanded out to This I think this works fairly well whenever you're converting from one language to another Otherwise you'd end up with two different parse trees from two different languages trying or in the same tree essentially two different Parse nodes from different languages So that's the reason why on the rewrite we don't write rewrite trees. We were rewrite with strings so Essentially take the string delete it and take the replacement macro with the pattern macro put in the Whenever we have the pattern variable in there We'll we process that that subtree and whatever it returns we put that in the macro and replace that So in our example, we had the replacement we wanted The pattern variable a times two And we matched with the pattern variable a equal three so replace everything with three times two so here's a couple examples from some migration some Delphi code that we did the first one as One of these kind of One-off expressions you you go through and you start to migrate the code and you notice that oh some places they use this for loop which the ending condition was minus one the normal for loop that Rewrite would have been less than or equal to C minus one But we could change that to be less than and make the code a little bit nicer So that's the type of one-off expressions that will will do in the pattern matching to make the code a little bit nicer whenever it's converted the second one is actually some code that in .net you're not able to set the The minimum size or the minimum height of a window by itself. You have to set the minimum size which includes the height and the width so what we do is whenever on on that a Slash forms t custom form what that's essentially saying is we're gonna match some something we'll call it a But it has to be of type t custom form. So if you're doing it from some other thing It's not gonna match. So the type of whatever a is has to be a t custom form So then the code rules so code reels and smack are essentially just any small talk expression for the search part What we do is we have an AST node that must match then some code that That returns if it returns to you then look it's gonna match replacement or generally Either in three different forms. So they're like edit expression so you can edit the edit the methods or edit the source code control flow Where you tell it which nodes to do are just generic small talk code where you can load files do whatever you want edit expressions, so we try to make it so that Create a language so it looked more like what you would think of whenever you just describing what you were wanting to do So you do things like self replace some node with some text For replacing you know If you want to move one node before another you can do self move this node before another node So we try to make it like like what you would think if you were editing What you would tell somebody to do if they were sitting down at editor Same with inserting deleting Control flow normally the way smack handle stuff is it go starts up from the root of the tree and processes down until it finds a match and On these code rules once it finds a match unless you tell it to do something it's gonna stop right there We can change that by Telling it either you know you can process the children of this node You can continue on processing this node. So if some other later rule matches the same node you can Process whatever so like if you wanted to process the tree bottom up instead All you'd have to do is write a rule it said self-processed children self continue and that would change from top down to bottom up diversity so here's a couple of examples of Those type of matches first ones from Delphi where we just take a statement block as it began an end and We do replace to begin with the opening prints or opening curly bracket Place the end with the ending curly bracket and we continue process the rest of that tree The second one's from some power builder migration or migrated to C sharp and they have these function objects and in the function object we we need to Check whether we check the type type node, but the type nodes can be anything any type of Power builder type so we also need to check whether it equals function object So and whenever we do that we set a couple of properties then we just replace the whole match with a Partial beginning of partial class So that's pretty much it for the type of rules that we can we can do doing smack One of the things I have done though is to add some custom tools in there So here's for a custom debugger that we have for Debugging grammars, so if you debug your parser One of the things you know if you have a table generated parser a lot of things are just like numbers in your table So it's really hard to debug What what we do is We store off kind of the the meaning of those tables or the Managers and give it symbolic names. So here we we have our parsing some JavaScript code and In the top left we have our stack of What's being parsed so at the bottom level we're parsing that you got the module list then we have the bar token and then some the variable declarations and The comma right before that so we're right at parsing. Let's see if I can Eat that there. I can't Do do small but we're parsing what's selected there in the the bottom or in the middle and We have that's our look ahead so in the top Top right here. We have essentially all our possible actions for that look ahead or for that state that we're currently in and so we can If you could read the it's going to match this identifier and it's going to this shift action So the next thing we're going to shift that identifier a lot of times what will happen is if you have a parser error You can open up the debugger and you can see immediately. Oh, we have this type of token and There's no action for that token and that's the air so you can see immediately of It should you should have an action so you might have missed a semicolon or whatever earlier on so That's the type of stuff we're going to look for The GLR is will just list all those states and so if it's doing multiple parses and the grammar we can see that it's actually ambiguous grammar, so we have multiple different paths through there, so we have the section here which shows the input that we're parsing you can actually Select someplace in the middle of the input and so you step the cursor and it will parse to that location The scanner state is here normally unless you're It will always display like the scope so the smack parsers can have multiple Scopes that the scanning in And if you're in the middle of scanning it and show some more detail But and the bottoms the standard debugger information that you would see Normally, it's not that Interesting so When we match we also have preview support So you run run a rule say preview and it'll bring up two different windows like this or two different views with the input on the left and Resulting code on the right But you can do is put your cursor anywhere in there you could select some stuff so here what I did was like put the cursor like in here and When when I do that it tells me which rules change that piece of code and If then if you select a rule it will then highlight the sections on the right of every little piece of change so here I Selected here for this and it told me this event vent declaration node rule change that text it put in put in that that this piece added a semicolon there and added the Ending print see on the left it just highlights the node that actually matched Now from this I could actually Tell it to go to the rule directly from here. I can also tell it to bring up the debugger So I have a real debugger which has Essentially the stack of the rules up there on the top left the rule that's being executed We have the original code and the ending code and We can step through there. We can also scroll down here and run the cursor or whatever This down here is just some inspector stuff But so it's all these tools that makes it a lot easier to get to where where you need to be for finding out Who changed this and why you know When it's not performing correctly So with that I'll open it up for questions Yes Parts Yeah So the what he's asking is that how do I start from any place in the grammar and Because you can't just start from the start node because these patterns can you know be any node down below the way I did is start from all of them and Most of them will fail you know immediately because they won't be able to shift what you're looking for and The patterns are fairly small, you know, it's milliseconds essentially no Yes Conflicts So how they resolve conflicts between overlapping patterns what I do is actually take all the possible patterns So I will so you know if this this expression in this pattern expression Potentially be you know five different Subtrees, I'll try them all for matching any other questions