 All right, so I'll go ahead and start. My name is Brian Duggan, and I'll be talking today about informal domain-specific languages in Perl 6. Thank you all for coming and attending this talk. I hope you like it. A little bit about my company. I'd like to thank Promptworks for sending me here. We're a consulting company in Philadelphia. We do coding in a lot of different languages for large and small companies. Okay, so this talk is about domain-specific languages, or DSLs, but first I want to clarify the informal in the title of the talk. So if you look for the definition of DSL on Wikipedia, you'll see a lot of languages like HTML or SQL. These are all DSLs. There are languages that are focused on a particular domain, but DSL has also come to be used in a lot of other ways. These days, when you're writing an application that generates HTML, you'll have your own language that generates this other DSLs. They don't have specifications, even though they're languages. They're restricted to small communities, and they change a lot, but they're very practical, and they're the things that you type by hand when you're writing programs. A few examples, so templating languages. If you look on Markdown, there's Wikipedia's Markdown. For SQL generation, we have ORMs. There are also a lot in different languages. In Perl, there's DBIX Class, or ROSE, then in Python, SQL Alchemy, in Ruby, there's Errol, and these are all ways of avoiding writing SQL. But you still have your own little isolated informal DSL to generate these languages. Web MicroFrameworks are another example. The cute syntax that you use to generate your routes, where you say, get this, post that, and it generates routes. Those are also informal DSLs. In 2010, Martin Fowler wrote a book called Domain-Specific Languages. He divided them into two different categories, internal and external. Internal DSLs are languages that are a subset of a more general programming language. External ones are languages where you're actually parsing it. You have some sort of parser, but it's not just a subset of Python or a subset of Ruby. I'm adding a third category to this talk, which I'm calling Variant DSLs, and these are languages that start off as a more common language, but you modify it a little bit. In Perl 6, there's a notion of slangs, which refers to this concept. For each of these categories, internal, external, and variant, I'm going to talk about a Perl 6 technique that can be used to create informal DSLs in this category. We'll start with internal. I'll pick an example from each one, and then an application of the technique to generate a language. We'll start with internal, informal domain-specific languages, so mouthful. What we're going to talk about are the Perl 6 custom operators' facilities. A quick review and explanation of the different types of operators that you see. These are the types of operators in Perl 6. You have infix operators, which have their arguments on the side. You have prefix operators, like the minus in this case goes before what it's operating on. You also have postfix, like A++, which operates on the thing before it. Circumfix are around something, and postcircumfix, like taking something out of an array. These are all the different types of operators. You can see that some of them take one argument, some of them take two arguments. The prefix and the postfix each take only one argument. After all, they're really just functions. Circumfix takes one argument. Infix and postcircumfix take two arguments. Already, even though you can write A++, you can also use operators in what's called a noun form. If you take the plus, and you put some brackets around it, and you put an ampersand in front of it, then instantly you've turned your operator into a function that takes two arguments. Conversely, you can go the other way. If you have any subroutine that takes two arguments, you'll automatically have it in fix operator. Here we have an example of saying we make a subroutine called plus twice, which takes the first argument and adds two times the second argument. You can call it with the arguments afterwards, but you can also put an ampersand in front of it, put brackets around it, and then you can say one plus twice two. Out of the box, without doing anything crazy, you already have some interesting constructs for any subroutine that takes two arguments. Of course, what you really want to do is take symbols and use those as arguments. You don't want to put brackets and ampersands everywhere. To do that, you can define an operator using this syntax. We say sub infix colon and then less than and then the operator goes in between the angle brackets and then you say the operators. If we want to say, I'd like to use the word plus to add two things together, we say sub infix plus and it returns x plus y. Then you can say one plus two and you get three. You can do other things too. Let's say you want to make a prefix operator at at or postfix operator plus plus plus, which modifies the argument and adds three to it. Here we have z equals at at ten and then we say z plus plus plus. What's it going to come out to? I heard somebody say 23. First it's multiplying by ten, multiplying by two and then adding three. This is not making things easier using operators like this. You're going to quickly run out of things and you'll probably end up with obfuscated code. Luckily, we're not restricted to just these ASCII operators. We can use any operator from Unicode. Let's say we want to define the dot product. As you may remember from math, the dot product of two vectors or two arrays, you multiply the corresponding elements together and then you take the sum. Sum from one to n of the corresponding elements from each array. That's the dot product. There's an operator for it. It's a dot. We can define the dot product. Sub infix, colon, dot is, I won't go into this too much, but essentially we're multiplying the corresponding elements and adding them all together. And then if we say one comma two dot three comma four we get. Let's see if people are paying attention since it's a lunchtime talk. Eleven. Okay, eleven. Alright. What about this? So here we can also use there are Unicode floor characters. Don't you hate when you have decimals in your code? You have to deal with like floating points or rational numbers. So let's define a circumfix floor operator. Floor of 2.4. This one is two. Okay, come on. Okay, so what happens if we have several operators and we start using them together? So let's say we have a plus operator here, the word plus and a times operator, which is the word times, we say one plus two times three. What do we get? Trouble. Oh, cobalt. Oh, cobalt. Okay, so I heard one nine and one six. So one plus two is three. Three times three is nine. So, you know, we wanted to get seven but we didn't because what's wrong? Precedence. Right. Luckily, you can specify the precedence of your operators. So if you want to say that times is a tighter precedence than plus, you add is tighter here afterwards. Is tighter takes an argument, that's another operator. So precedence is all relative. Now one plus two times three is two times three is six plus one, seven. You get seven. So in addition to is tighter, you could have done it the other way around. You could have defined is looser, say plus is looser than times. You can also say is a quive if two things have the same precedence. You get an error by a constriction tone. If you have like a non-transitive precedence, good question. I don't know. Okay, what about chaining operators? So let's say we have this operator to the power which is, I don't know cobalt but it seems like it could be a cobalt operator where we want to raise the first argument to an exponential value. So two to the power three to the power two, what are we going to get? 64 because it's going to be two to the three to the two. So eight squared. But what's happening here is we have two to the three to the two and it's two to the three to the two. But really it should be this way. If we're mathematicians, it should be two to the three squared. So in this case, what is it that we want to change? The associativity. So luckily you can change the associativity of your operators too. Okay, so if it's right associative, then you say is a source right. And now two to the three to the two is five to twelve. Yes, okay. Other associativities, you have right associativity, left associativity, non-associative which will throw an operator, sorry, throw an error. You have this thing called chain associativity which is really cool the less than operator has chain associativity. So you can say one is less than two is less than three which means one is less than two and two is less than three. You also have list associativity which says take all of these things and let me just make a multi-valued function that I can operate on all of them at once. The cross product is list associative. Okay, so let's look back at something simple, subtraction. Let's say we make up, let's say, okay, we want to define what it means to subtract one string from another string. So we say it's substituting all occurrences of the first string with the empty string. Okay, so we're going to say this is what it means. You take a string and you subtract a letter. It's like deleting all of those letters from the string. So house minus you is hose. All right. So then what happens when we say 32 minus two? Uh-oh, not good, right? Now we get three. So that's unfortunate because we don't want 32 minus two to be three. Luckily, arg, no good. Okay, luckily we can fix that. We can give types to our operators. Sorry, to our arguments. So we define the arguments we say they are strings. Stir dollar x and stir dollar y and instead of a sub it's now a multi. So we have multiple dispatch for functions and multiple dispatch for arguments. So now when we have a house and a you which are both strings and we have 32 minus three, we get the correct thing. We get hose for the strings but we still get 29 for the numbers. Do you have any problems with ambiguous types? I mean I write programs all the time until five where it's not clear until I use it whether it's a 32 as a string or a number. Well luckily it's optional. So they call it gradual typing or optional typing. So if you don't put the type in you could run into problems. It's right. So it's up to you whether you want to put the restrictions on or not. Can we use gradual typing on multi? Which is if I don't type the signature it will be the full back or any other specific type. That's right. So the signature of the function in this case it has types but you can have multiple dispatch without having types. It will look at the number of arguments for instance or named arguments. No I mean for the name and for the same for example you are string in but in other way you use the non-typing music. You can use them with different types of signatures mixing types and non-types. So what if I go like zero at once places? So we'll get to that in a second. We're going to have a few more. Okay so what if we put in things like constants? I think this is what you're saying right? Okay so we have strings here as two arguments and that works fine. What if we want to say we subtract an int from a string meaning we take a few characters off the end and if we put a constant here that also takes precedence. So here we can say an escalator minus electricity and we get stairs. Even though they're strings we have a more narrow type than the string type. So we can take the a's out of catamaran, we can subtract six letters from catamaran and then ten minus five still works. Okay here's an example. So Python has this really nice operator actually it's called the percent operator and you can use it. There are two different ways to format strings in Python. One is with format and one is with percent. The percent one is very nice. It takes a string and then it takes either a number or a list and it kind of works like sprintf. So we can make percent work like that in purl six if we want to. So here we say when you have a string and a number we just do an sprintf. If you have a string and a list you do the same thing. You flatten your list. You pass it as arguments to sprintf. Then we can say this is percent d and it'll put the 40 where the percent d is. We can say pi is about .2f, e is about .2f and that's the symbol pi and that's e and so this works just like that. We got this is 40, pi is about 3.14, e is about 2.72. So let's look at an application of this to some of the examples in the beginning for domain specific languages. So generating SQL. So some of the techniques that you might see in various DSLs in various ORMs or ways of generating SQL, you'll see method chaining where you'll keep calling like join or do something like this and at the end of the day you get your SQL. You'll see operator overloading sometimes. Then more common than not you see I'm calling it data structure abuse you know where you've got like okay all arrays, arrays mean OR and hashes mean AND. Or you know you have nested data structures which somehow get turned into SQL. How many people have worked with ORMs or use ORMs sometimes? Okay so you know what I'm talking about with some of these constructs. So we have new techniques with some of those things that we just talked about. The operators can, all those operators could be used to generate SQL. So we could write something like this. User plus address name equals ed and full name equals ed Jones where we're using equals and equals and AND to redefine those depending on the type of what's coming in. We can use the circumfix, post circumfix operator to filter things and the way we would do that is we would have, for instance, we'd have a table class and then the infix plus takes two tables. So user and address then would take two tables and generate something else. Similarly we could do something with equals and AND and we could say alright equals takes two columns or maybe takes a column and a value and returns a filter of some sort. And then we could do something else. So you have a lot of flexibility if you're designing an ORM or something to generate SQL in Pro 6. But what I think is interesting is that although operators can be used to generate SQL, operators have already been used to express SQL, namely before there was SQL, there was relational algebra. So in the 70s when E.F. Cod came up with SQL, he first started talking about the mathematics behind it and how you would express queries in this algebra. And he had definitions for things like projection, selection, rename, natural join, semi-join, all these things that we now use SQL to express. So what we could do in Pro 6 is instead of inventing something new, we could reuse some of these operators that already have well-defined semantics for querying data. So for instance we could have the infix bowtie operator make a natural join. And then you could say users join addresses like that. And in fact you wouldn't be making up something new because there's already precedent for the way these operators behave. You could do the same thing with the projection operator, turn it into a select. So project name and age would turn that into a select statement. So a conclusion of this section is that custom operators are very cool and when you're making a subset of a language into a DSL, they are a good way to do that in Pro 6. Okay, so part two is external informal domain-specific languages. And for this, Pro 6 has a feature a lot of you probably heard about already called grammars. So some typical examples of external DSLs include templating languages and wikis. And so for the next section what I'm going to do is take an example of one of these languages and we're going to go through it and I'll show you what it's like to write a grammar to parse a language like that. Okay, so the language we're going to look at is called slim. How many people have heard of slim? Nobody, wow. No Ruby programmers in the room I guess. So slim is kind of a cool language. It's popular among Ruby programmers and this is what it looks like. It's indentation based and it sort of starts like this. It has HTML over here and then you indent a little bit. You get head, indent a little bit more for title. Then you can go back for body and so it's very terse. So it's actually kind of nice because you can type things out. You can just indent and then instantly you get this nested HTML comes out of it. You don't have to type any angle brackets and so it's actually pretty cool. Okay, so let's take a look. First we're going to look at how we could parse this language and then we're going to look at how given the structure that we just saw as an input can generate a data structure that's the DOM, an HTML DOM. And it sounds hard but it's actually not too bad. Okay, so parsing a language. So I'm going to go, this is going to be a little bit of an overview and you can refer to the documentation for more details about how some of these things work. So essentially, Pearl 6 has grammars. They're first class objects in Pearl 6. Declaring a grammar is kind of like declaring a class except instead of declaring methods you have regular expressions which are associated with a grammar. So we have a collection of regular expressions here. Certain types of regular expressions are called rules and some of them are called tokens. So if a regular expression doesn't do any backtracking it's called a token and if the white space in your regular expression is significant then it's called a rule. Those are the only differences between rules, tokens and regular expressions but they're really all a type of regular expression with additional constraints. So this HTML here, sorry this slim code here where we have HTML head you can sort of see the pattern where you have some indentation and then you have a tag and then you may have some text after the tag. So that's basically it for the language. So the top level rule here is one or more lines separated by an end of line character. End of line is a series of carriage returns. So if we parse this particular code, if we parse this using this grammar we end up with a match object. So this is a nested data structure and the structure looks like the rules that we just had in our grammar. So we end up with a sequence of lines and each line has a tag and it may have some indentation. So these can be very cumbersome to work with. So generally instead of parsing something and getting a big match object the way to do something interesting with a grammar is to set up things that happen during the parse. So the way you do that is essentially so a grammar can basically have an object associated with it and then every time a particular rule is reached the method of the same name on that object gets called. So if you have a method called line, if you have an object with a method called line then while it's parsing and it hits a line it calls your method and it sends you the current line. It sends you the current line. So here we have our grammar which had tokens, tags, text and indentation and we're going to make a class that has similar methods one called tag, one called text, one called indentation and basically what we have to do is say here's what you do when you see a tag or here's what you do when you see indentation and then creating a new object we say my$dom equals dom.new we call slim.parse and then the actions parameter sends the object whose methods are going to be called. So here is our algorithm for parsing slim. We're going to go through this indentation based format and we're going to keep a stack with us and we're going to have a little node class that's got parent nodes and child nodes so basic data structure that's a tree. And when you see a tag you push a new node onto the stack and when you see text you're going to add some text to your node then when you see indentation you pull it off the stack. So this sounds a little confusing so what we're going to do now is walk through an example of seeing this algorithm at work before I show you the Pro 6 code to do it so that you believe that it works. Okay so here's our text on the right that we're parsing the slim text has HTML, head, title, body and on the left we've got two things that are going to sort of keep going we've got a stack that we're going to push and pop from and we've got a tree that we're going to connect to when we hit indentation in text on the right. Okay so let's go. First line is HTML, indentation is at level zero and there's nothing on the stack so we push an HTML tag onto our stack. Then we move on to the next line where we have indentation level one and stack one so a reminder of the algorithm we've got the level of indentation to be the same as the stack so if it's the same, the stack is one indentation is one, you're good. When indentation gets too small that's when you're going to change things. Okay so indent one, stack one we push head onto the stack and we move to the next line. Okay indentation level two, stack two it's the same so far so good so we push title onto the stack and then we set the title we run into text I'm going to skip that for this demo and then we move on to the next one okay now something changed indent is now one which is less than the stack which is three so because the indentation is less than the stack we pop from the stack twice okay so we go pop, pop and we put it down here with the tree and every time we pop from the stack we connect a node with the node before it so title is connected to head and head is actually connected to html okay so we now our stack is down to one and our tree has sort of looks like a linked list at this point okay we move on to the next one which is h1 oh sorry and we push the body on afterwards then we move on to the next tag which is h1 okay again now we're back where we have indent and stack are the same so all we have to do is push the h1 on and then we run into the text we set the text of the h1 and then finally we get back to level zero once again indent is below stack so we are going to pop from the stack two times and attach these so we pop once pop once pop twice and then so these are connected to each other and then when we pop the final time we're connecting to something we already have so we end up with a DOM okay so here's our DOM tree html head, title and html body h1 so it's a little tricky to walk through that but what's kind of cool is you can apply the same thing to making an abstract syntax tree when you're parsing a language I'm not going to do that okay so what does this look like in purl six so now that you know the algorithm it's actually really straightforward so here's our node class the node class has a tag which is a string so again these are optional but these are types that you can put there if you want to constrain the types of your attributes so we have a tag attribute we have a text attribute the isRW means that there's going to be a read write accessor and we need that for text because we're setting the text after we put it on the stack and then we have a parent which is another node and we have an array of children so this is a data structure which has a tree and you have sort of pointers in both directions so you have a pointer to your parent and you have an array of children the DOM class so we have this stack here which we're going to use we're going to keep track of this while we're parsing and then we have the top of the tree here which is a node so the my declares just a lexical variable and then has declares an attribute and so now just to refresh your memory the slim grammar has tag, text and indentation and so we need to make three methods one called tag, one called text one called indentation which say what happens when you reach those particular those particular points in the grammar okay so back to the algorithm rule number one when you see a tag, push a new node onto the stack so here we have a method tag the argument is conventionally used to represent the match object so this is what was captured, it comes in it's not exactly a string it's a match so if you want to set the tag to it you have to use this tilde which will stringify it the colons are another way of specifying parameters if you don't like parentheses to wrap your arguments you can use a colon instead so this makes a new node for a particular tag and just pushes it onto the stack okay when you see text set the text of the top node so here we have a method called text and we just say the top node, you say star minus one to reference the top node dot text equals the incoming text so not that much to it and then the trickiest part is what happens when you see indentation because now we have to pop until the size of the stack is the same as the level of indentation so basically once you sort of say it explicitly it's not that bad while at stack is greater than at dollar indent so there's a little bit of magic here because the dollar slash indent refers to the element of the match object that's named indent so before we started this you saw that there was an indent in some of the matches and that'll tell you how many levels of indentation you have so that dollar node is popping off of the top of the stack and then this width essentially is just saying make sure it's defined so it's kind of like if and then do an assignment but only if it's defined so we take the top of the stack here and we assign the node the parent of the current node to be the top and we take our node and we push it on to the children of that node okay so that's about it there's also a way to dump which you can do recursively by printing out different levels of indentation I'll sort of skip through that but all in all the code is not too bad for doing all that parsing okay and here's what you get okay so for a couple minutes here I'm going to say a few words about slangs essentially slangs are a structured way to modify the grammar so Rakuto the implementation of purl 6 uses nqp which has purl 6 colon colon grammar which is just like the grammar that you just saw except it's actually parsing purl 6 and you can mess with it so this is what you can do right now with slangs so at the beginning in a begin block during compile time if you print out the keys of percent star lang you'll see purl 6 grammar purl 6 actions and all of the things that are being used to parse your current program they're also available in variables called dollar tilde named after the different languages so dollar tilde main refers to the parser of the main program and you can get the grammar you can get the grammar and you can get the actions so if you want to see this by the way is called the language braid so all these different languages are being used together to parse your purl 6 program so if you want to change one of them first if you want to see what they look like you can say dash dash target equals parse and you'll see the parse tree that's generated while your purl 6 program is being parsed then if you look in the source for purl 6 you can see the definitions of these and they look just like the grammar that we just made up so you might see so they have these things like statement list name identifier and if you want to modify it like let's say instead of sub we want to use the word lambda then essentially you can redefine the token and you can say this token now instead of matching the string lambda matches the string instead of sub matches the string lambda very quickly there's a handy operator for taking a grammar just like you can take a class and apply we just saw something called a partial class which is like a little piece of a roll so purl 6 has something called but which says make this little anonymous roll that has these methods and override those methods in this particular object and you can do the same thing with grammars where you can override a few tokens in a grammar long story short you say lang main equals the main grammar but this is what my subs look like they start with lambda instead of starting with sub so that's sort of a long explanation for something that sort of makes sense at the end the but just kind of changes one little piece of it I'm going to finish real quick I'm almost out of time and then I'll see if I can take a question and then after that you can say lambda sub foo and instead of saying sub foo and it will declare a sub for you and by the way if you put it somewhere else these changes to the grammar are lexically scoped so inside a scope you can use the grammar you just made up there are a few examples of slangs in the ecosystem already I won't talk too much about those but you can look them up there's still some work to be done for syntax for using slangs but all in all they're there, they exist and in conclusion we've seen some examples of creating informal DSLs using changes to the syntax by parsing something externally or by modifying Pro 6 itself thank you very much I'm out of time but come talk to me afterwards if you have questions