 So, today I will talk about, do you know Papers We Love? You ever heard of it? Papers We Love? Nothing to do with that. I just stole the title. Perfect. Because I liked it. But we're just going to talk about papers that we use in Elixir and existing research that we use in Elixir to implement stuff that we have in the language. My username on the internet is that. My name is Andrea. I work at a company called Community, which does like messaging. This is an article that came out recently. 100 Music Stars are giving the fans their phone numbers. That's what we're doing. We are the phone numbers. If you're interested, more just talk to me later. We're hiring remotely as well, so just come talk to me. And I come from Italy. You've probably heard of it. My favorite thing about Italy in the world, has anyone been at my talk the day before yesterday? It could be in light. OK, we'll recycle a bunch of jokes. I'm sorry. But my favorite thing about Italy is Italians that are mad at food. So Italians get very mad at food when other countries make Italian food, because they don't do it right, apparently. So they have these beautiful Twitter accounts. Just go look at it if you want, where there's screenshots of Italians that are getting angry, especially at Americans, but generally angry at people that are doing Italian food. I will show you a bunch of them. Obviously, there is no spinach chicken pasta recipe in Italy. In Italy, a recipe like this is crime worse than a murder. Then there's a bunch of vomit stuff that they like, like a very common recipe used in Italy to cause what we call conodic potato translation, con of the power that is a huge uncontrolled vomit explosion. It's very passionate people. Is it a hangover meal, isn't it, to vomit the last crap out? Yes, very. We're not offended. We're actually vomiting, told you lots of vomit stuff. This is for Americans. We're here to watch you burn one chicken parmesan for cheese mozzarella taco pasta alfredo at the time. It's beautiful. Then we have defeated people. I really don't understand why you do this. Just people are defeated. So happy that there is an ocean between us, guys. Positive thinking, right? And then when you really can't bring anything more to an argument, you just go full on world war and say a world war won't be enough to forget this. So lots of passionate, very passionate people in Italy. Anyways, we'll talk about Elixir. Has anyone used Elixir before here? Perfect. Anyway, Elixir is a functional, concurrent, full-tolerant distributed programming language that runs on the Erlang virtual machine. And it's a relatively young language. It started coming out in 2013, roughly. 2012, 2013, roughly. So it's a pretty young language and it has gained a bunch of traction and I'm part of the Elixir core team. That's why I'm talking about this. I'm the guy middle, bottom middle and we're six people and none of us are academics in any way. Some of us have like computer science degrees but we're not academics but we absolutely love existing research. We love finding about things through existing research because there's a lot of, as you may know, there's a lot of existing research that's very interesting and that solves a lot of problems. So when we have to solve problems in our language, we often reach out for existing research even if we're not academics, right? You just go read paper and most of the time they make a lot of sense. We're gonna discuss three things. Three problems that we, where we use research to help solve them and they are formatting code and different data structures and property-based testing. So let's dive right into it starting with formatting code. So formatting code means that you run a tool that takes code written in whatever style you have and it turns it into a uniform, automatically formatted layout, right? So we really wanted this in the language because it's very welcoming to newcomers. It eliminates a whole category of discussions about style which are horrible and boring and also I always want to be right about those discussions. So if we build a formatter in the intellectual, I can be right more times and it also gives like benefits for consistency, right? When you're writing code, it's easier to read code that's written in the same style across community, across companies, across teams. So we really wanted this to be in the language. What is the hard thing about writing a code for matter? Usually you could just take AST and just turn it into some string and it makes sense, right? The hard thing is that you usually want to print code or format code with a line length limit, right? You want it to fit in a given line length. You don't wanna go on forever because otherwise you could just format everything to be like every line to just be as long as you want and it would be much easier. But if you want a line length limit, this becomes harder because say you have a piece of code like this. This is an elixir map. It's like a hash or dictionary in other languages and you have two keys and you have different things. How do you, so if you have let's say a line length of 30, this fits so the formatter can just take the AST for this map and just put it, print it as a single line. But what if the nice transition? What if the line length becomes 25? Now we need to, the maximum line length gets set to 25. Now we need to change how this is formatted in order to fit, right? So it could become something like this, right? So it becomes a formatted thing. If we shorten the line length even more, you can split even more and you can see how the formatted needs to take a bunch of decisions in order to figure out how to print code in a way that fits the line length, right? And that's the main problem, basically. We started out with this paper from John Hu, the design of a pretty printing library. And this paper introduces a concept of algebra documents. So an algebra document is a thing that you can render in, you can use to introduce some logic into rendering pieces of text, right? And a few examples of documents are simple texts. So that's exactly you can't break. For example, if you have a variable name, you can't break really that into multiple pieces or new lines. So you have this text, you have concat, which is a way to concatenate multiple documents and possibly breaking them in the middle. You have something like nest where you say, if this breaks nested by a few columns and that's used for indentation when formatting. And then you have something like line, the first is a line break. So you have a few documents that you can compose and then have logic to handle them based on the line length, basically, right? So if we wanna see an example, if you wanna write a list, you can say like, just write a parance, then concatenate with the first element in a comma, second element in a comma, third element, and then the closing parance, right? If you print it without any line limit, it's gonna print like that, right? It's just gonna fit all on one line. We can use line and force a line. So we can use parance and then the line with all the elements. We can nest the previous thing so that when there's a break, the break is nested and you can see it's indented. And so that forces the indentation and it forces the breaks. Another paper by Philip Waddle called A Prettier Printer also introduces the concept of groups, which is what we use as well, which is a way to basically say, to have a unit that can either be split or not split. So in this case, like the unit is the whole list. Like this whole thing can be split or cannot be split, but if it gets split, it should be split altogether. That's a thing, right? Otherwise you can end up with where you have like a parance, one, two, and then three on a new line. But we wanna make sure that either you print on a single line or you split the whole thing on multiple lines and that's what a group is for, right? So when you are dealing with these algebra documents, the most important thing is, again, having a hard line length. So when you have to decide, you basically, when you are rendering, you have to decide whether you want to render the document as a single line or replace the concatenations with breaks and render the document on multiple lines, right? So you have to choose. The problem is that Elixir is an eager language. It's not a lazy language, it's a strict language, right? So when you call something like this, it evaluates the arguments before evaluating the choose function, right? And in this case, it would have to render doc and then it would have to render the replace concat with line break doc, so a document where all concatenations have been replaced with line breaks. It has to choose and it has to evaluate both of them and see like if the first one fits, then it's okay. Otherwise it has to go with the second one, right? The problem is that this is exponential, right? Because you have to explore all possible scenarios before you can actually choose the main, right? It's an exponential problem, it's not really, like it's very, very, it doesn't scale very well and it makes it impossible to write practical things, right? So fortunately, we found another paper called Strictly Pretty by Christian Lindig and this paper explains how to do strict printing. So it explains like a different approach to formatting and to dealing with algebra documents that is strict and this was implemented for OCaml in the paper. So it has a very similar approach to the original paper which was implemented for Haskell but we were able to turn our formatting into strict formatting so that it works for our language. We also ended up adding a bunch of documents. So like color, for example, to have stuff that's colored in the terminal, we can use this. We're nesting with a cursor. So like nest as far as the cursor is instead of a fixed number of spaces but the idea is that we were able to, based off of this research, we started with a problem, we started with a solution, it didn't apply to our language, we found someone that already solved the problem for our language as well. So we applied that and then with the abstraction that we found in the research, we were able to extend this and create our own things that we need for our own purposes, right? So it was very, very useful. Diff in data structures, this is like, so the idea that we wanted this because when you have two tests, when you have a test and you're asserting that two things are equal and they fail and the test fails, we want to know why it failed, right? And here it's CCC but when you have like a big piece of text, for example, it's harder to see why it failed. So we wanted something like this where you have a diff of the things that are different basically. So it's much easier to see, it's in like a normal github and it's much, when you have a large data structure, this is unbelievably useful, like it's very, very useful to be able to point out to where the differences are. And so to do this, we started with this paper, we called an O&D difference algorithm, it's variation by Eugene Myers and this paper basically is about finding the shortest edit script to turn a sequence into another sequence and basically it's like finding the shortest path in a graph, right? That's a problem that it's solving and the way that it turns, edit script is basically a sequence of commands that can bring you from a sequence to another sequence and the commands are add this, leave this alone or delete this basically. So with these three commands, you can bring for example, any string into another string where you can say just add this letter here, like leave this part alone, remove this letter there and you can transform a string into another string. This is useful for, so the O&D complexity is important here because this is useful for things that are not very different. So if you have two sequences that are not very different, like two strings that are pretty similar, like the ones that we saw over here, if they are really very similar, this algorithm works very well. The D in the O&D is there related to how similar the sequences are, so the complexity of the algorithm is based on how similar the two sequences are. So it's good for things like source code changes because you usually don't have a lot of source code and you don't have to do big changes all the time, you usually do small changes. It's useful for things like DNA strand mutation, which is, I have no idea even what it is, but it's mentioned in the paper. But we were able to basically turn this paper directly into an implementation, so we have here list.mires difference and as you can see it takes two lists and it gives you an edit script, so the edit script for us is just EQ, that's the command for leave this part alone, then delete an element, then leave this other part alone, then insert another element, so it's just an edit script. And once you have this, it's really just a journey into a diff, you just colorize this, right? You colorize the delete as red and the add as green and leave the rest alone. We were able to do no modifications to the paper this time, so we were able to just take the paper and implement it in a lecture and it worked and it was very nice that we didn't have to think about this and figure it out. We also improved the situation a little bit by finding this other paper, string matching with magic trees using approximate distance. This tells you, gives you an idea of how similar two strings are, very fast, in a very fast way, and now we use this to decide whether we're gonna do diffing or not. So if we have strings that are relatively small and similar, we do diffing. If you have huge strings in the test failures, for example, we just don't do diffing because it would take too much time, right? Because that's property of the algorithm is that if the strings are very different, it takes time. So we were able to improve the situation a little bit for us by not doing the diff in that case at all. So that was very nice. We could really just basically take the research and apply it to our language and we didn't have to think about anything. And property-based testing, then it's, this is a more complex problem. Actually, we ended up not having this feature in the language. It's now shipped as a separate library, but we researched this a lot. And if you don't know what property-based testing is, it's basically a way of testing where you take, instead of giving exact values to things that you're testing, you come up with the shapes of values that are, that should work with your code. So for example, a list of integers, that's a shape. And then you find out properties of the output of your code. So for example, if you have a sort function, a property of the sort function is that the length of the list that it sorts is the same as the length of the original list, right? So that's a property that applies to every possible input that you give to the sort function. Every list of things, so that's the shape. And property-based testing basically takes the shapes, takes, to take the shape definition, takes the properties of the output and generates a lot of possible random data and it verifies that the property remains, right? That's property-based testing. This is an example of what I just mentioned. So we have generated a bunch of random lists with the shape list of term. So any list of any term, in elixir, sorry, the terms are all comparable, so it doesn't matter what you're sorting. Then we sort and then we assert that the sorted list is a list and that the length is the same. So that's a property that's valid for all lists, right? To implement this thing, we started from QuickCheck, this is a original paper called QuickCheck, a lightweight tool for random testing of Haskell programs. This is good, but it gives you a good idea of property-based testing, but it relies very heavily on types to generate things, right? So for example, it's easier in something like Haskell because you can have color, which is red, blue, or green, and then you can just infer what values you can generate from the color, right? So if you have a function that takes the color, then you can implement easy things like easy generators for types and you can say just for example, color is gonna be either red or blue or green, right? So in elixir, we don't have this because it's an end-to-end language that has no types, but we were able, this is not existing research as much, but it's existing code, actually. So if you write closure, you might know test.check, but test.check is a property-based testing framework for closure and we were able to have a very, very similar implementation to this thanks to the fact that closure is very, very similar to elixir in semantics, so it doesn't have types, it has a very similar set of data structures, they're immutable just as well, so we were able to take these, the ideas from the property-based testing and the implementations from the closure.check can make a library in elixir called stream data that allows you to actually, this is actually running code, allows you to write property-based tests and generate data and verify the properties of the data. So we didn't end up, again, adding this to elixir, but the experience was very similar to the other cases where we studied research and came out with something practical. Some honorable mentions of some paper that we really enjoyed, this is called How to Make Haddock Polymorphins Less A Dock, and we use this as the base of implementing protocols in elixir, which is a feature where you have polymorphins based on the data structure, so you define a set of basically functions that you can call, and then you can define implementations in different places for those functions for different data structures so that it looks, for example, you can have collections as a generic idea, and then you can have a module that works on all collections, and there's a bunch of functions like map or reduce or filter that work on all collections based on this protocol. This is a paper for LISP, Recursive Functions of Symbolic Expressions that Computation by Machine. We didn't take a lot from this, but elixir is a homo-iconic language that has macros and the ASC of elixir as elixir code, so it has a lot of ideas that come from this language originally from this paper. This is one that I like, it is Advances in Record Linkage Methodology as applied to the 1985 Census of Tampa, Florida by Matthew Jarron, this is a paper that describes another algorithm for string distance, it's Jarodistance for short, and it's very performant, so we actually have this function called Jarodistance in the Elixir Library, if you wanna do just diffing, or if you wanna get the distance between two strings in your production code, this is what you should use, not the diffing algorithm. And then there's one called ITERRT, Teaching an Old Fold in Neutrics, by John Lado, this is not a paper, I believe it's an article somewhere, a monad read or something, I can even remember, but it's still research, and we use this as the base for when designing the language, as the base for enumerables, which are our collections, polymorphic collections, it's a way to have a generic way of implementing something that can be enumerated, and can be where the enumeration can pause and continue, and you can implement this for multiple things like we implemented for lists, we implemented for lazy streams, where the data is lazy, we implemented for things like reading a file can be a stream where that emits lines, so we have a very good abstraction and it was heavily inspired by this. So the conclusion for this is that existing researchers is very nice to have. I recommend that everyone, when you're trying to solve something that sounds like a computer science problem and doesn't sound like adding a route to a web application, if you have to figure out a problem that sounds complicated, just go look at the research because there's a good chance that they're gonna be surprised how many things have been solved by other people. For me, it's probably better than me, that's the part that I like, that they solved it probably better than I would, so it's very nice to be able to have this existing research and the thing that I like the most about this is the research really tends to solve problem in a very good way. So when you read papers, usually it's a very, they're very focused on finding a general solution or a very simple solution or for example, a solution where you have a very, very simple core of things that you can compose, like very simple primitives and composition, so a lot of focus on that so that you can build more complex abstractions but starting from very simple components, so that's very, very nice. And then what we did is usually just modify this lightly so that it can be applied to a real world or to a real needs, but all the base ideas and the base algorithms that we use are often, they often come from existing research, right? And that's all from me. Yep.