 I'm surprisingly nervous today. Gambarimas. For Samarata, I didn't make this esoteric language, I'm just implementing it, and you can see the JRuby, what do you call it, logo, logo, or mask head of Keynote, but it's not a JRuby talk. It's about an esoteric programming language. I do not look like this, but it is my avatar. I co-lead the JRuby project, I work at Red Hat, I obviously know Ruby, I'm a crap beer drinker, so if anyone knows a good brewery in the Austin area, I'll be here till Sunday morning, let me know. I'm sure there's a lot of opinions I can already tell. I know I don't. So what's an esoteric language? If we look at Wikipedia, it has a lot to say, and it's very, very squishy. I'll just boil it down to one sentence. It's a language which is intended to be hard to understand or read. The most famous one is Perl. I use Perl for 10 years, I do like the language, so I'm allowed to make this joke. But I wanted to make an additional point. If you've never seen Perl before, you can figure out that this prints something out five times. If you look at this, this is creating a space for an object, it's an O thing. If you know Perl, it's totally natural, it doesn't look strange at all. You start looking at some of the Perl six features, then you start losing some of the Perl programmers. But I just wanted to qualify that something seems esoteric if you have no experience with it. So it's my, the philosophical part of this talk is done now. But something to think about. So I'm gonna talk about Paiet for the rest of this talk. It's named after Paiet Mondren. He's responsible for the artwork on the right. God, he looks so much cooler when he's younger. Like I don't know how you get here. But when he was young, he was like somewhere between Clinton and Time Traveling Hipster. But I don't know, when you get old, you just like make bad choices. Anyways, I'm gonna talk about the language Paiet. It's image-based, just like Smalltalk. Well, actually not like Smalltalk. Your source code is literally an image. This is a PNG image. If you go and execute this program, it prints Paiet to standard out. It's also known as a fung language. So it works with coordinate systems. That makes sense since the program is an image. It's stack-based. Actually, most of the esoteric languages are based on stacks and being very primitive machines. It's based on codles. Codles are just an n by n pixel square. So codles make up groups. So in the upper left-hand corner, we have a three-by-three group that's size nine. Next to it's the dark blue group and it's eight because green's taking a little chunk out of it. There's some runtime state called a direction pointer. It initially points right. It moves around in the clockwise way and it can change in other ways. There's a codle chooser, which is either left or right. And execution of Paiet is moving from one group to the next group. And it's based on this navigation rule. You follow the direction pointer to the furthest extreme of the group. And then depending on whether the codle chooser's left or right, you go to the left or right most at that extreme. Oftentimes, that's the same codle. So just to solidify this, when we start the program, we enter this normal blue group. The direction pointer is right. Codle chooser's left, so we enter at the top. But if the codle chooser would have been right, then we just enter at the bottom. Pretty simple. Let's just go for a slightly more tricky example. In this group, if the direction pointer's right and the codle chooser's left, they go all the way to the right and we pick the top one and we'd go across there. And obviously, if it's right, we'd go there. All right, so simple language, right? But it gets easier. So there's six colors and each color has three different shades. When you move from one group to the next group, you look up in this table what operation you should perform. So we got stack-based operations, math, comparators, input and output. And then pointer and switch, which just take the top value and spin those values around. These three transitions of blue show you going one lightness darker and this would be a push. You'll notice that it's just a cycle. So the darkest value, getting to the lightest value is one darker. Here's two hue changes difference. This is a divide operation. Again, you just follow the circle around. So now let's step through the program a little bit. We enter from the upper left-hand side and we're in normal blue and we need to transition to dark blue. I explained all that work earlier. So we go and look up in the table. It's one darker, so we're gonna go and push a value. And what we end up pushing is nine because that was the size of the normal blue group. So this is the only value of group size. It's only for push. So now if we go to step two, we go to black and I didn't explain what black was. So black is the equivalent of going off the image or you just can't pass through it. When you encounter either of those conditions, you use this algorithm where you toggle between the codal chooser and the direction pointer up to eight times. If you find a valid place to go, you continue executing. If you can't find a place in eight tries, the program terminates. This is how you normally terminate a PIP program. So going back to step two, the codal chooser shoots us from left to right and we drop down to the bottom and navigate on hot air. So white is a no op. You slide right through it. In fact, you can put other colors that it can't recognize and it's considered to be white. I think I need to take a break. Too soon? Okay, I'll do anything for humor. So I made an arpied. It's a gem. I just realized this morning that I hadn't actually re-released it since I did all this work. So I'll try to do that afterwards. Here's an example program, kaosei. With arpied, you can just say download this image off the internet and execute it. This is pretty cool. When you execute this program, what happens is you get a prompt and you can type something in and the cow repeats it in full ASCII glory. But this is truly like mind-bending for me. Like, who the hell would have spent the time to figure out how to do this? I have no idea. I wrote a simple program and I find this impossible. Another one that's really cool. You can pass in what total size you wanna use. So if you pass a two by two total in, it prints a pie, but if you pass in one, it prints hello world. So that's pretty weird. This is a conceptually cool one. If you go into the upper left, always certain upper left hand corner, the first thing that you push is something that's half the image in height, which is the radius. And then a little bit later on, you put this jaggy circle as the area and then you do the area divided by r squared to calculate a really bad value of pi. If you kept making the image larger and larger and larger, it would get better, but never be perfect. Okay. So the talk is actually about implementing rpiat. So what I did is I ported Python to Ruby. I'm kidding. The talk's not over, but I did port Python to Ruby. As it turns out, I'm not so good at porting. I don't know Python particularly well and I made some errors. The file I started with was literally a single.py file. So what do you do when you first make a program that doesn't work? You start debugging it. It turns out all the piet interpreters all have a very similar debugging output. And this was useful. I started to make some of the example programs run, but then I decided to take a little step further and I made a graphical debugger. So I'm gonna give a little demo. Oh my God. I'm not here. Does anyone see my, oh no. Oh, there we go, there we go. All right. This is going great. So this is the program I wrote, which counts down from a million, or it counts down from whatever I enter. Here I entered. And now you can see I'm doing a NIN, which is a numeric input. And I'm just adding three to it or echoing it. And so you can step through and you can look at the stack. Look, you can restart. I actually cat a file of three so I could do this. So this is nice. You can set a breakpoint. And now you can see that the stack went from three down to two. And you can see that it was pushing some values because it had to subtract a value. Now it's essentially out of conditional. It's at the pointer command. And I probably should have pushed two instead of three on because you'll have to do this a couple of times. But it'll give you some time to look at the stack. So now zero is on there. And when we resume, we go into this dead end thing and you can see it trying the eight different ways. And then for whatever reason, I just had to kill the debugger. So after I got some stuff working, I decided I should start writing some tests. I had to go and make an ASCII image format, which made things a lot simpler than going into the Gimp or something. And then I ended up having to make an image generator application. Oh, I should point out two things. That debugger uses JRuby and it uses a library called JRuby FX based on Java FX. And it's really, really slick. And this particular image gen application was written with another library for JRuby called image booted. So it's not the last mention of JRuby, but use JRuby. So once you generate an image, then you can go and run it against other implementations and make sure it's running the same way. When I went to go and write the program that I showed you in the graphical debugger, I started just writing an ASCII image format of that. And then I realized like, after I was working on it five minutes, that I wanted to add a second dupe in the middle. And then I realized that every color after that second dupe is completely gonna change. So then I had to go and write a tool that allowed me to go and regenerate all these color sequences. It's only one dimensional, but it was pretty helpful. And I can, of course, chain these things together to generate images or just pipe it to iPod itself. So I started with the Python implementation and I was pretty unhappy with it. I don't expect you to read this code, but it's what I call code-chouter. It mixes a whole bunch of stuff together. There's multiple loops, nested conditionals. And pretty much every Python implementation has the same code just in a different language. And so there's really two problems. One, I can't make it any faster. All that out of bounds checks where I go up to eight times is constantly being done. I can't do any more advanced optimizations. And really this is because we're mixing two concerns. The parsing and execution are mixed together. So let's fix it. Does anyone remember McDLTs? So normally when you go and process a language and parse it, you'll generate an abstract syntax tree. So you'll have a Y loop and the syntax for the test will be one child and the syntax for the body will be another. But what does that look like with Pyat? It's two-dimensional so you can have these natural cycles. Well, based on that, let's create an abstract syntax graph because we wanna reflect the syntax. Every operation we perform, like Corsche or add, will become a node in that graph. And we'll add a couple of other nodes for the codal chooser and the direction pointer. So whenever we hit the edge of the image or the black, we'll just push a node saying we need to change that because this runtime state needs to continue to exist. And here's one of the hardest slides of the morning. I'm gonna go and use the original implementation as the parser and it's gonna trace execution. So every time I go and add a node to let's say do a push, the next place it's gonna go, I put that onto a work list, but if it's the switch or pointer instructions, there's either two or four different directions I can go. I push those two directions or four directions onto the work list. Then I just pop off each item and keep processing the work list until it's empty and then keep a visitor list to make sure I don't visit the same place twice. So when I'm done with the parser, I have a nice graph and all this code on the left basically becomes this line on the right in execution. It's much, much simpler now. But actually look at the performance of counting down from a million. It got about seven times faster. So there was a lot of invariance happening. But let's not stop there. Let's try to make it even faster. So I'm gonna convert the abstract syntax graph into a set of virtual machine instructions and it'll be something that a compiler backend can like munch on and make something fast. But we have to get rid of these cycles because these cycles are an unnatural thing for a compiler backend. So all the cycles will be replaced with either jumps or branches and the rest of the operations will map to a series of more fine grained virtual machine instructions. So I'll try to give you an example. This is another hard slide. And the very first at the top is the not operations implementation with the graph interpreter. What it does is it pops a value off the stack, compares it against zero and then either pushes one or zero onto the stack. So this first line just becomes a pop instruction. So this is sort of a one-to-one mapping and it saves the pop value into a variable. And then all the rest is this line. So I don't know how many people have had assembly but we'll just leave it at, this is just a very simple if-else statement. So we check to see if the variable we popped is equal to zero and if it's not equal to that we jump to false, otherwise we fall through the true. But by the time we go and process this graph what we end up having is a single contiguous list of these simple instructions. And for those places where you need to jump or branch they'll record what label they wanna go to in this instruction list and then those labels will exist exactly and they'll want somewhere else in the instruction list. Here's the complete interpreter for executing these instructions. You start at the top of the instruction list, you retrieve the instruction, you execute it. If it happens to be a branch then its result is gonna be a label that it wants to go to. So then it looks up in a jump table saying hey what index in the instruction list is that label at and then it just changes the program counter to that. Otherwise it just increments and continues executing instructions. So if we go and look at how fast that is you'll see that it's slowed down. This makes sense, I mean we actually are implementing a virtual machine now, we have this program counter we have to go and look up jumps so we're adding some extra work here. But we're trying to do this because we wanna compile it. So now we have to come up with a back end. Now things are gonna get weird. So I work on JRuby and I'm really familiar with one internal instruction set which is our own. JRuby when we parse a Ruby code we generate a set of virtual machine instructions that represent Ruby semantics. And I thought oh this is cool. I'll take the virtual machine instructions I made for PYID, I think I keep saying P it, don't I? I can't break that habit. The virtual machine instructions for PYID, I'll just do one transformation and convert them to equivalent Ruby virtual machine instructions. And then I'll wrap that in a lambda. Am I blowing anyone's mind? So now within JRuby you can do a P at require, load your image and when it returns there's a lambda that you can then execute. So now JRuby can run PYID and Ruby. So if we look at the results there's two new numbers here. The first one is running JRuby in interpreted mode. You'll see that it's about 17 times faster than us running the PYID instructions through its interpreter and that's just because we're doing this in Java now. And we're doing some additional optimizations because JRuby is able to do some extra stuff to make things faster. If I actually JIT compile it, it ends up being five times faster again and that's because after we make it into our IR instructions we compile that's Java byte code and the JVM churns on that and generates native code and optimizes it more. So what I was able to get for this talk was a 305 times speedup. A couple other things. I really wanted to target this against LLVM counting down from a million to zero and not using the value. Most compilers are smart enough to realize that you're performing no work and it would have just returned zero and it would have been infinitely faster. And I thought that would have been a much more entertaining result but I ran out of time. I also thought it would have been really interesting to talk about some of the optimizations that JRuby did but the 25 minute talk. I already put way too much material in here as we all know now. So I think it's really fun studying different languages. So most people go and study other languages because they're useful, you might get a job with it but esoteric languages like they stretch your mind in a different way. Using a language where there's two dimensional flow that's just like truly weird and it's fun. So check out some of the esoteric languages. Hopefully this talk demonstrated even if you didn't follow everything that you can take something that's rather chowdery and convert it into something that's a lot simpler and ends up performing a lot better. You can always just keep transforming your problem into something that's much more workable. And never ever submit a talk proposal for work that you haven't done yet. I did have a lot of fun doing it but it was a lot more work than I thought it would be. Thank you.