 How's everybody done today? Three days of Closure Conj? We've seen a lot of really cool modern systems to handle new big problems facing the internet. And what I have for you today is some really old computer science. We have the problems of the 1970s. So this work, somebody wanted me to share the history of this work. This actually came out of Closure Conj last year when Dan Friedman and Will Bird came for the first time. And Dan was having a conversation with Rich about closure as a language, because he didn't know anything about it. I still don't think he knows anything about it today. But he was asking about different features of the language and asked if it had tail call optimization and said, Rich said, no, it doesn't have it. And it's not really possible because of the JVM. And Dan said, I think there is a way to do it. So he came back and he recruited me. And so that's what we're talking about today. And so just to give you a little outline, for some reason my clicker's not working in just a second. There we go. Okay, so here's a little outline of what we're gonna talk about today. We're gonna talk about the history of tail call optimization and the ideas behind constant space tail calls. We're gonna talk about the built-in tail call optimization support that Closure has, because it's got just a little bit. And then we'll talk about the system that I wrote, which I very, very creatively named Closure TCO. And I call it CTCL. And we're gonna spend the bulk of the time talking about the transformations that it does. And the three of them are listed there. Continuation passing style. Yes, we're gonna talk about CPS today. We've got thunkification and trampolining, which may be familiar to some of you. At the end, if we get some time, we're gonna talk about optimizations that I've been working on the last couple of weeks to make things a little faster, a little pepier, a little better. And then at the very end, we'll talk about work that I hope to do related to this and directly using this work. So I've got a couple of goals today. And I want you to be kind of in the same headspace. What I want you to know and what I don't care whether you retain after this. So I want you to have a high level understanding of what goes on in CTCL. How it does these transformations in order to enable tail call optimization in a language that doesn't have tail call optimization. I want, when somebody asks, how can you get tail call optimization out of Closure? It runs on the JVM, you don't have tail calls. I want you to be able to give two or three sentences about why the three transformations that CTCL does enables tail call optimization. And I don't think it's any secret to anybody who's talked to me at Conge very much that I think about continuations like constantly. I want everybody to think about continuations. I want you all to understand continuations. And I think that we've turned them into a monster that's really scary. And I hope to take the monster out of them today and hopefully give you a little bit more understanding of continuations if you don't already have a good one. I was gonna say, Dan's not gonna learn anything today. And I've got a couple of things. Oh, excuse me, sorry. So I also wanted to discuss how there are some benefits to using CTCL. Like I said, you can write tail recursive programs and they don't blow the stack like they normally do in Closure. But there's also some costs associated with the transformations that it's doing. And we'll kind of talk about those a little bit toward the end. Yes, as I was saying, there are also some un-goals that I have. Some things that I don't care if you retain. Things that we'll talk about or we might not talk about. But I don't want you to really think about these things. There's the internals of CTCO, the implementation. We're not even gonna look at it. I don't care if you understand the software engineering effort that went into it at all. We're not even gonna look at it. And the other thing, we're gonna see a fair amount of CPS today. Now I said I want you to understand a little bit about continuations. But I don't want you to be a CPS expert. You don't even have to understand the little bits of CPS that I'm gonna be showing off today. Okay, so we all on the same page. Everybody good? Everybody ready? Okay, so the first thing that I need to do, it's good the doors are closed. We're gonna imagine that this whole big room is a time machine. And we're going back to 1977. So unfortunately I brought you back to a time where everybody's using imperative languages. We got a lot of Fortran, we got a lot of PL-1, we got a lot of COBOL, but it's not all bad. We've got Lisp, although it's for academics, and nobody would ever build any big web-facing, database-backed program in Lisp. That's just crazy. But that's not the real problem. The real issue we have right now is a big debate. We have the debate between structured programming, which you and I and everybody in the room thinks of as the right way to program. It's using procedures and function calls and abstraction, proper abstraction in your code. But the alternative that everybody thinks is better is go-to. So there is this horrible, horrible, awful rumor that go-to is inherently faster than procedure calls. That procedure calls are stupidly slow, and yeah, sure, if I write my code using abstraction and function calls and everything, it's more modular, it's more readable, but my manager would kill me if I wrote a slow program with procedures and procedure calls instead of go-to. Well, it's also a good thing that we came back to 1977 because Guy Steele at MIT has just published Debunking the Expensive Procedure Call Myth or Procedure Call Implementations Considered Harmful or Lambda the Ultimate Go-To. He's truly a man after my own heart in terms of ridiculous names for things. And the main point of this whole paper was compiler writers, you're making procedure calls slow, and it's your own fault. You're making people not wanna use procedure calls because you're doing dumb things. You're saving a whole bunch of values whenever you do a procedure call that you don't have to, and they can be a lot faster. And so kind of the keystone of his whole argument is talking about tail calls, and this is very prevalent, very relevant to what we're talking about today. So he gives this little example and everybody can see this, right? So it's this little Lisp example. He's defining a function bar that takes two values, X and Y, and it does a call to G with X and a call to H with Y, and assumedly, these are defined somewhere else. And then finally, notice I say finally, you make a call to F with the results of those values. So if you have any sort of background in computer architecture, then you're familiar with the idea of a stack. And if you're not, a stack is essentially the place where we save values for the current procedure that we're running. So we have our little stack off to the right here. And we come into a call for bar, and we've got X and Y, and we're gonna use this little space, and we get into the body of the procedure, and we see that we need to make a call to G to get the value out of it. Okay, so we're gonna save our stack frame, we're gonna call it a frame, we save our information, and we make G work above us. G's up there, he's gonna do his thing, he's calculating a value, it's all cool. He gets done, and we've got a value for him. And we can erase his stack frame and everything. We come back down, we're back in our original stack frame. Oh wait, but we still have H to work on. Okay, so H goes up there and he spins around, he does his little computation thing, comes back. We've got a value. And now this is the key point. We had values that we had to save when we were executing G and H, because we had to come back to bar, we had some work to do in bar, but now all we have to do is make the call to F. So if we were silly, we would say, okay, F, go do your thing up above me in a new stack frame. But what do we have to do once we get back from the call to F? What does bar have to do when we get back from F? Think about it for just a second. Nothing, there's no work to be done. Why are we saving information about bar? So, let's do just that. Let's just reuse the stack frame and put F in there. Let F work in there, and then we don't have to do the stack manipulations. All of a sudden, we see we don't need a new stack frame, we don't need to do the manipulations, and all we've done is set up for some arguments to go to the next function and we jump. It's just a go to. Not only do procedure calls not have to be slower than go to, they can be go to. This is a really just awesome idea. And luckily it seems like most of the compiler writing community, at least for the most part with procedure calls, got the message, and procedure calls aren't painful anymore. And so we learned, tail calls are super awesome. There are an efficient and natural way in languages that include constant space tail calls to do iteration. You can do recursion on lists and all sorts of fun things. If you've ever written a finite state machine or a parser, you know that writing it in terms of natural recursion is awesome. And it's so great that Guy Steele, who was also working on the scheme programming language and Jerry Sussman at MIT, they put it into the scheme standard. You're not a proper scheme implementation unless you have constant space tail calls. And the standard ML guys came along a little bit later and they were like, hey, that's a good idea. Let's do that too. So we're still in 1977. Looks great for the future. Everybody ever is gonna have constant space tail calls, right? Everybody in the room, 1977. Everybody in the future is gonna have constant space tail calls, right? Awesome, wait. Not every language ever uses constant space tail calls. And there are a whole bunch of them, but what we care about today is that the Java security model makes it so that we have to have a stack frame for every single procedure call. And therefore, our favorite little list implemented on the JVM doesn't have full constant space tail calls. Sad days. But it's not all hopeless. We do have a little tiny bit of built-in tail call optimization and closure. We have, you're probably familiar with the recur form. We can use this for self-recursion and function bodies or if you use the loop form, which if you squint and turn your head kind of looks like a while loop, there's a reason for that or so I've heard. It uses the underlying mechanisms for while loops to do that go-to with arguments that Guy Steele told us about way back when. And by the way, we are in 2012 again, so. It's sad times that we don't have tail call optimization everywhere, but we don't have to treat the room like a time machine anymore. In addition to the recur form, we also have this function built-in to close your core called trampoline. I've personally never used it. You have to do some funky little code transformations in order to get it to work. And actually, if you were paying attention to the very beginning of the talk, you would notice in the outline, we're gonna talk about trampolining as a transformation that CTCO does. And so we'll get into it later. It is important. It won't use the built-in trampoline. It actually rolls its own. But these are the things that we have to play with. So that brings us up to speed. We know about tail calls. We know about the story about tail calls and closure. And so now we can get to the system that I wrote called closure TCO. And the idea is that the way that closure stands now, if you want constant space tail calls, you have to roll it yourself. You have to know when to insert recur. You have to know when to do that funky trampoline transformation that we're gonna talk about later. And I'm a compiler hacker. And I like to write things that write code for people. So the whole idea behind CTCO is, you just write your code the way you want it. Let the compiler handle it for you. And of course, the main idea is that, like with the go-to with arguments, the compiler transformation will make that transformation as good or better than you could do it if you were an expert in how to do all these things by hand. So with that, we're ready to start talking about CTCO transformations, which you know means continuation passing style. So what in the world is a continuation? It's this weird abstract mathy thing. Why are we talking about it in terms of writing real software well? I hope to give you a real software explanation of what it is. But first, let's think about what a continuation does. You may not know this, it's fine. It's a thing that takes a value, it does some work and we'll think about it as some computation that's waiting to happen, and then it returns a value. If we were doing things in type closure, I was actually thinking about this with respect to Ambrose. This would be an arrow type, it'd be value arrow value. And you can think of it as a function. It's a very convenient way to represent them. In fact, that's how we're going to represent them in CTCO. And you can even think of the smallest continuation you can possibly construct. We call it the empty continuation. It's better known as the identity function. It takes a value, does no work, and then returns the value. Okay, so we've got a little bit of a picture of how continuations do their job. So now I want to make a really big claim. And it took me a really long time to figure this out and I think that this is the point that helps you think about continuations with respect to when real people, not mathy abstract people, write software. Continuations are stacks. It is just the stack. When we apply the CPS transformation to code in CTCO, what we're doing is we're taking that stack that the JVM handles for us. And sometimes we don't like the job that it does because it creates stack frames where we don't want stack frames. We take that stack and we turn it into a function. We've got it in our hot little hands. And so we don't care about the JVM stack anymore. We can clear it out however we want because all the information that was being recorded in that JVM stack is now recorded in this function that we have. That's super cool. So from now on and literally even in the middle of this talk, if you hear me say the word continuation you can yell stack at me. So I need some audience participation right now. When I say continuation, you say. Stack. When I say continuation. Stack. Continuation. Stack. All right. Okay. So in order to illustrate this point, you know, I'm just, at this point I've just said continuations are stacks. I love you guys. So let's take a look at an example of why my laptop is not changing. Okay. Apparently we've had a technology failure. You're gonna have to take my word for it or not. I have not tried turning it off and on again. Okay, well unfortunately it doesn't look like I'm gonna be able to show you part of my big claim. I was going to take the intuitive recursive version of factorial, which of course factorial is just, factorial of zero is one and factorial of n is n times factorial of n minus one, right? And I had a picture. I had this wonderful picture of the stack being built up where you see these big multiplications coming out. And you have to save this information on the JVM stack when you do it in the intuitive recursive fashion. And it blows up, it blows up in most systems because it's a non-tail call. It's like F, not like F, it's like G and H in our original example from the guy's steel paper. What you see from the example of CPSing it, and now you'll just have to come and ask me afterwards, bumming me out. But then I get to CPS for you, which I really like doing. You see that the stack that gets built up on the JVM becomes the actual continuation function that we're passing around. Like I said, we don't have to worry about the JVM stack. It's not holding values for us anymore. We've taken all those values and we've put them into our function and we use it directly. And the other illustration of how this is awesome is we take a tail recursive version of factorial and you get to see doing CPS on it how every iteration, it doesn't manipulate the continuation. It doesn't manipulate the stack. You guys missed a continuation. I just said continuation. I need two more. Okay. But the stack doesn't get manipulated. You see it's the same continuation flowing through the, thank you. It's the same K stack flowing through the program the whole time. And it shows the point that guy's steel was trying to make. When you make a tail call, you don't need to record information on the stack. You can reuse the stack frame. You can reuse the continuation. Awesome. Okay. So unfortunately, back to my slides. Okay. So now we've done the continuation passing style transformation on our code. Thank you. Stack passing style. I can't, I've committed now. So we have the stack in our program now. And we can clear out the JVM stack, like I said. So the next thing that we need to do is this thunkification transformation. And this was the weird little transformation I was talking about with the built-in trampoline function. And the idea behind thunkification is that every time we make a function call in our procedure, we wrap it in a function of no arguments, a fun of no arguments. Call that a thunk. It suspends computation. And the cool thing here is it breaks the computation into steps. And what I would love to show you if I could get a terminal running would be in the definition of factorial, you just wrap the tail recursive call in a thunk and then you can just run that computation one step at a time. And it goes until it completes and because factorial is a function that accepts the number five and returns 120, you see that that eventually happens. I'm actually, I'm gonna try this one more time. I'm not getting alt-tabbing though. Oh, hey. Hey! Here we go. Okay, so here we go. This was the point about factorial I was trying to make. We can take a step back now. So you can see we have our natural recursive version of factorial here. If you give it zero, it returns one. Otherwise it takes n times factorial of n minus one. And this is our call trace. This is kind of our stack. And like I said, you can see the stack kind of building up through the multiplications. And so if I had time, I would have shown you the CPS version in detail, but instead you can take a look at it. And you see that instead of building up the stack with a non-tail call, we have a tail call now and we're passing along this function that does the same thing as the non-tail call did before. And so you can see that weird thing that I said about continuations, that they perform the part of the computation that's waiting to happen. That's what's going on here. This multiplication with n is the computation that is waiting to happen. And so we've stuck it in a function and we have it in our hot little hands and we can run around with the stack now. And you can see that here. If you look at the call trace, look at that second argument to each call of factorial. It's getting bigger and bigger with bigger and bigger multiplications in the exact same way that the original stack got bigger and bigger and bigger. So you guys kind of believe me but right, we're using the closure data structure to hold the information that the JVM stack was holding for us before. So kind of get some nodding heads and yeah, we got a stack. Our continuation is a stack. And now hopefully I can show you this is the tail recursive version. It's an accumulator passing style which that's just a big fancy name. We don't need to worry about it. But it passes around the final value that we're going to return. And so there's no work here to be done when we finish the recursive call. So we should see here when we CPS it, we have this call trace where we've handed in the original empty continuation. Thank you. And it just gets passed along over and over and over again. We never change it because the stack never changes. So the continuation never changes. Okay, so this is the CPS transformation on the tail recursive version of factorial using accumulator passing style. And like I said, the next thing we need to do is thunkification. And so to do that, all we've done is put this little hash here. Now you know hash is a little reader macro that turns the thing into a function because we don't use any arguments and it's a function of no arguments. So now I should be able to show you this is exciting, now I can show you things. Oh, and one other thing to note here. I've left an entry point, this overload so that you can call factorial with the original number of arguments and it just sets up the empty continuation for you. So I can take this old function definition and I can paste it in here. And you can see when I call fact with five and one, well, what's this? We got a function back. Well, what if I invoke it? We got another function back. Well, let's invoke it some more times. Hey, we got a value out. So each one of those sets of parens that I just threw in there, that's us invoking one of those thunks that we did. Now, when you make a function call, you don't wanna wrap it in a whole bunch of parens. You don't wanna guess how many steps it's gonna take. That's annoying. So that brings us to our last transformation that we have to do in CTCO. Mac, you're killing me. So yes, wrapping a thunk, break the computation into steps. You just saw those steps. And this prevents the runaway stack growth because at each point where we were returning a thunk, we would have made a function call which would have made a new stack frame and if that continued, our stack would get huge and we'd overflow and things would be bad and we would be really unhappy. So this breaks it into the steps where instead of making that function call, we return a value. And that's the point where, you know how I said we could clear out that JVM stack? That's exactly where we do it. Which lets us move on to trampolining, which is where we do the running of that computation step by step. And in order to do it, we need to have some sort of looping construct in our language that has constant space. Well, you remember toward the beginning, I said we have at least one. We've got loop recur. So we're gonna use that. We're gonna run this little computation in a loop recur and it's gonna go and go and go and go until it's done. And an important thing here is to figure out when we dismount this trampoline. Now there are a few ways to do it and I actually have to give Allen Dipert credit wherever he is in the room for coming up with a very clever way of doing it and I can show it to you now. Now that I have mastered my own computer. Okay, that's not what I wanted. There we go. So now we can look at trampolining and you can see that all we've done is we've taken our factorial function definition and we've bound it within the scope of this trampoline function. And the trampoline function is simply asking is there a piece of metadata attached to the thing that we were passed that says it's a thunk? Of course, we're not doing that right now. But now we are. And of course, when we make this entry point call here we need to load ourselves onto the trampoline. So I should, fingers crossed, be able to just grab this definition, copy, paste. I saw what it did there. Oh man, bumming me out. You're gonna have to believe me. So at this point we understand all of the different transformations that go on in CTCL. And just in summary, like I said, my major point that I wanted to make in this talk was so that you understood the transformations that are going on here. So somebody asks you, how do you get tail call optimization out of a language that doesn't have it? Well you tell them we CPS because that makes the stack an explicit thing that we have in our hands. It's a function. It's a continuation, which is a stack. Awesome. We thunkify our expression so that we can run it step by step and we keep the stack from getting too big. And then finally, we throw it on a trampoline so it does that run it bit by bit by bit by bit until it's done. Okay, cool. So actually at this point, although my manual trampoline failed because of copy paste errors, I can show you how CTCO does it directly. So you may have noticed I've loaded into a CTCO.core line or Apple. And we can just define our tail recursive factorial. That's the same definition we had before. Now actually before I do that, let's take a look at how this blows up. So we'd like to see how you're killing me. There we go. We'd like to see things break, right? So this is a normal called a factorial. Start the accumulator off at one. Oh, stack overflow. So let's go back to our original definition and let's just throw it through CTCO. And underneath it's doing all the CPS, the thuncification, the trampolining for you. Let's try that again. We got a big old number. Now one thing you may be thinking, well I could have done that with just using recurve because it's just calling back to itself. So a more interesting example to look at is these mutually recursive even and odd. So the idea here is zero is even and it's not odd. So we have these little functions that call each other asking, well if it's not zero, when we pass it to even, we subtract one and pass it off to odd and ask if it's odd. So if I, because I don't trust anything, I'm just gonna write it directly. And I think I still have time. Yes, I do want true there. Thank you, Dan. We have to change this to false because zero is not even and our recursive call is to even. Okay. Everybody think that's pretty reasonable definition for my even, my odd. We can ask my even of 12. This is gonna be really embarrassing if it doesn't work. Hey, 12 is true. If we say my odd, 12 is odd. Well let's say we crank this up a little bit. Uh-oh, stack overflow error. Let's go back to our definitions. Let's do it through CTCO. Let's do the same thing to my odd. Bad things would happen if I forgot to do it with my odd. Let's try that again. Bam. Okay. So now we're all good. We all understand the basics. We have CPS, we have thunkification, we have trampolining. We can tell people why it does that. But it's not all roses. There are some costs associated with this because of the code transformations that we're making. When we do CPS, we create continuations out of functions. And it takes some time to build new functions and we have to pass them around where we weren't passing arguments around before. We have all these thunks that we created. When we're running the computation step by step, the idea is we're doing it in a really tight loop. And we're just invoking, invoking, invoking, invoking. But that's still something that we didn't have to do in the original code. So it slows it down a little bit. Plus there's this idea of using stack versus heap memory. If you look real hard at the CPS transformation, you see that every single function definition, whether or not it's tail recursive or not, becomes tail recursive once you CPSed it. And so it looks like, oh, I just got tail calls for free. Well, no, you've traded the stack memory that you were using for your non-tail calls for heap memory because you use heap memory for your functions. And so eventually, instead of getting a stack overflow, you will get a heap overflow if your programs are written in a primarily non-tail recursive style. So that's why CTCO is intended for, I wrote something in a tail recursive manner and you understand tail recursive manner if you've read Dan Friedman's books, The Little Schemer and The Season Schemer and all that. And recursion is awesome and you use tail recursion. CTCO will be your friend. But if you've written a non-tail recursive program, it'll help you out a little bit, but it will eventually blow up. So some stuff that I've been working on, let's just go over these kind of quickly so we can possibly have some questions. In the last couple of weeks, I added an auto-recurify pass. So like I said with the version of tail recursive factorial that we looked at, you could achieve the same thing by just sticking a recur in there instead of doing the thunkification and trampolining. And so CTCO now just looks through and figures out, oh, you're just calling back to yourself, so I'll insert a recur instead. And I've seen a fair amount of speed up from that. It was also this problem that I had for a long time where you may have noticed that when I did my function definition, I overloaded it because when you do the CPS transformation, you go from whatever number of arguments you were passing around to that number plus one because of your continuation that you're passing around. And so you had this clash between the, when you were trying to call into the original version for say, so let's take a look at something really silly. The problem that we have here is that when we CPS the first arity, now we've suddenly got two bodies. One is trying to take a continuation in the second spot and the other one's trying to take an actual value. So the solution I came up with was inspired by Alan Dipert's trick of attaching metadata to all the thunks. So every continuation that's created in CTCO is tagged as a continuation. And so what you end up doing, I heard one stack, that's hilarious. So what you end up doing, and I won't go through the full transformation, you essentially ask, you ask is that second argument a continuation? And if so, you do the code for the CPS version of the body preceding it and otherwise you call into the original version. And it took me a little while to come up with this solution but I was pretty proud of it. And like I said, it was inspired by Alan Dipert. So I think that's most of the content that I have just finishing up, we have some future work that I wanna talk about. I focused on getting the correctness of this language. And so there are some glaring omissions and some of the expressions that it'll accept. So that's a small piece of work that I wanna get done soon. There's also some talk of getting this ported to closure script because people would like to be able to play with it there and there are some issues with the current implementation that only work on the JVM version of closure. I'm also signed up to write a tool CPS library because apparently people think I'm the CPS expert in the closure community. So that should be coming around sometime early next year. And finally I've got some other ideas for optimizations to try to make CTCO faster including coming up with a way to do minimal thunkification which I don't really wanna get into now if you wanna ask me about it afterward, that's fine. So the basis for a lot of this work was the first order one pass CPS algorithm by Olivier Donvy and I've heard reports that if you look at the CTCO code and you look at the paper side by side you kinda see how the CPS transformation falls out of it. And the other thing of course, I pulled most of this work from using parentheses to transform scheme programs to see or how to write interesting recursive programs in a stack based imperative inhospitable host which was written by Ron Garcia and a whole bunch of people including Dan Friedman and that was the reason why he was able to say to Rich last year. I think I know how to get constant space tail calls out of closure. So just a few thank yous too. Thank you to Dan Friedman for giving me all the ideas and all the foundations for this. Thanks to Alan Dipert for being the only contributor thus far on CTCO and inspiring some really interesting solutions in there. Thanks of course to David Nolan for helping shore up my obstinate schemer-ness into the closure realm. And thanks to all the Conj organizers for letting me infect your world with continuations. And what kind of compiler writer would I be if I didn't have a slide full of numbers at the end? Well I'm gonna go ahead and finish up. Thank you guys for listening.