 Welcome to Emerging Languages Camp 2010. Parrot by Allison Randall. This chief architect of the Parrot Virtual Machine, but I've been known to Moonlight and other things like running publishing companies or open source conferences. So you've probably heard of Parrot. It's one of the earliest implementations, at least in the modern era, of a dynamic virtual machine that's intended to target multiple languages. We do monthly developer releases and quarterly stable releases. The most recent was the 2.6 release, which was just yesterday. Now when we started the Parrot Virtual Machine, people told us we were absolutely crazy. Dynamic languages didn't need a dedicated virtual machine. They didn't need to be optimized. They couldn't be optimized. Since that time, and it's been about 10 years now, it's actually become quite common. You'll see the major virtual machines like JVM or .NET now proudly supporting dynamic languages, multiple dynamic languages, and other players are entering the game now as well, adding support for multiple languages. So it's kind of a good feeling to think that we inspired a movement by going out there and doing something completely crazy. So I say that dynamic multilingual virtual machines are common, and it's true. But Parrot has some fairly unique features. One of them is it has no stack. It is instead register based for variables, for passing variables, but not in the sense of a single global register set like you find on hardware. Instead it has local register sets for each, it's a scope, it's basically the size of a subroutine. So each subroutine has its own local register sets, which is good for safety, good for concurrency. Instead of using the stack for control flow, it uses something known as continuation passing style. So where you would ordinarily push a return address onto the stack and then pop it off again when you're ready to return, in this case, you capture a continuation at the point where you make the call, actually just after you make the call, and you pass that continuation object into your call, and when you're ready to return, you invoke that continuation. This also has some advantages for stability, avoiding stack smashing, and a certain amount of encapsulation in that your return is an object rather than a raw address. So Parrot has a native assembly language, which is object oriented. Perhaps its best feature is the compiler tools, which include a parsing expression grammar, like a grammar engine, this is sort of the modern version of parsing engines. It does not include pack wrap optimizations for caching the parsed elements. And from the parser, it then has a series of tree transformations to produce the actual code that runs on the virtual machine. We've found this is a very easy to use dynamic system. It's familiar to anyone. The syntax is very regular expression-like, so it's familiar to anyone who's used regular expressions. The tree transformations were heavily inspired by Don Knuse's attribute grammars, although many generations later, so you can imagine there are significant changes to the idea. But that's not actually what I want to talk to you about. So what I want to talk to you about is the next step. So in the next two quarterly cycles, 2.9 and 3.0, 3.0 will be next January, we're focusing on something that we've codenamed Laredo. Laredo means small parrot in Spanish. And the idea is a smaller, lighter, faster virtual machine. So in the past 10 years, we've had substantial opportunities to learn from our implementation, our early design goals, and so we're taking a step back now to reconsider our earliest design, our earliest decisions, and how that might, in light of what we've learned, how that might change and how we might refactor the existing virtual machine, either piecewise or by some substantial replacements to be closer to what our ideal would be now that we've learned 10 years of lessons. So one of those things is fast startup time. Good now, but we could be substantially better. Resource consumption, lower memory. With an idea of targeting cloud architectures, mobile architectures, where you need very small, very fast, very light implementations. And then to provide that core, that fast core to all the languages that we currently support and other languages that could eventually be implemented on that machine. So I'm just going to go through a few of the changes. One of them is a microcode approach. So currently, this was based on the original architect's design. I won't take the blame for that one. Was a monolithic opcode approach. Essentially the idea that opcodes are cheap, we can have as many of them as we want, no problem. So we have over 1200 opcodes and this is just the static opcodes, not the dynamically loadable opcode libraries. Yeah, it's a little bit painful, especially when you go to write a jit. The new, sorry I wasn't planning on presenting on this. The new virtual machine, which is currently implemented as a prototype, has 20 opcodes. Substantial jump. So that's the lowest level of the microcode, microcode zero. The higher levels of microcode are all composed from that lower level of microcode. So this is an advantage for jitting in that you need C templates for the lowest level, the M0 level. And then you compose the C templates for the higher levels from C templates or LLBM templates or whatever you decide to use for your jit. You only have to do it for those 20, maybe 30, once we get finished with it, opcodes and then everything else gets built up from there. The highest level of the composed opcodes is our current assembly language. So ultimately the entire assembly language, the native language, the bytecode of the system will be composed of a very small set of opcodes. Another thing that we're reexamining is garbage collection. So we're going for concurrent copying, compacting garbage collection from the start. The original implementation did not have relocatable objects, which is a substantial problem once you start to move to more advanced modern garbage collection systems. Our object system is another thing that we're going to be reexamining. Through an accident of history, our current object system has two completely separate branches. So you have objects that are implemented in C, and you have objects that are implemented in the assembly language or some higher level language but compiled down to assembly language. And with what we call the inferior run loop problem, it's essentially when you're crossing over the barrier from executing opcode level, the opcode level run loop down to the C level run loop because the virtual machine is written in C underneath. You end up having substantial problems crossing back and forth, especially if you cross back and forth multiple times and you end up kind of messing up your C stack in the low level virtual machine. So the movement in Laredo is towards unifying these object systems so that we have only one object system, and it's only in the superior run loop, only in the assembly language level, and therefore all objects are jittable automatically, and we don't have the cost of crossing back and forth over that C and virtual machine layer. From there, open possibilities, we will re-examine every single subsystem. There are scattered lessons that we've learned throughout the entire virtual machine, ways that we can improve it, and this is a fantastic opportunity to do that without disrupting anyone who's using the current stable supported parrot. That's it. Any questions? Oh, sorry. What dynamic languages currently run on parrot is the question. Python, Perl, 6, not Perl 5. PHP. The Ruby implementation needs help. I actually just got an interesting idea from Charles Nutter that I'm going to try exploring next month. Then a bunch of other languages like BF and OOC and, you know, like this whole, there's this whole long tail fun, and yeah, there's this long tail of languages, C-Lisp, Scheme. Basically any language that someone has thought, hey, it'd be fun to try out doing an implementation of this. We have that. We don't have Java, although we do have a bytecode translator for Java. So it's a mixed bag. The ones we're really targeting are those first four, the P languages. So the best illustration I can give you is Patrick Michaud, who's right now giving a talk on Perl 6 in another room, decided to give a Python talk for PyCon a few years back. And for this talk, he wanted to demonstrate Python running on parrot. But as often happens, he didn't actually get around to writing the talk until the night before. So he sat down, and in four hours, he knocked out about 80% of the core syntax of C-Python. Now, that's not, you know, the longer bits, you know, the harder bits, but still. A lot of that has to do with the parsing expression grammar, you know, the sort of regular expression-like syntax, and then the other parts of the compiler tools. Partly it depends on how similar your object model is to parrots. Like, Perl Python, PHP, Ruby, their object models are pretty similar. And parrot's object model is designed to provide that. From there, you end up subclassing the core object model and adding features or masking features that you don't want. So fairly rapidly, but the farther you diverge from those languages, the more likely we are. If you want, you can just create your own object model and drop it in. And that, you know, I can't predict how much work that is because it depends on, like, how many class methods your system supports and sort of how unusual your dispatch mechanism is. But we do provide the overriding hooks for all of that. So, you know, like, if you want to override dispatch, there's just one vtable function find method. You override that in your class object and it's taken care of for the whole system. So you're not really implementing it from scratch. You're just extending. At the moment, it's a simple C-switch because we did a fast prototype. Oh, what kind of instructions? Oh, okay. So, like, add, subtract, you know, the math, basic math instructions, basic comparisons. Yeah, yeah, that's pretty much all of it. And then you have things like allocate a block of memory and access a value at a particular block of memory, which is sort of like the base level of the object system. So the object system is composed from those primitives. I don't know who's had their hand up longest. Just that the Ruby implementation, the implementers abandoned it. So it's not actively maintained at the moment. But Charles is working on something I think he's going to talk about later that we might use for that to re-jumpstart it. There's an implementation of C-list and scheme. Now, part of it is a bit of an emulation. Your lambdas are objects in Parrot. But a lot of it has to do with simply not exposing a lot of the features of Parrot. And you can do that through, you know, since you only expose your syntax, then if your syntax doesn't allow access to certain features of Parrot, then it's completely hidden. And yet, your functional language can interact smoothly with object-oriented procedural languages as well. So the core virtual machine is quite fast. We benchmarked it about 10 times faster than C-Python or C-Ruby. But because we don't provide any kind of jitting currently, we ripped out the current jit as a precursor to our refactors. User code on top of it can be slow, particularly the compiler tools are slow. At least what I would call slow. Much slower than we want them to be. So that's one of the reasons for going for this refactor, is not so much for the core virtual machine, but to give the users as much assistance, automated assistance as we possibly can in having fast code themselves. It does provide support for concurrency. It currently is just using Puzzix threads and Windows threads underneath the hood, but it provides an abstraction layer above that. It's a very simple concurrency model, and there's a lot I would like to do in terms of thread-level speculation that we haven't gotten to yet. No, I have philosophical objections to CSP. But someone could, if you want to, go for it. So we've actually talked about combining Parrot with LLVM, team, Chris Lattner is a friend of mine from Portland. So the difference is LLVM is static languages, low-level, and Parrot is dynamic language tools. So the combination would be using Parrot for the dynamic compilation layer, and then having LLVM underneath for the raw implementation and jitting. Probably one more question. They do talk now. You can run, there's tests in the system to make sure that our implementations are unified to the extent that you can call a Python object from within Perl code. How much that continues to be, will depend on the language implementers. We've come to the conclusion that that may not be as much of an advantage as we thought it was, but to the extent that it doesn't cost us anything, we'll continue to maintain it. I think that's all the time we have, sorry.