 My name is Vladoreke. I'm a PhD student in the Scala Lab at EPFL under Martin Nodeski's supervision. I like to call myself the mini-boxing guy because I work on this pretty much all the time. I also worked a bit on the specialization transformation in the Scala compiler, on the Scala backend, on Scala doc, if you know diagrams or implicit members. So I'm going to talk about mini-boxing. We're going to first ask the question, what is mini-boxing? And I'd like to show off hands, who knows what specialization is in the Scala compiler? Okay, and who knows what mini-boxing is? Okay, good. So what is mini-boxing? Well, let's start from erased generics. If we take an identity method, which takes a type parameter t, we can say for any type t, we promise to return the same type, and we do so using generics. But when we compile this with Scala C or Java C, what we get is somewhat different. We get a method that takes an object and returns an object. So why is this bad? Well, let's try to do identity of five in Scala. When we compile this, what actually gets generated is slightly different. What is the identity of Java lang integer value of five? What's this? Well, it's the object representation of the integer. And why is this not good? Well, first of all, because it allocates an object on the heap. So it inflates the heap requirements. It produces garbage since it created an object on the heap. It needs to be cleaned up. It doesn't provide direct access to the value to the integer, but it needs to go through an accessor. And finally, if we have an array of integers, we know they are set one after the other in the memory. Whereas if we have an array of Java lang integers, we know nothing about them. They could be anywhere on the heap. Okay, so there is a solution already in the Scala compiler. Some of you already know it, specialization. If we were to compile this method with specialization, we would again get the version that takes an object, returns an object. But we would get other versions as well for other primitive types, such as Boolean, or character, and seven other variants. And when we do identity of five, what we get is just a call to a specialized variant of the identity method, which doesn't require any boxing. So it does not allocate an object. Okay, so this is the variant of identity specialized for int. Great, let's take a different example, tupled. And t1, t2 takes two type parameters, takes two values, and creates a tuple. With specialization, if we were to specialize both of them, both of the type parameters, we'd get a hundred methods. Why? Well, we have the first type parameter, ten variants. Cartesian product, ten variants for the second one, so a hundred. If we have function two, for example, that's in the Scala library, we use it for, for example, folds. Well, fully specializing that would take ten to the power of three, 1,000 classes. So it's not really great. Okay, can we do something better about to make it, can we do something to make it better? Well, there's an insight. In 2012, Miguel Garcia, who's, who developed the new backend for the Scala compiler, he walked into my office and he was saying, thinking aloud, you know, from a low-level perspective, there are only values and pointers to the JVM. There's nothing in between. So I looked at these values, and I looked, and I looked, and I realized that actually they all fit into a long integer. So if we could encode them in a long integer, we did not, we would not need to create so many variants of the specialized code. And thus, miniboxing was born. With miniboxing, if we compile this method, we still get the object-object variant, but we get just one other variant of this method, which takes a long integer, returns a long integer. Okay, so with this, we get only two to the power of n variants of the code, which makes the library suddenly specializable. So maybe we could specialize our collections. Now, if we do identity of three with miniboxing, we compile it, and we get something there into minibox. What's this? It's a conversion from a normal integer to a long integer. So, does it inflate heap requirements? No, not at all. Produces garbage? No, not really. It provides indirect access to the value. Well, somewhat, but at the same time, it's usually a machine instruction to transform an integer into a long integer and back. And finally, it does not break locality guarantees. Okay, great. So, now I showed... Yes, please, and if you have questions, always ask. On a seek of T. Okay, on collections. Yes, we'll see that in the benchmark sections. So, just bear with me for a few more minutes. Okay, so, now, hopefully, I told you what it is. If you have questions at any point, please interrupt me. Let's see why should we use it. Does anyone recognize this slide? Steven? He's in the room. Okay, he's there. Cool, cool work. Okay, so, this is an image library for image transformation and processing. And one of the things that is written in that presentation, I don't know if you can see it, it says, in production, no single optimization proved as fruitful as avoiding boxing. So, I figured, oh, that's pretty interesting, but why do they box? Well, pure image, the library in question, has very nice abstraction that generalizes over the input, the output format, over the pixel format, provides a collection like mapping over the images. So, all in all, I liked it a lot. And, well, I wanted, of course, to try mini-boxing on it. So, I took the usual path, which is code a mock-up of the program, try, become familiar with the problems that occur there, try out mini-boxing, and maybe extend to the whole program, which is not yet. So, I'm going to talk about a mock-up of the pure image library, not directly the pure image library itself. And I'm going to show you a few abstractions there. So, for example, we have image. With height, you can apply. You get an X, you feed it an X and a Y, and it will give you a pixel. And you can map over this image with something that takes an image and gives you back a generator for a new image. So, what's a generator? Well, an even simpler class. A width, it specifies a width, height, and the method to generate each pixel. So, speaking of pixels, why are they generic? You see here, I'm taking a parameter wrapper, representation, and I'm saying that this one needs a type class pixel. Well, let's see why do we need this. This is taken, to some extent, from the pure image library. So, suppose you have an image on disk, right? It's indexed colors, so it has a few colors, not too many. And it's the natural thing to do, to keep the size small. We load it. Maybe we'll have red, green, blue, and alpha. Eight-bit channels, each of them. And then we start transforming it. We transform once, we transform several times. And finally, start it on disk. And again, we transform it to indexed colors. So, what's bad about this? People doing image processing will probably know. Well, we discretize. We store values of the colors in eight bits after each transformation, which, with repeated transformations, will produce artifacts, meaning sudden jumps in color, which are not natural. So, one way to fix this would be to say, during transformation, it's much better to have a double precision value for each channel, red, green, blue, and alpha. And this way, we only discretize when saving the image to disk, and thus we get better quality. And these things, indexed colors, eight-bit RGBA, double precision RGBA, these are pixel representations. So, what we do in the library, we generalize over this representation of the pixels. So, this is pixel, the type class. It takes a representation, and it can give us red, green, blue, alpha as doubles. So, all the transformations we do work on doubles, while everything is represented, encoded, could be encoded differently. So, how can it be encoded? Well, we can use RGBA all encoded in an integer, RGBA extended, 16 bits per each value as a long, or we can have discrete channels, each of them either a long double or a float, and we get more precision. Okay, so far, so good. Now, I'd like just to walk you through how we can use mini-boxing on this library. Okay, so, here's image. Can everybody see it? Is it okay? Okay, this is an image. I don't have load, save, etc. I can only start from an empty image. Then we have an implementation. You can see the image is backed up by an array of wrapper. We can get a pixel, we can set a pixel, and we can map. Let's see the generator. Again, we have this interface that we promised, and we have several generators. Convert from a representation to another, scale, invert colors, and blur the image. Okay, let's see what else we have. We have pixel as promised. The only thing added here is the manifest, and RGBA, 4-channel pixel, and so on. Okay, so far, so good. And we have a little test here that I wanted to see how it works. Image.empty, and then for five times, we time a set of operations, and we just take the empty image, convert it to full pixel, invert it, scale it, blur it, and convert it back to RGBA. And time only tells us how much time it took. Okay, so let's try to build this. Building, you can't see it. Okay, it's building and built. Okay, let's run. So you can already see a set of warnings here. This is the miniboxing plugin at work. So operation took about four seconds. Can everybody see it? If not, I'll just paste these into a file so we know. Okay, so let's look at the problems suggested by the miniboxing plugin. So I already set it up. This miniboxing is a compiler plugin, but it works with the ID. So right now what you see are warnings generated not by the compiler, but the plugin. So what does it say here? The class image example, imageimple, would benefit from miniboxing type parameter wrapper since it's initiated by minibox type parameter wrapper 2 of method map. Okay, so image. Okay, let's add minibox to the type parameter. Okay, I'm going to copy this. The other warning, the class pixel would benefit from miniboxing type parameter wrapper 2. Okay, let's go to pixel, minibox. Okay, this one is the same. Four channel pixel. Okay, bear with me for a second. It won't take too long. So this one needs to be miniboxed. Let's minibox this one as well. This is the same, I think. Okay, convert to, would benefit from miniboxing. Okay, convert to. Let's give it some miniboxing annotations. Empty. Okay, let's do this. Map, finally. Okay, so I added a couple of minibox annotations following just what the miniboxing plugin has told me so far. Okay, it gave me more. Did it give me more? Oh, I didn't save, I'm sorry. Okay, so let's compile again. Okay, it gave me 45 warnings. Okay, maybe it's not a good idea to follow what it says. Let's run now. Let's just see what is happening if we run. Okay, we can see a difference, right? Let's just take these, compare them to the earlier numbers. We can see that just by adding a couple of minibox annotations, we already got the time down by a quarter. So just by following what it told us, should I follow the rest of the warnings? Maybe not, let's not waste people's time. So I'm going to do a trick which says, well, please mark all the type parameters as minibox, so you don't pester me with so many warnings. And let's rebuild it, and let's run it now. So remember, the first result was four seconds for processing. Okay, now it's one second, one and a half seconds in time. And trust me, all I needed to do is just listen to those warnings there. Nothing more. We'll go through the options. It was just to see how it works, to give you a feeling how it works. We'll go through all the options now. So how to use it? We'll talk about SBT configuration, we'll talk about the guidance the plugin gives us, and finally we'll talk about the website. So SBT configuration. Well, you just need to add a couple of lines to your SBT file and automatically the miniboxing plugin will get added to your build. I'm working on release 0.4 right now. It's still what you see here. It's a 0.4 snapshot. But with each bug, I fixed three other bugs up here, so I'm not sure. Don't expect, don't hold your breath for the release in the next few days. Okay, so once we added this, we can use the options, the cool things that you've seen. So for people transitioning from specialization to miniboxing, you have this option P minibox hijack, which will transform all ad-specialized annotation into ad-minibox, automatically. Markall is the second option that we've seen at the end, which basically says, well, all the type parameters that you see in this program add minibox to them automatically. Finally, if you're interested in how it works and what it does to your classes, you can use the log option. It will show you exactly how each class was transformed. And then we have warn and warn all, which are the cool things that we've just seen where it tells us, so please mark this type parameter as minibox in order to get maximum performance. And there's a little difference between warn and warn all. Let's see it. So when you write three colon colon nil under P minibox warn, it will just say, okay, that's the result. Under minibox warn all, it will say, hey, look, the method list dot colon colon would benefit from miniboxing the type parameter b, since it's instantiated by a primitive type. So what's this warning about? Well, it's about the Scala library, and we'll see we have a prototype where we also fixed that. Okay, so basically when you have multiple projects in your SBT file, you will want to use warn dash all. When you have just one project, you can use just warn. It's meant to not make too much noise. Okay, so we're going into the guidance with warn. Whom of you know this document, quirks of specialization? Eric must know it by heart. He probably knows them. Okay, so this is a document written by Alex, who's the designer of parallel collections. And it's basically a set of rules that tell how to obtain maximum performance with specialization. So performance on the JVM is magic. Do you agree? Yes. Well, specialization in Scala C is even worse. So for example, note the condition of specialization. Then later on in the document, use traits. Later on, don't use traits. It's like, yeah, sure. So unless you really know what the transformation is doing, which Eric knows, Thomas knows, it's very hard to make any useful prediction of how the code would run, whether it's running at full performance or maybe at half of it. So the question I wanted to see, can we do something about this? And the answer is, of course, minus P minibox1. So let's take the identity method that we've seen before. Now it's marked with minibox. If we do identity of 3, it gives us 3 back. Okay, it's probably using the specialized version, the minibox version. If we do identity of 3 as a string, okay, we get the expected result. If we do identity of any of 3, well, we'll get a message which says this. Using the type argument any for the minibox type parameter t of method identity is not specific enough, as it could mean either a primitive or a reference type. Although method foo is minibox, it won't benefit from specialization. So that's basically hand-holding you into getting it right. Okay, so what if identity was in minibox? Let's take this example. foo of t returns a t. It's just like identity not minibox. Bar calls foo of t. We're just defining these in the rappel. I figured not to use the rappel, but we can actually do life coding with these. So if we do bar of 3, again, we get a warning, which says the method bar would benefit from miniboxing. Okay, let's minibox bar. So we start from foo again. Now in minibox bar, what does it say? Well, it says the method foo would benefit from. So you see what it's doing. Can you spot what it's doing? It's propagating this information from the final use site all the way up to the definitions. Okay, so if we do minibox foo minibox bar and bar of 3, it won't complain and it will run the specialized variant. What's this all about? Well, there is the notion of optimized trace, which basically says, well, one minibox method or one minibox data should call and call and call other minibox versions. And we have three patterns in the code. Initiators, when you do, for example, bar of 3, it's basically a point where it can call a specialized variant, a minibox variant of the method. Propagator, which just calls another minibox version if it's available and inhibitors somewhere where it stops. It goes generic and you've lost all the optimizations. So there's a full tutorial on the website and you can read all about this. Okay, so this brings us to the next topic, which is the website. How many of you have seen this website? Okay, not too many. Maybe you can visit. It's scalaminiboxing.org. And there, if you look at the menu on the left, you have tutorials teaching you how to use it, how to try it out, how to set up your SBT config. You have a lot of examples and support. You have a Twitter, you have a mailing list, so pretty much anything that you can possibly want. So let's look at some benchmarks. It didn't just write itself, right? I wish I could automate it, but... Okay, so let's go to the link list. This is work... Okay, maybe others have helped, so maybe let's put it this way. So this is joint work with Amariki Genet, who's a student from EPSL, and he developed a mock-up of the Scala link list. As I told you before, before trying to minibox something big, the miniboxing plugin is still in a beta state, so we try not to bite too much. Just bite a little and try to optimize that. So he implemented, well, not exactly a little. He took function 1, function 2, 2, 2, traversable, traversable-like, and all these things you probably hate about the library. He implemented them using miniboxing, and he tried a benchmark called least squares method, basically taking a set of data points and fitting a line through them in the best way possible. So the results are here. You can see miniboxing and generic, and miniboxing, if you consider an infinite heap, we did give the JVM a lot of gigabytes of RAM. The library was... The minibox version was 1.7 times faster, but it's more interesting when we give it a limited heap, so garbage collection actually starts to weigh in, and there it's three times faster. I'm not talking about an entire collection library, I'm just talking about lists, Scala.list. Let's talk about Spire. Here we have a benchmark. It shows miniboxed, specialized, and generic side-by-side. So if you look at miniboxed, in the best case, worst case, it's pretty much the same, and it's in the same ballpark as specialization. There are also benchmarks where it's not doing that great of a job. Miniboxing is twice as slow as specialization, but of course it's four times faster than generic, so it's still a good trade-off. Sorry? Yes, that's one reason, and the other reason is that arrays are not optimized that well when they are miniboxed. I'm working on that right now, but we can take it offline because it's pretty deep. Okay, so let's see on the bytecode. If we talk about the bytecode, well, this is generic. This is for the link list, by the way, so we have function to that thing that exploded into a thousand classes. Generic, miniboxed, and specialized. Okay, if we talk about Spire, which is optimized for specialization, it's really, really carefully crafted to not include anything unnecessary. Well, we still generate three times less extra bytecode to specialize Spire. So, let's conclude, and then I'll take questions. I have to thank a lot of people who helped out. Unfortunately, you probably can't see them because they're all on the website. And if I'd like to give you a brief overview of the things, if you're interested in deeper discussions and understanding of the miniboxing plugin, there are certain components that you might like to read about. So, there's a Scala-based talk about the encoding itself and the class transformation. There's also a paper if you feel like going into 20 pages of deep material. There are other resources on code transformation. There's UPSLA talk and the paper. And this is pretty interesting because it's related to value classes, to multi-stage programming. So, pretty much a lot of crazy ideas that are going on. Okay, and there are many other considerations. The way we encode functions, the quirks of miniboxing, which I presented two weeks ago here at PDX Scala. There are also other collections, how to specialize collections and so on. They're all on the website. So, the website is the one place to search for all the stuff. Okay, so with this, I leave you with the miniboxing sign, the website. Thank you very much, and I'll gladly take any questions you might have. Okay, good question. Why wouldn't you put it all over your code? Well, you can do it. In some cases, you can do mark all, which will do that for you. You don't even need to do it manually. In some cases, some things might be slower if you push miniboxing into them. For example, something that is typically only used for objects or for strings. You might actually do a bit of damage by trying to run it miniboxed. And furthermore, it's also something of control. You tell the compiler which parts are critical for your code so that it doesn't explode and create, for example, too many classes for this. Okay, we have one. Any issues with compiling a bit of it with miniboxing and just isolating that and compiling the rest of it without miniboxing turning it on? So miniboxing is a little sneaky bastard on this one. So once you made a step to compile with miniboxing, what will happen is that the code is not compatible anymore to be compiled without the plugin. You can have, for example, something that's been compiled with the standard compiler, which is the library, and then add something that's been miniboxed, but not the other way around. So once you've made the step to use miniboxing, all the code that uses the code that you've already compiled will need to have miniboxing. And there's a very technical... I mean, it's not something that I do to take over the world here. It's something that's very technical. It's in the Scala... I explain exactly why. Basically, we transform a class into a trait, which gets compiled to an interface, which doesn't make any sense to instantiate later in the code, which is the code that Scala would generate. Okay, so have I answered your question? I saw a question there. Well, that's a very difficult question. Merging it to Scala means having to abide by a set of rules backwards compatibility rules, versioning rules, release schedules, which I think right now it's best if it stays a plug-in and we can release as often as necessary. And when we have a problem, when somebody notices a problem because there are still problems in there, we can quickly roll up a release for that to fix it. So I feel much comfortable for it being a plug-in for now. I mean, we also had the problems with specialization when it was merged. People saw it as a compiler standard feature and expected some stability, and it actually had a couple of bugs that made using it very hard. So it's best if it stays as long as possible as a plug-in. Yes? No, because the Scala library does use a lot of crazy features of the language, and I'm not sure all of it can be transformed. To be honest, the last time I had a chance to do it, it didn't work. I think now I'd actually give it another shot. It might work, might not. But in principle, that's where I'm trying to get. I mean, it's hard work to debug all the problems in the compiler, but I'm progressing. Yes? For things like Function 1, have you noticed a performance difference or a difference in ability to get in line? I mean, I assume long is your best case because you don't need to necessarily do any transformation on the way in or way out. So first of all, you can notice a difference there between specialized on long versus dangerous ways. And second of all, have you thought about are you using static methods to do those transformations if you first thought about actually inlining the shifts and maps and stuff? Okay, so let me answer the first question because it's easier and then I'll take the second one. So first question is Function 1, does it work faster when it's for long and not integer and so on? Well, unfortunately, Function 1 is specialized in the library. So there is basically no way that miniboxing will cooperate well enough with specialization and will use the features in specialization. So what we do is, it's shown here, it's explained here. There's an encoding. There's a transition from minibox from Function 1 to minibox Function 1 which has the correct accessors and the correct apply method to use long. And this is done on the code, but it's done in a way that it doesn't incur an overhead each time you call it, but each time you define a function or pass in a function or pass out a function. And this is done with an LDL cycle with this transformation. But it's probably more involved, so you can just have a look at this document. For the second talk, for the second question, they're defined in transformations from one primitive type to long and back. They're defined in a Java file because having an object, a Scala object, incurs an overhead and prevents the JVM from inlining in the JIT compiler. So they're defined as static methods in a class directly in Java, written in Java. Okay, so any other questions? Good. I have a little surprise towards the end. I'd like to fill you in on the projects in the Scala team at EPFL. That's the Scala staircase, in case you haven't seen it before. And just to give you an idea of the problems, the more researchy problems that are happening, that are being right now under scrutiny in EPFL. So I'm just going to start in office order on the hallway. So no specific order. Don't ask me. I just picked one office to start from, and I go through all. So in Yang, it's a multi-stage execution. Maybe some of you know it. I won't go into too many details. It's just a way to partially execute your program and burn some of the abstraction layers, such as collections and having higher-order methods. And Yin Yang is a front-end for this, based on macros. I try to put the URLs here. I'll also give you the slides so you can just access and contact the people in case you're interested. Scala.js backend. It's a backend for Scala that generates JavaScript. This is Sebastien and Tobias. They have a website, actually. Then there's Lightweight Modular Staging. It's pretty much... It's all about optimizing program, burning through layers of abstraction. It's T.R. Karomf, who's the main lead for this project. He's now a professor at Purdue, but this idea involves pretty much everyone in the lab. Everyone has worked a little bit with this, played around with this idea. Dependent object types calculus. This is the core type system for a Scala-like compiler. The part, the theoretic part, is Nada, Minty, Arkronf, Samuel, Grutter. And these guys are trying to formalize a type system, and with all the proofs and soundness proofs, basically. Then there's Pickling and Spores. This is support for distributed programming. Heather Miller, Philipp, and others have worked on this. Very interesting work. Stage parser combinators. Manu Har was here in Portland three weeks ago to present this at a conference. You probably know parser combinators. They allow you to write a grammar and parse codes, code or different types of strings. But they're dead slow, unfortunately. This is why nobody uses them in Scala. With multi-stage programming, he was able to speed them up beyond handwritten parsers with jumps and go-tos and all the non-principled kind of programs. This is generated from a very high-level specification. Doty compiler. This is a compiler for the Scala syntax using the dot-type system. This is Martin, who's mostly working on this. Dimitri and other people in the lab. Maybe this will solve some of the problems Steven pointed out in the early presentation on types and pattern matcher. Hopefully, at least some of them. Scala meta. Eugene should have been here talking about Scala meta. Unfortunately, he didn't get a visa to come to the US and present, so I took his place. He's working hard at work right now on this meta-programming support that involves radically changing everything regarding reflection, macros, the way we store Scala code. It's a pretty big project. Scala Dino plugin. This is a plugin meant to give Scala a more dynamic language look and feel. If you type something that's wrong, it's not going to crash compilation. It's not going to prevent you from compiling the program, but it will turn it into a runtime exception. So pretty much like you would write a Python program, some of the paths through the program will work correctly. Others might throw an exception. This is Manny Cedric, who's there in the audience. Okay, that's the guy. In case you're interested. Mini-boxing, I just spoke about it. Scala Blitz. This is Dimitri, speeding up collection operations. Macro-based. There's LMS Scapa. It's a protein simulator by Sandro Stucki. And Odds, a framework for probabilistic programming done by Sandro. And these are very interesting and very deep projects with very deep theoretical foundations. Then there's Type Debugger, when you get a type error in the Scala compiler, and you look at it and you're like, what? Say what? That's the kind of tool that helps you. It's Hubert who's developing it. Then you have the benchmarking framework. Some of you might know it, Scalameter. It's kind of a Google caliper for Scala. We are working on a vector implementation using relaxed ray radix trees. Okay, I completely forgot what RRBIT stands for. Relaxed ray... Okay, and this is basically improved performance for vector implementations. Okay, so this is a roundup of what's being in the works right now. In case you're wondering why do we... What do we do at EPFL? Okay, I see a question. The Scala staircase. Is the logo on the staircase? Good question. There was a logo that was kind of a spiral something. They built the stairs and then Gilles Dubochet designed the logo after the stairs. It's a mixed thing. It's definitely not the staircase after the logo, but there was the logo before, somewhat similar. Okay, thank you very much.