 So, our next speaker will be Carl Miller from Google and he's going to talk to us about Row High Performance R. And there he is. Thank you. Okay, great. Okay, thanks. So, I'm talking about the Row Project. While I come from Google and Google pays me to work on this, this isn't a particularly Google project. This is something that's very much intended to be a community project. We certainly openly welcome pull requests on GitHub and contributors from anywhere. So, what are we trying to do? So, we're trying to build a fully compatible open-source implementation of the R language. And we want this to be a real project producing real software. I'm in part because Google wants to run it on our own systems, running our own analysis, and all that sort of thing. And the thing that we want to differentiate it from the R interpreter that the R core produces that everyone's been using is primarily on performance, improved memory use, and better scaling. By better scaling here, I'm not talking about the data synthesizers, but more to be able to handle things that fit in the memory reasonably well. Sort of medium-sized projects. Additionally, we want the code base to be something that you can easily understand, something that can be maintained, something that can be improved. For those of you that have read the R interpreter's code, there's definitely a steep learning curve to that. It's short on comments. And this is showing up in that we're starting to see concerns about sort of what's going to happen when some of the current people retire, who's going to replace them. And that's a real serious concern. Find out later, just part of this performance emphasis, that we really don't want R users having to rewrite their code in C++ because it's slow. It's not a good use of an analyst's time. It should be analyzing data, looking into that, writing new methods, rewriting something you've already written doesn't make a lot of sense. Furthermore, it's error-prone, and it then makes it harder to make continued improvements upon that code, both for yourself and for whoever follows after you. So that's not a great solution. We love RCPP. It's a beautiful thing. But if you're diving into it all the time to rewrite code that you've already written, something's wrong. We like to take away the need to do that. In terms of how we got here, so back in 2007, Andrew Runnels, who's a professor of computer science from the Kent University, wanted to add some features to R. And so he jumped into the code, and he found himself already described as being a strange and unusual place with code that didn't really resemble anything like what he would teach his students. And so he started up a multi-year project to refactor the internals of R into the sort of C++ that he would teach his students to write. That was clear, easy to understand, well-commented, and easy to maintain, easy to improve. At that time, performance really wasn't on his radar. In fact, the code was substantially slower while working a lot of these things. Back in 2015, the emphasis has now switched to having performance be the top priority with everything else still being there, but not as core as the performance. So to make R fast, the first thing we need to understand is why is it slow? And there's a lot of reasons. The first one, which is probably the most important, is that R core that people have been maintaining R have very rightly focused on the things that it needs to make R be useful for it to be a powerful tool for doing statistics. That's absolutely the right thing for them to do. I don't think anyone would disagree with that. But as a result, performance has suffered. There are more reasons. So one of them is historical. So development of R began back in 1992. And back in 1992, if you had a brand new Mac, it looked like this. And if you had a PC, it was something like that. And the issue here is not that they were boxy or old or something, but the way that computers performed, the performance characteristics then were very, very different than what you see now. In particular, the time that I took to get a piece of data from memory was about the same amount of time as I took to take that data and do some basic calculations on it. That unfortunately is no longer the case. In the intervening years, the processes, the ability to do calculations, has been sped up by a factor of hundreds. Our ability to pull data from memory, a random read, may be 10 times faster than it was at best. Not a lot faster. And as a result, code that does a lot of these sorts of memory reads, which back in 1992 was a very sensible thing to do, because they were fast, comparatively. But code that does that now runs very slowly, not that much faster than on a modern computer than if you were running the same code on one that you bought in 1992. So are we doomed? Well, not really, because in 1992, Python was being developed as well. And Python's performance is not bad. The Linux kernel dates from about the same time as well. And pretty much everyone agrees that's fast. Free BST, which your modern Macs are based on, comes from well before that time. As does the GCC compiler that you compile your R with. These are all much older, and they've made this transition to work well on the computers that we have today. So we need to do some of the same things. So in the internals of R, there are lots of data structures on the internals that have this issue. The most common one's the con cell, which it implements a linked list. If you've seen the R internals, you see cons and lang and list everywhere through there, it's all slow. Similarly, the environments that you store your variables in, that your package code goes in, the namespaces, it's all built on top of these things. Similarly in R, there's no scale of values. If you have a value of three, R is going to allocate a vector on the heap of size one. And that's really inefficient. And again, it's just lots more of these very slow memory accesses. So we need to improve these. The good news is this is fairly straightforward software engineering. There's no rocket science here. There's a bunch of work, but it's work that anyone with an undergraduate degree in computer science can do without serious difficulty. Of course, there's lots more issues as well. R has this beautiful idea that everything in R is a function. And it's very elegant. And you can redefine all of your functions. You can change F so that if the first thing is false, you go into the statement that would confuse everyone who read your code, but you could do it. You can redefine addition, multiplication, assignment even. If you feel really nasty, you can take parenthesis. And you can have it every now and then at a small random number to the value. Imagine debugging your algorithm when just randomly the numbers move around. You can do it. And so because of this, the R interpreter, then, every time it sees any of these symbols, it has to lock them up. It has to find out, has it been redefined? What is the value? And then it has to call the function. And languages that don't do this, parenthesis and braces, don't generate any code at all. They just disappear at past time. But now we have to have these things. We're always doing these lookups. We're always checking things, recalculating things, rechecking things. And it's not just here. It's all over the place. When you call a function, there's an argument matching that goes on to match the arguments that you passed to the ones that the function expected. Now you could pass them by position, by name, by part of the name. There's defaults. And there's a big complicated algorithm that does this. And every time you call that function, that same function call, it recalculates what arguments go through our positions. Let's slow. What else is going on? So some of the things that people will commonly talk about that cause it to be slow, is that it's an interpreted language. There's no type declarations. Type declarations matter because if you have a simple statement like A gets B plus C, if you don't know the types of B and C, then you have no idea what kind of addition you want to use. I have many, many different kinds of addition. You can add integers to integers, integers to doubles, all the different types, and then scalers to scalers, scalers to lists, lists to lists of the same size, lists to a list of a different size, all the permutations are out there. And we don't know which one to use. We have to work that out. And that is time. In addition, as I said, there's no scalar types and things are single threaded. So again, are we doomed? The good answer is no. JavaScript in 2007 was considered a slow language much the same as ours now. And for many of the same reasons, it's interpreted. It has no type declarations. It doesn't even have an integer type. And if you want an array, you've got to create a dictionary where you just sort of say, well, the value for one is going to be this, and you sort of throw values in and they don't have to be contiguous or anything. You could have one, seven, nine, and that's considered just as valuable as one, two, three. There's no special support for a contiguous set. But what's happened in the intervening of the years is that JavaScript has become very fast. A lot of software engineering effort has been put in by the V8 team and by other JavaScript engines as well now that eliminate many of these different advantages. We can take interpreted code and we can compile it into native type code at runtime. We don't need type declarations anymore because it turns out that every time you do that same addition, you're passing in the same types in 99% of the time. So we can watch that, we go, oh, we know what's going on. While integers and arrays aren't in the language, you can add them to the implementation and they've done that. So there's a lot of stuff that we can do to make R a lot faster. There's optimizing internal data structures, optimizing the way that you do function calls, optimizing the way that memory management works. We're currently working on adding scalar types into the internals of row so that when you do one plus one, it's a very quick operation that can be done in the CPU's registers and it's in just a couple of clock cycles, much faster than the current system. And one of the core ideas in this is that all these lookups and checks and things, most of them are either unnecessary or the same as last time or you can assume that it's what you want it to be and then just sort of have a check in there. And then when your check fails, you go, oh, the thing I just did was wrong, okay, back it up and do something slow. But that doesn't need to slow down your main path. As long as things are sane, as long as parentheses actually means parentheses, you can just go ahead. And that speeds things up a lot. One of the things we then want to do once we've sort of got all this stuff happening is take this all the way down to native code. This is something again that the JavaScript and Lua and Java engines all do and something we want to do for R as well. At this point, we have a proof of concept just in time compiler, but being proof of concept, it doesn't actually make anything faster yet. So what about vectorized code? The optimizations I've been talking about a lot now work well when you're writing type loops or doing recursive functions and things like that. Vectorized code is notionally good at. But only so much. The problem is that, suppose we're doing a simple calculation, calculating the log life in code of a simple binomial model. The way that R does this, and it comes down to the memory bus again, is that the data has to come from memory, so we're going to take all the data in P, bring it over to the CPU, calculate the logarithm, and then send it all the way back. We're going to store log of P. And then we're going to bring log of P back one at a time along with Y so that we can multiply Y by log of P. And then it's going to go back and cross. And the data continually goes back and forward across this bus between the memory and the CPU. And because that memory bus is slow, the CPU is sitting there idle. You've got this brilliantly fast CPU, 64 cores. But only one of them is doing anything and that's only part time because it just can't get enough data. And so we can improve on this. One of the things that we're looking to do hopefully over the next year is that when we see a code like this, we can go, hey, I can rewrite this. And I can take all the, a single element from Y, a single element from P, do the entire computation and then send the result back or better still keep it around and just add it to the sum. And then by eliminating all that memory bandwidth we go, we can get a lot, lot faster, 20 times easy. And we can potentially think about then making it multiple threads. Right now there's no point in making it multiple threads because you've only got one memory bus. And so you just have more CPUs sitting around waiting for something to do, waiting for their data. Okay, so where are we at right now? So performance wise, we've caught back up and overall we're doing a little bit better. On scalar code, we're typically a bit faster than the interpreter, but not as good as the byte compiler yet. For recursive functions we're quite a bit slower or you need to work out what's going on with that. For vectorized code we're currently twice as fast. Obviously, as I just said, we want to get more to 10, 20 times faster but we're not there yet. So we would like to get to is at least 10 times fast. We believe this is completely obtainable based on what we've seen to date. It's just a matter of doing the work and the engineering. And I'm hoping to start using this internally at Google this year. There's a bunch of work that needs to be done on compatibility, making sure there's absolutely no bugs and all that sort of thing. But that makes a great testing ground. Being internal to a single company, there's a lot less variation of platform. It's much easier to debug and work out issues. And then if that goes well, then hopefully we'll be looking at a general release sometime next year. We have a substantial team, I'm running out of time. Our code, as I said, it's descended from R-Core's code and it still maintains that link. So anything that R-Core does, we get automatically. It's literally as simple as a git pull. Andrew Reynolds, as I mentioned, did a lot of the early work and refactoring. There's a team led by Jan Vitek from Northeastern University that's working on generating a more efficient bytecode compiler and optimizer. We'll allow us to make a lot of optimized code that does a lot of these things where you're looking at the code, trying to make assumptions, trying to work out how to change it to be more efficient. And then we have a small team at Google as well that's working on the runtime, the memory management, the object representations, things like that to try to make that efficient and also then to take the bytecode and convert that to native code. Our code's up on GitHub. You're welcome to check it out. As always, if you're interested in contributing, certainly get in touch with me, send emails. Thank you. So would something like deprecating, being able to redefine parentheses help performance? And I guess the follow-up is maybe for members of R-Core, would that be like a reasonable thing to do for base R as well, for new R as well? So the short answer is yes. The point at this point which have not yet implemented is to say that either you're not allowed to do that and actually enforce that at runtime, so we can detect if you try to redefine parentheses and if you do that, we don't wanna give you bad answers, but we can say, hey, we're not gonna go there. Right, that's too bad. The other option would be if someone tries to do that, sort of turn all the optimizations off. Given how little that's used, I'm kind of thinking of making it an error, but I need to do a little bit more looking into that first. And likewise for sort of a lot of the other very common keywords that get used a lot. Thank you. Thank you.