 Hi, title of my talk is How Avoiding Arguments Safe Today and also Increase Benchmark Performance by 10%. So I work with Aerospike, we don't need to know much about Aerospike, just briefly it's a, it looks better, okay. One this cable seems to work better. So yeah, I work for Aerospike, it's a noise scale high-performance key value store but I'm not going to talk about it much. You want to find out more about Aerospike and how you might use it. Come and talk to me after this talk. So about a couple of months back I was working, so I maintained the Node.js client for Aerospike, so I don't work on the actual database itself, I work on the client side. And a while back I was working on a major new version of the client, version two, and kind of had two, two big goals for that version. One was performance improvements. So the Node.js client just wraps the C-client, so it's a thin wrapper basically, and the C-client had recently added an API that enables asynchronous I-O, whereas the previous version was just using synchronous I-O. So one of the main goals was to switch to the asynchronous API to improve performance and obviously asynchronous I-O is a much better model to use, especially in Node.js where everything basically happens as in the previous version. The other big goal was to improve usability and maintainability. We mentioned that the Node.js client is just a thin wrapper around the C-client, and that was kind of leaking through the underlying C-client and the API was leaking through, and we didn't really have an API that was native to JavaScript and Node.js. One of the main complaints that we got from lots of users was that we didn't really follow the typical Node.js error-first callback style, where the first argument in a callback is usually an error argument, and it's usually not or undefined if the operation was successful or it contains some error code or an error object if it was an error during the operation. We also wanted to shift a bit more of the logic from the C-layer into JavaScript, make it easier to work with it, also encourage more contributions. A lot of people came to our Node.js client and saw it's mostly C and kind of stepped away and said, okay, I don't really know or see. I can't do much here, so that was another goal to kind of make it easier to work with each client. So a couple of weeks in, we had made good progress on the performance side, switched out the asynchronous IO for the asynchronous IO, and in my dev box I saw a pretty significant increase in throughput from around 55,000 transactions per second. That's about half-half read and write operations to about 80,000 transactions per second, just by switching from the synchronous to the asynchronous IO model. So, yay, everything was good. He increased about 40% in throughput. Goal number one was taking care of. In the meantime, I was also working with an extra contributor who had kind of taken a first step or first step at the kind of taking the main complaint with our existing Node.js API and had begun reworking the callback mechanism. And I kind of took over from that. Everything is to blame on me and not on the contributor. But so in the V1 version of the client, the client requests were basically passed directly to the C or C++ layer and callbacks were coming from the C code directly into the application code. And because it was kind of like C and as I said it kind of leaked more C-style code through this way. So the first argument that was passed back to the JavaScript application was always a status object, which had a status code in it which would be zero if the application was successful or if the operation was successful or it would contain some other status code if the operation was unsuccessful. So I mentioned earlier already that's not really what Node.js client developer would expect in a callback. The first argument is an argument that is usually now and doesn't contain a status object. So now it codes the application developer was expected to take the status object, look for the status code inside and check if it's zero or not to tell whether the application, the operation was successful or not. So it was kind of a big stumbling block for a lot of application developers who were not expecting this and who were expecting our callbacks to work like any other Node.js callbacks from the file API or from other database drivers. So what we did was we wanted to change this. We didn't want to touch the C layer too much. So what we decided to do was add a callback handler which takes the callback which comes from the C layer with the status code. It does that check whether the status code is zero or not and then makes the actual callback to the JavaScript application and passing none of the operation was successful or if the operation had an error converting the status code into a JavaScript error object and passing that back. So it kind of looked like this here. You can see one of the client methods would be to get a record from the database for a given database key. There's a parameter checking going on here but then basically we call into the native C library, pass the parameter and we have a separate callback, JavaScript callback, which gets called from the C client if the operation has completed. We then pass the actual application callback from the application that's using the error spike line library and pass that together with all the other information that we received from the C client like the status object and the actual result of the database operation. And in the callback handler, we do the check. We check whether the error code is zero and if it is, we call the callback that's passed by the application. If it is not zero, then we convert the status of the error code into a JavaScript error and throw it back directly to the JavaScript application. Okay, so fine. Now from the application's perspective, our application, our client, was behaving like a proper Node.js API with the proper error first callback semantics. But when I went back and rerun my benchmarks, what I found is that I had lost quite a bit of the progress that I made from switching to the Eastern Prince API. So instead of reaching nearly 80,000 transactions per second, I was back to only around 70,000 transactions per second. So I lost about 10% of my performance. That was clearly not acceptable for such a small change in how the callbacks were done. And after a while, we did trace it back. So it was became clear that really the callback handling was the cause of this slowdown. The other changes we had done didn't really affect performance, but just introducing this extra callback to this callback handler caused the application or the client to lose about 10% in its throughput. So that was not acceptable. So as it turns out, maybe I should go back. Didn't really go into much detail here, but what happens when the application, when the operation is successful, when the error code is zero, so as you can see, the actual callbacks from the C-plines contain one or more result objects. And we need to pass those objects back to the client's application, to the user's application. But we don't know exactly how many. There are some database operations that return just a single object. There's others like this get call that return three different objects. So in this callback handler, I had used the special arguments object, which kind of contains all the arguments that were passed to the function. So even though in the function signature, I only declare the callback and the error parameter through the arguments object, I can access all the other arguments or pass it to the function. I didn't care about the first two arguments because I already explicitly dealt with those. So I just wanted to have all the results of the database operations. So I use array prototypes, prototype slice, kind of convert the special arguments object into an array that I can then pass into the actual application callback using apply. So every point of the slice converts the arguments object into an array. I set the first parameter to null because the operation was successful. And then use callback apply to call the application with the result of the database operation. So I'm trying to figure out what was wrong. Why is this simple code causing so much of a performance degradation? I pretty quickly came upon this tricky maintained by the Bluebird project called optimization killers. And specifically, they were talking about how using the special arguments object can cause a function to not get optimized by the V8, like the JavaScript engine that Node.js is internally using. That's the V8 engine and it has an optimizing compiler. So it will try to figure out if see if it can optimize the JavaScript code to gain better performance. But there are certain code patterns which prevent the compiler from optimizing a function. And as it turns out, using the arguments object is one of the ways how you can prevent a function from getting optimized. So the Twiki lists a number of specific things that you're not supposed to do with the arguments object. That's what it says. There's only a few things. Basically, you can only access it via the square bracket operator accessing individual items one by one or using the special length property to determine the length of the arguments. So it's kind of the only safe operations. So I kind of had to figure out, okay, what are the alternatives? How else can I deal with the unknown number of arguments that I'm getting with it and still pass all the arguments but without using the special arguments object? And hopefully gain back the throughput that I lost by using this. So there are a few different options that were either listed on the Twiki itself. Also, the Mozilla development network has an entry specifically about the arguments object and two of the alternatives that the Mozilla development network page was giving me is to either just iterate through the arguments because that is kind of safe. It's an area like object and iterating through it by with a for loop, basically. That is considered safe. It does not cause V8 or that V8 can still optimize the function. The other one is to use the despised area constructor. Those are not my words. That's what the Mozilla development network had to say about the area constructor. So I was using area slice. That's kind of the original approach I had taken, which kind of seemed like the easiest or cleanest way to me. And then, of course, with ES6, there's also a new spread operator, which kind of means you don't have to use arguments anymore. You can have a parameter in the function declaration, which takes a variable number of arguments and puts them into one area under the hood. And that also allows you to kind of spread it out again when calling another function with this area. So I'll go through these in detail and kind of look what the code would look like if we're to use each of these alternatives. And then afterwards, basically, you can look at whether V8 is able to optimize any of these different functions that we come up with. And then I asked if I'm going to run a benchmark to see which one is the fastest. So this is kind of, this is the original approach that I had. We were slightly to make it fit better on the slides and be bigger. So again, we get the callback and the error argument. If the error code inside the error object is, is unable to zero, then we've got an error and we return, we're done. So if not, we need to extract the result of the database operation and pass it to the JavaScript callback that we received from the application code. And the original approach was to use the every prototype slice method. So we know already, this does not perform very good. The second approach is using the despised error constructor. And I'm not sure why it's despised, but that's what was it calls it. And basically, this means we're calling every itself as a function as a constructor, passing the arguments objects. And that will return to us a real proper JavaScript array. So then we have to remove the initial callback argument, which we don't want to pass back to the application. We set the first argument, which is the error argument to now, because the operation was successful. And then we can send it or pass it to the to the applications callback. And our winner. The second alternative approach that was also suggested by Missoula is to iterate over the arguments object because it is after all an array like object and can iterate it like an array. And we can do that and construct a new proper array. Using a plain old for loop. Again, we skip over the first two arguments because we already have dealt with those. We set the first argument null, which is the error object. And we call the callback function. slight variation here. As I was playing with this, I was curious whether there are other kind of patterns that have performant that perform better or worse. So the only difference here is, instead of assigning to the array, to the new array, which were constructed by index. Here I'm using push to push new arguments as we get them into the array. So the same thing basically does the same thing, just use different methods. And yeah, well, it'll become clear in a short while, why I was specifically trying out these two code patterns. Then I mentioned the new E6 spread operator. So here we actually add a new parameter into the function signature using this so-called spread operator. And what this basically does in case you're not familiar with the new E6 syntax, it basically allows you to take all the remaining arguments that are passed to the function. And they'll all get built into an array object for you. So inside the method, arcs is a real proper array. It's not this pseudo array like arguments, it's a real proper array. And it can be used like an array. And it contains all the arguments that are passed to the function beyond the initial two. And then when we call the callback function, we can use the same spread operator again, kind of unroll the array and to pass all the individual elements within the arguments array into the callback function. So this looks really nice. The code was clean. The only problem with this is our ErrorSpark Node.js library is not a standard application. We're providing this to other people to use in their applications. And not all of our customers are in the E6 yet. So we're not really free to use this spread operator, even though it would make the code much clearer, much nicer. So I included this here because I was interested to see how it would behave performance-wise. But it was not really an option for us to use. So after trying out all these different methods, I thought, is there any other method I could use? Any other way to avoid using arguments? Are there any other constraints or specifics to my specific use case? And yes, actually, it turns out yes, because I know at most I'll get three parameters. Some of the data was called one or two or three results, but I don't have unlimited number of result objects to deal with. So what if I take a very naive, simple approach and avoid using variadic arguments and just say, OK, I'm going to take argument one, two, and three, and I'm going to pass it to the callback function? Because after all, in JavaScript, if I call a function with less arguments than expected, the remaining arguments are going to be undefined. And if I call the application's callback function with some undefined argument, and the application doesn't expect the argument, nothing happens either. They'll get an extra argument, which is undefined. And why not try this? It might not look very pretty, but hey, it's the simplest approach, sometimes the simplest approach is the best. So let's talk about the optimization. So I already mentioned that some of these patterns prevent the V8 engine from optimizing the code. It'll still work, but it won't work as fast. So basically, the V8 has two different compilers. And if the optimizing compiler is not able to handle a code, if there's some code patterns that prevent it from using the optimized compiler, it'll fall back to the normal compiler, the unoptimized compiler. And the code will still work. It'll just not be quite as fast. Within nodes, there's a few special functions. One of them is the get optimization status, which allows us to find out whether V8 is able to optimize the code within a given function. So we pass it a function, and V8 will tell us, is this function being optimized? Or is there something in a function which prevents it from being optimized? And it returns this status code, and I've listed down here, if it's one array, the function is optimized. If it's two, it's not optimized. There's a few other status code, always optimized, never optimized, maybe deoptimized. In practice, I haven't encountered these, and I can't talk at much authority on what the code's three to seven mean, but the first two are definitely interesting. And those are the two that I'm going to be looking for to see whether my function is optimized or not optimized. There's a few more things we need to do to kind of get to a state where V8 is able to optimize it. So it doesn't optimize every function the first time it happens or it gets called. Because oftentimes, especially in JavaScript, we kind of just create functions on the fly dynamically, and you might never encounter the same function again. So we need to call a given function these two times. It'll go through a few different steps within V8. And we also need to call this other special function optimize function on next call, which kind of gives V8 a very strong hint, hey, I really want this function optimized. Try to optimize it the next time you encounter this function. Otherwise, it might take a bit more longer for V8 to finally decide whether it wants to optimize the function, depending how often it's being called, et cetera, et cetera. Okay, so we're just going to run this code basically. I've put all the different callback handler functions in a separate file. I'm just going to iterate through them and see if they get optimized or not. So this basically is small. The same code again, I just split it into two just now, but it's the same code which determines the optimization status and which iterates through the functions. And we need to pass a few extra parameters to the Node.js process in order to be able to determine a function that's optimized or not. So there's these other extra parameters, trace optimization, trace deoptimization and allow native syntax. This is the parameter that enables these special functions with the percent prefix. So if you don't pass this parameter, these special functions, and I'll recognize them, we won't be able to determine whether a function is optimized. And this Node will spit out or V8 rather will spit out a bunch of other more or less interesting statements while it goes through this process optimizing. I'm just going to grab that all out because I don't really care about that I just want to know is the function optimized or not. So here are all our different functions and we see some of them are green, which is optimized, and some of them are not, which is not so good. So this was the original one, which uses the array slice operator. And as I originally determined, it's not optimized or it's not optimizable. The array constructor can be optimized. That was one of the other options that the Mozilla page had listed as an alternative. The other one is to iterate over the arguments. And remember, this was the first iteration. This was where I used every square brackets index to assign the values to it. This method gets optimized. The other one where I'm using push to push the argument into the array one by one is not optimized. If anyone can tell me why, I'd be happy to figure out why. But somehow using the push method causes VA to not optimize the function. But the other iteration function, which uses the square brackets to assign values to the argument, that one is optimized. The spread operator is also not optimized. I was quite surprised by that to see that the new ES6 spread operator would cause VA to not optimize the function. I'm not sure if that's because maybe it's not fairly new. Maybe they haven't really figured out how to best optimize it or if there's any other reasons. But at least for now, in the latest version of Node, I'm using Node 6.4. That's the current stable version. As of now, the spread operator is not optimized. And the last one, not surprisingly, the new function which doesn't use arguments at all, which just takes arguments and pass them on, is optimized. No surprise there. Same thing here. Okay, so that's what I mentioned, Mark. I mean, whether the function is optimized or not optimized, it's good to know. But in the end, the only thing we really care about, how fast is it? Is it going to give you back those 10% and throughput that I lost earlier? So I'm using the benchmark module here, creating a new benchmark suite using the same callback hamper methods. For each of the methods, I'm just creating a new entry in the test suite to run a benchmark on that function. Again, I'm passing the same parameters, callback function, which doesn't really do anything. My C-style status object with the status code in it, and response object. And then I'm just going to run the suite, and the rest is just output basically any time one of the test functions gets called, and print out the result. And at the end, I can look at my suite and filter for the fastest result, and print that out, and we have a winner, hopefully. So again, this is the same benchmark code I was just showing, as you know the difference here. And this time, I don't need to have to pass any special parameters, I just call it benchmark, no benchmark JS. And let's see how our callback functions are, how fast they are, basically. So the benchmark module will run each of these functions many, many times, until it's kind of sure that it reaches a stable value. So for the first, my original callback function can see here, it managed to call that about one million times. A second, the air constructor is quite a bit worse, only about half as fast. Even though we saw earlier it's optimized, whereas the original slice function was not optimized. The iterate arguments clearly quite a bit faster, just iterating over the arguments turns out to be quite a bit faster, about four times faster than the original approach. Interestingly, using the push is yet slightly faster again, even though we saw that the iterate arguments function was not optimized, whereas the was optimized, whereas the iterate arguments using push was not optimized. So, which clearly has this optimization of the code is important, but it's not the main criteria. The spread operator, not too bad, not quite as fast as iteration, but what's really clear here, the dramatic difference between all of these other approaches and the native approach of just using the arguments and passing them on and not having to deal with the special arguments object at all. So I was quite surprised to see this huge of difference. It's about, yeah, factor 10 faster than the next fastest approach, which is using iteration, using push. So, yeah, like I already mentioned, what I also found very interesting to see is that the optimization status, whether a function is optimized or not optimized, doesn't really make that much of a difference. I mean, if you have two functions, one of them is optimized, the other one's not optimized, it doesn't mean that the optimized function is going to be faster, because after all, each function has a different way of finding the arguments object and iterating over it versus using some other arguments function. Those differences in an approach far out way are the differences that the optimization that the ADES makes. Same thing here again, numbers are a bit different because this was run on a different machine, not on my laptop, but relatively speaking, they're all pretty much the same. So you see about a factor or a hundred difference between the slowest and the fastest way for something that's functionally completely equivalent. So from the perspective of the user of this function, they behave exactly identical, at least in my given use case where I have at most three parameters. So, yeah, that's kind of the end of the story. I switched out the callback handler. I rerun the benchmarks and it was back at where I was before I introduced the callback handler. So in the end, I had my flexible callback handler and I didn't lose any performance, which for me was quite eye-opening that this extra function column there would cost me about 10% in throughput in a database benchmark and that just making some slight changes to the code using a different method of handling variable length arguments would help that drastic performance effect. So I found that quite enlightening. Yeah, that's pretty much it. Just briefly about me done hacking. You can find me on Twitter or github. I do have the benchmark and the optimization code as a guest as well. If you're interested to try it out yourself, slide decks, I'm going to put those up on speaker deck. The source code, as I said, I did publish more or less the same results previously on a blog post for our company blog. And yeah, there's the bluebird optimization killers, which has a lot more information about not just arguments but other code patterns as well that cause V8 to not optimize the function. So if you really are doing performance measurements on some performance critical code, it'll be worth a look to see if maybe some of the core patterns that you're using are preventing the V8 engine from optimizing your code. But like always, benchmark first, do profiling, figure out whether the code you're looking at is actually performance critical or not. Don't waste your time looking at every function and trying to figure out is it optimized or not. Then find out first what are the specific functions that are like on the hot path that really will make a difference to your code. Yeah, that's it. Regarding the question from the blog, why are some arguments optimized and not optimized? I'd rate arguments and I'd rate arguments using the push. Maybe one of the reasons is that, let's say I could push basically, but they say it's optimized. They are kind of saying we have a better way to do this. They are slightly different from what you think, right? So maybe for other arguments, for example, yeah, they are optimized engines. We can do this in push, so we're going to improve the code and therefore I think that's optimized. But I'd rate arguments using push. It's very like, oh, we can't do anything else. We can't improve it or hang out with you because they are kind of very important. Yeah, I might be. My suspicion is that it's because I'm passing the value in that argument object into a third party function at that point. V8 says, okay, you're passing this value around somewhere else. I don't know what this other function is going to do to your value. It might modify it. My suspicions that's causing V8 to say, okay, I'm not going to optimize the function, but so yeah, to your point, Optimization doesn't necessarily mean that we're going to make it faster. Yes, yes, correct. Yeah, I'm not sure. That's the case that V8 thinks at this point, okay, with push. I think that it could still, because it's not about optimizing the JavaScript code, it's about basically optimizing the way it executes the function and spending more time curing the V8 engine specifically to this function and making sure it runs in an optimized way. Maybe. Maybe, let's say, for example, in JavaScript, they kind of provide some way to write code and provide the individual bits. Maybe they would say that there's no reason, there's no way, but because they're bringing individual bit operations, but yeah, it could probably be super fast, but because the code finally won't have anything else. Yeah, maybe. I mean, ideally it would just tell me, hey, don't use arguments. That's quite a much better way for you to do here, but yeah, it's possible. Not entirely sure why this difference between every index and push was causing V8 to not optimize this. For the other methods, after like reading through what the blue word you mentioned, it kind of became clear to me why V8 is able to optimize, or at least based on its heuristics. I mean, really understand why it can't optimize some things that happen with arguments. I think you would have to better understand how V8 actually implements that arguments object. But based on the heuristics, I can understand why some of these functions are optimized. Now there's not, but with the push, I'm not really entirely sure. So thank you very much for this talk.