 Can everybody hear me? Yes, no? Thumbs up? I can't. All right. Hey, everybody. All right, cool. That was awesome. Good morning. I want to thank the Gorogo organizers. I'm a NYC.RBOG. My first meetup was probably in 2005, which is a long time ago now. And this is my first time actually doing a full-length presentation at Gorogo. Last year I did a short one. But last year I dropped my iPhone off the boat. So I think Francis and Josh felt bad and that's why I'm here today, but maybe it's because of something else. Not hear what I'm here to talk about today. Today I'm here to talk about performance and Ruby performance, and we'll get into that in a second. And the reason I'm talking about Ruby performance is I spend a lot of time thinking and doing this. I work at a company called Paperless Post. These are some awesome things that some of our illustrators made. Paperless Post is a growing platform for delivering meaningful and important messages online and offline. And most importantly, we have a great team of really talented designers, developers, and most importantly, eaters. Really good eaters at Paperless Post, so you can come join us. But today's agenda, a lot of times I talk about very philosophical things. I've given a talk about making bacon and how that relates to food and programming, but today is gonna be really practical. I've been working really hard on some really interesting things, and I wanted to share that practical work with all of you. And having said that, let's start with some philosophy first. So the reason I wanna talk today is because performance matters, and you just hear this over and over again, and it's getting drilled into your head. But not everybody talks about why or what that actually means. And performance, a lot of times when people say performance matters, they're talking about user experience, performance, time to first bite, things like that, that how faster page loads and how fast it's experienced, especially on mobile and different platforms, definitely really impacts how a user experiences your site. Though from my perspective, running this team and kind of working with our ops team and running kind of a scrappy startup, whether you're funded or not funded, performance is also about dollars and cents. The faster your app is, the less servers you need to run it. And that's just like a real number one-to-one thing. So these two items on the screen, Ruby and Performance, don't really often get talked about in a positive context. And they don't often go together. But this crowd, I don't think I have to defend myself. If I'm talking to a group of people at a Ruby conference, I probably shouldn't have to defend that Ruby is worth doing. So when we talk about performance and we talk about applications, what I'm talking about mainly in this talk is about web applications. And whether you run an iPhone app or something like that, whatever it is, today I'm really talking about web apps. And this is kind of what a typical web app request looks like, right? You have your Guy Fieri, who's your user, and he spent some time connecting to your server. Then there's some amount of time, 10 to 30,000 milliseconds of time, building your app and building the page in Ruby. And then that gets downloaded and there's time, the time of downloading. And then eventually once that's downloaded, then your CSS downloads, your JavaScript downloads, and then your images. And that kind of dotted line is when the user can see something. So when I'm talking about this though, there are certain things here that I can have an eye as a developer and I as an operator can have a good impact on and some things that I pretty much have no impact on. I can have no impact on. I can have some impact on connecting and how fast the HTML and download and CSS and images download. I can make those things smaller. Or more importantly, I can put a CDN in front of them and make it closer, geographically closer to the user. But really when we talk about this, we're really talking about is how much time is spent generating our page or doing the work that we need to do for the user on our server. Because really, even though we can improve images and the JavaScript and things like that, one, that's not really the purpose of this talk, but also there's a lot more variance and a lot more time that we can spend improving our Ruby application and how fast that can run. And that's really where the dollars and cents come in. I mean, I can pay a lot of money for a CDN and I can put servers nearer to you and I can spend a lot of time working on JavaScript compression and things like that. But in the end of the day, if your page takes 10,000 milliseconds to load, or 10 seconds, then that's gonna be the majority of the time the users spend seeing your page. And then if we dive deeper into this Ruby server land, what we're really talking about is not Ruby the language. I mean, Ruby the language can be slower times and there are things that we can do to improve it and those things have been happening over time. But really, it's your fault. I'm sorry, it's your fault, your app is slow. It's not Ruby's fault. Most of the time spent generating that page is in your application code and maybe a bunch of it is in the database and your cache and maybe a bunch of it is in Rails itself, but a lot of it is in your application. So we have a lot of room to work with is what I'm trying to say. So where do we start? So in the past, I think Ruby has kind of suffered in this respect. Basically, the passive Ruby core, a lot of time was spent building, and Rails I would say too, is spent building features for developers, not for operators. And what I mean by that is a lot of us, not in the DevOps sense, but in the fact that a lot of us write Ruby applications, but a lot of us also run large Ruby applications and have to worry about how fast those things run. And a lot of the features that have come out over the years in Rails and Ruby 2.1 are to improve the speed that we can develop new things, but not necessarily the speed of the final application. And this is kind of not the greatest thing if you're someone who spends his entire day trying to think about how to run these applications. But there's light at the end of the tunnel. Basically, last year a lot of this changed and a number of people who were major operators and who were doing, running these big applications, specifically a bunch of people at GitHub, and elsewhere were given kind of commit access to Ruby. And the one that I wanna give praise to most today is my man Amon, TMM1, some of you know him. No, that's not Lawrence Fishburne from The Matrix. That's Amon. And they started working on building these tools and we've started working as a group, a Ruby performance group to kind of figure out how to improve the introspectability of Ruby. So I wanna propose that this is, if not all, it's some of the future of Ruby Core that there is this focus on introspectability. Introspectability is not a word, but it's a word today. So I'm very happy to say that there are finally these APIs in the Ruby language to build the tools for the type of introspection that we really wanna do as operators of large Ruby applications, which is really exciting. So with that, here's to the future because we got through the past. I finally found a profile that can make my Ruby fast. Sorry, anyway. So what I'm gonna talk about is today is, that was an intro and we're gonna talk about is the future of Ruby performance tooling and it's multitudinous, also probably not a word but we'll go with it today. And it doesn't matter what version of Ruby you're running, though I'm gonna talk a little bit more about Ruby 2.1. There are tools to track how fast your application is and if you're not using these tools, you're blind basically and you can't know how fast your application is gonna be running. So reminder that there is, this is my biggest caveat and I've given an entire talk about this. There is no one tool. A lot of people are fixated on the idea that they can find this silver bullet that can find and fix and debug their applications and I am here to tell you that that does not exist and never will. There is no best tool either. The point of being a good Ruby operator is that you use a lot of different tools in different situations to make your code fast. So, welcome to Bryce 3D Lava Landscape. I was trying to think of how we can talk about and compare these different tools. So what I came up with is kind of a D&D scoring system for these Ruby tools and I'm calling them these Ruby performance character profiles. So when I talk about this, we have a bunch of categories. First of all, I love the keynote flame effect so it's gonna be heavily overused in this presentation. Just get used to it. So first of all, I'm gonna talk about specificity and specificity is how detailed the output of this profiler is gonna be. Can you zero in on specific users, specific runs of a piece of code? And then I'm gonna talk about impact and these two are pretty linearly tied most of the time. Impact is the level of probe effect and if you don't know what probe effect is, look up the Wikipedia article. I'm not gonna go into much detail about it but I am gonna say is that probe effect literally means putting in the amount of profiling you put in might have a linear effect on how fast or logarithmic have some kind of impact on how fast the code that you're trying to profile actually runs, which is a bad thing and this happens a lot in Rubyland. Finally, the difficulty of operator use, meaning how hard it is to set up and use. Readability, how easy it is to read and understand the output, another made up word, real timeliness, which is how fast to now are the results. And finally, I'm gonna talk about special abilities. So I'm gonna start with something that a lot of you know, active support notifications and friends and I'm calling these the L's of the Ruby performance landscape. And active support notifications built into Rails 3 and it can be used elsewhere. And basically it's not much more than benchmark.realtime but separated in two different places. That's what AS notifications is because what it is really is in one place you publish an event and in another place which this is kind of, this don't bother reading this code it's really just pseudo code. But the idea is that with that event I'm collecting that event and it's payload I'm iterating over the payload and then I'm projecting and sending those values to stats D, which is a collection server for these metrics. And the cool thing is that everything works and you put them together, you get really awesome things like graphs and really nice logs that have a lot of useful information in them. So this is a real graph from our production web app using this kind of system. But basically the idea is is you're really just one to one timing something somewhere, collecting the information somewhere else and then pushing it to a further collection service. And this is not something really new so you're gonna be like, yeah, duh. But I wanted to give an example because I think it's a good benchmark for how other things compare to it. So let's give this active support notification some rankings. So AS notify, so for specificity and impact I give it a five to 10. And this is important because like I said like I was talking about before with the impact and the probe effect your impact is linearly tied to how specific you wanna get. If you wanna know every single action and record every action and all the prems and all the different elements of that action your impact is gonna go up. And the problem with the way people talk about this and kind of what I wanna point out is that people don't really talk about this that often. If you're doing a lot of work in that active support notification subscriber that is gonna affect how fast that page is. You might not think it does because you might see, oh, that response only took 300 milliseconds but your Unicorn or whatever Ruby server you're doing is paused while it's sending that stuff to stat state. So if that's not fast it's actually having a big impact on your production server. Ezo operator use, I'm giving it a seven. It's pretty easy to set up if you're running a Rails app or even if you're not, you could probably use this or something like it and it's pretty easy to set up. Readability really depends on what you do with this data. So if you're putting it into graphs then that's great. If you're not then it's gonna be pretty useless. And then real-time in this it's very real-time. The special abilities, graphs. What it's really good for is your P90s or your 90th percentiles. It's really good to know across your app or in different controllers or different actions what the overall response time or your 90th percentile response time are. And it's also good for knowing what layers of your application are slow. Is most of my time in my app in my database layer? Is most of my time in my view layer? It's really not good for giving you details about anything. You're not gonna get any useful details other than oh I know this action is slow or my database is slow. You're gonna need something else to dive deeper and find out why it's slow. It's also very, very, very easy to misinterpret this and I've seen it happen millions of times literally, actually not literally, but a lot of times. Anyway, so next, this is getting into some newer stuff. So RB line prof. RB line prof, a lot of you might not have seen. A lot of these tools are kind of still a little under the radar but they're really, really interesting and that's kind of one of the reasons why I wanted to talk about it today. So I call RB line prof the warrior and this is a tool that Amon wrote and it works really, really well. But the way it works is basically you have some block in your application, you wrap this line prof block around whatever you're doing and then it collects this kind of dump of output and then eventually you use some kind of output formatter to render that to your screen. But what it's doing is it's a first class line profiler and a lot of other languages, basically every other mature programming language has had line profilers for a very, very long time and Ruby has had some that have worked to varying degrees but this one works and works really, really, really well because of some new features in Ruby 2.0. And what that means is so this is an actual, you might not be able to see the actual text here but this is a staging server on our site. I click the little large professor at the bottom of the screen. You definitely can't make out that that's large professor but it is and that reloads the page, the same page that I just load except with line prof running. And then once that finishes, you definitely can't read the actual code but the point is that's not the point. The point is is it recorded how long each line in my action took to run. Now this is kind of a mind blowing concept. We're used to this idea that oh my database or this query took a little time. This is actually showing individual lines of my application and that's the highlighted source code on the right and then timings on the left took to run. And this is a really, really, really powerful tool for obvious reasons. I don't have to do much to think about the fact that oh hey, I see this block that I didn't even think about think was running on that page is actually iterating over this block of SQL queries like 5,000 times. No wonder my page is slow. It's really clear to find those type of things or it's loading this one template on every request that's really slow. So RB line profit, very cool. For specificity, I would give it a nine. It's very specific. It's showing you the individual line of code that was slow. Impact though is also a nine and this is a really important thing. This is not something you can run on production. That was on our staging server, it's fine. But those timings that it gives you on the left side are you have to consider them relative timings because the act of profiling your code makes those timings take longer. And that's something to remember because you're not gonna see oh, this took 150 milliseconds when I ran RB line profit against it in staging but why is it taking some other arbitrary amount on production? It's because it's not real numbers. It's about relativity. This file took longer than this file and comparing them. It's very, once it's set up, it's very easy to operate and it's very readable the output if you format it correctly and it's not that real time because you're really only running this on staging and also it actually takes a pretty long time as you saw to get those results. It's amazing for these visibility into these dev hotspots. So if I have an action I'm like why is this action slow? If I run RB line profit against it, I'll have a pretty good idea of what part of that action is slow. It's really, really good for diving deep into these slow actions. So if I use active support notifications to identify what actions are slow, I can use RB line profit to actually introspect what about that action is slow. But it's not good for discovering systematic issues, our systemic issues. I can't know that this action that I'm running actually has any effect on anything else on the site. It's really localized. And as I said, it's not good for real world numbers. Oh, we've obtained a special weapon. Okay, awesome. So something we worked on for a little while is we saw RB line profit was great and we had these other tools for measuring and collecting data. What we didn't have was a way to repeat these tasks. So we created PPP profiler. And what PPP profiler is, is a way to run a single piece of code and run it over and over again with profiling. And when you run that, you get this output. And the output is in a markdown format. So we can just put it into a pull request. And what it does is it collects benchmarks for a hundred times of the cache, a hundred times of the cache off, because a lot of times we're optimizing how things, like when the cache on, if nothing runs, obviously it's gonna be fast. But we wanna optimize cache off, like how fast is this with the cache off because filling the cache might be slow. And then it gives you the line profit at the bottom. And this is really, really useful. And we've used this to dive deep into all of our application code at any time we find something slow. So it's a plus two for operator use and a plus two for readability. It is really just a wrapper around these other tools, but that simple wrapper adds a bunch of special abilities. It's really good at tracking these local improvements. So I do something and then make a change, run the profile, do another thing, run the profile, commit, I can post these results and play kind of golf of golfing these things down one stroke at a time. It's really good also for sharing improvements of these pieces of code with the rest of your team. It's not good for production. You cannot run this in production. This is all about local development. So finally, I'm getting to what I consider the most exciting part of the talk. So StackProf and StackProf, I call the mage because it is magical, honestly. So StackProf is a sampling profiler and sampling profiling is not a new concept at all. It's been around for a really long time. Google has published many papers on its use at Google and but it's new to the Ruby community. And I think it's important to understand how it's different from other types of profiling but it's also very, very useful for certain things that we wanna do and we've never had tools like this before. So the way sampling profiling works is you have your Ruby process, in this case maybe a unicorn and here's time going left to right and here is your stack. So your stack might look like on the top, you have your action controller dispatch which is running your action and then it calls deeper down in the stack your controller action and then that might call a template renderer. Obviously these are made up methods and then that'll call your active record fine. And so basically if you think about how your Ruby code executes, it's a call stack. It's going one and then calling deeper in the stack and then deeper and then eventually those results bubble up and then bubble back down and bubble back up and that's kind of the lifetime of your application cycle. So the way our stack prof works is it basically injects these samplings into this stack. So there's a new API in Ruby 2.1 called RB Profile Frames and the way that works is it is a zero memory overhead zero CPU overhead way of collecting what is on your stack at that time. And what Aman did was use that API to start a timer basically whenever you want and then every interval it collects these frames and then at the end it coales them and tells you what are the things that I've seen the most on the stack. Now this is really different from what you're used to and I know this might not be readable but the point is that it's not showing you like I executed this, it took 100 milliseconds, then I executed this, it took 50 milliseconds. What it's showing you is I've taken samples from around your process and I see that every time I've taken a sample active record scoped methods was on the stack or was at the top of the stack. So in this report it's saying 5.9% of the time active record based scoped methods was on our stack and this is actually a dump from production and what that means is it might not show you the details about what's going on there but it's showing you that these are hot spots or these are the things that are happening the most often in your code. So Brendan Gregg created this really cool thing called a flame graph and a flame graph is basically the same data but it's a different way of visualizing this data. Brendan Gregg is a really smart person. They've been using these ideas in Google for a long time but this is what a flame graph looks like. It's wild, right? But basically it's the exact same data that I saw in that dump. The only difference is it's visualized in a way that maybe we can understand a little better. So this is actually on one of our production servers I just ran this and got this output, right? And this is showing me that in this action there's a lot of time spent in action pack surprise. A lot of time spent in active support but maybe there's some other little things like we're actually spending a bunch of time in views and a bunch of time in helpers here and I can zero in on what methods were called and this is kind of reverse of what I was showing you with that other visualization. Basically the top is the bottom of the stack and the bottom is the top of the stack. So that'll be action pack and the bottom will be my finders on the Postgres methods. So stack prof is really cool. It's not very specific but the most important thing about this is it has very, very, very, very, very little impact if you're running 2.1 on production and this is an amazing thing because now we actually can collect data and see this data from actual production servers without having to worry about is this taking or hosing production performance. It's not very easy to operate but it can be readable if you use flame graphs and other tools and it's very, very real time. Flame graphs are the special abilities. You're not gonna optimize a single controller with this but it's very good at finding these systemic issues in production. One thing we used it for was we found out that StatsD actually had a lot of overhead in connecting to UDP servers and this is something that's being talked about a bunch now but basically we saw StatsD at the top of our graphs and all these StatsD calls were actually happening outside of our action controller request stuff so we would have never saw it in our graphs but it was actually happening and taking up a bunch of time on our servers. It's not good for detailed info about what's slow in your code and that should be obvious. Like you're not gonna find actual details about what's slow in your code but you might find that different parts of your app are slower or faster. Oh, wait, another special weapon. So recently I've been working a lot in Go and Go is a really interesting language and Go pulls a lot of the tooling and ideas from Google and Google has had, as I said, the stack profiler things for a while and Go has had it almost since day one but there is this really, really, really cool part of Go where you just include this middleware in your HTTP Go app and then from your command line you call GoToolPROF and point it at this server and then 10 seconds later you have an interactive command line tool to navigate the information from your server and when I first saw that I was like, mind blown and then when I connected the dots with stack prof I was like, wait, we can probably do something similar with stack prof, so that's what stack prof remote is. So here I'm just including this middleware in my app and here is kind of a run of this thing and basically all it's gonna do is I'm gonna run stack prof remote and again, this is against like a real production server. I'm gonna point it at one of our production web apps and the output really is important but the important thing is, is I'm running this on a local server, it's hitting a remote server and telling it to start this profiler and because this is in production a bunch of users are making requests, I don't have to fake requests and unless I'm explicitly wanting to know about it I don't actually know what requests are running. I just know this is what a typical server is doing in a 30 second period, right? And so it starts the profile, waits 30 seconds and then 30 seconds later it's gonna stop the profile and then download a dump from production and then put me in an interactive session where I can navigate this dump. 30 seconds is a long time. This is a video so I know this is gonna work. So yeah, so it downloaded seven megabytes of data from production and it shows me now I'm in this interactive prize session where I can see here are the top five things that my server was doing. Here are the top 10 things that my server was doing. I'm doing my fake typing here. Here are the total time, top five. Total means how many things were in any sample not just at the top of the stack. And then finally I can also introspect a specific method and see this method active support convert key which is actually in hash within different access. Guess what that shows up a lot in our production server. So this is really, really interesting and we just started working on this. I'm pretty excited about it. It adds operator ease of operator use. It adds readability and it adds real time in this. This also I didn't mention collected from not just a single unicorn but all unicorns running on that machine and it uses RB trace which is a magical, magical tool that I could literally talk about for two hours but just know that it's magic. It is good for easily inspecting a production server and this is what we're doing. We're using it for an introspecting production. It's not good for development but that shouldn't matter. You have a lot of tools for development. This is your production tool. Finally, I promised that I was gonna talk about generational GC and all that but I honestly, I couldn't even fit that in an hour and a half long talk. So really all I wanna do is kind of sprinkle a little thing on top and talk about that. So object space is a new tool in Ruby 2.1. Our object space has been around for a long time but there were new tools added to object space and one of them is this magical thing called dump all and what a great name for a method. So object space.dump all does exactly what it says. In the spirit of principle of least surprise, object space.dump all dumps your entire object space and what that means is when you run this, it pauses your application and writes every single object that's in your memory on your Ruby heap into JSON and then stores it in this file and that looks like this and Mike Bernstein who worked at Pivotal for a long time, he and I worked on trying to solve this problem for very many years. It took him on probably like two weeks and he just did it but what this means is you can actually see every single object in your stack. That is magic and that is wonderful. But this is 87 megabytes of JSON. It's a lot of data obviously and this is just from one of our applications. If your application is a lot bigger, it's probably gonna be even bigger. This is relative to the size of your heap. So what do we do with this data though? That's kind of the question. So in terms of specificity and impact, it is very specific. It's every single object in your heap but it also has a big impact. It pauses everything while it's running this. So that's kind of dangerous and that could take a minute. It's not very easy to operate. It's not very readable and obviously because it takes so long, it's not very real time but it is every object in your heap. I don't think everyone understands it. This is the best thing that we've always been waiting for. It's like, hey, our app has a memory leak. What is in Ruby memory? And for the 12 years that Ruby has existed, we've always been like, and that was our answer. Pretty much. Oh, we can use objects based on count objects. There's a lot of strings in our application. That was our best answer. Other tools like Java and other languages have had this for forever and it's pretty amazing that we finally have, we have our hands on it. It's amazing. It could be good for diagnosing memory issues and it's definitely not good for, it's hard to run and it's hard to fetch from the systems right now. Though that should hopefully improve. I wish I could tell you that I've used this tool and found the memory leak that we've been searching for in one of our workers for like four years and I wish I could say that I did some magic and compared two dumps and looked at them and saw that, oh, this object appears 20 more times than another object. So it must be our leak. Huzzah! But I have never, I haven't done that yet. But it's possible. So that's something. And so in order to try to think about how to do that, I created this tool. It's really simple. It's definitely work in progress to visualize what's in the object space. And in order to do that, you take this JSON heap and right now I'm importing it into a Postgres database and a relational database. And I thought that that was an interesting idea because in that object space.dump, you have not only objects, but you have the references to all the other objects by their addresses on the heap. And that's an interesting idea because basically you're talking about a graph database or a relational database. You're saying this object has relations to these objects and maybe I could visualize or see how what objects had a lot of references, what objects had very few references, or what was holding on to a lot of memory. And unfortunately, 87 megabytes of data, it takes a long time of JSON, it takes a long time to import into a Postgres database. Obviously this is local, so if I had a faster server, maybe a faster. But with that 87 megabyte dump, it took about 13 minutes to import into my local database. But once I did, I have this little tool and this is like the simplest Sinatra app ever. It's all it's really doing is showing me all those objects in a visual way. But it's pretty cool. I can see that there are a lot of arrays, surprise. And those arrays have a lot of elements in them. And actually in exploring this, oh, whoops, I didn't mean to cut that off. But in exploring this, I found out that our routes in our app take up like three or four megabytes of data in the heap and are replicated in like 14 different places. So it's definitely something I'm gonna explore, but just navigating that, I literally spent an hour and a half just being like, oh, what's that data object? What's that data object? It's like this really cool thing of just looking into your Ruby process and seeing what's in there. Like I said, this is a total, total work in progress. The data is there, we just need better tools to gain analysis. So I'm hoping that all of you will help me try to figure this out. We have this data now and we should be able to visualize it in a way or compare dumps over time that should allow us to see, oh, these objects are sticking around in the dump and really, really think about also what objects in Ruby take up more memory than other objects. There's some other really cool stuff in Object Space that I don't have time to talk about, but you can also actually track the allocations per file. So that obviously also has a large impact on how fast things can go, but you can actually see that this file allocated thousands of objects and this actually line in a file allocated thousands of objects and this line only allocated 10. And this is a really interesting way of visualizing and thinking about Ruby memory, which is another part of the picture from Stack Prof, which is all about CPU. This is about memory and visualizing that. Sam Saffron wrote this thing called Memory Profiler, which does this allocation tracking. It's really interesting, but it's also a work in progress, but you should definitely check it out if you're interested. So besides this awesome Bryce 3D melting mountain landscape, what do I want you to take away from this? Basically, what I want you to take away is that there is a lot of better tools than there used to be. There are many tools, besides even the ones that I mentioned for doing this, and not just tools like layers on top of Ruby, but APIs and things in the Ruby language itself that allow us to build these tools. So additionally, the world outside of Ruby tooling is really vast. Many other programming languages have better tools in us and I want you to go out there and steal them, please. Take those ideas and bring them back. That's what Stack Prof is. That's really what the object space dump all thing is, is there's been a lot of research and thought about how to profile production applications and all we need to do is figure out how to do that in Ruby. So actually, if there's one thing I want you to take away from this, it's this, which is something that people don't often think about. But when we talk about the probe effect and we talk about impact, it's really important that, like I said, there's no best tool, but there are good tools for certain applications. And the best way to kind of divide them is that you should use these low impact, high low specificity tools in production to profile our code and the high impact tools to measure the relative improvements of iterative changes in development and staging. And that's the most important thing. High impact in development, low impact in production and combining the combination of those two is magical. So we can get better at tooling and you just have to help me, basically. That's the message. Thank you very much. I'm AQ, that's my GitHub. Here's a link to a bunch of the projects that I mentioned but definitely hit me up afterwards. Now that I'm done, I'm gonna get drunk. Woo! So, please find me on the boat. We'll hang out. Thanks.