 there's an outbound connection to a upstream HP server. So that's pretty much LSOV. Any questions about that? Cool. So the next tool is S-Trace. How do you deal with this S-Trace? Okay, not as many. S-Trace, again, it's Linux only, it doesn't work on OS X, but similar to LSOV it can connect to a running process. And the cool thing about these tools is they're not Linux-specific or rather Ruby-specific. You can run them on any process and get a bunch of information about that process. So there's too many ways to run S-Trace. First is a summary mode with the dash C option. So the way this works is you attach your process, it runs for a while, you control C and you end up with a report that gives you a summary of what happened. So here you'll notice the top two system calls. So S-Trace traces system calls and signals. System calls are essentially functions that are defined in the kernel. So you have a bunch of functions defined in your C code and your Ruby code. And anytime one of those calls into the kernel or jumps into the kernel space and that's known as the Syscall. So the two Syscalls that show up at the top here are read and write. This makes a lot of sense, especially in a Rails application, you're reading in H3 requests, writing out SQL queries, reading the results back, doing a lot of reads and writes. So most of the time the kernel here is spent doing that. The next way to run it is in tracing mode. So tracing mode actually will show you in real time as system calls are happening inside the process. So when you run this, you don't just start spitting out stuff. It goes by pretty fast though. Usually I'll use the dash little option to run into a file. So if you look at this output in a little bit more detail, in a Rails app again you're likely to notice a lot of reads and writes. So here's the actual read call in more detail. So you can see the first argument to read is a file descriptor that's reading from. So file descriptor 22 in this case is an inbound HTTP connection. Again, if you look at LSOV, you would actually see the IP and port associated with that file descriptor. You can see the H3 request that's getting read so here it's just a get. And there's much more information, the number of bytes that were read and how long that call took. Similarly, on a Rails app you would see something like this. So here we're writing to file the SQL query and then right after that we read the results back. And here you'll notice the read call takes about 1.3 seconds. So this is indicative of a slow query and this is something that I'll often do is collect a bunch of SRAs data and then run a Ruby script on it and kind of look for slow system calls. So if you run SRAs on Ruby, you might notice a few things. One of the first things I noticed when I ran SRAs on a production Ruby insist was something like this, a lot of sick VT alarm signals coming in every now and then. So I did a little digging into this and what I learned was that Ruby on 8 uses signals to schedule its green threads. So there's a bunch of threads defined in RubyLand that are green threads and the way that it knows to switch between those is that it tells a kernel, hey, every 10 seconds send me the signal. That way every time I get the signal, I'll know I need to go ahead and switch threads. The other thing you might notice here in summary mode is a lot of calls to SIGPROCMASS. This is actually due to a Ruby bug that was recently fixed in 188. So here you'll notice three and a half million calls, 100% of the time the kernel is being spent in this one syscall. So what's going on here is a lot of Linux distributions build Ruby with the enable p-thread option and that ends up enabling this obscure option that does a lot of SIGPROCMASS but every time it's switching threads and rescuing exceptions, a bunch of stuff like that. And this actually ends up in a 30% overhead. So it makes your Ruby binding 30% slower, which is a pretty big sort of impact. So again, you can run this, you can run S-Trace on your production site right now, collect 10 or 20 seconds worth of data, hit control C and if you see SIGPROCMASS, it's spending a lot of time. That tells you right away that you should probably spend some time re-compiling your Ruby. Use something like RBM or an RBE that we impact in the set of your standard sort of Debbie and Red Hat, Ubuntu, Debbie Ruby. Yeah. So this has been fixed in the latest builds for Ruby? That is correct, yeah. The point over here is that it's also, this has been fixed in the very latest Debbie and Ubuntu Debbie packages. So the next tool is T-Speed Dump. So we talked about sort of general tracing stuff. Since we are writing a lot of network-aware applications, a lot of times we're running to sort of weird network issues going on. T-Speed Dump, I mean, really useful for that. What T-Speed Dump does is basically dump traffic in real time that's going over network. So there's a bunch of information, the mandate is really helpful, but what it boils down to essentially is an expression that you pass into T-Speed Dump. So here's an example of a very simple expression. We're telling it capture all T-Speed data that's connecting to the destination port 80. So anything that's connecting to the web server creates all that data. And so again, if you ran this on your system you would right away start seeing all the incoming and outgoing HTTP requests. So here we can see there's an incoming request for an image and you can see all the headers and all the data that's associated with that HTTP request. Similarly, you can do this for MySQL. So here's all the SQL queries that are going over in that work, for instance. The best way to use T-Speed Dump though is with the dash W option. So this, instead of printing out the packets to SCD out, we'll write all the packets to a file. And this is really useful because you could run this on your production server, collect a bunch of data over a couple hours, and then download that file locally and open it up in a tool called LiarShark, which is really nice and gooey for filtering and interpreting and viewing this sort of data. So here again is an HTTP request. And you can see that it's a much nicer sort of gooey that you can dig into and see all the details about that HTTP request. The other nice thing is that it understands a lot of common protocols like HP, MySQL, stuff like that. And so it'll parse out the data and show it to you in a way that makes a lot more sense. So that's T-Speed Dump, any questions about that? All right, let's move on. So the next tool is Perf Tools. It's probably one of my favorite tools. It's actually a set of tools that were developed at Google that they use internally for performance work. The one I'm gonna talk about in particular is their CPU Profiler. So again, here's some instructions for later, basically you download and compile. And the way this works is you get an LDP load of the library and then all you have to do is set an environment variable before you start your process. And that tells the process to write a profile out to whatever location you specify. And then they respond to it with a prototype called P-Prop that's able to analyze your profile and give you much data. So like any sort of Profiler out there, it's got the standard test-based output that shows you what's going on. But the really nice thing about Perf Tools is that they have this built-in graphical output that makes it a lot easier to understand your code, understand where time is being spent. And just by looking at this, the bigger that a function shows up, the more time it's taking, the more CPU cycles it's spending. So some examples of places I've used this. The cool thing about Perf Tools is it's a sampling profiler. So every so often it'll take a sample. And what this means is it's fairly low overhead and you can actually run it in production. So I ran this in production on a pretty big Rails app for five or six hours during peak time. I collected a bunch of data and then generated a graph. And what you'll notice in this graph in particular is that about 12% of production time is being spent to function RE match exit. And the cool thing again about this graph quote, call graph layout is you can follow that up to the functions I called it. If you follow it up a few levels, you'll notice that all of these calls are coming from a function, RBSTER sub bang. So like you would imagine, RBSTER sub bang is the C equivalent or the C implementation of the function strings sub bang. And you can trace where all these calls are coming from in RubyLint, you'll find that all of this stuff is coming from time.parse. And sure enough, if you open up time.parse, you'll notice code that looks something like this. And since time.parse supports a variety of time formats, the way it's implemented is that it wants a bunch of reggages until one of them succeeds and then it figures out what the time string translates. So in this case, the solution for us was to switch from time.parse, switch from a Ruby-based time parser to a C-based time parser. And Jeremy Evans is actually doing a talk about this later. His gem called Home Run, which moves the daytime libraries into C-Lint instead of RubyLint. Another example, I maintain the memory machine gem and for a long time there was a known issue where if you use Ruby threads and memory machine together, especially with E-Poll, it was really slow. And so again, I just ran proposals on this and sure enough, I noticed a quarter of the time is being spent in mem copy. And again, you can trace the mem copy up and see what are those columns are coming from. So in this case, they're coming from two functions, like the VM, RB thread and save context and RB thread and restore context. So once I knew that this was threading related and it was coming from these two functions, I was able to just sort of look at the implementation of those functions and figure out what was going on, fix the issue. In this case, the memory machine was allocating bunch of really huge network buffers on the thread stacks and since every time threads are swapped in and out, they're copied back and forth, it was doing a lot of mem copying that was causing a lot of overhead. And so we just moved these buffers onto the heap instead of the stack and that fixed the issue. So any questions about perf tools? So I was using perf tools for a bunch of stuff and what I realized was that in a lot of these graphs, like sure it was cool to see that mem copy was acting or thread save context or especially like RB stir something but what I really wanted to see was the actual Ruby functions themselves. So I created this project called perf tools on RB. It's on my give-up and it's essentially a fork of or a wrapper routed around perf tools that makes it really weird. Here's how you use it, you can read me, it's pretty good. So I'm just gonna grab some water real quick. So if we take this example, there's a really simple scenario app that has two actions. One is slash sleep which is time intensive so it takes a long time. And the other is a compute action which is CPU intensive. So it takes a while but it's also spinning the CPU where it's the first one, a kernel is smart enough to say this process wants to sleep, let me just pause it. So we run this application inside perf tools and we send both of these actions 50 requests each. What you end up with is something like this. So on the left, so there's actually two modes that you can run perf tools on RBA. On the right is sort of the regular CPU profiler. So like you would expect the compute action which is very CPU intensive shows up as the highest, as the biggest thing on the right. But you can also run it in real time mode and so that's measuring all time. On that graph, the sleep action shows up at the top because it actually takes longer than the compute action. The compute action is also on that graph but it's not as expensive in real time as the sleep action. So here's a bunch of examples of ways I've used this and found it helpful. Early versions of the Redis gem use system timer on every single read like man and so running perf tools on that we were able to see that there was a lot of overhead and once we knew that that was an issue we switched over to using native TCP come outs on the reads and writes. Another way I like using perf tools is just to sort of get a sense of the code base since it shows you the entire call graph you can get a sense of what functions are being called how often they're called and what other functions they call. And so here for instance you can see Ruby gem spends a lot of its time in file system access. So it's traversing directories, checking files exist, opening up gem specs and doing a lot of file system access. Recently I ran perf tools on one learn and noticed that about 25% of the time it was being spent in the gem version spaceship function. It's just because one learn is doing a lot of version comparisons to try to figure it out with the often versions to install our based on your gem file. So in this case, once I knew that this was an issue I was able to patch that function itself and made it actually a bunch of sort of micro optimizations to that function. But even those were good enough to sort of give one learn of 15% overall improvement. And again, you know, like if you look at this patch these are the sort of optimizations that are not sort of recommended to do to make because they are micro optimizations. But the whole point here is that once you profile your application and you know that it's spending a lot of time on a certain function, it makes sense to spend time improving that function. And then finally, most recently I added an object application for father two profiles. And so here, instead of showing you CPU overhead or real-time overhead, each sample represents one object that was created. So here again, this is on a big Rails app. You'll notice that time dot parts and date dot parts are creating a lot of objects. So we noticed earlier that these functions are very CPU intensive, but it turns out they're also creating a lot of objects. And again, since Ruby has a simple sort of naive markets with garbage collector, more obvious to create the longer you end up spending in GC and the longer you spend in GC, while GC is happening, nothing else can happen. So it makes your application slower. The solution for us in this case, in this particular application was to switch to the MySQL driver. And most of these time dot parts that were coming from roads that we were leading to the MySQL, I'm trying to parse the times out of, and MySQL 2GEM knows the exact format that MySQL is going to pass it. And so it just writes a few lines of C code to parse that out instead of propagating it up to Rubyland and having Ruby figure out what that time to spend. So that's Perftos, any questions about that? All right, so Perftos is a lot of fun to use. It shows you data in a really easy to sort of understand way. So the next tool is sort of related. It's the Rack Middleware, the Rack Perftos. It makes it even easier to use, especially in the context of Rails, the Rack application. So you can jump to solve this. The way it works though is really simple. It's just a middleware. So here, for instance, we require the gem. We insert middleware in the beginning of our Rails application. And what it does is add a bunch of URLs to the application and make it really easy to profile. So the simplest way to use this is the last command on the screen. And the way that works is on any URL, you can just pass in profile equals true and times equals a number. So here, for instance, it's gonna run, it's gonna make 10 requests to the home page and then instead of returning the contents of the home page to your browser, it'll just return the actual gift. That's the profile. So all these screenshots that you've been seeing have been taken from these gifts, sort of outposts, and that'll show up in your browser very well. The other way to run it is sort of a more manual where you can start the profiler, hit a bunch of URLs, stop it, and then grab the data out. So that's pretty much it for Perf Tools. Any questions about that? Cool. So the next tool is Ltrace. We talked about Strace earlier. Ltrace is very similar, but instead of tracing system calls, calls inside the kernel, it's able to trace calls in user land to see functions in shared libraries or inside your binary. So in order to how Strace works, you can run it in two different modes. The first one is in summary mode. So this is the same example from earlier. We have these threading issues with the RAM machine. And sure enough, just like Perf Tools, it shows you that a lot of time it's being spent mem copy. And the next way to run it is in tracing mode. So the cool thing here, again, is you can see stuff as it's happening in real time. So we talked about earlier, Ruby uses syncvt alarm to tell it to switch threads. And here you can actually see syncvt alarm comes in, and then sure enough, it does two mem copies. One to copy the kernel thread out, another to copy the new thread in. And what you'll notice here is that the last argument to mem copy, which represents the number of bytes that you want mem copy didn't have, in this case, it was fairly large. And so every time it had to do a thread context switch, it would end up copying one mem byte in and one mem byte out, which was the reason it was so slow. So that's Ltrace. One of the problems with Ltrace itself is that it's unable to trace calls being made to shared libraries. And since all Ruby extensions, our shared libraries that are loaded in during runtime, Ltrace can't show you what's going on inside those library calls. And so, Jodemoto, Fork, Ltrace, and added support for libgl, it's up on this GitHub. We actually just recently took over a maintainership of Ltrace. So the next version of Ltrace will contain all these patches. So these patches add much options. But basically the way this works is you can use the dash x option to trace any C function. So here, for instance, we're tracing all calls to garbage collection function defined inside the VM that does garbage collection. And sure enough, in tracing code, in real time, you can see garbage collection as it's happening. So here, you'll see that garbage collection is running every few seconds. And every time it runs, it takes about 200 milliseconds. This is fairly typical for any decently sized query application in MRI. And the thing to note here is, again, when garbage collection is running, nothing else can happen. So if it takes 200 milliseconds and it gets triggered during the request, that's gonna add 200 milliseconds to your response time right away. Like I said, you can run this on any shared library that you read extensions. So for instance, the MySQL client library finds a function called MySQL real query that's used to make all queries. And so on any given process, you can attach to a process using its pit and start tracing all the queries it's making. And so here again, you'll notice a bunch of queries going on. The first argument is the connection pointer. So you'll notice that there's two different connections. And then one of these connections, it makes a really slow query out that ends up taking 1.2 seconds. And you can do this on any other function. So for memcache, for instance, I use this to track down the weird cache exploration issues just by collecting a bunch of sets and get to our time and then analyzing them in the Ruby stored app with that. So that's L-trace in question about that. Cool. So the next tool is GDB. GDB is sort of the standard debugger that's part of the new suite. This is really useful in Ruby land for tracking down the cycles. So I'm sure some of you guys have at least seen the sort of output at the top where Ruby will suddenly die and all it says is cycle. So to sort of demonstrate this, I have a simple C extension at the bottom. And all it's doing is defining a global function called segd that just does invalid memory access and causes a cycle. We have a simple piece of Ruby code that requires this library and then causes a cycle. Before causing a cycle, it does a couple of things. So it'll, it calls, you know, four dot times and then dir ch dir hash nu and then it cycles. So there's two ways of using GDB. One is to attach to a running process. So again, we find the process, we take it's paid, we pass that to GDB, connect to that process and then it lands on a prompt so you type continue that basically says, you know, continue running this process until something happens and then sooner or later it's gonna cycle and you go back to that prompt and you can start collecting more data. The other way to run it is using a core dump. A core dump is essentially a dump of the contents of memory. To do, to do it this way, you have to do a couple of things beforehand. So the first thing you have to do is bump up the core limit size. So here we're basically setting it up up to 300 megs. What this means is as long as the process is under 300 megs, it'll get dumped out when it dies. The other thing we do is create a directory for the course to live in and then tell the kernel to put the course into that directory. And once you end up with the core dump, instead of invoking GDB with a process ID, you just pass in the course and it's smart to figure out what you do. So once you're at the GDB prompt, the simplest command to use is where and what that does is basically give you a stack trace of all the C functions that were called when that's a fault happened. And once you have this, you can basically, you know, copy and paste this where you can help open an issue and let the maintainer know this is what was going on and at least it'll have a bunch of information to try to figure out what's causing that cycle and try to reproduce it. You could also just start looking at the code itself. You know, you have enough information about here. You have the function names, the files and the line numbers that they're defined on. And if you look at this in more detail, you'll notice, you know, the very first frame is main. Main is sort of the entry point to any C function. That calls Ruby run, gets Ruby up and running. You'll notice in numeric.c, it's invoking function into the new time. So that's the four dot times that we were calling that calls Ruby yield or just yielding to the block. There's a endure.c, there's a hash.c. So you can see how the C code sort of relates to the Ruby code that was causing the cycle. Similar to the whole of Perftool's thing, I was using GDB for a while. And again, what I really wanted was Ruby level information. So we see level information. So I created this project called ggb.rb, which is sort of a scripted GDB with hooks for Ruby, such that it knows what's going on inside the Ruby and that you're connecting to it. So again, this is all my give up. The way this works is it adds a bunch of Ruby commands. So for instance, there's an eval command. And so you can attask any running Ruby process and start evaluating code inside the context of that process. This is really useful for debugging. It's useful for injecting code. For instance, if you wanted to start profiling an application, you could attach a GDB and run Ruby eval to require Perftool and get Perftool joining inside that process. There's a bunch of other commands. There are commands to inspect all the threads. Here is the list of all the threads. You can see the first thread is the main thread that's waiting on joining the second thread. The second thread is in asleep 60. That has 57 more seconds left. The third thread is waiting on father's future five. Again, you can look at Elsaw to see what that father's future five wants to do. And the last thread is the current running thread. There's some commands to inspect the heap. So you can see how many objects are alive, what types of objects are alive. You can inspect the contents of objects themselves to hear we're grouping all objects by content or all strings by content, rather. So if we take that example from earlier where we're causing the segful, now with gdb.rb, we can get Ruby level information about what's going on. So we type Ruby threads, and instead of C functions, C is the final names, we get Ruby functions, Ruby final names, and Ruby will have numbers. So a couple of ways that I've used this and found really helpful are versions of, we got a question about Ruby version agnostic. So the question was about what Ruby versions it supports. I just pushed a version last week that supports any 1.8, so it'll work on 1.8.6, 1.8.7, RE, or MRI. But not 1.9. Not 1.9 yet, yeah. And another thing to note is that it only works on Linux because gdb.7 currently does not support OSUX. So here's an example. There was a plug-in called Rails Warden that's an authentication plug-in for Rails, and it was causing memory leak, and turn off, you can attach. Here's a list of all objects by class name, and there were, we noticed that there were several thousand instances of each of the middlewares, and what was going on was basically, Warden was leaking its middleware by adding itself to a global array, and since middleware always points to the next middleware in the chain, it ended up leaking the entire chain of middleware. You'll notice a lot of interesting things if you just sort of hook around applications. Longer, for instance, has this line of code that I discovered that just creates a new thread, that just leaks one in a loop. God had a lot of memory leaks for a long time, and again, you can run some of these commands, and we noticed that there were way more instances of God watched than we expected, and there were a bunch of arrays that had over 90,000 elements, so these were arrays that were just growing infinitely without ever getting hurt. So that's gdb.rv. Next to it was memproth. So memproth is sort of a memory profiler for Ruby. It's similar to the leak house, you guys never used that, but it doesn't require any patches to the VM, so it's just a gem that you can just require and start losing. So memproth has a bunch of features, a bunch of API, you can look at the readme for more information. For instance, you can track objects created during a block, so for instance, here we're creating 100 strings, 100 floats, 100 modules, and it'll just print out a simple report that shows you how the objects were created or how the lines. There is memproth.dump that shows you a bunch of details about objects, instead of just giving you a summary, it shows you in JSON the actual structure and details about the objects. So here, for instance, is a string, you can see how long it is, what its data is, what class it is, et cetera. You can do this for any of the native types, arrays, for instance, you'll notice, show you the actual contents of the array. Hashes will show you key value pairs and whether or not there's a default prompt. And then classes, like you'd expect, also have a super class. There's an instance variable table that has, that actually holds both class variables, instance variables, and constants. And there's a method table that holds both methods that are defined. But memproth.dump all is basically just dump, but it gets called on every single object that's defined. And since this all gets dumped out as JSON, it's really sort of easy to run analysis groups on it after the fact and get more information out of the data. So one of the ways to do this is using memproth.com, which is sort of a web-based UI that can take one of these JSON dumps and create sort of a clickable interface for understanding what's going on. So here's a real quick example of things running out of time, and I'm gonna speed through this. Using memproth.com to find a leak in Rails 3. So, really versus Rails 3, it's really massive memory leak in dev mode. So the way, I'm gonna sort of walk through how I found and fix this leak. So you require memproth, we send the application 30 requests and we run memproth on the pit to sort of dump out the heap, upload it, and give you URLs to end up on a page like this. You can start clicking around to sort of get a sense of what's going on. Here I'm gonna click on do good classes by name. So you see something like this, there's 2,500 classes and there's 30 copies of test controller. So this is all built on MongoDB and you can see the MongoDB query that's being used. So we say, find me all classes, but limit it by the ones that are named test controller. So you get a list of all of those. You click on one of those to get details about it. And the cool thing about this is you can see the references, right? So the whole point with the GC is that as long as something is being referenced, God's production is gonna think that it's in use, it's not gonna get rid of it. And so you can use memproth to find what's still referencing an object. So here we see one of the references to test controller is a hash that has exactly 30 elements, which is interesting because we sent exactly 30 requests to this application. So we click on that and sure enough, we see that this hash is holding references to all those test controllers. And the cool thing is that it shows you the bottom line numbers so you jump to that code and see what happens. So in this case, it was basically a bunch of caching logic inside Rails that was sort of conflicting with Dev Mode. And the fix for this was really simple. Instead of using the actual class object as a hash key, we just use the string test controller as a hash key. And so every time in Dev Mode, a new version of test controller gets loaded in, it still has the same string key that gets mapped to this hash. So just overwrite the old entries that are creating the new entry. There's a couple more memproth features that we're working on, sort of middleware that is a tracing framework. So for each request, it collects a bunch of data about how long the request took, you know, a couple of details about the request itself, and a bunch of information that's sort of useful just for profiling in general. So here, reflecting how many Moscow prints happened, how long they took, how many calls to GC. This was a really expensive action. You know, there were eight calls to GC that took almost two seconds to execute. And then a bunch of information about objects that were created. So here, almost four million objects and then sort of a breakdown about all the objects by type. So this is something that I'm actually working on. The cool thing is, if you run this, memproth is mostly written in C and assembly and it serves a really low overhead. And if you run this in production, collect data about a bunch of requests over time, you can then take all the ideas on data and start analyzing it. And so I'm currently working on sort of a report that's generated using all this data. And this is something that I ran recently on production application and noticed that, you know, a lot of time, almost 40% of the time being spent this application was in GC. And every time GC happened, it would take almost half a second. So that's pretty much it. One of the stuff was that there's times and bugs that we all run into that sort of makes thing, you know, WTFs, it doesn't make any sense. And if you have the tools to start debugging and start gathering even the tiniest amount of data, you can start making progress. Because if you'll find, you'll get some data, you'll start looking at it, start making correlations and once you have that data, you can start investigating. Even if you have, you know, a line number, you can start looking at that line and start making progress instead of being stopped. And so the whole point of this talk was to introduce a bunch of these tools such that when you run into issues like this in the future, at least you'll have in the back of your mind, you know. I sort of remember there's this one tool that might be able to help. And you can look at these slides and copy some of these commands and start to use these things. There's a bunch of slides in here that I don't have time for. So this is sort of a case study of me running all these tools on an application and finding bunch of problems and fixing them. Like I said, all these slides are online. So make sure to check them out and if you have any questions, you can ask them now or you can find me later on or you can reach me in the middle of the break. That's it.