 Good afternoon. My name is Philippe Henri-Gour. I want to tell you about my Mongol clusters. Because most of the time, I have wonderful relationships, my Mongol clusters. But sometimes they just stop responding to me. They don't take my requests anymore. So in these cases, I'm a little lost. I'm like, how should I feel about it? What can I do about it? And that's pretty much what I want to cover tonight. What you can do when your Mongol cluster is misbehaving and you cannot figure out what's going on. So in case you're still wondering about the accent, yes, I'm French. I'm working for a company called ThoughtWorks which is providing consulting services, specializing in agile and also cool and exciting technologies like Ruby. I wrote, published with Addison Westley a short cut on the topic of Ruby troubleshooting, covering how you can leverage system tools like LSOF, S-Trace and GDB in the context of Ruby. And you might also know me for my involvement as a creator and main author of Selenium Grid. By the way, we need to talk about virtual Selenium tests because there's a way not to do that. So Selenium Grid is a tool that's human. You're Selenium tests in a distributed manner across multiple machines so you can bring your tests in parallel and instead of waiting three hours for your tests to finish, you wait 10 minutes. Anyhow, the important stuff is one ahead tonight. The main reason dates back once upon a time like a year and a half ago, Patrick Fowley who did the presentation in Ruby Internals, myself and a couple of other ThoughtWorkers, we were on a pretty ambitious Ruby on Rails, okay, we're working on a pretty ambitious Ruby on Rails project. Is that all? Thanks. Which we're talking to a lot of database, multiple database from the same Ruby on Rails application, multiple web services, a pretty aggressive user load, so it was kind of big, enterprisey, ambitious Ruby on Rails project. And having fun on it, our velocity was pretty high. And here we started to assess the stability of that. And we realized that our Rails instances were getting stuck, called frozen, pretty quickly, pretty consistently, but unfortunately at random times, at random requests, and just looking at our logs, just going to instrument Ruby, we couldn't go anywhere, and we were not even close to find the problem and the root of the problem. So this is when Pat Fowley, who was a tech lead at the time, came to me and a colleague of mine called Jessen Miller and said, hey, Philippe and Jessen, you know the stability problem we have? He was like, yeah, yeah, you need to solve it quickly. So we responded, of course, but you know ahead, some of us were thinking, we don't even know where to start. So when you have a problem on your hands and you don't even know where to start, usually it's good to ask a little help from your friends. We're a big software community, right? There's a lot of people that could help us. So we went around and we asked the PHP dude, we even asked the .NET troopers, you know? And we were so desperate, we even asked Java the hut, you know? Super powerful, hardly lightweight. But it turns out none of these guys could really help us. You know, they said, you have to think about yourself. You have to think about who you are, as there will be community, as there will be developers. So let's think about who we are, you know? We're nimble, for sure. We're moving fast. We have style. We have values and principles. And more importantly, we have the force. So who can we be but Jedi's? So the cool thing about being a Jedi, Ruby developer, is life is easy, right? We have our problems. We just use the force, a little ruby trick, you know? RB, script console, looking at our logs, raise inspect, we're out of trouble in no time. And that works for us 99% of the time. But sometimes we're in real world trouble. Usually in a production environment, you know? And not only we're in trouble, but this is one of these cases when we have to solve the problem quickly, and the entire team is counting on us. Does that sound familiar? It's a little like, you know, Luke, in his mission to destroy a Death Star, you know? So when you're in this kind of situation, it's good to remember that even a Jedi is never alone. When you're concentrating on, you know, fighting and steering your ship, for the whole time, there was somebody in the back, you know, who was working hard, keeping chains running, keeping the energy level high for you. You remember him? So this guy, we have it as Ruby developers. We have our own little R2D2, just for us. We have somebody that keeps our Ruby applications running all the time. We don't even know about it. We forget about it. It's called the operating system, units for most of us, the lucky ones, you know? So in terms of, there's a lot, in my experience, Ruby developers, for some reason, might not know all of them, and they're well-documented, you know, in the world. There's a lot of tools you could use, a lot of success, you know, L-S-O-F-I-O-STAR, TAR-P-R-STAR, whatever, you name it. Something you might not know about is some of the tools that are not only system level, they also give you, at the same time, information about your Ruby, stuff happening in Ruby, and stuff happening at a system, which is pretty cool, but then you can correlate the information easily and see what's going on. So tonight, I'm going to concentrate mostly on two of these tools where you can see what's happening in Ruby or what's happening in your system at the same time. The first one, unexpectedly, is GDB, the Groody Burger. So acting as a C-D-Burger, GDB can tell you everything what's happening in C, so everything that's happening in your system, level, like system calls, signals, everything that's happening in your Ruby interpreter. But what you might not know is, with a couple of clever macros, you could also, from GDB, do stuff like getting a stack trace, a Ruby stack trace, raising a Ruby exception, or even evaluate arbitrary Ruby carry, which is pretty freaking cool. So do you guys want to see that tonight? So let's try to get started. It's mostly going to happen in the terminal. So I've got a little raise application, which I'm going to launch. I'm going to try to find it. So three actions in one control. The first one I'm going to call just right now. So it's basically mostly sleeping and returning. So live on the stuff here, as you see, my action is working. My Mongol cluster is happy, up and running. Then something is going to happen. Actually, let me start over again, because I forgot one critical step, I think. Ping rep. Yep. Let me start over. Here we go. So something is going to happen, but you don't know anything. Something strange is going to happen. You don't know anything about. But all of a sudden, when you're going to start to target your Mongol cluster, whereas it's not going to respond anymore. By the way, it's never Mongol the problem. Every time I investigate all of these troubleshooting problems in production, it's never Mongol. It's always your code, Rails, your database, your system, you name it. Mongol is pretty freaking solid on that. So when you're like this, well, let's try to use GDB to attach to the process. So first, I need to know what the process is, the PID. Now that I got my PID, I'm going to try to attach to it. Here we go. So now it's a C debugger, right? So I kind of was going on that C level. So let's try to get a back trace, a C back trace. I got my back trace. So here, it's not going to tell me much, right? Of course, I'm evaluating Ruby code. I knew that. What's more interesting is if you go to the top of the stack, where am I stuck? If we go up, we should just try again. That's going to be easier. If we go to high level of stack, I see, oh, that's what I'm blocked on. I'm trying to get a lock on the file. So why am I trying to get a lock on the file? Well, it seems to be this kind of blackmagic.cs doing something, start trains. Oh, I kind of remember. I think I'm using a native gen called blackmagic, but maybe it. But from where? Which controller is curving this? So what would be cool would be to be able to get the stack trace, right? What Ruby stack trace, not just a C level stack trace. So let me try to go to my process. Here we go. I'm going to clear the process so that you can see the output. I'm going to show it on the space here. Not this guy. And then there's a pretty freaking cool macro done by Mauricio Fernandez of Egg and Class fame, which allows you to eval whatever Ruby code you want. So let's try this. That works. Let's try something else. Maybe there would be a good way to get a stack trace. Eval caller. Showing off. We got our stack trace. And if I go up, oops, let me clear this and do it again. If I go up in my buffer, I can tell you, oh, I'm in my control. Showcase, line 6 in the Action Provide Business Value. And if I were going to my code, I would figure out pretty quickly, oh, this is when I use this black magic Jamestar trace. So you could argue that what would be nice would be to know which file I'm trying to lock on. That would be a great job for LSOF, but I'm going to go back to this to another talk. You're welcome to check LSOF documentation if you want. So if I can you evaluate arbitrary Ruby code, I can evaluate caller. That's pretty cool. But I can't even go crazy, right? I could go stuff like actually let me clear my window first. I could go like Oshik space, each string. Or each object, sorry. And I could say, well, for each of these strings, just spin them. And I've got all those strings in my web instance. Not super useful, but on the same principle you could try to find all the classes in your system and walk the object space and try to see how many of them you have. It turns out actually pretty nice micro to do that. Let me clear this up and do it again. So now we see that we have only one mongrel, what we're showing. We see how many strings, how many hash, how many modules we have could be handy. So that's going to be it for the demo. Something important when you attach to a process with GDB, actually I could show that to you maybe, is if I try to control C here I'm not going to be able to exit. That's called GDB is getting a lock on your process, so you first need to exit from GDB. Cool. So something also I'm going to tell you I cheated a little bit is for GDB to do this magic you need to be able to access the debugging information of your process. Which most of the time it's a non-issue. If you're on Linux or Solaris stuff like that, you're pretty much okay. We're out of the box to be interpreter. On the macOS Leopard the interpreter is not compiled with debugging information. So if you want to do this kind of tricks you can go off with compiling your own Ruby which by default enable debug and use it to troubleshoot your problem. So let's review a little bit. First let's give credit to who it's due. All these wonderful macros, I didn't make them I wish I could, I did. They've been done by James Buck two of them and most of them record them, Mauricio Fernandez and Noble. If you look at them I'll tell you later how to get them and download them, it's pretty freaking awesome. They're basically re-implementing eval, but as GDB macros that's pretty cool. So let me review some macros with you. First, how can we get the macros? You can get them in different places on the internet. The easiest way is actually to go to my website phsupport.com On this page you're going to find download link you get all the macros compiled together with additional documentation. Save it as a dot GDB init in your home directory and your golden. So what kind of macros can you have? Well, this one is one of the most important one. It's all be finished. You're basically not doing anything except putting it in a safe state because you might attach to your process in a state whether it's not safe when interpreter without arbitrary code. So all be finished is taking care of that. In a safe state, you can keep going. The next macro is the best one, eval which you can evaluate to have it through Ruby code, especially caller when you're in trouble and want to get its backtrace. Some other ones, this one is by James Buck. This is a way to get the Ruby backtrace, the Ruby backtrace, not the C1 from GDB, interpreting the C-level backtrace. It's not working all the time though, it's working using eval caller. Ruby raise is a nice little trick by James Buck too. It's a way to raise from C it would be exception so you can get your stacktrace in your log, most of the time. So all these macros are going to put the output to the standard output of your process. Sometimes it's a little difficult to trace down the output most of the time on gold.log but if you don't know, you could temporarily redirect the output and it ends up in a temporary file in the slash TMP ruby-debug-pid directory. So GDB is a pretty nice tool, it was better than nothing to figure out what's going on at C-level and at Ruby-level and gives you pretty good visibility in what's going on from what I call the Jedi space, Ruby interpreter and application space which is more like Yoda. But you can get too much into R2-D2, you can't get too much into this operating system because there's a fundamental limitation in Unix between kernel space and user space. Another important stuff to understand about GDB also is that when you touch your process with GDB you basically capture a snapshot of your process, just like any debugger, right? So when you see your application with GDB, it's basically like this. It's frozen, you know? So if your problem is dynamic or if you're not monitoring the dynamic behavior of your application, GDB is not going to be able to help you. So GDB is a great tool, you know? It feels like a Jedi pistol. You're very glad you have it. But also it gets jammed. You know, I would lie to you if I would promise you that any Ruby process we have that's locked you're going to attach it to GDB and all these macros are going to work at all the bugs and you get answers to all your questions. That's not true. That's true. 70-80% of the time there's still this last mile when the macros are not working and it's going to be very hard for you to figure that out. So it feels like we're a very nice tool, we have a pistol as a Jedi, but we wish we had a more powerful or more comprehensive tool, you know? Something more like a lightsaber. And actually Ruby developers do have a lightsaber already. It's called Detrace. Some made it just for us, you know? More exactly just for the ones that are running on the West End, BSD or Solaris. So Detrace is a revolutionary, amazing dynamic tracing tool made by Sun, but that you cannot find on Leopard and on BSD. What's so amazing about it? Well, the best part about it is, there's this fundamental divide between user space and cannot space. With Detrace you can see everything in your system. Everything, everything. It means the deepest internal internals, if you can make sense out of them, and the higher level constructs of your application. So why is it so good? Well, first you only learn one tool. You know, one tool to all the more. But also, more importantly, if you use only one as opposed to using logs in your Rails application, instrumenting the Ruby VM and becoming a guru in tracing what's happening in your kernel, the stuff about Detrace is instead of having this information in different places with different format, it's going to be very hard to correlate, you know, to put one in relation with the other. Because using one tool, you can correlate all this information. So you could answer very easily questions very difficult to achieve, like within this Rails request, so high in application level. Just for this particular Rails request, tell me all the system calls that are called only during this method, because you have visibility of the world stack. So this is extremely powerful. Sometimes Detrace is used like as a sys admin tool. We need to take it over and we need to make it a Ruby tool, you know, and other things. So remember like GDB was a lot like frozen processes static view of your process. Detrace is the total opposite. It's all about dynamic aspect of your application. It's all about motion. And a matter of fact, it's a lot like motion capture, in a sense of Detrace is going to be able to monitor little probes that you're going to put on your application just like these little bulbs on a motion capture outfit. So you can monitor in real time all the actors working in your system. So it's a little motion capture but even better, even more complex because not only monitor in real time but you manipulate multiple levels in real time. You go from the bare bone system view to the mechanics of it, more like the Ruby interpreter to the full flash body of your Rails application. So you can see everything at the same time in real time but basically there's no limit to your visibility there's just limit to your understanding of what you're capturing and how to make a sense out of it. So first a quick show of hand how many people here already use Detrace? Okay, quite a bit and how many people know about the de-scripting language? Okay, we're much the same. So I'm going to cover a little bit of introductory material on Detrace to make sure you understand what's going next. So first a little bit of vocabulary. Detrace is tracing what is called Detrace probes. A probe is pretty much like a sensor you put in your code at location, it's a location it's an event you care about and that makes sense to you and you want to be able to record and capture. So in another way you can see these Detrace probes as little bulbs on the machine capture off that stuff you can capture and monitor. These Detrace probes one of the amazing stuff about them is you can enable them or disable them on demand. So to keep the analogy about little bulbs imagine some of the bulbs you can turn them on and off on demand to capture only what you want I mean just the arm, maybe the full body and to keep the analogy even further if you were going to turn off a bulb and machine capture off that it's not going to use any electricity just saving on electricity. The same stuff with Detrace if you disable a probe by design the Detrace impact, the performance impact of Detrace on disable probes is very close to zero. So the great stuff by being close to zero is you can use it in production. Like you can launch your application and Detrace probes turn off when we have a problem we just turn on a couple of probes here and there to analyze what's going on and progressively understand what's going on in your system. So you can analyze the problem you care about in the environment you care about your production and at the layer you care about you can be very selective. So probes are little sensors, little bulbs on the system when a probe is enabled Detrace can listen to them and when they trigger, when they are encountered they trigger something that fire what we call an event. This event you can record, capture and you can do all kind of analysis on it. We're going to see that later with examples. So the question would be like oh we're talking about these probes, these probes, these probes where do they come from? Well somebody has to give it to them to provide them to us and it's called providers and basically it's some user space libraries or some kernel modules that somebody writes for you are providing instrumentation for something you care about. So let's say you TCP IP stack let's say your system calls let's say you will be a interpreter let's say Postgres let's say javascript interpreter could be whatever you want. So now that we know this we can try to understand the D language it's all about when this probe is triggered what should I do? So it's very very simple D is pretty much influenced by C in its syntax so you shouldn't be lost at all. So the first line is going to be always this is pseudo D the first line is going to be a probe description which probe which little bulb should I listen to and when you listen to this probe the action that's going to be between curly braces is going to get executed what are you doing in the action you're going to print information you're going to collect information you're going to aggregate information you're going to do something and you have this second line the slash predicate stuff that's an additional refinement you could have on your probes you could say oh my probe fires but only under these conditions only when all the conditions in my predicates are true then you're going to do my action make sense? let's see what kind of predicates could we have it's very very simple it's going to look familiar to you you could be like I'm only running on one CPU I don't care about stuff happening on one CPU or I only care about one process or I don't care when a program is actually the system schedule and of course you can combine all these conditions together using the regular C conditions so that's more like a real life sample that's true D code so this one is like remember the first line is a probe description here's a probe description is for every system call that's of right when it stops ok? so at this moment what am I going to do only then I'm going to second line for the predicates only the condition is going to be true so only if the program is actually bash and in this case I'm going to just print a little message telling you the PID of the bash process that is doing the right so not work at science but actually very powerful we're going to see that in a second so what kind of action can you do within your curly braces block you could do a lot of stuff but most of the time it end up being printing information with printf regular C syntax printa which is more printing aggregation you could print some average distribution of values you could also capture the stack the kernel stack or the user's level stack of your process useful to know like you could every 10 seconds see where you are in your system and figure out where you're spending time in your system and you could also create variables and capture a timestamp and put it in a variable here that's the last line is assigning a variable to a safe storage self so that you can use it later so sometimes people think detrace is only to listen to events that's somewhat true but that's very powerful for example you can use detrace for performance analysis to capture how long it's taking you to do something and the pattern is always the same it looks like this first you're going to start something every time we're starting a read system call for Apache the demo in HTTPD what are we going to do we're going to capture in a thread local storage a variable we're going to call start that's going to be the timestamp of when this stuff started so now we have the start of the process so next we're going to define the rule which is saying oh no time returning from my read system call you remember this timestamp I add instead of the start I'm going to minus it from the timestamp of now which is how long it took and I'm going to print it and I'm going to aggregate it to distribute it too there's a little trick there's some little trick around there like if you look at the second line this is in there to make sure we actually did record the entry before recording the return because with detrace you attach to whatever process so you might very well record a return before recording a read so what can you do with detrace what kind of providers do you have what kind of stuff can you listen to of running Solaris just do sudo detrace-l you're going to see all the providers you have that's pretty amazing very different just to give you a little taste here are some providers that cover pretty much a whole stack from canal level providers so system call PAD, VM stat also network stuff there's some pretty amazing providers for UDP traffic, TCP traffic and FS traffic and also like VR level language interpreters there's some pretty amazing providers for javascript thanks to Joi we have providers for Ruby but also we have some providers for Java and Alang and even for stuff that are like bigger pieces of software like database passgrass or even x11 and adobe hair that was a pretty cool demo in detrace conf about adobe hair so of course we mostly care about some of them which are the Ruby probes now we can trace Ruby with detrace which is pretty awesome so be kudos to giant who wrote these probes in the first place mostly for their Solaris platform then Apple was the amazing job at incorporating these probes in Leopard and you have them for free in Leopard so on Solaris and Leopard you pretty much out of the box you can get these Ruby probes which is pretty nice so what kind of stuff can you probe with these Ruby probes it's pretty inclusive you can trace every time you're entering a function every time you're returning from a function every time you're raising an exception every time you rescue it you can even trace every line of code you execute for HTML file which is pretty amazing every time the garbage collection is starting or stopping every time you're starting to create an object and finish starting creating an object which if you have huge location time for creating Ruby objects might be useful every time you're free an object and the last probe Ruby dash probe is actually one of the coolest but I'm going to tell you about it right now I'll keep it for later so now is about time to see this in action so let me go back to my little Rails application I'm going to restart I'm going to reduce the duration of the call so with less so basically I have a running Rails application nothing very fancy here what's more fancy is that now I can use detrace to kind of see what's going on on multiple levels so let's first start at the system level so detrace is a command line tool you need Ruby relish to use it and you can inline some scripting like this so I'm going to try to record all the system calls happening and now I can see everything that's happening at the system level on my system which is pretty cool but of course you want to see what's going on at Ruby level so let's try another probe for Ruby so first I'm going to need to know which process I'm going to target so there's something a little strange on Leopard with Ruby probes is that they are not enabled until you launch a reprocess so if you do detrace that shell and don't see any Ruby probe when you're freaking out just launch a reprocess they're going to appear so now we could try to target every function we execute so Ruby I don't see my terminal anymore function entry and need to give the PID of the process and of course I need to agree so now I see every time I execute a function what's not very useful is sure I'm entering a function or method which one is that you know I want to know better so every probes give you some arguments which I'll call arg0 and arg1 which you can print so let's do it so that would be the intuitive way of doing it unfortunately that's not the way it works because arg0 and arg1 are coming from user space detrace is instrumented at kernel space so you want to copy from user space to kernel space this argument so detrace can make sense out of it so there's a little utility to do this and now I see which object probably twice arg0 not arg1 I see which object but also which method I'm entering in real time which I could correlate the system course for this particular method which is pretty freaking cool so this is basically relying on system instrumentation this is relying on Ruby instrumentation but wouldn't it be cool if you could go even higher remember I promised you can go to your application level and it feels like how am I going to do that how am I going to be able to go from a stuff that's provided to me as providers in ccode which I probably don't want to write to something I can trigger from our Rails application or Ruby application that's going to make sense for me that would be nice right every time we're starting a new request we could trigger a probe and every time we start stopping the request we could trigger an event based on another probe so it turns out the join guys did that if you look at the source code they're providing for the Ruby VM on Solaris they're using the Ruby tracer they have a method called fire and if you give two arguments to fire basically two strings you remember this arg0 and arg1 that's what you're going to get you can trigger detrace probes and events you can trace which is pretty freaking cool you can even give it a block and it's going to trigger an event at the start and then at the end the problem is I don't run Solaris I love Solaris I love open Solaris but as servers not too much in my laptop so when I was preparing for the demo I was like hey I want to do the demo on Lepid I don't want to do it on Solaris so I looked for tracer for dot fire on Solar on Lepid couldn't find it so I was going crazy I was like I can't believe Apple was so smart to introduce all these pros and they'd leave away tracer this is so powerful well actually after a little bit of investigation and going into the internals I found out Apple did provide tracer it's just called detracer and unfortunately it's not exactly the same API it doesn't take a block and provide a new method called isEnable so it's kind of ok that's cool but I want my applications to have the same code to run on any platform I want to run it on a joint VM so I was in production I want to run it on Lepid VM for the development environment and for this QE and business analysis guys running Linux or windows for them to work too the same code even though there's no detrace facility on the platform so I wrote a little gem called xray which is basically just subtracting you from all this different implementation stuff and give you one interface to all the systems and no implementation for the entire detrace so let's see this how it would look like so let's imagine we have a Ruby class just my application I want to trigger a detrace probe every time I'm starting my application so every time I'm entering this stop method well with xray or detracer or tracer that's very easy and do something like fire arg0 arg1 and whatever arg0 and arg1 you decide what it is what's meaningful to you of course this fire method needs to come from somewhere so you need to include a little module and require and you're all done so extremely easy any Ruby developer can do it and it gives you tremendous instrumentation at all the value of stack of course most of the time you don't want to record only when something starts you want to record when something starts and when something ends so a nice convenience way to do that is to use xray to do like firing and you got this arg0 and arg1 again you give it a block and automatically in this case my-ws-start will be triggered at the beginning before executing the block and my-ws-end will be triggered after executing the block that's pretty convenient so let's try to see it in action I'm going to show you this showcase controller so I'm going to try to do something useful which is using this firing stuff here I want to trigger a probe with custom request do something useful which is basically the name of my action and then let's imagine somewhere here I'm doing some kind of database success I like to fire a custom db stuff with the actual SQL query I'm executing to monitor how long I'm spending in it so if I'm trying to run this action here we go so now let's try to use dtrace to see if our custom tracing can be recorded so I'm going to go remember this magic ruby-probe I didn't want to tell you about that's exactly what it is for so we're going to use it I know nothing happens because my application is not moving the request I'm not getting in so let's get a couple of requests and now my probes are firing so if you're paying a lot of attention you might realize that some probes seems to be out of order so you could have rendering finish I mean you could have your request finish before you're rendering which doesn't make any sense this is actually a feature of dtrace in a sense that this is happening when you're changing scheduling multiple CPUs which are recorded by dtrace on each CPU and to avoid in the one dtrace team wants it to be very lightweight no impact on performance so I don't bother correlating on each CPU the two different flows so if it's a problem to you just print a timestamp and you can solve a timestamp of course if you want to record time and timestamp and stuff like this this is not impacted by this CPU scheduling stuff so I hope you're excited about it but the thing is do you really want to write this for all your rails request I don't think so so part of x-ray 2 is easy and on-the-fly instrumentation of all your actions to know how long you're spending in request time in rendering time and database time monkey patching active record and action control so of course I'm going to need to restart my world's application at this point we don't need gdb anymore so let's restart it now I'm going to execute another action on my control which is more a classic version there's nothing dtrace specific here it's all going to happen magically with monkey patching so let's go back to dtrace window and let's do the same thing because it's pretty much the same thing Ruby dashboard stuff is always very good for your custom application level probes and here R0 and R1 I'm going to be, is it a request is it rendering and argument is going to be which request it is or which SQL I'm executing for dv query so process change so here we go again nothing is happening because I would need to use my application to make something happen so let's go to this controller clear everything so we can see better and do a request and sure enough we see we started a request we ending a request the rendering started the rendering handed and the dv access started and dv access handed so here we're not out of order which is pretty lucky so that's pretty cool and what I really want to know is how long I'm spending in rendering how long I'm spending in database that would be way more useful so let me try to show that to you very quickly so of course this is more involved so I'm not going to do it on the common line in one minor thing I'm going to use you can write dtrace scripts so it's more something like this so you could have a begin block which is going to get executed when you start your script no big deal here I'm just saying ctrl c to n because basically here I'm going to compute times so basically I'm going to aggregate information instead of just putting information on the fly I'm going to capture information aggregate it process it, normalize it and then give it back to you so you could have a begin block and you could have end block too it shows you're cutting at the beginning and at the end let's focus just on the first probes so this one is saying for each probe that I'm triggering that's the start of a new request or just capture the time as a local variable called request start and this is actually not a regular local variable this is an aggregation local variable think of it as map hash map or action so I'm going to be able to trace by action it was the name of the action then when my request is ending I'm making sure this is really a request hand I'm making sure that previously I did record a start for this guy then I'm going to count how many times I invoke this request I'm going to compute how much time I spend in this request I'm going to compute the average, the sum and even quantize the time spent in the request and if you look down you're going to see exactly the same start for database success so let's try to run the script it's going to be way more meaningful here we go ctrl c to hand so first I need to generate a couple of requests I'm going to try to generate a couple of classic requests which go pretty fast and also be consistent then I'm going to try this particular action because I can trigger how long it's taking with this parameter so I'm going to do two of them that are pretty consistent in time and one that's being consistent and hopefully you're going to be able to see that so we're going to see not only we're going to be able to see how much on average we're spending in one request for each request, we're going to be able to see do I have a consistent time for this request or is it all over the board so let's try to get our results now let me clean my buffer we'll see so let's see what we have so for the showcase controller classic action I see it's pretty consistent the distribution consistent almost always the same duration which seems to be a pretty small duration as opposed to the next guy which is a second action and trigger for a second action and trigger I say oh I'm not that consistent most of the times it's not that bad but sometimes it's taking way way too long I need to do something about it and this is where you can start drilling down and figure out so for this guy talking long you know what was happening in my system what was happening at the Ruby interpreter level or what was happening in my OS I will answer this kind of question and so it's instrumentation of the requests but you have the same stuff for oops maybe I didn't enabled it here we go now for the database requester so you could analyze how long you're spending in your database query using Dtrace 2 so if we go back to the presentation the great stuff about Dtrace is you can drill down to your system you can start a very high level saying oh this request is low or this collection of requests are slow and from this analysis try to figure out how am I going down and out refining your understanding little bit by little bit you can start saying oh which method which Ruby method is taking a long time within this request we're going from application level to interpreter level and say oh is this method and within this method you're going to say so what's taking long at a system level within this method and you figure oh it's just TCP access only when I'm talking to this IP oh so this subnet is actually going slow right now so that's a kind of investigation you can go from very broad top level to very you know sharp low level or it could be the opposite, it could be like I had this system problem that's happening like the lock we have on the file why is this happening what was going on at a level in the stack and you could say oh when this is happening before that record which request and record which method and I'm going to be able to go up the stack again so this is where Dtrace shines I think considering time I'm probably going to skip this demo I did all my demos come online but the cool thing about Leopard is it comes out of the box with something called instrument which is a kind of visualization tool built on Dtrace which is pretty awesome and come with stuff that are useful out of the box something that is worse investigating is you can build your own custom instruments so for example if you're X-Ray for request or your own custom probes you can define new instruments in this application recording only this and give you a visualization of it and auto completing what you can do this R0 stuff so I encourage you to build new instrument and you're going to see exactly the same stuff probe description, predicate, action the same thing just nice and graphic and you cannot complete the probes you're interested in so what should you remember about this presentation I hope you're going to remember at least one thing and the one thing is don't wait for your tough production problem to show up don't get caught unprepared for the first time your deployment is going to go so on more importantly do not wait do not wait for the emperor to take over your mongrel cluster you know so start training today and how can you train today well there's a lot of system tools you can get familiar with and on your day job and that's going to provide useful value and like LSOF you know like IOS stuff like D-Trace you could use it for developer stuff and if you get used with these tools then you won't have to learn the tool and the problem at the same time when your time matters in a good track so there's a lot of documentation on system tools on the internet but I would recommend you to do especially if you have a leopard machine is to install the x-ray gem this way you got all this simple and useful rails implementation and you won't have to figure out for a couple of days why tracer is not in the leopard VM I'm obviously not objective here but I would still recommend my shortcut for example there's a great chapter on my LSOF which is a well underused tool in our community and I didn't have time to cover it tonight you could look around the internet if you don't know where to get started with on my website I posted a couple of pointers especially for D-Trace you are welcome to check it out and have fun with it so that's it for the presentation thank you very much I hope you liked it I know if you have any questions we have a good time to ask them go for it do you have something you would be I have the PID with PS this would be for specific PID so you don't have to show me this I would be before PID so it would actually be before and you can do this it's a problem for what we do for now but for some reason it escapes me it's a mixture of we don't really agree with your design we think it would be better than D-Trace and it's also a mixture of different feelings so Linux doesn't want to port D-Trace which is strange because even BSD which is pretty picky what it picks did port D-Trace to the platform so they read that for a couple of companies supporting as an alternative which is fine but the thing is, it's way fine SystemTap is still stuck in kernel level basically it doesn't give you anything user level and it's going very slow I mean unless somebody does a miracle quickly it's not going to happen very quickly so actually if somebody is feeling pretty hardcore about D-Trace, kernel, some tunnels SystemTap, that would be great because a lot of projects are deployed on Linux and on Linux we don't have a lightsaber we stuck with a pistol at this point does that answer your question? on my website I included some links for D-Trace that does include something to SystemTap the official website and also some opinions on SystemTap but one of the D-Trace guys was a few of the experience and plans for D-Trace the module that's calling the C-Extension so you still have the cost of going from the module to the C-Extension but reasonably for Ruby applications we don't care about it it's more like if you're writing a Ruby on Rails application you're probably not concerned about the cost of one single method call within your system so I would encourage you to try it we're not with these tools do you think there are any ways for other priorities to happen especially with the things that you showed us with the GDB so what I want to talk about is Ruby was lagging a little bit in terms of tooling especially for performance it was going on your system and we're like we're kind of lazy waiting for a good solution I think we say we're stuck with NVN we don't have a common platform we're not going to build some kind of instrumentation stuff for Linux, for blah blah with D-Trace we start to have a pretty compelling standard platform standard APIs people can use to build whatever creative tools they can think of so a D-Trace instrumented ROB would be awesome also some kind of demonstrations not for Ruby but for more like an array disk stuff for interface no one was telling you about drilling down you would see on the top what's going on for one system then you could be fine graphically what you wanted to see and see another line and see what's going on and keep going this way and you see what Apple did with instrument that's pretty awesome even if you could argue that some aspect of D-Trace could be leveraged even better with an instrument but in any case I think there's a huge potential based on D-Trace because it's still like how many people are going to come align it and write the script and stuff like that I think especially for startups and people with creative minds there's tremendous possibilities for providing visualization tools the community needs based on D-Trace with standard API