 Hello. It's Todd again. Today I'm going to talk about Dtrace, which is a very interesting tool for tracing what is happening on your system. But before I talk about Dtrace, I'd like to make a public service announcement. In addition to talking, I am also volunteering to help with the network that we have here for the conference. The network is managed. It's basically split over four different DSL lines, and it's very, very busy with all these people here. And because it's very, very busy, it occasionally gets slow. And when it occasionally gets slow, it might be very tempting for you to pull out your phone and make a temporary hotspot. And I'm here to ask you, please do not do that. Because while that might help you for a few minutes, the more people who do that, the worse it is for everyone else. And the more temporary hotspots there are at the conference venue, the more clogged up the airwaves get. And the harder and harder it is to recover from a small network outage. So while it may be very tempting, I'm here to beg you, please do not turn on your phone's hotspot. And if you have one on now, just turn it off and all will be forgiven. But if you leave it on, maybe not. OK, so Dtrace. So first I'll talk a bit about me. I've been doing this for a long time. I'll just leave it at that. I come from Japan. Well, I don't really come from Japan, but I came here from Japan. I've been there about 20 years. And I worked in the industry for a very long time. And I quit my job last year to do interesting things. And here I am doing interesting things. So what about you guys? Has anyone here heard of Dtrace before? It's a little bit more obscure than some tools. OK, maybe about 15%. Has anyone used it? Almost nobody, OK. So I'm here to maybe help increase those numbers a little bit. So I'm going to talk about the history of Dtrace, how Dtrace works, and how it's built, and a couple of use cases. And even if we're lucky, a live demo. So what is Dtrace? So Dtrace is a debugging and system analysis tool. It differs from other tools in that, like a lot of tools you might have on your system, you've probably heard of Strace or Top. Strace, for example, traces every system call that a program runs on a unique system. It focuses on one process at a time. Dtrace looks at the entire system. It's a little different from those other tools. OK, don't, oh, there we go. Dtrace traces the software as it's running. So it's similar to Strace, but it's basically looking at the entire system. And you can trace the kernel as well as user land. So you can actually say, OK, I want to look at every time this piece of the kernel is run, and Dtrace will collect that data and tell you about it. It's open source code. It's released by Sun Microsystems for Solaris first. And it's under the CDDL license, which by some definitions qualifies as open source. And it's now spread far beyond Solaris into several other open source operating systems, most notably, perhaps, Mac OS, where it is quite from the entrance to find out. And I'm assuming most of you guys are Linux users. Sadly for Dtrace, it's not quite as mature in the Linux world, but there are actually two separate Linux ports in progress right now. And I'll talk about those a bit more on the next few slides. So Dtrace has been around for a while. It's already 11 years old. And it came out. It was one of the big killer features of Solaris 10, 10 years ago. It's really, really solid on Solaris. It's production-ready. In fact, Sun, at the time, it was Sun Microsystems, really encouraged you to just use it on your production systems in a live scenario, like, if your production system is having a problem, it's not working, it's not performing the way you want it to, Dtrace, just go in, use Dtrace, and look at the running system and find out why. You can actually use it to debug a problem that's happening as it's happening. And it's designed to be stable enough that it will not, well, in theory, it won't crash your system. So Apple saw this, and they said, yes, we want this. So they took it, and it was only two years later. It was released in macOS Leopard to 10.5. And since then, they've completely adopted it. They use it as the basis of a lot of instrumentation in the Xcode development platform. There's an app called Instruments that you might have seen if you're an Xcode user. It's all built on top of Dtrace. And the Dtrace implementation on macOS is actually quite mature these days. FreeBSD said, yes, we want this. And so very shortly after that, after it was released for Solaris, it started to be added to FreeBSD. And by now, it's quite mature in FreeBSD as well. It's been the default since FreeBSD 9.2, which was what, three years ago, two and a half, three years ago. And FreeBSD is my preferred operating system, and that's where I use it. And it's also available for Linux, although it's not quite as stable as the other platforms, and it's not quite as widespread. So there's a very serious implementation for Oracle Linux. So Oracle, when they bought Sun Microsystems, they got the intellectual property of Dtrace, and they also have their own Linux distribution. So they said, OK, we'll mix these two things together and make something great. And they put a lot of effort into it. They've tested it, they have a test suite, and they're pretty serious about doing it well. And while the kernel bits of it are open source, the userland bits are not. So you can't just take this and run it on any Linux. You have to use Oracle Linux. So sorry about that. But if you want the most solid Dtrace for Linux, there it is. There's also a completely free implementation that's available on GitHub. And I haven't used this Dtrace for Linux myself, so I can't really speak to how mature it is, but I've heard that it's usable. But it's still young. It's not quite as production-ready as the Solaris implementation, for example, but it's there. There are also alternatives for Linux that do similar things to what Dtrace does. You might have heard of SystemTap, which I have looked at, Ftrace, which I have not. They are the same as Dtrace, and they're not compatible. Dtrace has, well, I'll get into it a bit, but Dtrace has its own programming language, and SystemTap doesn't use that. But anyway. So what is this thing? Dtrace takes your code, your running code, that you have on your system, which normally runs invisibly, and it makes it visible to you. Basically, it lets you have X-ray vision into what the computer is doing at any given time, and you can ask Dtrace to tell you whenever a certain piece of code is run, and you can ask questions about the circumstances of what happened to lead to that happening. So it basically allows you to ask questions. When is this happening? How many times is it happening? When it happened, what else was happening? What caused it to happen? This kind of thing. So it allows you to, if you're seeing a problem and you want to know what's causing it, Dtrace allows you to drill down and say, hey, what's going on here? And it allows you to do this while it's happening, which is what makes it really useful. So you have a system that's behaving strangely. You don't understand. Use Dtrace to kind of find out. As I mentioned before, it's not just one process at a time, it looks at your entire system. So if you want to know every time anything, anywhere opens a network socket or opens a file or calls a particular system call, Dtrace will tell you. So if your system has a lot of locks, the kernel is locking a certain file or a certain resource a lot, just ask Dtrace, well, what process is running when that lock is acquired? What process is running when it's released? Dtrace tells you. It's very efficient. So unlike some other tracing tools, I don't know if you use Strace. Strace is great, but Strace slows down the program that it's tracing. So if you use Strace to tell you, oh, hey, that program's just sitting there spinning doing something, but we don't know what, let's use Strace. Well, if you use Strace, it tells you, but it also slows that program down, sometimes significantly, because Strace works by basically stopping the program and starting it again, stopping it and starting it and catching each time it calls a system call. Dtrace doesn't do that, and in fact, when you're not using it, it consumes zero CPU time. When you do use it, it consumes a very little CPU time, just enough to kind of grab some information. And crucially, and differently from other tracing tools, it traces in the kernel and in user-land, so you can actually trace any part of the kernel and any part of user-land code. So how does it do this magical thing? So Dtrace actually, when you use Dtrace, it takes the code that's actually running on your system, whether that's in the kernel or whether it's your user program in user-land that's running, and it basically patches that code, the executable image of that code in memory, basically makes a modification to the memory image of the code that's running, whether that's the kernel or whether it's your user-land program, swaps around a few instructions, patches it, and says, you know what? When you're running this piece of code, don't run what the compiler said to do here, we're just gonna patch that instead, run this bit of code instead, and then jump back. So it actually patches the code in memory while it's running and makes it do something else. And then when you're done using Dtrace, it puts it back the way it was, which if you're a systems programmer, this might sound horrifying to you, but it actually works quite well. So what does it do? So the change that it actually makes, the way that it patches the code, is it adds some instrumentation, and by instrumentation, I basically mean it collects data about that execution. So it can collect how many times a given piece of code ran, it can collect the contents of variables that were in effect when the code ran, it can, yeah, you can basically just collect any piece of information about the state of the system every time a given probe is encountered. And because of the way that it works, because it actually leaves the code alone when it's not in use, it has zero effect when it's not enabled. And when it is enabled, it just basically, just pulls the code apart and sticks a little piece in, and then, yeah, grabs a piece of data, and that gets undone when it finishes. So once the data is generated by this patch, it gets filtered, summarized, and displayed. So basically you can have any number of detrace probes active at any given time, and all of them are feeding the data that they collect in through a central filter, collector. And then that gets either filtered, summarized, or any kind of query can be done on the data that's provided by the detrace probes. And the way that you do that is using the detrace command and the deprogramming language, which are part of detrace. And I'm going to give lots and lots of details about the detrace command and the deprogramming language, but not today because I only have another five or 10 minutes. So I'm here to tell you about the detrace workshop that I will be doing on Sunday, just up the road for about four hours, and it will basically be the extension of this talk. We'll be going into lots of details about detrace. It will cover the D language in excruciating detail, and there'll be lots of exercises and examples and a chance to do workshops, say exercises, and all this great stuff. And if that sounds exciting to you, talk to Hasgeek, you will need to have your own laptop with you that either has an operating system running that supports detrace or you can run an operating system like that in a VM and that works fine too. So if you're interested in that, come along on Sunday. But for now, I'm just gonna give you a little bit more information about detrace and then I'll stop for questions. So there's basically three components. I mentioned probes, probes are where the data is collected. Providers, providers are groups of probes, and then consumers is the part of detrace. It's actually reading all that data and summarizing it. So probes are basically like, what does detrace know how to instrument? Because it's not really safe to just randomly modify pieces of code anywhere in memory. It's actually a little bit risky. So the way that detrace does this safely is that it has certain kinds of code that it knows how to modify safely. For example, the C compiler generally compiles the entry points to functions the same way every time. There's a like in a given CPU instruction set when you enter a C function, there's a compiler outputs a certain preamble that looks the same for every function call entry. And detrace knows what that looks like on the architecture it's running on. So if you wanna put a probe for a C function, detrace is like, yes, okay, I know how to patch this to do that, and so there is a provider that knows how to patch C functions. So every time you put a probe on a C function, that's called the function boundary trace, that's a probe, and the provider that knows how to do that is called the FBT provider. And so every time, so basically you can, as a developer, as a system developer, you can add your own detrace probes, you can make new probes for different kinds of things, and the provider allows that to happen. Basically you make a provider that knows how to put a probe on a certain type of thing, and then you can have probes within that provider. And then consumers, basically that's an abstraction, and then all of those providers feed the data that they collect in the same funnel, and then the consumers read from that funnel so that no matter what the provider was or what kind of probe it is, it's all summarized and filtered through the same abstraction. So basically every probe has a name, you name it by the provider, the module, the function and the name, and that's probably too much detail for now. Is that the five minute warning? Okay. So I'll just show you what that looks like. I don't know if you guys can see my screen, but if I run detrace minus L, this will just show me all the probes that are available on my system. So every probe has an ID number, it shows the provider, it shows the module. The module is basically like the name of a piece of code where the probe was installed. And then in this particular case, the FBT provider is that C function boundary trace provider that I mentioned before. So these are all C functions, and then the function is the name, and you can add a probe at the entry point or the return of that C function. And so this goes on and on and on. If I go to the end of the list, it's about 49,000 different probes that are available on this particular system, which is a previous D10 machine. You might see even more if you have a Mac. Yeah, I can also basically use a command like this one here to show the number of probes per provider. I'm not gonna actually do that. And I'll just show you one example. So let's say we wanna know every time a particular system call is made. In this case, we'll use the open system call, and I'm just gonna run this command. So I'm saying, I wanna know every time the open system call is entered, Dtraces tells me, okay, matched to probes. I'm running right now, I'm probing them. And you notice nothing's happening. That's because my system's idle. It's not actually opening anything. But if I, for example, open a new terminal screen, bam, there's a million of them that just happened. Now, it's not giving me any information other than that it happened, but you can tell Dtrace, oh, well, when this happens, tell me which process it was or which file was opened or what time it was or any kind of information. And that's what I'll be showing you how to do on Sunday. So this is just a toy example. And you can get a lot more complicated. So the D language allows you to do conditional probes, speculative tracing on it and on them, super useful. But yeah, that's basically Dtrace and I will wrap up there and pause for questions. Do you have any non-toy example, even if it's just a screenshot or something about aggregated, you know, you have a bunch of traces that you run and you've aggregated that. I do, but I don't have them in that I can find and show you in the next 30 seconds. So I think I will repeat that it's a great opportunity to come on Sunday and see. Well played. I will say that if you Google it and look you will have no trouble finding. You can do, for example, you can do things like, I know what you're talking about, you can do things like say you notice that opening files is taking a long time on such and such file system or connecting to a network socket is taking a long time. You can ask Dtrace to calculate the amount of time that each one takes and basically summarize and give you an aggregate and say well, 90% of the time it only took 10 milliseconds but 10% of the time it was over 200 milliseconds and it gives you these nice histograms and graphs and it's, yeah. And I could show you that but I haven't prepared those, I'm sorry. Yeah, thank you, Todd. Okay. Please take all your other questions to him offline. There's a car blocking the movement of other cars. The number of the car is AP 10 AP 6652. We request the owner to move the car. The number is AP 10 AP 6652. And if you like, Todd's talk.