 I'm the CTO at 8th Light, which is a software consulting and custom development shop. And one of our core principles as a company is about learning and improving in our craft. So it's great to be here at RubyConf among a bunch of all of you who are clearly by coming to a conference motivated to get better at what you do. So it's great to be here. And we as a company and I as an individual have historically focused sort of more on applications development, so web, mobile, desktop, back-end sort of services sort of stuff. But over the past couple of years, I've gotten more and more interested in the system side of things, the systems level of the actual full stack, and in particular on performance. So like things like how do I make the system go faster? Why are these things so, so horrifically slow? What can we improve? And I've noticed over that time that I've had enough wrong ideas and intuitions about performance that those wrong ideas have sort of added up and made it very hard for me to trust my own intuition when I do run into a performance issue. And so I find it to be pretty unsatisfying when I or somebody like me sees a slow application and instantly jumps to solutions, right? Solutions like rewrite in your favorite language, right? Change from Ruby to go. Use a different database, switch from Postgres to Mongo, whatever. Write a bunch of caching logic. The list can go on. I'm sure you can imagine your own. I've just seen too many times where it takes a while to write this new solution and hopefully it does the job. Hopefully it solved the problem. But whether it does or not, you've invested that time and it's definitely going to uncover new problems to be solved, right? After that re-writer, that big change, that big solution has been implemented. And so based on deserving these tendencies of myself over time and reflecting on them, I'm investing more time these days on the other side of this sort of cycle in understanding problems more fully. Really grasping all of the pieces that are causing the issues before I start to think about solving them. Or at least that's what I'm attempting to do. And so in this talk, I'm going to share a bit about why, a lot about how. I've been trying to understand performance problems in particular better. So I'm sure it's no surprise, since it's part of the talk title, that we're going to explore how to get insight into problems by using detrace. But first let's talk about what detrace actually is. So detrace is a dynamic tracing and observability system. So that's a fancy way of saying that it lets you see in depth what's going on in your computer at any given time. It's from the Solaris Illumos world. It's also on FreeBSD, Oracle's Linux, and it also does ship and work with OS 10. There's some caveats if you want to, or Mac OS now I guess, there's some caveats as of El Capitan. If you want to start using it, there's going to be some hoops you have to jump through and I'll have a link at the end to show you what those hoops are. But you can't run it on Macs and that's the angle that I've taken with detrace. So all these things you're going to see are in the context of me just running locally on my development machine, who's built for production, but this is the context I'm operating in. Okay, so detrace, if you're familiar with Strace on Linux, it can be convenient to think about detrace as basically a fancier version of Strace. If you're not familiar with Strace, no big deal, Strace basically lets you spy on system calls that are going on in any given process. So at a higher level than that, any time your application code asks the operating systems to do something, whether that's open a file or read or write from the network, Strace knows about that happening and can let you know. Julia Evans incidentally wrote a terrific zine on Strace that I'd recommend everybody check out whether you're familiar with Strace or not, it's great stuff. Okay, so detrace lets you spy on system calls as well. In fact, this is a pretty common workflow, especially as you're first starting to investigate some issue. Okay, but I said fancier, so what is the plus plus part? So detrace lets you write programs to trace events. In this initial case, those events are all system calls, but those programs can let you filter, aggregate, and take other actions based on what events are being traced. The programming language that you use to do this filtering aggregation and other actions is pretty limited. It's a bit of a weird programming language. D is the programming language. It's not the same D that's like the successor to C or C plus plus. It's a whole different thing. It's a very limited language, and that's by design. Your programs that you're writing to trace these events are running in the kernel, so kernel very heavily protected area of the operating system has permission to do essentially anything. But detrace is designed to be safe to run on production systems. So it's built for production, built to be safe for production, and therefore by limiting the programming language, the detrace implementers have the opportunity to prevent you from doing things that might go bad. For instance, there's no loops in this language, which to be like, when I discovered this, I was like, well, this is crazy. How can I get anything done without loops? So there's some convenience that you trade off, but this way detrace has an easy time of ensuring that you don't have an infinite loop, for example. So your program's actually going to halt. But it's safe is the idea. So what does running detrace actually look like in practice? What does it look like? So you can fire up a terminal, run detrace from your terminal, and give it your program as input. This is just me typing a program essentially on a command line. You can put the program in a file, of course, for more complicated programs. You may even do it for a program as short as this. If you've done any AUK or seen any AUK, you might see some similarities here. It's heavily inspired by AUK. So this program traces two different kinds of events. System call entries, syscall, colon, colon, colon entry, and the detrace system ending. So basically when you trace for a while and you hit control C, boom, that's the detrace end event will fire. So the colons here indicate a few levels of namespacing, essentially, and then inside the curly braces is the processing action that's going to happen when that event fires. OK, and in this example, we're aggregating all the system calls across the machine. So this is system-wide, not even just the system calls made by any given by a specific process, but by any process, which is kind of cool. And we are aggregating all the system calls and then aggregating them sort of by two keys. And one is the probefunk, which is the name of the syscall, and then the exact name is the name of the program that called it. And then when the program ends, it truncates the aggregation to the top 10 and prints it out. So I don't expect you're going to come out of this call out of this talk knowing everything about detrace and just being able to hammer out programs, but this is hopefully going to give you a good enough idea of what's going on that you can know where to go. OK, so that's the idea. And then when we run it and we hit Control C to end it and the trace, it looks something like this. And as you can see, Slack shows up a lot here. It's probably spending a lot of time rendering animated gifs. Actually, I remember when I ran this, and that's exactly what it was doing. It was a bunch of animated gif rendering, which is great. And these syscall names over here on the left might be a little opaque to you. And they certainly were for me. Even after digging into detrace for a while, I didn't know what half of these things were. So I looked at the man pages because I was interested in seeing what they did. You can Google them, et cetera. This is going to be a theme throughout the talk, that you don't have to know everything about systems to get value out of tools like this. You can use it as a learning tool. OK, so stepping back up actually a little further down from the system call sort of application to operating system boundary, you can actually look at lower level resources, so things that are further down downstream. And system resources like the disk or the memory system, you can actually trace events on those. So an example here is we can see how many page faults requiring I.O. are happening. So that gives us some idea of what memory usage is like and what processes are actually causing these page faults to happen. So in this case, if we run the trace, Google Chrome and MDS stores are the two programs, the two exec names that are triggering the most page faults. We can aggregate in different ways and drill in further into these things to investigate more. OK, and then going back in the other direction, away from this back towards our application land, where I am happier, you can add your own probes. So developers have already and probes are essentially like names for events that are going to fire when they happen. And developers have already done these for some widely used applications and dynamic runtimes. So Postgres has probes built in. MySQL does. Java, Erlang, and of course Ruby has these built in as well. And so there's also libraries out there for Ruby and other languages that allow you to write your own probes yourself and do even more sort of domain specific events and trace those without even having to drop to the C level or anything like that. So think about this sort of statically defined tracing, these user defined probes as almost like adding log statements, except for that you've kind of got a log aggregation and search system, like the Elk stack or something like that, all set up and ready for every application on your system in de-trace. So it's pretty cool. I mean, there's not the persistence, it's not a perfect metaphor. But logging is a decent way to think about it. OK, so an example here, you can trace things that make sense, as I said, for any application or language runtime. Here, I'm tracing all the events whose provider, which is that first piece, starts with MySQL. So, and we do similar truncation and printing as before. So the MySQL developers wrote some code in the MySQL application that lets de-trace see these sort of internal events that happen when I run a few simple queries. So things like query execution, parsing, the select statements starting and completing, and then reading and writing from the network. These more domain-specific things can be interesting to trace. OK, and here's where things get really cool and almost mind-breaking for me anyway. So you can trace arbitrary functions, so even in the kernel. So without the original developers of an application writing any tracing code at all. So de-trace can dynamically instrument the running code. And by that, I mean, like literally, you can be running your application. There's zero performance cost from de-trace. You essentially flip a switch, spin de-trace up, and dynamically insert instrumentation to running code on the fly in the already running process. And this is possible because de-trace is running in the kernel and it's able to sort of manipulate stuff as it sees fit. So this is really sort of magic stuff and it's a thing that sets de-trace apart from many other sort of systems of observability tools. The D in de-trace, incidentally, is for dynamic. It's for this sort of magic dynamic tracing bit. Okay, so here's an example of that. This code traces and times every kernel function that's being called while the de-trace is running. So there's two probes here. There's any kernel function being entered and any kernel function being returned. So the first one sets this thread local variable, the self arrow in. And the second one checks to make sure that thing's present with a pattern match. It looks sort of like a reg X there. It's a guard statement. You're able to filter events by pattern. And the idea there is that if the return event fires first, like if you happen to start your trace and it'll return fires first, you don't end up with these weird numbers. So we make sure that the function has been entered and then we check when it's returned and see how long the duration was and aggregate those durations up. Okay, and we aggregate here with this quantized function which is a bit of a cool thing where it lets you see a frequency distribution of how long each of these things took. And so this kind of blows my mind. We can actually look at how many nanoseconds different kernel functions take to run. And here's just an aggregation, but with any of these, we could drill in and see what kernel functions taking the longest to run. And if we're interested, we could even download the kernel code. I mean, the Mac OS 10 kernel code is available to download online. The Mac ecosystem isn't completely open source, but the kernel is, which is pretty cool. Okay, it gets even wackier in that you can not only trace function entry and exit, but also all the way down to the instruction level. So like two or three assembly instructions down, however many bytes offset from the start of the function, and you can trace that event happening. So personally as an application developer, I don't have a lot of use for that, but I can imagine that if you're further down and you're reading and writing a lot of assembly language, this could be really cool to look at. Okay, so DeTrace gives you a lot of ways to get insight into what's going on under the hood. So what does it look like actually in practice to investigate performance problems using it? I think all of us have probably had experiences where code ran way slower than we think it should, and we just didn't understand why. The thing's slow and it's upsetting and it's frustrating, honestly. Jeff Hodges has this great quote in this article where he says, it's slow is the hardest problem you'll ever debug. And I think it's a really, really good point. His scope, he's talking about his broader, he's talking about distributed systems. So there's a lot more going on, but this stuff's not easy even on one machine, I think. Okay, so we're gonna look at an issue for the rest of the talk that happened to me in real life, but at the time I didn't know how to use DeTrace at all, so the solution would be just terrible, it would be me just stumbling around in the dark if you were to watch that. So we're gonna look at an idealized version of a way you could investigate performance problems with DeTrace. Okay, so here's the deal. My team has some tests that are intermittently really, really slow, like 30 seconds. We expect them to be milliseconds, maybe even less. Computers should be fast. They should be really, really fast, not 30 seconds to execute a single test. And we don't know what's going on, it's not clear what's going on. What is going on, what is it doing? We rely on fast feedback from our tests so that we can make sure we knock out bugs before they get shipped to production so that we can ensure that we fix them quickly, and then we don't have to wait two hours for feedback, right? Okay, we know that it's not just one test that's affected, it's not just one person that's affected. We know that there seem, you know, anecdotally to be a few people who have it worse than others, but we're not really sure. It's not clear if it's the features they're working on. It's not clear if those people are just more vocal about the problem. We think it might be worse in one part of the team, but we're not really sure. Okay, so anyway, this test suite is taking way, way too long. It's making us demoralize many minutes running for just trivial changes, and we rely on fast feedback. Okay, so let's make a quick mental note and even feel free to yell some out of some ideas. What could be going on here? What are some hypotheses? What things could we check for? What are some things that could cause really, really slow, like 30-second level wait times? Database lock, is that the same thing, cool? Running out of memory, great, yeah. And just yell them out, no need for me to acknowledge. N plus one query, great. Sorry, two at the same time, but maybe I made a bad idea. Network calls? Cosmic rays. Cosmic rays, excellent. Cashmasse, right, yep. Slow desk IO, connection pool timeouts, yeah. I mean, we can enumerate lots of the garbage collection, you know, the cosmic ray thing, like it could be somebody that wrote a sleep that happens to fire randomly every Tuesday and Thursday on the hour, who knows, right? There are a bunch of things that could be going wrong, that could cause really slow programs, but we don't have enough information yet to do much more than hypothesize, and here's where I'm sort of trying to make that shift from immediately jumping to let's implement some solutions to, like, using some other methodology to understand the problem better. And you can probably guess that that methodology, based on the context clues here in this talk, the next step is gonna be to use de-trace to gather more data and to understand the problem more fully. And, you know, that data we gather is either gonna support or detract from our hypothesis, hopefully, but either way, it can generate new ideas, we can generate new ideas based on what we learn. But we could start with a hypothesis and see where it leads us. It's not the worst idea in the world, because we have the tools to look at. Okay, so what are the usual suspects? For me, you know, this is a Rails app, and for whatever reason, my go-to, you know, I know what the problem is, sort of thing, is I think it's either gonna be the database or I think it's gonna be slow garbage collection. Both in Ruby and the JVM, I always think it's gonna be one of those two things. So that's my instinct, that's my gut reaction. And a lot of times, that would be right. So let's test that idea. Let's start with garbage collection. We can calculate the duration between garbage collection begin and end events. Here, we're using the Ruby static tracing affordances to check when the garbage collection mark begin and end phases occur, and then aggregating on, or actually not aggregating, but just printing out the duration every time it happens. We could do the same thing for the sweep phase. In fact, in the output, you'll see the exact same thing that's going on in the sweep phase. So this is not the only way to learn about how garbage collection works. We could add Ruby code to do it, and there are a lot of other tools, but this works. Okay, and with the code here, there's not a ton super interesting and new, except that since we're dealing with nanoseconds, we divide by a million to get our nice, sort of human readable millisecond level, or column readable at any rate. Okay, so, and here's the result. The problem definitely surfaced here. The 30 second pause definitely did happen, so I'm not gonna try and trick you with any of these traces we see from here on out. You can assume that every time I show you a trace, the 30 second pause, the problem did actually occur. But the slowest GC pauses that we saw, or the slowest GC phase execution runs, were like on the order of tens of milliseconds, 38 milliseconds I think is the biggest, no, there was a 71 millisecond one there. But still, tens of milliseconds, nowhere near the 30 second culprit we're looking for. So, this is sort of an interesting thing. Let's back up for a minute, and think about this problem at a higher level. When something is slow, it often means that some resource is limited. In fact, it may always mean that some resource is limited, whether it's hardware or software, but I'm not prepared to make that argument for you today. But resource limitation is an interesting way to think about performance issues. Brendan Gregg, terrific performance analyst, he has this thing called the use method, utilization, saturation, and errors that elaborates on what resource limitation means, and some other stuff, which I definitely recommend checking out. But it could be a software resource like a lock or a connection pool, could be a hardware resource like CPU or memory or something else. So, let's back up and think about resources. Think about this problem in terms of resources, figure out what resources are limited. So, let's start at the system level. What system resources might be having issues? So, here are a few system resources, this is in all of them, but these are some that we could take a look at. We could do that with eTrace. So, CPU is a pretty important resource. We can sample the on-CPU processes 997 times a second and aggregate the counts of those executable names that were caught in essence on CPU, aggregate the counts of those things, like how often they happen. And it turns out that we find that a program called kernel task way down here at the bottom is by far the most often on CPU. Kernel task, we might not know what it means, we probably can surmise that it has something to do with the kernel. We could drill in more and figure out what that program's doing. By grabbing kernel stack traces, wherever the kernel task program is, the one is on CPU. You can also grab user land stack traces. There's some caveats that we don't really have time to get into right now around symbol resolution and compiler flags, but you can also get user space stack traces. So, we're looking at kernel stack traces here and if we drill in and see what they are, we see that it's pretty much all idle. We've got this at the very bottom, machine idle is the name of the kernel function that's being called and we can pretty much guess based on the good naming here that nothing's going on. The CPU is not burning, this is just the stack that's on CPU when nothing's going on in CPU. And if we bounce back up to the full list of programs that were on CPU, our Ruby process actually only has up three times on CPU and all the time we're tracing. So, there's really not much going on CPU-wise here. And to me, this sort of discovery feels a bit like a let down or it might feel like a bit of a let down. But it's not, I wanna shift everybody's head if you're thinking this is a bummer that we haven't found the issue. This is not a bummer, this is an awesome discovery because now we can rule out any hypothesis that has anything to do with Ruby being on CPU. So, this is like tight loops, infinite loop sort of stuff. This is too many threads competing for CPU. This is garbage collection. And with any of those things, if it's not obvious to you that those things are CPU intensive, you could write some test programs and see how they behave to guarantee it. But those are all CPU intensive when we've just ruled all those out in addition to more. And it's even better than that. It's even better than ruling out Ruby being on CPU. This rules out anything CPU intensive. This is like database result set sorting. This is anything at all out of process that would use CPU heavily. So, I think sort of meta lesson here is that starting early with these broader resource questions is a good strategy. It feels a little bit similar to a binary search, right? Where you have this wide search space and you want to trim out as many options at a time as possible. If you investigate closer to log in of the possible performance issues, you're gonna be a lot happier than if you have to investigate all N of them. Okay, and so we can rule out memory and disk slowness with other detray scripts. So, those aren't the issue. So, that rules out tons of possibilities. So, maybe we take a look at networking next. We know that we definitely do a lot to call to the database like any good Rails app. We know that, yeah, networking is a thing that happens a lot in our app. And if none of these pan out, we could investigate more system resources or we could go over to the software side. So, let's start with networking. Let's trace all the socket connects and see what's low. So, there's no code on this slide. There's just the URL. And the reason I'm doing this, it would take me a long time to write something like the thing that's in the script. It would take me a while to get in there. But what's in the script works. It works well and it uses syscall tracing to give you socket connection latency. And I show you this because I wanna emphasize you don't have to write everything yourself. You don't have to be a systems wizard or even invest a ton of time in detrace to get a lot of use out of the tools that it provides. A lot of scripts come with your operating system and others you can pull down from GitHub and elsewhere. Okay, and so the result when we run this, we see that our database connections to 127.0.1 are pretty fast. The longer latencies we do see are like 20 to 40 milliseconds which is longer but it's not like there's thousands of these. And I see that I've got some typos here where I have the wrong executable name process. But on the other hand it is interesting that we've got these connections to this external host, the 7252.4.1.19. I don't recognize that IP address. I expected only local connections to my local database. That IP address we could grab around our code and our configuration files. We find that it's not anywhere. It doesn't appear anywhere on the code base. So it makes us sort of like wonder what is this IP? What is this host? Okay, so in order to figure out what host name and IP maps to we can do something like reverse DNS lookups which aren't always reliable as we found out, this didn't end up mapping back to anything that was in our code base either. But DNS lookups are typically over UDP. We could see if there are DNS queries that might end up mapping back to this IP. So it's essentially like running the process backwards to trace back from where we end up. So DNS lookups are over UDP and after digging a while into UDP tracing I got a little overwhelmed because there's a lot of UDP stuff going on in my machine and at any given time there's some asynchronicity things happening. But after doing a little more study and thought I learned that most DNS calls go through this C library function called git adder info. And so we can use that fact to use dynamic tracing and figure out what hosts are being looked up and how long they take. So we can dynamically trace this git adder info for the process ID that we're interested in. Grab the timestamp and the host. There's a bit of a dance and this copy inster is about copying a pointer from user space to kernel space and so that it can persist through the calls. It's a whole thing. If you wanna write a bunch of scripts you'll have to learn a bit about it but don't worry about the details here. But then at the end you print the duration in the host name when the call returns. Okay, and then when we run it sure enough we find this someplace.com URL's getting looked up and even better we find that it takes around 30 seconds to resolve and if we do the DNS look up ourselves we can see that that does map in fact back to the IP address that seemed to be seemed to be the external one. So it takes 30 seconds to resolve. So not only is this a solution to our why are we making external connections requests it's a solution to our entire problem. Incidentally the someplace.com it's not just me making up a fake URL for the talk this is actually literally the URL, the domain that was in our code base. It was in the test code it was not in production code don't worry this is all completely isolated to tests. So but this is the underlying issue this is the whole thing. This is the game over. The people getting the slow tests are getting slow DNS look ups for this host. And we could drill in further and learn more about networking but we know what to fix now. We have the slow DNS look up for this particular host. So we can complain to our ISP we can switch DNS providers we can replace the someplace.com references with example.com which is always fast or we can do what any reasonable person would do and fake out the network calls and tests to just make sure we never connect to external URLs and guarantee that across the system. But and it feels really good like now we fully understand the issue. If it's super like I don't know freeing to have this ability to go from this very specific you know this sorry from this vague idea like the tests are slow the tests are really slow sometimes to this very specific DNS look ups for someplace.com or slow and they're intermittent due to people being in multiple locations flaky wifi DNS caching stuff but it feels really really empowering to have made this jump down to the details. Okay so let's take a step back here and review what do we learn here in this process. So one answer is we learn not to do these two specific things and it's absolutely true right. Don't connect to external services in the test suite and two don't have a bad DNS situation make sure your DNS is fast right. Those are absolutely correct that and that turned out to be the solution to our problem. So those are correct sure but on the other hand if we you know look for that specific problem or problems in that space every time something's slow we're only gonna be efficient in solving performance issues if those are the only problems that we run into and for better or worse you know we've got as devs and certainly my intuition tells me to focus on whatever happened the most recently and or whatever just sticks out in my head the most based on a blog post I read or what have you and so those are the problems that I'm able to come up with solutions for you know very quickly but I think the bigger better takeaway here is that by doing performance investigations in a more disciplined and principled way we can make ourselves more efficient in solving any problem not just the ones that come to mind or happened recently. Okay so DeTrace is absolutely not the only tool we could have used to track this problem down but it's versatile there are many paths DeTrace gives us the ability to ask really broad questions like what's using CPU or really really focused ones like how long is the specific low level C function tick to run and DeTrace is programmable so it means we can use it to build our own custom systems tools. Most of these scripts I wrote for this investigation but of course now I can just copy them and then in my bin directory and reuse them again on future investigations to answer that same question if it occurs again in the sort of binary search or it's not really a binary search but some sort of search space. And I want to encourage you that it doesn't take years of study to get great use out of DeTrace or tools like it. You can get started with a few one liners like the ones I've shown here or the ones on the websites I'm gonna link to you later but you can get a lot you can get pretty far with just some thought about what questions you wanna ask and you can get a lot of insight. Learning about how systems work at a lower level than you need day to day can be really helpful for avoiding problems down the road and of course obviously when you're in crisis mode trying to solve issues. Okay so where do you go? What do you do next if you're interested in learning more? Well first I would definitely recommend reading the free online resources. Brendan Gregg's blog in particular is a gold mine just everything he writes not just the DeTrace stuff and the DeTrace guide right below is really thorough and I think it's pretty well maintained. It is focused on a Lumos so if you're using it like me from a Mac perspective then there may be some things that don't work or some things that you have to sort of like infer differences. There's also a really good print book available that does have some OS X, OS X, I don't know. Sections available and if you're like me and just really like having print books or if you get through the examples above that I would definitely recommend a print book. Okay and then of course as a developer I always learn a lot by reading other people's code so it was really helpful for me to sort of dig around in the scripts that other people have written and see how they work, tweak things, see how things break or change in their output. A lot of these are totally approachable. They're literally like a bunch of files in your user bin directory that they're like literally like one line of code and a bunch of comments so it's kind of cool to dig into those. There's a bit of a caveat here in looking at de-trace examples that people have up on the internet or on your system is that depending on how any given script is written it may be directly useful to you or it may be out of date and it may just not work and that's sort of a fundamental issue, right? De-trace gives you this enormous power to trace arbitrary functions even down the instruction level so if there's a script that you're looking at that does some of this dynamic tracing stuff and let's say the function that it's dynamically tracing is removed or renamed or changes semantic, changes the way it's called in the next version then your script is gonna break or it's not gonna work as you expect and this is sort of a fundamental issue but there are sort of more stable de-trace probes that you can count on working from one version to the next as well and as you get into it, you'll learn more. So I think it's also worth extrapolating in two directions here like using de-trace for other things besides perfect investigations and also using other things besides de-trace. So for the first one, I've personally used de-trace way more like as I said, I'm using it on my laptop. I've used it way, way more for just general like learning about how my computer is working than for performance investigations but it's taught me a lot in that respect and it's taught me stuff that I can apply to future issues. And then secondly, if you're on an OS that doesn't have official de-trace support see what other tools you've got available to you to get your questions answered. And of course like many of us deployed to Linux and on Linux there is a de-trace version for Linux. It's not sort of an official thing unless you're on the Oracle one but there's also a system tap which is try and solve similar problems and there are some really exciting things with the latest Linux releases that if you see Brent and Greg's blog you'll know what I mean. There's some really exciting things coming out in Berkeley packet filters in like four nine kernels. So that's really cool stuff as well. The IO visor GitHub organization is where a lot of that lives. So de-trace has taught me a lot about performance operating systems and problem solving in general and I hope it helps you too. Here are a few resources to check out. These slides will be on speaker deck and I'll tweet when they're online so if you follow me on Twitter you can get notified when they're up. And I'd love to take any questions you have. Looks like we've got some time left and please grab me anytime also that you wanna talk throughout the conference. I always love meeting new people and talking about technology and especially come say hi if you're interested in talking more about AtheLight and how we help companies to build and maintain reliable and flexible software. I'm in Chicago, London, LA, New York and we'd love to talk. So thank you very much and I've got a little time for questions. Right, integrating something like de-tracing into the testing framework. Yeah, I can imagine doing so but I guess there are a few issues that would probably make me shy away from that. One is that typically the de-trace program that you're running is external so it's almost like you'd have to shell out and then run it. It also needs root permission so you'd have to shell out and pseudo out which means your test suite would have to be running as root which I probably wouldn't advise. But it's definitely, yeah, I can imagine that sort of thing being useful. Like maybe you wanna say there should be no DNS calls ever made period and you wanna run that trace. The orchestration would be tricky, right? Cause you'd have to spin up two processes now like one, your de-trace thing and then the other test suite. Learning about which syscalls are relevant. Yeah, so I mean my approach so far has been to essentially spy on all of them. There's this program that's built in de-truss that's based on de-trace that gives you a picture of all the system calls that are being made. So if you're, like what I've done so far has just been to kind of like dig into, run a trace against a program that I'm interested in how it's doing, what it's doing, see what syscalls are being made and then just look at the man pages which are pretty extensive for each syscall. Yeah, so that's most of it. And de-trace will also, for like file opens it'll, I believe it'll give you the name of the file that comes out. It might be another program besides de-trace. So yeah, a lot of it's just been like observing what's happening on like workloads that I understand that I've like made some small program to do a thing and then make sure I understand the stuff that I want to understand on the other end when the output comes out, when the full output comes out. Yeah, man to name of the syscall. So the man command gives you like the option of passing two arguments in this. The first argument is like the section of the man pages and two is the syscalls section. Okay, I'd be happy to take questions or as I said talk with any of you afterwards up here. So thanks a lot for coming and I appreciate it. I appreciate it. I appreciate it. I appreciate it.