 Hi everyone, so John and I here are going to be talking about diagnostics improvements in .NET Core 3.0. So I'm John Salem, I'm an engineer on the .NET Runtime team and I work on diagnostics. And I'm Joram Shirhati, I'm a program manager on the .NET team. So let's go ahead and get started. So this is what we're going to be covering today. We're going to talk about new features in the runtime, which is the new diagnostic server that allows us to egress this rich diagnostics information and the IPC protocol that speaks. We'll also cover the new diagnostic tools that we've built, which will allow you to consume this information. We're going to go over .NET counters for metrics, .NET trace for collecting traces. We're also going to visualize sample CPU stacks using speed scope that we collected with .NET trace. And we're also going to look at .NET dump as a way of collecting and analyzing process dumps for .NET applications. And then finally we'll just recap and tell you how to acquire these tools so you can do what we showed you today during the course of our talk. So for the 3.0 release we've added a diagnostic server into the .NET Runtime itself. So it speaks across a transport protocol depending on which platform it's going to be, a named pipe on Windows or Unix domain sockets on non-Windows. Now it speaks a custom IPC protocol and we've built some tools on top of that protocol that are capable of, like Rob said, creating dumps on Linux, attaching to the event pipe as well as attaching profilers. So if I understand correctly, you've prepared a demo for us today. It's an image processing app that's had a few performance issues that you ran into as you were building it. So we're going to run through the tools and we're going to solve your issues and everyone can see how we use these tools to solve your issue. So can we jump over to John's computer right now? Oh, perfect. Yeah. So you see here the beginning of a basic little console app. All it's doing is taking an input image, in this case the headshot for the .NET Conf and outputting what we call a PPM file. Now this is a classic image format. It's very, very simple. It's just a couple ASCII characters as a header indicating what it is followed by every pixel written out and ASCII plaintext delimited by some white space. Very easy classic CS 101 type assignment to generate these files. So can we go ahead and have a look at your app run? Yeah. So let's go ahead and run it. So John's just running his application and he's going to start it up with the image that he just showed you. As you can see, he has a couple of transforms that he can run and he's going to run the aptly named PPM slow transform because well it's a little slow. And as you can see, no matter how long I talk, this thing is still processing just one image. So John, how do I get more visibility into what's going on? So this is like my 10,000 foot view of the world. I first want to know that there's something wrong. So how do I get these metrics when your application is running? So we already know it's slow, but the runtime produces a whole bunch of other interesting data that we can take a look at using one of our new tools.net counters. So John's just going to go over in another terminal window. He's running.net counters and he's going to pass the process ID of his already running application. So once it loads up, you can see that he's out of process connected to his existing application and he's getting live updates to event counters. And when he goes ahead and runs his application, you can see these counters just light up. You can see it's begging his CPU. It's about 22%. He has a lot of GC collections happening. Allocation rate is really high. If I were to look into this, just the allocation rate number should make me suspicious. So why are you allocating that, John? I don't know. Can we look at your app? Let me see what's going on. Yeah, let's go ahead and take a look. So if we scroll on down, you'll see we're doing a whole bunch of stuff. We're loading the image. We're mutating things. We're appending to a string. It's really hard for me to tell from this point of view what exactly is going wrong. So how about we recap what we learned about metrics and then we'll look at if there are other tools available that can give us more detailed information on what was happening while your application was running. So I'm hoping my screen's back up. And then let's go ahead and talk about metrics. So the .NET runtime and ASP9 core both emit event counters and you can customize custom event counters using the event counter API. These event counters can be consumed both in process and out of process. In process, we have the event listener API. And that's in fact what the application insights STK uses to give you counters in .NET Core 3.0. And out of process, you can talk to the diagnostic server. That's exactly what you did with the .NET counters tool. Our tool only showed you the live values for those metrics, but the runtime actually computes additional stats. So you can get count, min, max for all the metrics that are emitted. And you can think of event counters as the successor to perf counters. So they work everywhere that .NET Core does. So unlike perf counters, which are windows only, event counters work on Linux and Mac OS. And unlike perf counters, they are usable in low privilege environments. So you can use it in an X copy deploy and you do not need administrative privileges to actually get access to those counters. So this was a good, we got a view into what's happening in the runtime. We know we're allocating a lot. So if I were investigating this first, then I'd be like, can I see what was actually on the stack? Can we see what's taking up CPU time? Can we see what's actually causing those allocations? So if we switch back over to John's computer, John's got his application back up and running. And are you going to introduce us to the new tool? Yeah, so I'm going to go ahead and run .NET Trace and we're going to rerun that slow transform again. And that's going to actually give us a view, just like you're asking for, of what was on the stack at a given point in time. So John's going ahead and running .NET Trace. He's pointing it to the process ID of his existing, already running application. And this is going to do a couple of things. So as he goes ahead and hits enter, you can see one of the providers enabled by default is the sample CPU stack profiler. So what's happening right now is the runtime suspends all running threads every one millisecond, walks all your stacks, and generates the CPU stacks. In addition to the sample CPU profiler, you can enable additional profilers. So we ran it with the default arguments, so we didn't do anything like that. But if you're used to using ETW and collecting other events from the runtime, you can just pass them as parameters. It's the same keywords and same filtering that you would for ETW events. So John, I see the output file was this .NET Trace file. So can you tell me what the .NET Trace file is and how will we consume it? So we send events across the event pipe using a format known as .NET Trace. Now if you're on Windows and you're running Visual Studio or Perfew, you can just open up that file. That has a ton of information, like you said. Most things you can send across ETW, you can send into these files. Now, because I'm not on Windows, I need another way of opening this file. And so for now, we have a way of converting these files into other formats. So right now we have Speedscope, so we're converting it to Speedscope for the purposes of using it and visualizing CPU stacks in Speedscope. If you were on Visual Studio, you could open it there. If you were on Windows, you could also open it in Perfew and get detailed activity view, event counts, and all the other things you're used to doing when you do ETW with Windows. So John's gone ahead and he's opened this collected CPU Trace in this application Speedscope. So John, just tell us more about Speedscope and how do I get it? Speedscope is an open source application. You can find it on GitHub. The author also has it hosted up at a website at www.speedscope.app. I actually have it installed locally here on my laptop using the new Chromium-based Edge. So it's installed here locally on my box as a PWA. So this is running right here. So now we're at the part of the exercise where I got to see a bunch of color lines on your screen and figure out what's wrong with your app. So I'm kind of going to ask you questions and hopefully you'll point me at what's needed to get to the root cause this problem. So I'm just reading from left to right. I'm seeing a bunch of things. I see this unmanaged code time. What is that? So because it's the runtime tracing itself, we don't have insight into the native code. So most things that are after a transition into native code, you'll see unmanaged code time. So in this case, that's the top of the stack for a console realign. So waiting for some kind of IO interop or something like that, right? Okay. So that makes sense. So we know your app is slow. We know the part we don't care about is waiting for you to type because that's just you being slow, right? So how do I filter out the only the parts I care about? Where do I get to the meat of what was happening in your application? So this is what we refer to as an icicle graph. If you're familiar with a flame graph, it's the same thing just inverted. So the bottom frame here is the top of the stack at a given point in time. This is the time order view, which means we're going to read it from left to right. So if something's on the stack for a really, really long time, chances are we're going to see a very consistent bar across. And in fact, we do. If you take a look here, you'll see that we have a convert to PPM slow that's been on the stack for, it's a 78% of the time of the trace. So if I double click on that, we can scope into just what was happening during that call. So that's interesting. Now we're in the region of interest. We don't have to check the data for the things that we don't care about. And if I recall correctly, we said we're processing an image. So I assume there's a lot of you repeating the same operation over and over again on every pixel, which probably also explains why we have so many of those vert icicles. Is there a way for me to view the cumulative impact of these operations? Can I stitch these stacks together to see what's really eating up the most time? Yeah. So like you said, we're seeing a whole bunch of fragmentation here. This is a sampled set of stacks, which means we're going to have a little bit of noise in terms of timing, right? So we get a sense of what sat on the stack for a really long time, but for a lot of these really small chunks, these are probably a bunch of repeated calls that were happening quickly in succession. So we can't really get a sense of what took the most time. We just know that it was there a long time. So if we take a look at this, you'll see from the left to the right, we've got that image load, which didn't actually take much time. But if we go and take a look at this left heavy view, like you were asking for, this gives us a nice aggregate view. So all of those discrete samples get aggregated into a single sample. So in this case, we can see again, there's our nice long frame for convert to PPM slow. And if we zoom in on that, we can take a look at exactly what was happening during the whole time that that was on the stack. So can we go back to your code? So if I'm looking at what you're showing me, I see some string concatenation, some stuff that seems vaguely familiar from when we saw your code. So do you think this is enough for us to go fix the problem in hand? I think this is enough for me to identify what happened. So if we go back to the code, you'll see here, here's that image load that didn't take too much time. Here's some basic IO. And then here's that mutate that was only on the stack for a fraction of a second. But like I pointed out before, we see a whole bunch of string concatenation, and then here's a whole bunch of string concatenation happening inside of a tight loop. And I think that's going to be our culprit. So just explain to me why is that plus equal to that a bad thing? So in this case, I'm actually taking a single string and then concatenating to it over and over and over again. And this is going to cause a lot of churn because it's going to be creating a whole new string every time I append to it. Which means there's a whole bunch of junk strings that are getting thrown out, which is why we saw my allocation rate spike and the GC rate spike. So I actually already have a solution written up. So if we scroll back up here, it's quite a simple solution. You just have to use a string builder. This is exactly what it was made for. And if I replace all of those plus equals and concat calls with sb.append, and we switch back to the app and run the regular ppm, you'll see that it takes less than 0.2 seconds to do the exact same computation. About 200 times faster. Yeah, just about. So we were able to use metrics to figure out that there was something wrong. And then we knew what to go look at by collecting a trace, right? So let's just recap what we've learned about tracing. So we introduced you to the .NET Trace tool that allows you to collect event source events, including sample CPU stacks. You can obviously collect other providers, both those present in the runtime and custom providers as well. We are working on adding support for GC heap dumps. So we can give you object counts of what's on the heap that you can collect out of process in this manner. And we have tools for visualizing CPU stacks in Speedscope, in Visual Studio, and Perfume. And we're looking at adding other tools to visualize this data. We have a prototype for Chrome DevTools and we're looking at other stuff as well. So this was great, but I suspect this is not going to solve all our problems, right? What if I want to look at all the objects in the heap? Like looking at CPU stacks isn't quite going to get me that richer fidelity of information. So do we have something else we can look at today to get more information on what's going on in your application? Yeah, if you're looking for something that's more descriptive of a precise moment in time of the entire world of your application, you're going to want a crash dump. So if we switch on back to my box, we can take a look at a bug that I've been hitting with one of the transforms. So whenever I've been running the grayscale transform, I'm actually hitting an issue where I throw an unhandled exception and my whole process dies. So I could go and load up a few different dumping tools, including our new .NET dump tool. But it's happening a little bit faster than I'm capable of hitting the keys. So let's go ahead and run my grayscale transform. And it's pretty instantaneous. But luckily, there's a command line or environment variable that you can set that tells the .NET runtime that if it encounters an unhandled exception and it's dying to take a crash dump of itself. So in this case, you can actually see that as it was dying, it went and wrote a core file for me. So if I understood correctly, I can use the .NET dump tool to collect a process dump on like when desired. Or I can have the runtime set itself up to collect a dump when it's about to crash due to unhandled exception. Yes. So that's great. Now I have a core dump. How do I analyze this dump? How do I actually introspect what was in memory at that time and what is possibly the cause of my error? So .NET dump has another command, a sub-command on it called analyze. Now if I run that and point it at the dump that we just collected, you'll see we get this nice little REPL environment. So let's take a look at what we can run here. Now if you're familiar with .NET diagnostics from the past, you might recognize pretty much all of these commands. These are the commands from SOS. The REPL that I'm getting here in the .NET Analyze sub-command is actually a hosted version of SOS that's reading my dump that I just took and was placed in that directory earlier. So for folks who aren't familiar, SOS is a set of debugger extensions that we shipped allows you to introspect .NET structures and memory and reason about it. Initially it's shipped in framework, it's shipped as an extension to WinDBG. But now we've hosted it inside a console application. Because we have a WinPE reader and we have an Elf reader, we can read these memory dump files ourselves, introspect the memory and give you more rich information about it. So can you find out the exception that occurred, John? Yeah, I can use the good old print exception command here. If I just type PE, it'll go ahead and show me that we hit a system overflow exception that I was doing some kind of arithmetic expression that resulted in an overflow. It even tells me that it happened inside of the convert to grayscale function. So let's go ahead and take a look at what might be the offender. And John just showed you running PE but you can run dump pubs or dump heap or the newly added SOS command for dump async. I can actually walk the heap and stitch together async stacks. Definitely recommend you try that out. Lots to explore in that tool. So if we take a look here, part of converting an image to grayscale is taking weighted averages of all the various color channels. So in this case, I was taking 30% of the red channel, 59% of the green channel and accidentally 111% of the blue channel. Which means that I'm definitely going to be overflowing if there's any blue there. So it's quite an easy fix. And if we go ahead and replace the code and rerun my process and rerun the transform and go ahead and see we get our nice little grayscale image. So this is great, but what if I was running in a Linux container? Would that work? Because we spoke about debuggers and because I know there are difficulties even building LLDB say if I were running in an Alpine container, will this work in an Alpine container? I actually ran that entire demo from inside a container using the new dev container experience in VS Code and Visual Studio. So do you need to do something special to run in a container to get this to work? Yes, so when you're running inside of a container and you want to take crash dumps, this is going to be the same regardless of how you're running your container, you need to enable a couple capabilities. Namely the sys ptrace capability, as well as setting the security options set comp undefined. Okay, so let's go ahead and recap what we just learned about capturing and analyzing memory dumps. So right now we don't have any dump collection or analysis on macOS, but it's fully supported on Windows and Linux, including ARM64 and musical based distributions like Alpine. It does not require pseudo privileges on Linux to capture the dump, but you do need sys ptrace, sorry. And you do need to turn off sec comp so you can actually collect the process dump. The collected dumps aren't portable. So you need to analyze it on the same OS architecture or at least the same architecture and libc version, like set of native dependencies so that you can analyze it. So that was the ability to collect dumps and analyze them without using a debugger, right? So the obvious next question after we've introduced people to all these tools is how do we acquire these tools? So these tools ship as .NET global tools. Global tools if you haven't heard of before are just packages on Nuget that have an executable and the .NET SDK knows how to extract them and place them on your parts so you can run them. So once you've acquired all these tools, so once you do .NET tool install, you should just be able to do .NET space counters, just like how John ran them on his machine. In addition to the tools we mentioned, we have a couple other tools. We have .NET symbol, which is a tool to download all the files necessary for debugging that includes symbols, modules, SOS, the DAC, and it works for ELF and WinP. Like we said, Mac OS and Mac OS dumps aren't supported quite yet. And we have .NET SOS which, so we spoke about how you don't need a debugger to analyze memory dumps. But there are cases where you might need a native debugger, say to look at native stacks, or look at like managed native stack transitions, like this assembly, things that aren't possible with just our hosted version of the tool. And for those scenarios, we have a global tool that allows you to configure LLDB and load SOS. On Windows, it's even easier because we ship SOS as part of the WinDBG extension reality. So you should just open a process dump and be good to go. Just make sure you have a new enough version of WinDBG. And with all those tools covered, I think we're ready to take questions from people.