 Diagnosing your performance issues with your apps that are already in production may not be the greatest thing in the world, but have no fear because Azure has got you covered with the Azure Application Insights tool. So learn more on this episode of Visual Studio Toolbox. Hey everybody, welcome to another episode of Visual Studio Toolbox. I'm your host, Leslie Richardson, and today I am joined by Principal Software Engineering Manager Chuck Weiner, who is a member of the DevDiv Azure Services team. Welcome, Chuck. Hello Leslie. Hello Leslie, thank you. Thanks for coming. So today we're going to be talking specifically about Azure Application Insights, right? So can you tell us more about what that is? Yeah, Application Insights is an application monitoring features within Azure that allow you to collect data about your application while it's running. And the profiler is an advanced feature of Application Insights that lets you collect performance data about your application while it's running in production. As you know, with apps running in the cloud, it can be really hard to get debugging data about those applications. You can't just put a debugger on them. You can't just run a profiler on them whenever you want, unless you're using our service, which we run the profiler for you and collect that data and upload it to our service and make it available for you through Application Insights. Cool. So that's really interesting. Just the whole space of profiling. I feel like that's a subject that a lot of people are aware of, but don't quite use or learn up on until it's imperative. So something's going wrong while the app's already out in production, right? Yeah, it's very true. A lot of times users are like, oh, I have a performance issue. What do I do? How do I look into this? You know, it happened yesterday. How do I find out why and what I can do to fix it? And that's the cool thing about our profilers. You can turn it on today and just let it run. And it'll be collecting data in the background and it'll have that data ready for you when you have a problem that you want to investigate. And it does so in a way that's not intrusive to your app. It doesn't change your app code at all. It runs out of proc. It's collecting for Windows services. It's collecting ETW traces, which is the Windows eventing, and getting call stacks from that data. And so it only runs by default. It runs a couple minutes an hour. So it's not running all the time. It's a sampling profiler. So we're hoping that over a period of time, it'll collect the relevant data that you need to debug your application. That's cool. And I like that you can kind of leave it on in the background and go about the rest of your dev work that you need to do. Yeah. So why don't I show you how to enable it and start collecting this data? Yeah, sounds good. So this is an application that we have that demonstrates the profiler. It's an Azure app service, which is the easiest way to use the profiler. We have a real simple switch to turn it on and off, which I'll show here in a minute. But for this app service, you can go to the application insights tab. And if you have application insights enabled for your app, it'll look like this. It shows application insights is turned on. And if it's not enabled, you can enable it here. But then you'll see this profiler section. And you can turn the profiler on and off. And as you can see for this app, we have the profiler on. So when the profiler's on, like I said, it's collecting data a little bit every hour. And the hope is that over a period of time, you'll get all kinds of requests will be coming into your application during those couple minutes an hour. And you'll hopefully get some long-running requests than you can see what's happening, why those are taking a long time. So once you've got it on, you can go to your app insights resource by clicking the view data at the top. And this takes you to the actual application insights resource. I'm in a different resource now than before. I've gone from my app service to app insights. And the profiler is part of the performance features of app insights. And so this is the performance blade for application insights. And there's a lot of information on this blade that I'll get to in a little bit. But the first thing you can do is hit the profiler button at the top, which this is our kind of configuration page or kind of profiler home page. This allows you to set some different properties about the profiler. We have triggers, which I'll show you here. And this is a list of profiling sessions that have happened. This app's been running for a long time and been collecting traces, profiling traces, you know, every day for a long time. And it lists those here. But at the top is some information about the profiler or settings about the profiler that you can set. The first one is profile now. If I click that button, it will start profiling right now in my service. This is really cool if you're doing some testing. You might have a perf test or a stress test that you're running and you want to get a profile from that test. You can click that button and it will start profiling now. And then there's triggers that you can set. If CPU goes above a certain threshold on the machines that are running your service, it will trigger the profiler to start. And we have that by default set at 80%. And it will run for 120 seconds. That's really cool. So you don't even have to play the trial and error game or trying to figure out where to start? No. Yeah, it will start for you when CPU gets to a certain point. And we also have a similar thing with memory. If your memory threshold gets above a certain point, we'll start the profiler. And, again, that's set at 80% as the default. And then there's this setting here. This isn't released yet. I'm showing you kind of testing bits here. Like I said, the profiler runs by default like a couple minutes an hour. And up until now, we haven't had a way to change that setting. It's just that's the way it is and that's what you get. We've had a lot of people ask us, can I turn that off? I only want to have the other triggers or I only want to have profile now. So we've given you ability to turn default profiling on and off. And then you can also set it normal, high or max. Instead of running two minutes an hour, you might want to run it more than that. So that's kind of high or max is like run it as much as you can possibly run it. I wouldn't recommend doing that in a production service for very long because you could start affecting the performance of your app because we're collecting some very low-level data and that it could and then uploading that data, it could take up resources that your app would need to run. So I recommend keeping it on normal but there are cases where you might want to increase it. And then this shows you recent profiling sessions that we've done and on the left here you can see it tells you how those sessions were triggered. Most of these are default sampling which is that random we're just going to profile and hopefully catch something cool. And then we have a few here that are triggered by the CPU trigger. That's so helpful. Just filtering out my problem. Yep, yep. And it's nice too because the dates here, if you knew you had a problem at a certain point in time, you can come in here and look, okay, on Tuesday at 9 o'clock in the morning, we had a high CPU. Let me click on that and see. I don't want to click on this right now because it takes a bit of time to load. But if you do click on that, it loads all the events that happened during that profiling session and lets you look at traces for that session, which I'll show you that in a minute in a slightly different way. Cool. So that's kind of the profiler home page. Now let's go back to the performance blade. This is the page that people are going to use to investigate performance problems. By default on this page, it shows you the average length of time for a request. But I would recommend if you're looking into a problem, you want to switch this to the 99th percentile. Now this is showing you the longest request. And this chart down here shows you all the requests that your application serves and how long those requests took for the 99th percentile. So you can see we have some requests here. Now this is a contrived example. We've done this on purpose to make it have long-running requests. But you can see this one here takes 23 seconds. That's a long time. So that's quite a long time for requests. Yeah, you want to look into that. Why is that taking so long? And this chart over here is now not very interesting for this service because all of my requests, because it's an example, take this kind of the same amount of time. But this is a histogram that will show you how many requests, for each time slot, how many requests you had in that time slot. And then the little triangles at the top indicate for that time slice, or that time slot, you have a profile. So I can drill into that if I want and see, okay, I have profiles or request to take 20 seconds. I request to take 22 seconds. And on a production service, hopefully you have a broader range of times. It looks kind of cool. To see all the different time slots that your requests are taking. But once you drill into this, you can click down here to this profiler traces button. And this, you can see it says 41. It means that for the filters that I have on this page, we have 41 examples of a request that took that amount of time that we can show you detailed data about. So I'll click that. And this will load the list of requests on the left. And you could probably count them. These should be close to 41, or if not exactly on 41. And then this is the detailed data that we have about that request. And you should be able to see your code in here. This is a call stack of what happened during that request and what took the most time. We try to highlight the hot path for you, which is the pieces of that request that took the most CPU time. And for this one, you can see we're doing an array sort. And that sort is taking a long time. There's CPU time scattered throughout here. And there's waiting time scattered. So the waiting time is because this is taking so long for the sort that this thread is getting swapped off the CPU and back on. Because the CPU can't allow one thread to just take over. So it keeps sending it off the CPU. Okay, I'll get back to you in a minute coming back on. That's what the waiting is. So you'd probably want to look into why is my sort taking so long? Am I doing it some weird way? Am I sorting too much data? There might be something you could do to speed that up. So I noticed, for instance, I noticed the download trace button. And in Visual Studio, there's also that built-in profiler. So could you theoretically download this trace, upload it into VS, and then track down what line of code is causing that giant performance spike? Yes, you can. Very good question. That's exactly what you can do. And the really cool thing about the download trace button is it gives you that whole two-minute trace. The view we have here is only what happened during your specific request that you're looking at. But this download trace button downloads the whole two-minute ETW file, which you can open in many different tools, like PerfView, Windows Performance Analyzer, or Visual Studio. I can show what that looks like in Visual Studio. This is one I've opened in Visual Studio. And one thing to note about that when you download that file, it downloads it as a zip file. Visual Studio doesn't recognize a zip file. You have to rename the file to end in .diaxession, D-I-A-G-S-E-S-S-I-O-N. If that's the file extension, Visual Studio can open it. PerfView and Windows Performance Analyzer can open the zip files. Sweet. Yeah. And then you can use the Visual Studio Diaxession tools to dig into this and see what's happening with your code at that point. That's awesome that you can work together harmoniously if you want to. Yeah. And the really cool part about it is you're getting this from a production service. You didn't have to do anything to collect this data. We've collected it for you. Hopefully, it has interesting things in it, and you can find something to fix in your app. Yeah. That's awesome. Yeah. And let me just show you a view of Windows Performance Analyzer, because it has amazing analysis that you can do. This is from a Diaxession that I collected from one of our actual production services, and you can see just the charts and graphs are just, you know, unbelievable data that you can get from this. Yeah. It takes a little bit of time to figure out what all it's telling you, but... I'm a little overwhelmed. Yeah. Yeah. But it has really nice, you know, documentation, and I know users that have got some really valuable information from this tool. So it works very well. It's just nice that users have options either way. Yeah. Yeah. And this being a Visual Studio show, we hope you use Visual Studio, but there's other tools, too. Sweet. So if I wanted to go learn more, since I'm very much new to the profiling space, especially with application insights, where can I go to learn more? Yeah. You can go to our Help page, which is here. We have pretty extensive documentation. We're looking to improve it, so leave us a comment if you have questions, and we can edit this documentation anytime. But this tells you how to enable it for different types of services, and then some other options that you have. And then there's some troubleshooting at the bottom if you have problems. Great. Yeah. Seems like that's a lot of good stuff to get started with. Yeah. We hope so. Sweet. And I mean, what's next for the profiler? Well, thanks for asking. We are looking at expanding what we show. In the browser, we have this view of just one request, but we'd like to be able to show, and we've recently added this flame graph. I can't remember if I showed that earlier. But we want to be able to show you information from the whole trace file. This is a very request-oriented view, and we've gotten a lot of feedback from users that they'd like to see. Without downloading the trace, can I see information about the whole trace? And so that's what we're looking at doing now, is allowing you to see maybe a flame graph for the entire ETW session. We want to give you some more insights, too, as to where things might be slowing down and why. And so we're working on some detection of patterns that we see in people's code, and we can give you hints about what might be slowing your code down. Cool. Yeah. Well, I can't wait to see that stuff. Yeah. We'll have to come back and do another show when we have some of that ready. Yeah, definitely. Speaking of which, thanks for coming now. I think that's really cool info that I'm sure tons of people would find incredibly useful whenever they run into those performance issues. Yeah. You're welcome. Great. And then at the top here, there's the help and send feedback. If you use these links, it'll contact our team also. So if you need help with things, we're there ready to help people. Awesome. Yeah. We actually do pay attention to the feedback that people send. Yeah. Yeah. Don't hesitate to reach out. And with that, yeah, once again, thank you so much. And hope everyone goes and tries out the cool profiling tools that we just talked about. And with that, happy coding.