 In C and C++, you've probably spent a lot of time thinking about memory allocation, but there might be cases where you want to think about those same things in .NET. If you want to learn more, there is a really cool profiling tool called the .NET Allocation Tool, which we're going to learn more about on this profiling episode of Visual Studio Toolbox. Hey everyone, welcome to Visual Studio Toolbox. I'm your host, Leslie Richardson, and once again, I am joined by Sagar Shetty, who's APM on the Visual Studio Diagnostics team. Hey, Sagar. How's it going? Good, Leslie. Thanks for having me. Absolutely. Once again, we are back for another episode of our profiling series, and what are we going to talk about today? Yeah. We're going to go through another deep dive for another tool within the Performance Profiler in Visual Studio. Today, we're going to talk about the .NET Object Allocation Tracking Tool, a bit of a mouthful, but it's one of two memory profilers in the Performance Profiler Suite, and it's basically designed to show you the top functions, the code pathways, and different types that are allocating the most amount of memory in your code. Cool. I don't know if you ever simplify that down to note or no add. Yeah. Normally, we just call it the .NET Alloc Tool internally, and that's short enough. Yeah. Yeah. Cool. Usually, when I think about memory allocation or just allocating anything, I typically think of C or C++ where you have to do all that manually, but in .NET, usually, there's the garbage collector who takes care of all that stuff for you. Why have the .NET Allocation Tool? Yeah. So, garbage collection, that's definitely a great point, Leslie, and especially with .NET, it's gotten to a point where there is the garbage collector that, like you said, does a lot of that on memory management automatically. That being said, profiling at a high level is about getting the optimizations that you want to the highest degree. So, even though garbage collection can help you with a little bit of that memory collection, there are still optimizations to be had based on the way that you write your code, and hopefully using our tool, we can help surface some of those optimizations that you can do on top of garbage collection. Yeah. So, can you describe a little bit about what those specific use cases would be for a .NET user, who might want to use this tool? Yeah. Absolutely. So, I think the easiest way to explain is to jump right in. So, let's go into VS and then start looking at some of the views, and I think some of those scenarios will become more apparent. Awesome. So, going into VS, so again, just like all the other tools, to get into the performance profiler, it's the same workflow. You go to the Context menu, you can go to Debug and then Performance Profiler, or just use the keyboard shortcut Alt F2. So, then we get to this page, and to give a little bit more of a background with this particular tool. So, today we're talking about the .NET Alec tool. So, I have this box check marked. So, this tool is going to be good for really any sort of managed scenario or managed application. So, it's good for all flavors of .NET. So, Framework, .NET Core, ASP, .NET, et cetera. What's your framework too? Yeah. So, all flavors of .NET. Yeah. So, it really is a managed and pretty comprehensive tool in that regard. For native, you're probably, you're not going to use this tool. Just based on the way it's architected, it uses the I-Core Profiler, just kind of the profiling interface for the .NET runtime. So, it really is a more of a managed experience. Is there a similar tool for C++ users that they can use? Yeah. So, on the C++ side and for memory analysis on the native side, you're going to end up using the memory usage tool. Now, these two tools aren't exactly alike, but that will give you some insights in terms of where your code is spending a lot of time from a memory perspective. And so, kind of going back to the .NET Alec tool, another thing I'd like to call out is in this particular settings window, so if I click on this gear icon, we come to this particular window. Now, in the past, we've talked about a few different data collection methods and when Esteban was talking about the CPU usage tool, for example, something we talked about was kind of sampling, which essentially was a data collection technique where we were taking snapshots of the performance data in our code and kind of stitching that together. The .NET Alec tool, on the other hand, uses a slightly, you can use sampling if you want, like you can switch it over to this, but by default, it uses a slightly different data collection technique called instrumentation. And instrumentation is essentially, if you think of snapshots and sampling, it's kind of taking pictures and stitching that together, instrumentation is kind of like a video. It really is much more detailed and is giving you exact call counts and like very fine tune and precise and accurate data. So that's cool because you can get exact call counts and we'll see some of those values in the example today. I will say, and one thing I will caution users using this tool is that as a result, this data collection technique has a pretty high overhead. And so to kind of counteract that, one recommendation I'd have is to keep traces as short as possible. And I will kind of reiterate this at the end of the video, but performance is definitely something we're working on with this tool in terms of improving it and speeding it up. I can assure you this is something our engineering team is very hard at work for, but yeah, in general, just as far as best practices keeping traces short would probably be something I'd recommend with this tool if you want to use instrumentation. So what is the default set to like frame-wise or however you measure how much or how little gets tracked? Yeah, so you're going to look at with this tool, it's going to literally track every single object allocated. So as we'll see once we kind of run the tool and go into the views that it will kind of look at different object allocation types and for each different object type in your code, it will show you how many objects were created and how much memory that total is taking up. And so by default, the instrumentation is tracking all of them and grabbing all of those objects and it's measured on an object level. Yeah. Sweet. Well, I'd like to see it in action. Yeah, yeah. So I'm going to close out that window. I have an app loaded up and I'm going to go ahead and click start and start the profiling session. So the app we have today is actually a WinForms app. And this app is essentially what it's doing is helping us visualize prime numbers. So it's a pretty simple app. You have a minimum value, a maximum value and for each of those values and ellipses created and each ellipse it corresponds to like a specific number. And for numbers that are prime, they're indicated by like a green icon and for numbers that aren't it's yellow. And so we'll kind of exercise this a few times, click stop. And this app does quite a lot of memory allocation. So it's a good way to show off this particular tool. In this case is an object being created for each number or each dot that was on the screen there or- Yeah, so we're going to dig into the source code a bit more, but for each ellipse that's creating and in this case an ellipse object. So we're using an ellipse class and so there will be an ellipse object being created. And so yeah, a lot of memory is being allocated. I can imagine. Ramp that max up to 10 million. Exactly. Okay, awesome. So now we're kind of looking at the report generated by the Donna and Alec tool. And so there's a couple of views here. So the first thing I want to point at are kind of the graphs at the top of the screen. So at the, like the highest graph we have live objects. So these objects are just a total count of the live number of objects that are allocating memory within your code. This is objects across all different types. And in some of the views when we drill down to the tables we can kind of look at and categorize objects by specific type, but this graph is just kind of showing an all up total count measurement. So that's kind of live objects. And then the second graph is the object delta. So the chain. So this is showing any time you have like large spikes in objects. And then as we kind of alluded to earlier Leslie with garbage collection kind of cleaning up and reclaiming some memory you have these red bars every once in a while. And so the red bars are actually indicating where garbage collection is occurring in your code. So in general you can think of it where green bars are just kind of adding on more and more objects. And at the beginning you'll see large deltas because just the way the math works out, right? When you don't have many, yeah, exactly. When you're initializing and you don't have many objects to begin with even adding a few more objects percentage wise because this is on a percentage basis is a lot. Over time it tends to dip a little bit. But the really important thing with these graphs or swim lanes as we call them within the performance profiler is you can time select and filter down by that. So for example, I can select a range on this graph. And what that will do is filter down the data in my tables by that time range I selected. So if I was really interested in a bit of garbage collection here or some area where there was a lot of activity I could filter down and then look at specific time ranges and really dig deeper for that specific time range. And another thing I'll point out and this isn't as applicable for this tool but just to reiterate, when you're running multiple of our tools in conjunction with each other we will stick swim lanes at the top if you're using a tool such as the CPU usage tool that has a swim lane. And if you're running multiple tools in conjunction with each other and you wanna filter by the same time range across all tools, if you do do the filtering at the top for all the tools and all the reports it will filter by that time range too. So with this tool you're generally running it by itself because of the high overhead but I just wanted to kind of call that out again. So those are kind of the graphs and I'm just gonna clear the time selection for now just so we look at everything all up and now we dig into the tables where there's I think a lot of insights. So generally you start the, there's a few different ways that you can kind of start your investigation for today I'll kind of start with the allocations view. And so the allocations view is essentially showing you a bunch of different object types or classes or structures within your code. So there's a very long list here and so like ellipses kind of bubbled up to the top and for each one of these types we're showing you a number of allocations. So this is the number of in this case ellipses or objects create objects of that type created within your code for furthermore in addition to just the number of allocations of that type we're showing you the actual amount of memory that's being taken up. So that's what the bytes column is showing you across all of those allocations and then also average size. So that's just the division between bytes and allocations. Wow. So it looks like a lot of allocations going on for pretty much all of these. Yeah. And so we'll dig into ellipse a little bit more in the source code in a second but one more thing I wanna talk about as far as the types is generally they fall into two kind of categories and two subcategories. So the two main categories are value types and reference types. And actually if you notice this is something we've kind of modified over the last years. We actually added in icons into this particular view. So this blue icon over here indicates a value type and this yellow one indicates more of a reference type. So what those are. So value types are things like as we see here a double or an integer or a Boolean even. So what value types are. So if you think of an example. So I said like an integer for example is a value type whenever you create a variable of a type of a value type such as an integer what happens is a specific memory address is kind of pointed out and that's where the variable is initialized and stored. And in the case of a value type the value of that variable and that variable exists at the same memory location. In the case of the reference type things work a little bit differently. So a reference type are things like strings, classes are reference types, arrays are reference types. And in the case of a reference type where a variable exists and where its value exists are actually two separate spots. So a string might be initialized at a specific memory address. And then its actual value the contents of the string are at a slightly different memory address. So at the memory address where the variable is stored it actually also contains a pointer to the place where the value is stored. And the reason why this is important Leslie is because of how these two types are stored in memory within like the dawn at runtime. So if we think about our memory it's essentially kind of like a physical block, right? There is a limited amount of finite, it's a finite resource. There's a limited amount of space we have to kind of store data and allocate memory. And within that memory the way it's kind of managed in .NET is there's at a high level two partitions. There's the stack and the heap. So the stack is generally a partition where things that are more short-term are stored kind of local variables, things of that nature. And the heap is a partition where more long-term objects are stored in general. So it's more objects and things of that nature things that are more long-lived. So this is kind of like over simplified and more high level but that just kind of gives you a sense of where those two things are stored. And the reason this is important is even though value types are stored on the stack sometimes if they're cast to so like if you have an integer and you cast it to an object that actually becomes a reference type. So then it ends up being stored on the heap as well. So now you have a value type that's taking up memory on the stack and the heap. So in other words, it's taking up twice as much memory. So you kind of have to be on the lookout for value types and that's kind of why we surface them with these icons here. So yeah, we have value types, you have reference types. Also, if we go back and look at this back trace you may notice I kind of mentioned that there are two kind of sub types. There are also these kind of blue icons with these like buckets under them. And also we have like the yellow icons with the buckets under them as well. So in that case, those are value type collections and reference type collections. So again, like the blue icon is the value type so this bucket is showing a collection and then the yellow one with the bucket is showing a collection as well. So that's essentially just taking a value type and showing a collection of it. So in this case, this type is effective value entry and then it's like a list of that type. And then in this case, like here we have like system.object and it's like a list or like an array of objects or a collection of objects. So that's kind of what those icons are showing. So anyway, that's a bit of a tangent backstory on different types just started going to that but kind of getting back into the code a little bit. So we noticed before right now we have this sorted by bytes. And so whatever has the highest number of bytes bubbles up to the top. So this is taking up a lot of memory. And what if we wanted to like investigate this a little bit more and see kind of what's happening in our code here? So if you double click on this line what we show you in this right panel and let me modify this window a bit is a back trace. Okay, so after we kind of click on this ellipse type we want to look through the back trace and kind of see where in the code it's being allocated a lot. And so we sort of see this generate primes function allocating a lot of memory and a lot of bytes. So now what I wanna do is ultimately go back to source code and see if there's any sort of modifications that can be done to kind of optimize this so I can right click and hit go to source file and in this case I have the code up. And so we kind of come to the generate primes function. Cool, so I'm a little surprised because in some of the previous tools that we talked about in past episodes like the CPU usage tool and I think memory tool, database tool they all had like hot path function tables like with little fire icons that indicated here are the functions that you should consider honing in on because they're hot spots for all the CPU usage or memory per issues that sort of thing. Yeah, exactly. And we'll actually touch on that again with this tool and in fact just a sneak peek we kind of have that same functionality in the call trees window. And so we'll talk about that a little bit later and kind of here it is. But yeah, we'll come back to that and for now we'll keep going through the allocations view but yeah, definitely the expand the hot path is definitely a useful feature and we do preserve it in this tool as well. Cool. And so now Leslie we're kind of looking back at some source code and trying to see where we can optimize and see why ellipse is allocating so much memory. So we have this generate primes function. We kind of have these long values the min and max that we saw in the application before. And if we kind of scroll through here we see we have a for loop going from min to max. And within this for loop we have an ellipse which is in this case a class. And so we're creating a new ellipse object for each different iteration of this for loop. And so that's why when we had that application before that is a lot of ellipses, right? Yeah. And so one thing I will point out here is based on the nature of this particular application and this visualization even though we are allocating a lot of memory towards ellipses for this visualization we actually do want each of those ellipses, right? Cause we wanted an ellipse printed out or shown for each number. So yes, it is taking up a lot of memory but you just kind of have to think about is this an actual bottleneck or are you willing to live with this? Because if we were to want to improve this for example, right? There's only so much we can do to get around this particular issue based on the way we've currently implemented this code. Right now, yes, even though we are creating a new ellipse for each iteration of the for loop we actually want to do that and paint to the specific color based on whether it's a prime number or not. If we wanted to not have to use an ellipse we'd have to really think about how to kind of instruct this code very differently. And so it might not be very easy to do that. We probably wouldn't use an ellipse class seeing as the height and the width are the same. Maybe we use like a circle or something different. And so even though we can fix this or optimize that that's not what I wanna focus on for this particular demo because ultimately it might be a bit more involved. A question I might have instead though is this function seems to be getting a lot of bandwidth and doing a lot of work and yes, there's a lot of type ellipse being allocated and created. But are there other types within this exact same function that are also being allocated a lot and is there another optimization to be had? And so to kind of like answer that question I kind of wanna go over to another one of our views within the dig session. So the first thing I wanna do is actually copy this function because I want to investigate this function more. And the question I wanna ask myself now is not so much like what are the top types being allocated for a specific function in particular that function I was just looking at what are the top types that that function is allocating? And actually we have a functions view that can help you do just that. So this is the functions view kind of showing you similar data to what we were looking at before but just grouped differently and kind of visualized a bit differently. So something I wanna emphasize with the allocations view the call tree view which we'll look at it shortly here and the functions view is you're kind of looking at similar data it's just grouped a little bit different and it's kind of like a slightly different pivot table if you will. So in the case of the functions view we have the process ID up top and then within that we have different modules and then within modules we have specific functions. Now I had a function in mind that I was interested in and so we actually have this search bar here. So I'm gonna post generate primes in here and enter. And so now it's going to bring me straight to that function of interest. And when I come to this function of interest if I expand this particular node I see all of the top allocation types for this specific function. And so we have like the total allocations for this specific function. We see the self allocations which is the amount of allocations that just this one function is doing total is including this what this function generates and all of its children. And then we also have the self size invites so the actual amount of memory. But if we dig into the types before once again we see, okay, ellipse is the top there's a lot of ellipses being generated within this function. But if we look at this a little bit more Leslie there are other types that are also allocating in quite high quantities like even more so than the ellipse, right? So we have 30,000 colors being allocated and 30,000 color color color brushes being allocated. And they don't take up quite as much memory as the ellipse but they still take up a sizable amount, right? Over a million bytes for each of these. And so now that with this information in mind I kind of want to go back to source code and say, hey, let's look at that function again but not so much focus on the ellipse but other of these types and maybe there are other optimizations to be had and if we optimize this even though it wasn't the top type it's still memory we're saving. So I want to go back to the source code and so I can do that by right clicking and saying go to source file and I have the code up so I'll just go back here. And as we're looking at this code again if we look at this for loop more closely so yes we're creating an ellipse for each iteration but we're also filling it with a specific color and that color is actually not changing but based on the way we have this implemented we're creating another solid color brush object each time even though this is just in this case this is the color of yellow and in the case of the prime fill color we had it green so we're creating another solid color brush object and making that green but we're creating a new object each time in this for loop too, right? Yeah, that's a loss. Right, so in the case of the ellipse like I was saying before yes there are ways if we want to optimize that we could but it would be a little bit more involved we probably wouldn't use the ellipse class but in the case of the fill color we can actually do something pretty quick here so what we can do is instead I kind of have the code commented out but as an example we could pull out this new solid color brush object and bring it up to a static member right here and we could assign that to a variable like fill color and then prime fill color in the case of like the yellow and green and then instead of doing fill equals like this new instance of an object we just assign it to this static member fill color up here so that's kind of what I have commented out here and so I won't rerun the code because it might take a little while but basically if you do that what you'll see and then rerun it like this is that all of those but essentially all of them will go away because we're not necessarily creating a new solid color object every single time for every iteration of the for loop we just have that static member that we declare at the top and that's not changing and then we just reference it each time within the for loop and it just gets painted that particular color and so that will save us a lot of allocations because if we come back to the functions view like this function alone had 30,000 plus allocations of both color and solid color brush and over a million bytes each so a lot of all of that will kind of go away essentially and so something I want to point out here is that the nature of the investigation I mean it's kind of up to you to figure out what you want to optimize like maybe you do want to optimize and go after the top frame which was a lips in this case but as you pointed out the code change might be more expensive or more involved and so it's ultimately up to you to figure out what's worth your time and what the trade-off is but in the case of color and solid color brush that's quick win and it's the only way to help your application so I just wanted to point that out Yeah, and I think that's kind of the theme for a lot of profiling investigations like that like ultimately it's like okay how badly do you want to fix this perf if it means having to modify your code in a way that maybe doesn't make sense depending on the context so Totally and like in the business world we have to deal with in software development you have to deal with these trade-offs all the time and so ultimately on the profiling team what we're trying to show you is just data and insights into how your application is actually performing what you want to do with that data ultimately is up to you but yeah, you'll have to ultimately decide where you want to spend the time and what's worth it Not that it's been quoted a bajillion times already but with great power comes great responsibility Great responsibility Exactly So kind of talked about a little bit before how we're kind of showing you different pivots on similar data between the allocations functions and call tree view and so now I kind of want to dig into that third view the call tree view and we kind of alluded to it earlier Leslie with the expand hot path feature and so what the call tree view is showing you is just what are the code pathways that are allocating the most amount of memory so with the allocations view you're filtering by kind of like a specific data type or like object type with the functions view maybe you have a specific function you want to really drill down into and say okay, I'm interested in this function across the entire time span or if you're using the swim lane filtered down time span what are all the allocations happening here the call tree view is just saying okay, like maybe I'm not interested or focused on a specific object type yet or I'm not focused on a specific function yet but what are just the code pathways just show me the path that a lot of allocations are happening and so one thing you can do with the call tree view is kind of expand nodes individually but as we alluded to earlier what I would recommend people do is start at like a node of interest and use the expand hot path feature so what this is doing is essentially showing you where most of your allocations are happening for a given path so to kind of walk through some of the metrics in this view again so at any given node we have the total amount of allocations happening so that's all the allocations at this particular frame and then all of its children we have self allocations which is all the allocations just at this particular level so this is native but like if we wanted to look at main for example main is allocating eight different things and then we have the bytes in terms of that in terms of memory not in terms of the number of objects but in terms of the memory and then we have the module name which is essentially showing what module that function is associated with and sometimes it'll be associated with multiple modules and so with the expand hot path algorithm is essentially doing is saying hey, as we're walking down this if there's a lot of self allocations happening within total allocations that means you should go into the next function and dig into that a little bit more because that function is contributing to a lot of allocations and so I kind of started it up here and then used the expand hot path and so what it brings us down to is essentially two things one is this let me expand this out a little bit more this generate button click method which is certainly allocating a lot of memories because that's the button that's essentially triggering that visualization and then also this allocations frame so kind of walking through each of these two individually so the allocations frame is saying hey, at this particular node right above it so in this case it would be like application.run system.windows what are all of like the top allocations happening for this particular method so similar to the functions view but it's specific to this particular call tree and call path right and that's something important to denote because there are functions in the call tree view as well as functions in the functions view so someone might ask like what's the difference the difference is the functions and the functions view is looking at the data all up so it's kind of combining the functions across all the times it's being called and kind of adding the allocations across all types within that the call tree view is looking at a specific call stack right so like any given function might be called many ways and if we drill down into like these different nodes like you'll see a lot of the same functions being called multiple times but it's showing you each different iteration of a specific like call stack and so something you can see in the hot path is the specific allocations for a node of interest also it will sometimes end with another function to look at and I started the hot path from this highest node something to note is you can really start at any level you want like let's say I wanted to look at the generate prime function I can start the hot path here too and it'll show me the allocations or other UI you know external calls that are happening as well so yeah this is just another view another pivot on that data and it's allowing you to kind of go through the call trees and ultimately see what code pathways are allocating the most amount of memory that can be really useful it seems for a lot of dot net peeps out there especially dealing with a lot of graphic intensive things like that like that application with the prime numbers so yeah and something I want to call it again is especially in this view this is a time where you're really probably going to engage some of the time filtering if you're like really interested in garbage collection and you want to see hey what were the calls that like I don't care about necessarily all the functions in the world but what were the functions being called at this particular time this is when you combine the graphs with this call tree view again we're improving peripherals a bit slow but it'll show up eventually and yeah you kind of combine those two views together awesome yeah there it is so you mentioned that Perf is still a work in progress for this tool so anything else on the roadmap for dot net allocation? yeah so this actually segues perfectly into our last view which is the collections view absolutely clear this selection and then go to the collections view and admittedly this kind of this view is pretty young right now and so we kind of want to work on it but essentially as we alluded to earlier there's a limited amount of memory you have to work with right and it's a question of how to allocate and manage it best and luckily as we mentioned previously dot net does a good job of having the garbage collector come through and automatically scan the heap portion in particular of the memory and looks at what are objects that are allocating memory but are not being used and kind of cleaning that up and so what the collections view shows you is one kind of instances where garbage collection is being occurred and if I click on a specific row within this table I see first of all the number of objects that were collected and how many survived and then we also get our pie charts over here which kind of shows you the top types within each of those garbage collection like what were the types that went away and what survived and so like I said before this view is more in its infancy of course you can also still time filter you can still see on the graph like where the red bars are occurring but what we want to do here and we're still kind of working on designs and the best way to bring this out is like what are the actionable insights from this code so like yes this is showing me a little bit about like where garbage collection is taking place and what are the top objects that are surviving or being collected but we want to kind of in the future show you more insights all around like how do you go back to source and what are the optimizations within your code you can do to maybe not have garbage collection happen as often or more efficiently or better so that's something we want to look at improving this view as well as also perf with the tool yeah awesome so many tables to choose from so it's called options great a lot of different pivots on the same data yeah I like I like options personally I think yeah more customization the better absolutely so thank you so much Sagar for sharing the dotnet allocation tool so if people want to try this out or learn more about this particular tool where can they go yeah so we've got docs as always with all of our tools and so yeah docs for the dotnet alloc tool as well are updated so yeah we'll point you to those documentations and you get some more of those details and if you have any questions of course always reach out to us awesome and this is not the end of our profiling series so what are we gonna talk about next time yeah so next time Esteban is gonna cover the dotnet performance counters tool so really excited for that one that is actually our newest tool if I'm remembering correctly and so that one that one should be really fun great well thanks once again for coming Sagar probably gonna see you in the near future absolutely pleasure as always Leslie thanks for having me likewise and until next time happy coding