 This is the session on enabling application performance and monitoring within .NET applications. My name is Dave Tillman. I'm with Pivotal. I've been working on the SteelToe project for well-sensed inception from day one. In fact, my favorite colleague last night, Tim here, who also works on the SteelToe project, decided to give me a nickname, which was called Papa SteelToe. I don't know why, but anyway, I prefer PapaToe. So we'll take the short version of that from now on. But anyway, I've been working on SteelToe for quite a while, and so I've got some good things to talk about today before I get there. Let's do this slide. You've all seen this at least a dozen times today. So I'm not going to read it to you. I think you already know what it says, and so I'll move right on to the session itself. So here's the outline today. We're going to start out and talk about some cloud-native frameworks that you probably already are aware of today, at least in the Java.NET space. We're going to talk about Spring Cloud and SteelToe, two frameworks that we'll go into a little bit of depth on, but what we're really going to focus on in this session is, of those frameworks, monitoring and management, the tools, the technology that you can include in your Java apps or in your .NET apps, of course I'm going to focus primarily on .NET in this session, to enable monitoring and management of your app in a production environment, runtime type environment. And then throughout the session, I'll be using a demo app to try to go illustrate all those features that you can make use of. I'll also bring up some code, show you how within .NET and SteelToe, using the SteelToe framework to enable this stuff within your app. Okay, so first of all, in the Java world, we've all probably heard of Spring. We've all heard of Spring Boot these days, or at least many of you probably have. And then, of course, Spring Cloud, which is built on top of those things. So you've got Spring Cloud, which is a cloud, basically a Java framework that you can use to basically cloud enable or make your applications cloud native in your environment. Spring Cloud is based off of or built on Spring Boot, which has become very, very popular, which in turn is built on the Spring framework itself that's been around for many, many years for building web apps and other apps within the Java ecosystem. When layered on top of that, Pivotal has taken Spring Cloud packaged up components of it and offered up something called Spring Cloud Services, which is an offering that will run on Pivotal Cloud Foundry. It's packaged up as a tile, as a Bosch deployable unit. As part of that, you get several pieces of technology, things like a service registry, centralized configuration. You get Histrix, which is something created by Netflix. You get a bunch of components that are quite useful in building microservices in the Java language. And this has been around for actually for quite some time, actually several years. Then on the .NET side, what we did is we started a project, Pivotal sponsored a project. I think it's been about two years now, I think. Probably right, Jason, when you say we've been doing this for about two years. So the Steel Toe Project started about roughly two years ago. And the idea was to enable .NET applications to be able to use those services that I just described running on Cloud Foundry and making basically .NET a first-class citizen for app and microservices development on Cloud Foundry. This is a project. If you've not heard of it, it's an open source project. Like I said, it's about two years old. There's the link to the code up in GitHub. There's some documentation that goes along with it there that if you're interested in digging in more depth on it. And there's a Slack channel where you can find myself, Tim, and others that work on the project. So if you need help, you just come to that Slack channel. We're always there ready to help. The Steel Toe framework works both on .NET Core, in other words, the Core CLR, .NET Core framework on a cross-platform, cross-operating systems. It also supports the full framework that .NET full framework you've been using for probably many, many years. It also works with ASP.NET Core and ASP.NET 4x. So if you're moving on to ASP.NET Core, Steel Toe will work seamlessly in there. And we've also done some work here in the 2.0 release to try to make better support of ASP.NET 4x. I like to refer to it as the legacy web app technology. On the right side of the screen, you'll see the areas that Steel Toe covers. There's really kind of two areas that it focuses on. One is simplifying building apps and running them and deploying them and managing them on Cloud Foundry. And then the other is tying into those Spring Cloud services that operate on Cloud Foundry. Those are actually Java-based, they're built in Java technology, but they all offer up REST services. So Netflix Service Discovery, Historic and all of that are easily accessible from a .NET application using Steel Toe. So what I want to do is I want to, we literally could spend hours and hours and hours on all of the components that make up Steel Toe. So what I want to do in this session is I want to focus on the management monitoring, what I refer to as the M&M's of Steel Toe. And so if you leave today with nothing else, remember this to be the M&M presentation given by the Steel Toe team. And you're going to learn about monitoring and management as it relates to .NET applications. And I'm actually going to cover both areas, both Java and .NET, so that you can relate, if you're a mixed shop and you've got both Java and .NET, here are some of the Spring components that make up, I'm sorry, the Java components that make up the equivalent functionality in Java. And then I've also listed the Steel Toe components. And you can break up monitoring and management into four major areas. We've got, at the very top, we've got what we refer to as the management endpoints. These are endpoints that I'll go into here in a minute that allow you to gather or collect information out of your app at runtime. And they basically arrest endpoints that you can hit. And I'll describe them here in a second. Then in addition, if you are interested in using hysterics as a fault management, a fault tolerant latency kind of framework for building your dependencies on remote instances, you can use hysterics. And within the hysterics framework, there's a lot of status and monitoring and statistics that flow out of that hysterics environment that you can make use of. And we've got an implementation in .NET now that we did as part of Steel Toe that's completely compatible with the Java version. So you can gather metrics and performance data out of both your Java app and your hysterics, have them flow into one common dashboard and see the status of your circuits that you've got going. And then two new things that we're kind of announcing, I guess, at this show that we've just recently added is we're adding distributed tracing functionality and we're adding metrics into the .NET world. Those things already existed in the Java world. You had something called Spring Sleuth that was used for doing distributed tracing. What we're announcing here is that we've done a Steel Toe open census implementation. OpenCensus is a project that has been championed by Google. And it is basically centers around implementing distributed tracing. And we've done a .NET implementation of that. Basically a port of the Java open census code over into .NET. Obviously with some changes that made more sense for .NET. And then on top of that, we ported, of course, all of the unit tests. So there's 450 unit tests that we had to port over as well to make all that work. In addition, then, we've also added support for open census and Steel Toe for metrics gathering. So that's one of the really beautiful things about open census is it's not only all about distributed tracing, but it also has stats gathering or metrics gathering. And so we've gone ahead and ported that code over as well. And then we're building on top of that, as I'll show you here in a little bit. All right, so let's focus first on the management endpoints. These are essentially endpoints that give you access to information that just by incorporating the Steel Toe framework into your application, you're able to provide these types of things inside of your application. The endpoints themselves are not really tied to HTTP REST calls. But if you want to, but out of the box, if you just drop it in the way you typically would, it'll expose these endpoints as functionality as REST endpoints that can be called by any application. Today, I'm going to show you the Pivotal Apps Manager, which will actually call those REST endpoints and surface the information within an ICUI within Pivotal Apps Manager. And there's the list of the endpoints that you can make use of. We just recently added to the list for .NET applications dump and heap dump. Dump allows you to capture a thread dump of the application at runtime. And the heap dump actually allows you to capture a mini dump. So if you're diagnosing a problem, let's say you've got a maybe you have a memory leak or something in one of the instances, you'd like to grab a memory dump or a heap dump so that you could do some off analysis of it, you're able to do that right within the tool. And all of these endpoints, like I said, they're offered up by both Java and they're offered up by both .NET. So anything that is able to call those REST endpoints and display the information coming out of it can be used to access the information out of it. So I'm going to show how you can do this with the Pivotal Apps Manager here in a minute. But of course, like I said, it's just REST endpoints. So here's the demo app that I'm going to use. It's a real simple app. We've got a shopping cart microservice on the front end. You hit it at the checkout endpoint. It calls the process order microservice, which in turn calls the pay process. And then the results all flow back through. Very, very simple. And I've enabled it with both the Java and the Steel Toe components to provide these REST endpoints within the application. And I'm going to use the Apps Manager down there at the bottom to actually surface the information that comes out of those apps. So if the demo gods are good to us today, we will have. So here's the sample. Very simple. I'm just going to hit the shopping cart process, hit the checkout, hit that a few times, and basically just return some JSON that says the order's been processed and the cart's successfully been charged. So we'll get a couple of things going there. Now if we go back to the Apps Manager, we see the three microservices that I talked about. The payments and shopping cart are Java applications. The order processor is the .NET application. Ignore the Zipkin server. I'll get to that in a bit here in a second when I start talking about distributed tracing. So if we bring up, for example, the shopping cart service, the first thing that happens within this particular product, the Pivotal Apps Manager, is you see a logo up here in the top left corner, which is the Spring Boot logo, for those of you who know a little bit about Java. And that signifies that this app has been augmented with the actuator management endpoints, right? The rest endpoints have been enabled. And what happens is there are some additional menu items that are added to the application. So for example, we see this trace. We see threads, which we can use to do thread dumps. If we go down here and look at one of the instances that's running, we'll see that we can take a heap dump of this particular Java app, et cetera, et cetera. So these are things that the Apps Manager has queried the application, hit the endpoints, determined what's actually implemented and available, and now surfaces that within this. So for example, we could take a look at threads in this particular Java application. Here's a list of all the threads. And you get a stack trace for each one of those. So if you're trying to diagnose a problem where you're beginning to see an instance slowing down and you're trying to figure out what are all the threads doing, why isn't it responding to requests or why are they taking so long, you could do a thread dump, look at the threads, see what they're all busy doing, and maybe begin to get an idea of what's going on inside of that app. If we switch over to the .NET application, similar thing happens now because we've enabled this application with the actuator endpoints. In this case, we get the steel toe symbol up at the top. That signifies that the actuator endpoints are now in the application itself. We get a trace. We get some things. We get able to view traces. This particular app that I've got is an ASP.NET core app that's running on Linux. And so since it's not running on Windows, I don't have the thread dump and I don't have the heap dump available to me right now. That's something we're going to add in the future. But it's not there yet. One of the things, for example, that you get is, notice the steel toe symbol down here, is you get some get information. So one of the things that the endpoints provide is the info endpoint that allows you to capture basically some configuration information about the application. In this case, it's get information. It tells you what commit this app was built from. It gives you when it was built some things that, typically when you're starting to diagnose a problem, you want to know what instance and what bit of code am I actually looking at so that you can begin to look at it. So let me switch over to looking at this app itself, the order.net app. So in order to get those things to those endpoints to become available, those rest endpoints to become available, it's pretty straightforward within your application to add that functionality. So if I go to Program Main and take a look at the application, can everybody see that? A little bit bigger? Let's see, Control. How's that? Good enough. So there's a couple things that were added in here for making this, enabling this application with steel toe and those endpoints. The first thing as we do is we add this Cloud Foundry configuration provider. This basically parses VCAP services, VCAP application, and pulls that information into your application and makes it part of your application's configuration data. It also makes it available for all the other steel toe components for operation. Then as you're going to see in a minute, we're going to talk about logging. One of the features that's in the actuator endpoints is the ability to monitor and change the logging of your application on the fly. To do that, you have to use something called the Dynamic Console Logger. It's basically a wrapper around the standard logging provider that Microsoft provides for a console. But it also then allows us to do this querying and changing of the console of the log levels. So those two things are needed in Program Main. And then over in your startup class, you need to add the Cloud Foundry actuators. This will add all of those endpoints, info, logging, heap dump if it's relevant. All those things that I said, just with this one single command, will add those endpoints as services within the container. Then if you go down here, you then have to add those rest endpoints into the pipeline, into the part of middleware. So then you just do this. And now all of those rest endpoints have been actually enabled and made available now within your application. That's all you need to do. There is one small thing I forgot. I just thought of, you do have to add a little bit of configuration. So in appsettings.json, you have to enable this path for the management endpoints. This is the endpoint that the apps manager calls in order to access those rest endpoints. This is the context. That's pretty much it. So just quickly trace. What trace does is when you have the trace actuator endpoint within your app enabled, it keeps a circular buffer of the last 100 requests that came into that application. It captures request headers and the response data. It also captures response time. This particular request took 37 milliseconds. And basically it's actually configurable as to what information you collect as part of that trace. As I said, it's a circular buffer that runs by default 100 requests. Then when we look at logging, one of the things that I mentioned is that's enabled is you are able to configure logging levels. So if I click on this logging level thing, I'll notice first of all that I'm able to see now, I've got 109 loggers that have been created within this application since it's been running. And I can go in here and say, web application two, and I see all of the categories underneath that, and I can just go over here and switch this to debug. And now all of those loggers have been changed from whatever it was, info or nothing, to debug level logging. Now you can capture logs, for example, if you're trying to diagnose a problem, you can capture some logs. This has been changed on all instances of this particular application, whatever it may be. So you can capture the logs and do some offline analysis if you're trying to diagnose a problem. Then when you're all done, you can go back and set it to info and you're ready to go. So this is a nice little feature that's enabled by the actuator endpoints. Also, if I go in here, there's what's called a health endpoint. The health endpoint is essentially capture, in this particular case, what it's capturing is it's looking at the disk space that you have available for this particular instance of the app and making sure that you're not eating up all the disk space. If you can set thresholds such that if that app begins to write too much data to its local disk and begins to overflow, it'll flag this as being an application that is in bad health. You can also add other health indicators into here. Like, for example, we have a MySQL health indicator, which will actually test the MySQL connection if you actually have a MySQL database tied to your application on the back end. There could be RabbitMQ and, in fact, you can write your own health indicators and plug it into this infrastructure without too much trouble and you'll just see this information pop up within, at least as Rest JSON data that comes back in Rest calls. So those are the management endpoints. I pretty much focused on the .NET ones, but all that same stuff is there for Java as well. So it's nice. You have one common set of endpoints that you can use to look into and peer into both running Java applications and .NET apps. This has been part of what I just described has been part of Steeltoe up through 2.0. We're currently working on 2.1 of Steeltoe right now. Okay, so let me move on to another area. Distributed tracing. This is a new area for 2.1 for .NET, for Steeltoe. It's been around, obviously, in the Java world for quite some time. It's part of, in the Java world, it's called Spring Cloud Sleuth, and it's pretty much tied to Zipkin, although I think in the 2.0 release, they're doing some things to add open tracing support as well. And then if you want to add, you know, basically use distributed tracing within your app, in the Java world, you add a dependency on Sleuth, and in the Steeltoe world, you'll be adding a dependency. This is not currently out there today. It's still in process. You'll add something called tracing core as a nugget dependency, okay? When you do that, out of the box, the first, one of the things that you get right away is all trace IDs and span, what are called span IDs, you think of a trace as a distributed request that's going through many multi, many microservices. Within each one of the microservice, there is a typically at least one or more spans created, which are capturing trace information about that particular request, and you can add or annotate whatever you want to that span. That's all collected together as part of a trace, and then optionally, as you'll see, you're able to actually send each one of those microservices can send their spans up to a central server for analysis and logging. Out of the box, what you get with the Steeltoe distributed trace is we enable adding span IDs and trace IDs to all log outputs that come out of the application, and what that allows you to do is do something called log correlation. This is also done in the Java application, so what's added is essentially this capability right here, or that, I'm going too fast, that little bit of service ID, and service name, trace ID, span ID, and exported is all added to every one of the log entries. And then what we've also done as part of Steeltoe is we've then gone ahead and instrumented all the ingress and egress places within your application. So when a request comes in to say an ASP.NET Core application will automatically start a span, a piece of the trace, and then if you initiate a HTTP request call to another outside process, will actually forward the context, the trace context automatically for you out that egress point, okay, so that that trace will continue to flow throughout the microservices application. And like I said, this works for both Java and .NET. It's all interoperability, all interoperable, and I'll show you here an example. That's what you get automatically out of the box. Then you can optionally, if you'd like to, instrument your app yourself. So you can start your own spans or you can add context or more information to the spans, like for example, custom IDs or order IDs or any information you'd like to add to your application or add to your trace can be done. And you do that in the Java world using the Brave APIs, the Zipkin Brave APIs, they're part of Spring Cloud Sleuth. In the Steeltoe world, you'll be using the OpenCensus APIs, the ones that I talked about that we've implemented as part of the Google activity that we've done. And then optionally, you can enable your application to send those spans and traces to the back-end server and you can export them to different back-ends. Zipkin is one of the more popular ones that we have used within Pivotal quite a bit, but there are others, Jaeger, Stack, whatever it's called, Stackdriver from Google, and others, there's quite a few. We've already done one exporter for Zipkin as part of the Steeltoe project. Okay, so let me go into, again, the demo that illustrates that we'll drive this home. So I'm gonna use something from Pivotal called PCFmetrics to show log correlation. That's down there in the bottom left. And it's important to realize that the log messages that are just getting annotated with that information, if you're using some other log analysis tool instead of PCFmetrics, the same demo that I'm about to show you would work as well because it's all just basically flowing in through the logger gator and out the fire hose into whatever wants to collect it. And then I'll also show you the Zipkin server so the traces will actually get sent back into a Zipkin server and we'll take a look at that as well. Okay, so I'm gonna start a console app that will start hopefully hitting the endpoint and start generating some traces and so forth, some data on the app. And then I'm gonna go to PCFmetrics here. So this is PCFmetrics, if you've never seen it. There are actually two parts to PCFmetrics. There's a log analysis part of it and then there's a metrics analysis component. We're gonna first focus on the log analysis part, right here. So this is essentially a raw log dump of this set of applications and I'm gonna pick one of the, I'm gonna pick this log message from checkout. First of all, the thing that you wanna take a look at is notice this here, this is what I was talking about. Every one of the log messages coming out of these applications now contain the name of the component, the trace ID and the span ID and that's what's used to do log correlation and that's what, in this case, PCFmetrics or any other application could use. The way you make use of that within PCFmetrics is you find like the get checkout request and you click on this button to view the trace in the trace explorer. So I'm gonna go ahead and click that. And what I get here is now I get all of the log messages that are specific to this particular trace, this particular trace ID, okay? So every one of the log messages from the payments processor, the orders processor and the shopping cart are all shown here and they're all actually shown, I think it's in a newest to oldest is the way it's sorted right now. And then up at the very top, it actually shows you the timings for each one of those requests. So you see the time that it took for the checkout endpoint to finish was 58 milliseconds. As part of that entire 58 milliseconds, 35.5 of it was used for the process order and the 11.8 was used by the charge card. So log correlation, all done and we can see that we can do this across both Java and .NET applications because we're being consistent. Now the next thing that you can do is you can take this trace ID and you can go over to the Zipkin application which is now is been collecting the traces, right? Each one of the Java and .NET components are sending their traces. You can just enter this and it'll do a query and you get basically similar information but you get another level of detail and then another level of accuracy. So here we've got on the left, we've got the shopping cart service and within the shopping cart service are actually two spans that were created. It calls the order processor which in turn calls the payment processor. I can look at each individual span just by clicking on this and now I get more details about that particular request, how long it took and notice what I also get is I get this key value pair set of information. These are attributes that can be applied or added to the span as it's going through the application. It allows you to collect data and there's by default, Steeltoe and the Spring Cloud Sleuth will add their default set of attributes but you as a developer can augment or add whatever information you'd like. So you can capture application-specific information as part of that trace. And we've got a just-in-time debugger that just popped up. Okay, so that is distributed tracing. That's something new in 2.1 of Steeltoe. It already exists in the Java world as part of Spring Cloud Sleuth and we'll be releasing that as part of 2.1. Do we pick a date yet, Jason, for 2.1? Steeltoe? Huh? August. Okay, let's go into metrics. That's another new area as far as Steeltoe is concerned. Something new that we're adding as part of Steeltoe. So within... If you go back to the actuator endpoint, you will have noticed that the one of the endpoints is something called metrics. And what this basically does is it allows you to query the application and pull application metrics out of it. So response time by endpoint, heap usage, garbage collection times, and all that kind of information. I'll show you that here in a second. And this is available both in the Java applications as part of the Spring actuator endpoints as well as now in the Steeltoe implementation in 2.1. The endpoints are exposed via HTTP endpoints. System metrics are automatically collected for you. So things like I said, heap information, that sort of stuff. App metrics are also automatically collected and created for you on ingress and egress points within your application. So as requests come in in your ASP.NET Core application, for example, we'll increment counters, we'll capture timestamps, we'll, you know, all of that sort of stuff. We'll aggregate that information all within the Steeltoe metrics components. And then optionally, you can actually add your own metrics as well. So if there's some specific application specific data that you wanna capture over a period of time as your application is running, you can add those as well. And there's a nice tagging system as part of OpenCensus that you can make use of to actually tag your metrics. Then optionally, if you would rather, instead of having to hit the metrics HTTP endpoint and get the data, you can optionally cause the components to export the metrics to a backend system. And we have already implemented, I'll be demoing this in a second, where we'll actually export the application metrics to the metrics forwarder within PCF, within the Pivotal Cloud Foundry. We're gonna do other exporters as well. So you don't have to necessarily. And all that is, by the way, the metrics exporter, all it does is it captures the metrics information and puts it into the logger gator fire hose. So any metrics app, any metrics tool that you wanna make use of can actually use that information if it's able to read out of it. So let me quickly do a show you that. So I've been running that app, that console app that's been generating, hopefully requests. And now I'm gonna use the PCF metrics tool to surface some of the metrics. So right now I'm looking at the shopping cart, which is the Java application. And if I wanna add to up here at the top is essentially all of the various metrics that are being captured by the PCF metrics. And I can add additional metrics by clicking up here and going to add chart. And then for example, I can say, let's see, heap used. Now this is a Java application. So I can add the heap usage and now I get a graph of the heap usage for whatever the period of time I selected and I can take a look at that. I can go on and do garbage collection, all kinds of additional metrics that I can do. What's more interesting is let's go over to the .NET application. So we'll switch over to the orders processor. And as it's refreshing, I've already been in here as you can see. One of the things we see here is we see heap usage coming out of the application. We also see generation collections, GC gen one, GC gen zero. And we also see actually the response time for the process order endpoint plotted over time within. And this is all application metrics that are being captured by Steeltoe and being flown or being sent onto the fire hose for collection. And in this case, the PCF metrics is what's being used to demonstrate that. So you get basically, you get the same set of kind of metrics that you get out of a Java app. You begin to get out of a .NET app. They can be combined into one analysis tool for analysis and exploration. This is something that'll be, like I said, in 2.1. And we hope to have that soon. And real quickly, because I'm running out of time, I'm gonna show histricks as well. So what we have here in the case of histricks is if you're using histricks in order to make requests to back end services, you have to build a command pattern around the remote endpoint. And you make use of the histricks command class in order to implement that command. As part of that, you get metrics and monitoring that comes out of that particular command. And what I've done is in the order process, I'm using a histricks command to make calls to the payment process, okay? So we're gonna take a look at the histricks dashboard that's part of PCF. And I thought I had it up. Let me, so the way you get that is you go over to services, circuit breaker dashboard, click manage, whoops. And here is the circuit breaker dashboard. And so it looks like my console app has crashed. So it's not generating any. So now we're starting to see real time information coming in. The payment service command is the command, the histricks command that's making requests of the payment service. Everything's doing well. It's doing, it's done 14, whatever, 13 requests to set. I think this window is 20 seconds. So it's doing about 13, that command is processing about 13 requests a second. If I go back to the dashboard and do something crazy like shut down the payments processor, and if we go back to the histricks monitor, all of a sudden now we start to see failures, right? Because that, that history, that request to the payment service is failing. We start to see that the circuit itself has opened. We're using the circuit breaker pattern as part of this. We're getting about 12 short circuits now, a request. So over time, the circuit opens up or closed, opens up, and the requests no longer start going to the backend service. If I were to start up the payment service, we'd see this now automatically turned back into all green. Everything's looking good. What users are seeing, of course, on their side is they're just seeing order process successfully, but payment processing is pending. So you're able to basically fall back and take care of, log the order and move on. So, okay, I think I'm probably pretty much out of time. There's so much to cover. Feel free to come up and ask questions afterwards if you'd like, and I'll be around, of course, the rest of the day and tomorrow, okay? Thank you.