 Hello, everyone. Thank you so much for coming. My name's Jenny Morris, and this is Stephen Brown. We are Solutions Architect at Elastic, and today we will be talking about gaining insights into your microservices with Elastic APM. At the end of the day, it all comes down to business outcomes. As developers, we want to be able to deploy our application into a cloud environment where scaling auto recovery logging, monitoring, APM service binding, zero downtime deployment have all been taken care of for us. And we know that Cloud Native is the way to go. So we adopted DevOps that brings DevOps together. We leveraged the Cloud Foundry platform and CI-CD tools to continuously deliver our applications across multiple environments where each environment and every environment is close to identical as possible. We use microservices because they can be separately deployed, managed, evolved, and scaled. So we decided to decompose our monolithic e-commerce application into multiple microservices and deploy them into Cloud Foundry. We use some services on the market. We adopted some frameworks. We like Cloud Foundry, manage all the infrastructure, scaling, upgrading, patching. Things were awesome. It was a combination of several weeks' effort across multiple functional areas. All the unit testing and integration testing and everything was green. We were able to do blue-green deployment of our new e-commerce application to Cloud Foundry. Zero downtime, right? And we were just super excited. We felt like we were in the happiest of place on Earth, better than Disneyland. All was good for a few hours. And then we got some calls from our help desk saying that some customers experienced that this was loading some pages. So in the old days, we all know that we are going to open 20 windows and everybody's in the war room and trying to figure out what's wrong. We've all been there done that. So of course, with Cloud Foundry, things have been streamlined. Now we just have our login tool, metrics tool, and APM tool. So we just deal with three or four different places instead of 20. Well, some time later, some pizza's been consumed and the issue's been resolved. But the problem is that metrics, logging tools, and APM, they serve different purposes. And there's some overlapping. But without that consolidated view, you're wasting your time switching tools or you're trying to just stitching and aligning the visuals to investigate the issues. So observability is really important, which Stephen is going to talk about in a lot more details later. So for now, I want to just talk about this really great news, which is Elastic APM is the first major open source APM solution. We wrote this out about 14 months ago. And now we are bringing it into Cloud Foundry community. So it is free. You can go back to your cube and download it and just start instrument your applications that are running on Cloud Foundry. All right, let's do a quick demo here. So we are going to use this car web application to demo APM. We have three components, React front end and two Spring Boot microservices in the back end. The APM agents are going to collect all the data and store into Elastic Search. And we can use APM dashboard that lives on Kibana to track and trace each request and page and entire session. So for my back end Spring Boot application, this is the manifest file I used. Standard Java build pack that has Elastic APM agent in it. And the service here is pointing to my APM server. Same thing with the other microservice in the back end. For front end, we're still working on the node build pack. So for now, I'm just going to use the Elastic agent JavaScript agent and initiate it. The server URL is pointing to my APM server. And here the distributed tracing origins. So this basically tells the agent to add distributed tracing HTTP header to requests that made to my back end services. So this allows Elastic APM to tie the front end performance information with the back end. So we can have a distributed tracing view. So I pushed those were my three microservices. I pushed them into Cloud Foundry. In this case, we're going to use PCF. This is the apps manager. I go to my front end here. I see a car list. I'm going to add a new car. Now I'm coming back to my APM dashboard that lives in Kibana. You see a new transaction just came in. So this refreshes every five seconds. Now I see a new transaction here. It took 500 milliseconds, this one. And that one was somewhere from, I guess, five minutes ago. If I have a lot of transaction, I'm going to see a distribution here. And if I scroll down, I can see the distributed tracing. Where is the time spent in each microservice? You can see a clear view, a waterfall view from the browser all the way to the back end. So by default, it records all the API calls as well as the database calls. Because those are important significant events. But you can easily configure instrumentation options. For example, by setting the tracing methods to be at the package level, or class level, or methods level. You can do that by setting those in application properties files, or Java system properties, or just environment variable with CF setting environment. So this, if you were to instrument every method in your application, you're going to get a lot more details. Which is OK if you're in a dev testing environment. You just want to see what's going on from end to end. But in production, you may add some overhead. Unless you just have a new functionality, you just want to see how your new function behaves. Production, for a while, you can turn on the methods just involved for that new function. Or in our case, we got complaints from customers about our car web application. They're saying when they add a luxury car, it just takes a long time. So now we got a troubleshoot. So of course, I can turn on the tracing method to trace instrumented specific method, which is no code change instrumentation. All I needed to do is just the configuration. But it might be better if I just create a customer instrumentation by defining a custom span here. So I basically says, OK, I'm going to create a custom span. It's called calculate estimate. And I'm also adding a bunch of contextual information here, like a brand made here. This is really going to help me debug on top of the performance information. So I'm going to save. I will compile and push. So notice that in my code, I only put one line of code. I just wrap this line of code in my span. You have three lines of code or 10 lines. So you can have multiple spans throughout the transaction if you want. And also we talked about the trace method that was the configuration option. There are a dozen other options. And you can define a sampling. You can add annotation here. So you really have full control of how you want to instrument your application. And we can see here earlier, we talked about Java BuildPack. That's standard Java BuildPack. And elastic APM agent is loaded here. So while we're waiting for that application to start, I want to show you it's already started. OK, so we are going to add a expensive car. Luxury. It does take a little bit longer, huh? I go to my APM dashboard. It's coming in. It took over 2.5 seconds here. All right? Click on that. Scroll down. I see my custom span that I just put in there. See, it's called calculate estimate. And I have my stack trace. If I go to my tags, I have all the contactory information that I put in there. And I can also search for this tag in the search bar, brand, and I found this transaction. As a matter of fact, this elastic search, I can do full text searches, say Ferrari. And I found this transaction as well. So that was all good. But in a lot of mission-critical use cases in production, you really don't have the time to drill down to your APM tool and mess with all these and that and trying to figure out the problem. Wouldn't be nice if we can proactively detect the response time. If there's some abnormal behaviors, we just flag it. Wouldn't that be nice? And we can easily do that here, integrate with machine learning anomaly detection. I can create a new job. Then I go to machine learning tab, which is another application that lives in Kibana. So this is my newly created machine learning job. It detects high means of response time for my car, friend, and application. And if it detects anything that's abnormal, it's going to have it flagged. And integrate with my learning system, such as Slack and your pager or kind of service now and generate a ticket. Since we just created this, we don't have any data here, but I just want to show you what it would look like if you have it run for a while. You see, I see some anomaly here, right? And you see the severity. Obviously, this one red dot is the most severe. And you can see the color. It kind of tells you how severe they are. You scroll down, look at the severity score. It ranges from zero to 100. This one is 90. It's pretty severe. Typically it's going to take two millisecond, and this one took like 4.8 millisecond. One more thing. So I found this one. Machine learning is telling me, okay, I got paged. I can easily tie it back to APM dashboard. I see all my transactions here. And now I see my distributor tracing. So that was a quick APM demo. And now Stephen is going to talk about how to correlate with Cloud Foundry, locks, and metrics, and so you can have a consolidated view of all your operation. She got to talk about all the cool stuff. Now I get to talk a little bit on some other topics here. So APM, so again, first open major open source. I'll come back to that. Observability, how many here? It's kind of the new buzzword, heard observability. Yep, devs, ops, both. I'm both, that's my, okay, awesome. So this is really important to you. We've been doing this for a long time, one way or another, right? It's that aggregation of metrics tracing APM, but it's really about the ability to search very easily when something's happening, correlate data, do this dashboard report, and we're all doing it today. And I would tell you, we've come a long way. I quite literally, and Saul and I and others, it have worked together, tail minus F on 20 windows, right? Tail dash, F pipe, log, pipe crap, right? We lived in that world for, and you know what, that was better than what we had before that, right? Before we even had great logs. So what is effective observability about? It's really about the two things that my CIO and my customer always wanted to know. Am I up, and am I responding, right? So that lineup there about the fewer higher metrics, those are, there's really, we have this idea of measure everything, we're collecting so much, but in the end, a lot of us are suffering from alert fatigue, window fatigue, tail fatigue, ticket fatigue, all that. You really want to be able to boil these up, right? And it's really those two things. And, but underneath it, we all know that there's maybe hundreds of services, hundreds of infrastructure, network, everything, and how do we get to those two simple answers? Or how do we get to what happened and why it happened and how long it's going to take to fix it from that top down, right? So the key is correlated data. The second key is really automated, right? So I love the, you know, measure anything, measure everything, but only surface what's important. Right now we're kind of in that phase where we're measuring everything and so much is coming up. So you want to only surface what's important and hopefully automated and she, Jenny just showed a little bit of how you can do that at the APM. You can obviously do that with metrics, et cetera. And the key is the really the ability to seamlessly traverse kind of from that front end, that uptime. So people have uptime tools. Most of them, it's like uptrends or BSN or stuff. It doesn't integrate with anything. It integrates with alerting. It doesn't drill right down into the logs or the metrics of the APM tool. Imagine if it actually did that. Most people look at the uptime tool and then what's the first thing they do is they go to another tool. So that ability to traverse from up down and left right with highly correlated data, that's how we're going to make all of our jobs easier. So, and we've come a long way, right? You know, when I joined my last company, oh, by the way, I ran a team of Cloud Foundry developers for about four years. So this is real life for me. This was real life for me. And I would get a call from or text even or an alert and we'd all jump on sometimes some folks that are here that I used to work with. And, you know, even with all the TDD, CI-CD and the beautiful things that we're doing that took us from releases once or twice a year to releases several times a week or even several times a day, I mean, fantastic transformation, still got that call at 2 a.m. And we'd all get on the call, the devs would get on the call, not the ops people would get on the call. And we only had about four tools now to work in. And contrary to popular belief, Slack is not a data integration tool, right? Can you slack me the app name, right? Can you slack me, what's that IP address? Oh, I can't find it. Oh, shoot, I gave you the one in staging, not production, right? That's what we do a lot today. That's okay. It's way better than it used to be. But imagine we could put it in a single tool. And so today, what I see and what I ran and we ran it pretty well is something that looks like this. You have an uptime tool, you might be using, a lot of folks don't use real user monitoring. They think Google Trends is real user monitoring. It's not, gets confused with synthetic monitoring sometimes. Or they just don't use it because there's maybe cost prohibit, right? Because those are often agent based licensing model. And then we have, there's lots of good APM tools out there, AppDynamics, New Relic, you name them. They're good, they're solid tools. There's good logging tools. And then often there's a different set of tools under the covers for the metrics. And we lived pretty well with that, but it was still difficult. So where the industry is heading, not just elastic, is bringing all this together. So imagine now your tool suite look like this. All your data is in a single tool. And from that continuum from the uptime, all the way to the lowest infrastructure metric is in one data store with highly correlated data. And imagine a slider or something being able to go up and down between that data or left and right across those services. That's what's gonna make your job easy. That's what, you know, in the end, it's when something goes down, first two questions are what? How long and what happened, right? And what comes after how long and what happened assuming you get it back are what the other two questions. Yeah, well impact, why? And then the real question, the real why, not the kind of why, you know, you have to drill down and get real, is how are we never gonna let that happen again, right? So our jobs is to get through those five why's and pretty quick. And, you know, if you're all in the same tool and you have all that there, you're gonna be able to do that a lot faster. Now it's kind of interesting. I sometimes like, whoa, we're all the data's in one place. It's highly correlated. But my devs kind of wanna see their world still and my ops kind of wanna see their world. Can I segment it out? Yes, we can. I'll talk about that a little bit too. You know, you can still have role-based. You can have entire different spaces. In fact, I would take this up. You can even put business data in here. You know, like what's e-commerce? People wanna know card abandonment rate, et cetera. That can go in here. And I could have a business view but it's still highly correlated. Imagine I have card abandonment because I had an uptime problem because one of my services was down because the database is struggling because it's overloaded in the CPU because somebody added a query that didn't have an index. Not that that was a real problem or anything, right? But that, and I'd probably have to be in four different tools to find that, right? So PCF, this is just, for instance, what it could look like, right? And we're working forward. I'll talk a little bit about what's coming in the future. This is what we've built today. We're having, I'm gonna show you a little example of this today. You know, using some of the different parts the space drain, anybody familiar with space drain? Yeah, it's kind of one of the new ways to get logs out of Cloud Foundry at the space level. So a little more parallelism there than out of the fire hose. Okay, quick, I'm not gonna spend a lot of times because we're dev, I think I do show one piece of code. The elastic common schema is the ability to do the correlate data. You guys can look at these slides a little bit later but once you correlate data and once you apply it to kind of a common schema, if you build a visualization for some metrics and then they apply to the schema, you build something else like it's a Docker container or it's a Cloud Foundry container or it's one of the arenas or, you know, you can reuse the visualizations, the ML jobs, et cetera. And then as we go forward, as we roll out, so everything elastic is rolling out from here on out applies this, anything new will plug in very easily. Now, it's a convention, it's not mandatory but the benefits are pretty startling and it's not. As a developer, you should never, you'll never see this. It's set for maybe, if you get really good and start to kind of even have a common application log format, which is a dream as an engineering manager that I've had for 20 years and I still haven't quite got there but that's what that elastic common schema is about. Come on to space us, let us organize this. So you can still have an Uber that could see anything or you can have a dev space where it's kind of the visualizations, the queries that the devs are interested in and maybe the ops are interested in their own space. So this is what I have set up today. I'll talk about, you know, I wanna get this code out so people can play with it and give us some feedback. Basically, I am the operator of the PCF instance that you see here and the developer. So we have a full reference architecture set up. We have a number of Spring apps, a number of some Angular front-ends, some React front-ends. All those little icons that don't make sense are really a lot of elastic icons. We're doing uptime metrics, even monitoring the MySQL database from both an error logs perspective, command rates perspective, all that kind of stuff. It's all pumping in through LogStash and Elastic into a couple Kibana dashboards. So, oh, there's your code. So that's how you install a space string and it's pretty simple and it just pumps out logs out of the space. I would have done that live but we didn't really have time today. So actually I am gonna go straight to the ops folks first. So this is just, this is kind of a proof of concept of what it could look like. So I haven't really seen this before. So the whole industry is trying, APM vendors are trying to become logging and metrics vendors, metrics vendors, we're all converging on this, not just Elastic, it's the idea. But I have yet to really see a dashboard like this, which is this top, it's a little hard to see, I apologize, that's for uptime response time and uptime below it is, this is across all the apps I have running, the Go Router response code. So basically your web logs, right? And then blow that is kind of just the stream of bytes because those are the things that ops folks look for, they wanna see if there was a spike or a low. And we can see where one of the apps had trouble, some extra reds and then one of the apps completely dropped off. This is response time per app and then down into APM. So if you think about like that stack, I always just talk about tracing, uptime, web logs, application performance, here's your CPU by container, memory by container, and then I'm moving down into the MySQL layer and the MySQL commands and look at that, MySQL had an issue there, right? And then down into actually the host metrics and of course if I went even to another screen, we can get down to the nitty gritty for the host. So that's kind of an operator view, right? But most, a lot of people here are devs. All right, so through spaces, I created a separate dev space and a whole different dashboard. And this might be a dev that's in a particular area in his own app. So right at the top, he's really worried about the response time for each one of his microservices. And now I've broken out the response code by service, right? And now you can look at a little bit more detail, what that looks like and then I could go into APM, Jenny already showed you that. But then classic kind of log things too, as we're always interested in our rest calls, right? So here's simple space, app, verb, response, bytes, and I can filter on those if I'd like, I can either do it manually or I can just turn on some filter that I already had that was looking at that car value estimator and now it's filtered out. So really pretty easy way to look at logs. But of course, what we really wanna see is live streaming of logs and be able to search on this. So this is our, this just, it was beta, just GA, but this is live streaming of logs. And I can set up different indices that I want for the spaces to, like maybe I only want logs streaming from my org and my spaces and Saul wants them from his org and his spaces. This is live streaming and you can search on it in real time and then we'll be having highlighting and things like that coming soon. And let's see, there was one other thing I wanted to show. Oh, yeah, and we're stepping into a kind of a service oriented uptime. This is a really nice today, like to monitor the uptime of your endpoints, your actual endpoints. It's not synthetic monitoring like you would with a full scripted uptrends, but up down calculating SLAs and things like that on just your endpoints is a good tool. So that is a whole lot, excuse me. So let's get back to closing this out a little bit. So, how we doing on time, good. All right, so Jenny got to deliver the most important new first major open source, APM non-agent based licensing, right? You have one agent, you have a thousand agents, doesn't matter to us. The other thing, what that does is also allow you the ability to move APM up your CI CD pipeline. So imagine you're running little APM jobs while you're delivering your services and keeping track if you're slowly incrementing up the time on each one of those services, right? You can run tests, but it'd be a lot easier just to do it with an agent, right? And the second, at the end, I think this idea of correlating the logs, metrics and APM, probably the other really big takeaway from that and these things together are going to help you deliver your code and your developer experience and your customer experience, right? Putting all that together. If we're doing our jobs, our customers would never know that we're doing all of this, right? So looking forward, from a commitment perspective, Elastic is committing to partner up and we're actually, this is both, we're telling and by the way, if anybody is interested we're hiring a full-time engineer to move these projects forward with Cloud Foundry. So we're looking for a Cloud Foundry engineer. I'm gonna post the link at the end, it'll be in the slides. So we really, and they're committed and I'm committed and the reason I'm committed, I've done a lot of this work but I literally am one of the committers online too. We deliver the first Java APM build pack, right? And then what's gonna come next is you heard Ben talk about the Cloud native build packs, they're coming soon and so we're not gonna do a whole bunch in the older style, but we're gonna get Java and then we're gonna be working down the go, node, net, et cetera, truly shows. What I like to do in the next, I shouldn't promise a date, is all that work that I've done, I'm gonna map it to ECS and I'm gonna put that out as a community repo so you'd be able to basically stand up and build all these things I just showed you, open source, right, open source, I should get back to that in one minute. But anyways, all of it, I'm gonna community out that, put that in a community repo so you all, anybody who wants to, can download it and even make it better if you'd like. Actually that was one thing I wanted to, when I mentioned the APM. Everything that we showed you today was open source or what's known as elastic basic license, which is free, with one exception and that's the machine learning anomaly detection, that's a piece of commercial IP with elastic, everything else is open source, basic. Take it, download it, get it running. And I think the final thing here, that I just wanted to point out the resources, so go ahead and look on elastic, it says cloud foundry and engineer, if you're interested, put a couple links to APM observability, common schema and we have the code that Jenny demoed in the repos, so it's really easy to pull down and basically repeat or replicate what Jenny did. So that was a whole lot in 35 minutes. Questions, sure, so great question. What's the relationship between Kibana Spaces and XPAC? So rolling back, great question, because we often get a lot of questions about that. So XPAC was Elastic's name for commercial features up until about a year ago. We actually, you'll still see it in the documentation, it's how we denote code that's under Elastic license versus code that's under the Apache 2.0, we just call them commercial features now. But so XPAC commercial features one and the same. So Kibana is a basic license feature that you can start creating spaces. If you and that little, if you want to start fine-grained role-based control, that falls into a commercial feature. So you can use spaces today to get organized, say you wanna use a space and do field level control over a search, that's actually gonna be a commercial feature, but the spaces are there for basic, hopefully that answered your question. Okay, yes, yeah, so Elastic has a rest endpoint. Log stash, there's a long history, kind of a log stash, it's a lightweight streaming ETL tool, however you'd like to get data into Elastic, as long as it speaks for us, you can use that. So in absolutely in different environments, we would never say, oh, you'll pump it through Log Stash just to get Log Stash in there, no. So there's a concept of beats, which are these edge shippers, they can ship directly to Elastic, sometimes they ship through Log Stash, some folks like Kafka, right? Kafka in the middle, because maybe they have tens of thousands of endpoints and they wanna have a buffer in there, so yeah, you can use what you'd like. If you wanted to do the Elastic common schema, you'd have to do a little bit of the mapping there. Does that help? Okay, not yet. And we're gonna look at what level, integration's probably gonna start with kind of at the service broker level and work our way down. That would be my suspicion. How big? So for now, you can do the user provided service and point it to your APM server and that's how we built in the demo. Yeah, but we're working on the service broker and the tiles that Bosch managed. Were you asking how big the services or how big the Elastic search instances were? Yeah, like how big do you like? Yeah, so there's a whole sizing architecture that you can go through me, my favorite, set up a, if you're one known person and you don't mind it crash, like somebody steps on it or stops your VM and you're not paying attention. If you're okay with that, run one node, otherwise run three nodes. Three small nodes, if you run three eight gigabyte nodes, three 16 gigabyte nodes, you'll have plenty to get started. It doesn't have to be massive. I ran this entire demo on one 16 gigabyte node, Elastic search and it's hurting a little bit because I'm crushing it with a lot of data. You need it all running locally. So it was interesting. What you saw there without seeing is Jenny used cloud and I used a local Elastic search. So here, let me back up one micrometer support. I would have to take that offline. I'm not even familiar with exactly what that is. I'm sorry. Is that a different APM tool? Why don't we take that offline and we can take a look at that. Yes, so I'm gonna, so yes, in fact, let me show you. Oh, it's the mouse. Can you see this? So this is what a real ML job looks like. It's gonna learn your seasonality. It's gonna learn the periodicity. Typically takes about three times a period. So it'll learn a day in three days. It'll learn a week in three weeks. It'll learn a month in three months. And so this is more, you know, we kind of showed you one that, but so absolutely, is that what you're talking about? The aggregates? Oh yeah, so the automated anomaly came, joined forces from a company on pre-alert that's been doing this for a decade. They're actually, so the ML jobs are very mature for this. Very mature. Yep. And I think that means we need to leave. Thank you for spending your time with us. Appreciate it.