 All right, good afternoon, everybody. My name is Jeff Johnson. I'm a software engineer at Google. I work on the Cloud Foundry team with a lot of the folks who've been speaking in this room today. I'm going to talk to you today about Stackdriver and what it means for Cloud Foundry. We're going to show you how you can use Stackdriver to monitor alert on Cloud Foundry and use BigQuery to do some analytics with that data. So Stackdriver is the suite of monitoring, logging, and diagnostics tools for GCP and AWS. With Stackdriver, you get a lot of the things you expect. You get logs. You get the tools you expect, the dashboards. We support your favorite languages. But there's some awesome in there. The awesome are some things I'm going to show you, like streaming data live into BigQuery, debugging your production app, which has been shown today. So I won't go into that. And the fact that it's a Google scale product. It's an API you write to in the Cloud. You don't need to worry about scaling up that pipeline. And you don't need to worry about doing summations of data and all that sorts of stuff. It was released into GA, GCP Next 2016. So it's been out there for a bit. But how do we integrate it with Cloud Foundry? I think that's when we say we have a cool piece. We want to stick it together with this other cool thing we like. That's not really the right approach we want to take. Let's think about the motivation behind this. I think there's two personas to think about. You've got a developer. The developer has applications that they need to monitor. They want to be alerted on and dive into issues quickly. You've got the operator who's got a much broader set of issues who isn't really concerned necessarily with specific applications, but really concerned about the health of the system. So that operator needs a single pane of glass. And they need alerting as well. But they need to be able to just at a glance make sure things are OK. Stackdriver Tools is the Bosch release that we developed in conjunction with Pivotal. Stackdriver Tools is in the Cloud Foundry community and is available today. What we provide in Stackdriver Tools is first application monitoring. This is taking data logs and metrics from Cloud Foundry's logger gator and ingesting them into Stackdriver. The other piece that gives you is the infrastructure and virtual machine level monitoring. So there's two separate agents that Stackdriver provides. One is to collect logs, which is through FluentD. And the other is to collect metrics, things that the hypervisor can't really find on its own that you need active monitoring for. It was released into general availability in the spring of 2017. And the nozzle portion of it is available today on PivNet as a tile. So let's look at what all these three pieces sort of look like in practice. So this box here, of course, is a virtual machine. And that machine happens to be a Diego cell, runs some applications, that kind of thing. On that machine, we're going to co-locate two jobs. That's the Stackdriver agent, that's metrics, and Google FluentD, which is logging. Having those agents on the machine and giving them some credentials lets them write directly to Stackdriver. So each one streams directly to that endpoint. Now, since we have a Diego cell, we have application logs. Those application logs go from the application and are collected by the Metron agent running on that machine and go through the logger gator. Now, there are people potentially in this room who have written the logger gator, so I'm not going to go into depth about it. But I have represented it as this alligator who will go ahead and spit the logs out to our Stackdriver nozzle. So we have two instances in this case, but you shard all your logs across a pool of these nozzles. So now will be the tricky part. You have to excuse me. I was going to try to do this on my laptop, my normal laptop, but it is a Linux machine. And as soon as I plugged it in, it crashed. So we've made a quick change, and I'm going to figure out how to use this smack. So I have this Cloud Foundry application on the right, the left-hand side. It's not very interesting. It logs out to standard error, standard out. I'm going to pull up Stackdriver logging, and I'm going to start streaming logs from Stackdriver. So this Cloud Foundry deployment I have right now is three separate Cloud Foundries in a single GCP project that's sort of around the world. And I'll talk a little bit more about that in my next Spanner talk. But what we have here is a lot of data coming in, and these are logs right now just from the logger gator. So I'm going to go ahead and generate just a random error, which you can see some other application is doing that right now. Okay, there's our error. Stackdriver does structured logging. So what structured logging means is we're not just talking about lines in a log file. We have additional context about the log. So if we expand these labels, we can see all the context about the application that this log came from. So this came from my logger app. We've got application ID, all the stuff you'd expect. And it also has a JSON payload. Now, the log message doesn't have much more in the payload, but every Cloud Foundry metric, everything that comes through the logger gator, we retain all the fields so you can do structured queries. So I'm going to do a really simple structured query and just look at the events for my application. So now I have a stream of just things that have happened for my application. We can see here it was deployed earlier and we can stream just the same way we did with the data before. So this is awesome for sort of at a glance looking at digging into issues and looking at things that are happening now. Stackdriver does a 30-day retention. So if you're writing to Stackdriver in 30 days, your data starts to fall off. You can choose to export that data. And you export that data by creating a query, just like I did here. That query could just be everything. You create that query and export it to a source. Sync, sync. I've chosen BigQuery. So I'm going to look at BigQuery in my GCP project. So here's the table that represents all the data coming from Stackdriver. My query, like I said, is just give me all the logger gator events. Just to show how quickly this streams, let's go ahead and create a query to do a new search for a log message and see, hopefully, that it appears right as we create it. So let me just set this mic down for a sec. Can't do SQL and hold it in my hold. So that is there immediately. That goes from your application into Metron, into the logger gator, and just into Stackdriver and is available immediately in BigQuery. What's awesome about this is you can start to build up an analytics pipeline without having to necessarily worry about at each stage, how do I break this data down even more and more and more and start doing those averages and worry a little bit more about just what do you want to find out? So a simple, a little more complex query that I will definitely have to put the mic down for again is, okay, we've got all this information from the logger gator. One of those bits of information is from the go router. So the go router is at the front end of our cloud foundry and it's gonna receive every single event and log accordingly. So what we can do is look at the payload that it gives us and we can try to figure out, okay, who's hitting our cloud foundry? Let's get an assessment of the user agents that are coming at us. All right, now this is not a cloud foundry that is very public, but you can see we've been getting scanned by all sorts of bots. Here's our Stackdriver monitoring thing. Here's an RCE. We can see how frequently they are and it was pretty straightforward to build that query. It's just simple SQL. But what's incredible about it is that it is real time. So we did that query, it took 1.7 seconds, not too long and we are able to, we can refresh that and just get constantly up to date data. Logs are still going. So another thing Stackdriver logging allows us to do. We already have this query concept. We'll get any query based on arbitrary metadata, whatever we wanna see. We can take that query and start to create metrics without information. So logs and metrics are in a ways a bit different. Metrics are about counts, they're about values where logs are more often just an event, something has occurred, but we can use this query to create metrics from logs. So let's change this query instead of all the logs from my app. Let's look at all the errors from my app. So now that I've built up that metric, I'm gonna go ahead and just generate some more errors so something happens. Very interesting application as you can see. I'm gonna go into Stackdriver monitoring. So Stackdriver monitoring is the home of metrics. So it gives you access to the metrics that you've explicitly created. Those are custom metrics from Cloud Foundry. Those are log metrics like we just created and we'll try to check out in a minute here. And those are also GCP services. So you can see it's gone ahead and created a chart for me on my PubSub responded to messages. That's because I created a PubSub topic for Stackdriver for the nozzle and I have nothing to subscribe to it. So I should probably delete that after this talk. But let's go ahead and create a metric for what we just saw. Oh, let's try to do it through this Explorer, this will be interesting, maybe too interesting. All right, so that metric is not ready, but here's one I created earlier. So that's demo. No, Stackdriver metrics, we've got all of our loggergator information coming in. So we are taking those container metrics, we're taking counter metrics, all these things coming out of various pieces of Cloud Foundry and we're ingesting that. That means we can start to build out dashboards. So here's a dashboard I built for my Cloud Foundry. We've got the CPU percentage per our Diego cell. We've got a lot of Diego cells, but what Stackdriver allows you to do in this view is we can start to start doing averages, 99th percentiles, means, things like that. So I've gone ahead and done that with my GoRouter latency. I've got three GoRouters in this deployment, just one in each one. So I really just am worried about, well, what's the 95th percentile on that? And there's all sorts of ways to look at this, but it's pretty easy to slice and dice. These metrics here are generated by the Stackdriver agent. So I just picked two and tossed them in here, but there are certain things that a hypervisor can't really know about the operating system. And it needs to have, about the running operating system, it needs to have something there sitting alongside doing deeper asks to the kernel, what's going on, checking the state of things. So what you can do with this information is start to drill down and see what is the VM that is starting to run out of memory, those sorts of views. That's great. That's a view, but with this information, you often really want to be told when to look. So it's awesome if you have a huge screen on the wall that has this very attractive dashboard, but probably I need to be paged. Probably I need an email or something when things go wrong. And you're able to do that with alerting. So here's a policy I created, and this is just based on that Diego metric that I had in the last pane. And it says, if my disk space goes under 10 gigs, go ahead and send me an email. Nothing crazy. But one thing Stackdriver does that I really like is it has both incidents. It can create an incident that needs to be resolved or responded to. And it can also send a little help message alongside. So if you have sort of a culture of, we're not just gonna create alerts, those alerts are actionable alerts that say, this is a problem, here's some ways to get started. Maybe not, here's a solution. This one has provided a great solution. You can see that, which is to download a few more hard drives that is possible in a lot of ways in the cloud, but maybe not the most useful advice. The last thing I wanna show you here is the uptime checks feature. So I mentioned I have a multi-region cloud foundry, a global cloud foundry. What I'm able to do is define a simple URL endpoint, like you'd expect from uptime monitoring. And Stackdriver will check it out and periodically pool from it at the frequency I want. And give me metrics based on how, if it's healthy or not, and what sort of the latency is. You can just like anything create alerts, be notified, hook that up with what you need to hook it up to. So with that, I'd like to go to any questions anyone has, but mostly thank you all for checking this out. So feel free to stand up if you have any questions or just hang around. I'm gonna be in here for the next talk. Yeah, so for the VM itself, even without the agents, we do provide a set of metrics at each VM about various utilization. So I believe we can do CPU usage and a few others. But we also provide metrics for a lot of our services. So our load balancing services have some metrics and our things like Spanner, you could build a dashboard for those services. So this is definitely the place where if you have a GCP project, we'd expect to do our IaaS reporting in here. You can build reports. Stackdriver has a pretty robust API. I know because I wrote the nozzle that you can pull stuff out of. So if you wanna pull that into another system, it is available and able to sort of integrate with that stuff. So that is an excellent question. So the question around essentially, is there a way to stamp out dashboards? Let's say I have one, I even have an application that we have a set application dashboard and we wanna launch that with every PCF app that we push. But the overall health is just the same sort of need. I tell you, I hear it. I hear that ask. That's all I can say about it right now. But today that is not capable. That is an excellent question. The first, the easiest way to do that today would be to say to have a centralized account and then do reporting out from that. But that doesn't get you Stackdriver monitoring. That gives you Stackdriver logging. It's easy to create those queries and say, okay, for this logging application, I'm gonna dump this into a specific bucket or a specific big query that I give these folks permissions to, having that centralized spot. I don't know how to do that with monitoring, but I'd be happy to follow up about that. As opposed to having the monitoring console. Are you doing that for logs? Okay, cool. Yeah, what's up? So under the hood for the host monitor agent, we're using CollectD. So CollectD does allow you to include some plugins for a lot of popular open source projects, things like Nginx. If you are hosting something like that, you can expand that configuration. What we're doing with this is the base set and we are also doing what Cloud Foundry is logging. So we aren't really doing instrumentation beyond that. That's right. I don't know enough about Stackdriver Trace to tell you if that's what that is. But I don't believe we have a full APM product. Yeah, so I think the way today to handle that, they're all gonna come to a central account or whatever account you specify. The way today to do that would be to create those exports and permission based on the sync that you're sending that data to. So you could flow into BigQuery or PubSub and with your data going to those specific syncs, you can do your permissioning at that level. But there isn't really a way to say, okay, give this IM user this sort of slice of log view. That does not exist today. Well, if anybody has any other questions, I'm gonna be hanging around. I'll be here all week. So thank you all for joining me. Question? Thank you. That's awesome.