 Hello everyone. Thank you very much to everyone that has remained from the last talk. I apologise. I'm here again. You'll be really glad to know that I'm not on the next time slot here. You get a break from me, but then I am on the last time slot. So it's kind of like alternating me. So thank you for the people that have remained. I appreciate your patience. It's also going to be quite amusing to see if I can remain awake because I'm quite badly jet lagged. So the last talk might be a little bit dodgy. We will give it a go. So welcome to our talk on logging and metrics. My name, as you know from the last talk, is Colin Humphries. I'm the CEO of Cloud Credo. Presenting to my left is Ed King, who also works for Cloud Credo. We are a Cloud Foundry and Bosch consultancy based in London. So very quick sales pitch. If you want help with Cloud Foundry, please do get in contact. Come and talk to us. We love working with organisations that are trying to do interesting things with Cloud Foundry. So I'm going to talk very briefly about why logging and monitoring are so important. For the first time I used Cloud Foundry, I thought it was fantastic. It was absolutely amazing. I had an application. I called CF Push my application. Staging happened and it ran. And I thought, this is going to change my career. This is going to change IT. This is going to change how we deliver value. What is a better place now I can CF Push applications? And it was until my application broke. And then I had absolutely no idea what was going on. Because the black box of the Paz made it all so opaque. I trusted the Paz to run my app. And if it broke, what was going on? Particularly if staging failed. The user journey used to be, I have absolutely no idea why this staging has gone wrong. Quite often you've got no logs at all. You just had staging has failed. So things have improved in Cloud Foundry. But I want to know how do I get access to logs, to metrics and to monitoring. And Ed is going to take us through the components in Cloud Foundry and how they allow us to do that. OK. Thank you Colin. So yeah, I'd just like to start by covering some of the main components involved in the Cloud Foundry Logging and Metric system. And as with most things in Cloud Foundry, the system is constantly being updated and improved. And especially recently there have been some fairly big changes to the way that the Logging and Metric system works. And so I think it'd just be a good idea just to start by taking a look at the current state of the system, what components are there, and how they work as well. So the first component I like to talk about is Logglegator. And Logglegator is really at the core of the Cloud Foundry Logging and Metric system. It's currently comprised of a few smaller components, namely the sources, metron, doctra, and the traffic controller. And I'll talk a little bit more about those in just a second. But just to sort of give you a general overview of the Logglegator system as a whole, this is the component that allows developers to stream their application logs in real time down to the CLI. And that can be achieved by running the CF Logs My App command from the CLI. It also allows developers to dump a recent subset of their logs, and it can also provide functionality for draining those logs off to third-party syslog drains. And how this all works, the Doppler component basically sits there and gathers all of the Logging and Metric data from the platform. And it stores this Logging and Metric data in temporary buffers on the Doppler servers. But it's really, really important to realise that the Doppler servers don't retain that data on a long-term basis. And actually Cloud Foundry doesn't really ship with a component to provide long-term storage and indexing and parsing of the log messages. And so we'll be taking a look at a project that we can use to actually provide that functionality in just a few minutes' time. But just be aware that once those buffers fill up, your logs are kind of dropped off the end and lost. And the other component I briefly mentioned was the traffic controller. So this is the component that actually accepts the incoming requests for our logging and Metric's data. And it can then forward those requests onto the Doppler servers in the back. So this diagram here, I've shamelessly ripped this diagram straight from the GitHub page for the Logregator. But it gives a pretty good overview of the Logregator system and some of the components involved. So just to briefly run through some of these then, on the left-hand side we have the sources. And the sources are the components that are actually generating the logging and Metric's data. So an example of the source would be the DEA logging agent, for example. And the sources generate all of this logging and Metric's data and forward it onto a component known as Metron. And Metron is a small go library that basically sits there and gathers all of that incoming data and is responsible for forwarding it onto the relevant Doppler servers. And so, as such, the Metron agents get co-located across every VM in your Cloud Foundry deployment. And typically the sources will log to the local Metron agent and Metron will forward that off to the Doppler servers. And then, as I said, once the data comes into the Doppler servers stored in those temporary buffers and we have functionality for shipping those logs off to syslog drains or the Logregator traffic controller can accept incoming requests and forward those on to the Doppler servers. So that's all great, but really the most awesome feature of Logregator, or in my opinion the most awesome feature, is a relatively new feature called the Firehose. And I guess the definition of the Firehose is a stream of every application's logs plus Metric's data from Cloud Foundry's components. So application logs should be fairly self-explanatory, but the Metric's data is slightly more interesting. So, to give you an example of the Metric's data, the Cloud Foundry router is constantly emitting Metric's events for every single HTTP request coming in and out of the platform. So, for example, it will emit Metric's events detailing response times or status codes for each of the requests. And all of these events get gathered up and forwarded down the Firehose. Obviously, by default, every single application log, as well as every piece of Metric's data, is quite a lot of data. And so because of that, this notion or this concept of nozzles has been introduced. And a nozzle is basically a pluggable component that can attach to the Firehose, pull down just a subset of the data that you might be interested in, do some processing on that data, and forward it off to some third party, for example, Syslog or Graphite. So I believe there's a Firehose to Syslog nozzle, which I think is available now in the Cloud Foundry community GitHub page, which, as the name suggests, connects to the Firehose, pulls down some logs, and forwards them off into Syslog. And that sort of concludes just talking about the main components involved in the Cloud Foundry system. And I think back over to Colin to introduce logs. So logs. The reason I wanted to talk a little bit and interject into Ed's talk is because the first time I really needed to get logs out of Cloud Foundry. It was a Cloud Foundry version 1 installation we were working on with a client. We'd given them this setup and it was working pretty well, and then something went wrong with their application. And they said, how do we see the logs from this? It's all gone wrong. I need to see some logs. I've got a big problem here. How do we get to the logs? And I said, okay, what we're going to do, you're going to subscribe to the NATS message bus. You're going to filter the messages for something that looks kind of log-shaped when you think things are going on. And then we're going to try and work out what's happening from there onwards. And it was an atrocious journey. It really was. This for me is one of the biggest pain points, or has been a big pain point in Cloud Foundry. And now it's got a lot better. So over to you, Ed. How do we now get to these logs? How do we now get some insight into what's happening when our application isn't working right? Sure. So I'm going to talk about a project called Log Search. And the Log Search project allows us to essentially integrate the Elk Stack with our Cloud Foundry deployment. So one of the things that Log Search does is to package up the components of the Elk Stack as a Bosch release. For those of you not familiar with the Elk Stack, it stands for Elastic Search, Log Stash and Kibana, where Elastic Search provides the back-end storage and indexing of your logs. Log Stash provides the filtering and parsing of your log messages, and Kibana provides the front-end web interface. And so Log Search takes all of those components and neatly packages them up and makes them available to us via Bosch. And the good news about Log Search is that it's completely open source and free. You can go and download it right now. It's available on GitHub. And I'd like to just say a quick thanks to David Lang and to all of the contributors and maintainers of Log Search, because it's really awesome. And I'm just going to spend the next few minutes talking about what a Log Search deployment looks like, what the components are involved, and also how we can integrate it with our Cloud Foundry deployment to provide that long-term storage of our log messages. So on that note, very excited to announce a new project, the Log Search for Cloud Foundry project. And this project is really focused towards customising Log Search to work with Cloud Foundry data. And it does this by allowing Log Search to accept logs from our Cloud Foundry deployments from two main input streams. So the first stream is from the Cloud Foundry component syslog messages. So every component in Cloud Foundry emits syslog messages. And we can tell Cloud Foundry to forward those messages off into our Log Search deployment. That's the first way. And the second way is that we can also tell Log Search to talk to the firehose and to pull down log messages from there as well. And the project is really focused towards two main user groups. So there's the Cloud Foundry developer who's going to be mostly interested in getting application logs, or their own application logs. And there's also the Cloud Foundry operator who's going to want to see all of the system logs for the system as a whole. And Log Search for Cloud Foundry provides some really nice multi-tenancy options to ensure that users can only access the logs that they are actually responsible for. And it's pretty cool how it does that. So there's essentially a proxy that sits in front of the system that goes off and talks to the UAA server, determines which spaces the user has access to, and then it uses that along with elastic search aliases to basically filter down the amount of logs that the users can see. And so we end up where users can only see the logs that they actually have access to. If you want a little bit more detail about how that's all set up, there's a couple of links there to some YouTube videos where David goes into more detail about that. So this image here, this is the Kibana web interface. As I'm sure you'll agree, it's very, very pretty, like it looks great. What we're actually seeing here is a dashboard showing three demo applications logging through the Log Search system. One of the nice things to point out here is that we've got the actual application names coming up in the dashboard, as opposed to just the UUIDs. So it's a very user-friendly interface, great to use, and it just generally looks pretty good. This slightly less impressive-looking diagram is one I drew myself, but it helps to sort of give an overview of all of the components that are involved in a typical Log Search deployment. And it sort of shows that the journey that a log message would take, starting with Cloud Foundry on the left and going through the Log Search system to end up in Elasticsearch at the back. So I'm just going to run through each of these components to sort of give you a quick overview of what they do. The first component is the ingester, and the ingesters are really responsible for accepting incoming logs into the Log Search system. And Log Search ships with a couple of default ingesters, namely syslog with TLS and a rope ingester as well. And if you also happen to be deploying the Log Search for Cloud Foundry boss release, we get an additional ingester, and that is the component that could go and pull down logs from the firehose. So the ingesters are basically the entry point to Log Search for your log messages. Once the logs have been ingested, they are then forwarded onto the queue. And the queue component is currently provided by Redis. And this is actually a really, really nice addition to the standard Elk stack. It provides us with a couple of benefits. So the first benefit is that it helps to keep the system stable if you were to experience a sudden increase in the volume of logs coming through your system. So Redis provides a nice temporary buffer, and it gives you a little bit of time to just go and scale the relevant components so that you're able to keep up with the demand of the increase in logging traffic. One of the other things it does is help to prevent against message loss in some certain scenarios. So, for example, if you were to lose your Elastic Search backend for whatever reason, again, Redis gives you that temporary buffer and gives you a little bit of time to go and figure out what's wrong before you start losing your logging messages. So it just generally helps to keep the whole thing a lot more stable. And that just ships with a standard log search, Bosch release, which is great. Once the messages have made it through the queue, they are then forwarded onto the parsers. And this is where the actual filtering and parsing of the log messages actually occurs. So as I'm sure you're aware, every single log message under the sun is going to be in a slightly different format. And the parsers are really attempting to take that mishmash of logging data and turn it into something that we actually want to use and to store. So the parsers are running LogStash in order to do this. And Log Search ships with a few default filters to do some standard filtering and parsing of log messages, such as cleaning up the white space, for example. And it also enables you to write your own filters as well. And Log Search provides some nice tooling around helping you to write those filters and get them included in your log search deployment. Once the logs have been parsed, they are then finally forwarded onto Elastic Search where they can be stored and indexed and they can remain there for as long as you need them to be. And the final component then is the Kibana web interface, which we saw earlier. And as I said, this provides the front-end web interface and the nice dashboards as well. And it's probably worth me also mentioning that alongside Kibana, Log Search exposes a read-only Elastic Search API as well. And this is great for integrating with other third parties. So, for example, the CFCLI, we could write a plug-in to go and grab logs out of the system using that endpoint. And just to show this again, then, that's sort of the system as a whole. So we export our logs from Cloud Foundry, they get ingested, pass through the queue, parsed and formatted, and then end up in Elastic Search. So that's all great. By now, you must surely be wondering how do I actually do this? How do I get this all set up? And the good news is that it's actually not too difficult, assuming you've got a little bit of Bosch knowledge. And I know that might be quite a lot to ask, but it's really, really worth investing some time in becoming more familiar with Bosch because it helps with deploying a complicated stack such as the Elk stack. And it helps to make the deployment and management of that much, much easier. So just at sort of a very high level what you'd need to do then, the first step is you'd need to upload the Bosch releases. So the Bosch release contains the actual source packages that are going to be running as part of this deployment. And there's currently two of them. There's the standard LogSearch Bosch release, which contains the Elk stack. And then there's the new additional LogSearch for Cloud Foundry Bosch release, which contains that additional ingester that can talk to the firehose. And this is simply just a case of Bosch upload release from the command line. So fairly simple so far. The next step is that we need to configure a few properties within the deployment manifests. So because we are running a Cloud Foundry deployment and a LogSearch deployment as well, we're going to end up with two separate deployment manifests. And if you're not familiar with the deployment manifest, this is basically the file that details what your deployment actually looks like. So, for example, in the LogSearch manifest, you could say, I want to have 10 elastic search nodes. I want them to run on these IP addresses, et cetera, et cetera. And we can use the properties to customize those installations. So the first properties we need to set are properties in the Cloud Foundry deployment. And we need to set the syslog demon config properties. And these are the properties that tell Cloud Foundry to forward all of the syslog messages. And we will point it at the syslog ingester of our LogSearch deployment. And the second set of properties that we need to set are the ingester Cloud Foundry firehose properties. And these live in the LogSearch deployment manifest. And these properties just detail a user that can access the firehose, the actual firehose endpoint, et cetera, et cetera. So we set those properties, save the file, and then hit Bosch deploy. At which point Bosch will go and do its thing and go and set up everything for you. And that's actually really, really awesome because we've essentially gone from having nothing to a fully scalable Elk stack with Cloud Foundry forwarding all of the logs through the system in essentially just a few commands, which I think is really, really great. But I am aware that not everyone loves Bosch, and it can be very difficult to get started with it. So I'd like to point you towards a separate project, the LogSearch Bosch workspace, which really aims to sort of help you get up and running with this as quickly as possible. And that's available in the Cloud Foundry community. So why should you choose LogSearch? Obviously it's open source. That's awesome. But really my sort of favourite thing about LogSearch is that, as I say, it extracts all of the complexity, or most of the complexity of managing the Elk stack and being able to define everything in a single deployment manifest is really great. And it actually makes it very easy to scale the system as well. So, for example, if we want to scale our elastic search cluster, all we need to do is edit one value in that file and then hit Bosch deploy. And Bosch and LogSearch will go and handle the rest for you. And while we're talking about scaling, the graph on the right there gives an overview of the number of VMs that you need to run for LogSearch in order to ingest X logs per minute. So I think that's 10 VMs for 300,000 logs a minute, which is fairly reasonable. But I guess the thing to note there is that it's pretty linear. So it seems to scale pretty well. And that's the end of the logging section. Thank you very much. Thanks, Ed. I just want to say at this point we've got a good solution for logs, but there is more to life than logs. There's more to life than just looking at when things go wrong. We want to support patterns. We want to support trends. We want to see that we can address issues before they become the kind of things that crop up in logs is explosions and stack traces. So how do we look at the metrics in the system? PAS is like a black box. So how do we get the metrics out, get them viewed, and how do we graph all the things? Ed? Yes. So let's talk about graphite and how we can integrate graphite with Cloud Foundry. So graphite is a graffing and metrics tool that's gained quite a lot of popularity over the past few years. And I think that this has been partly due to a famed blog post written by the team at Etsy. And one of the things that they mention in that article is how at Etsy they worship at the Church of Graphing and how they use graphite to achieve this. And so it would be great if we could have our own Church of Graphing available for Cloud Foundry. And fortunately we can. And there's actually quite a few ways that we can do this. And I'm just going to talk about two of the ways that we can integrate graphite with Cloud Foundry to provide a really nice metrics solution. So the first approach is that we could use the Cloud Foundry collector. And the collector is an optional component that ships with a standard Cloud Foundry deployment. But I've kind of refrained from talking about the collector too much in this talk. The reason being that it's sort of being deprecated in favour of the fire hose. But actually it is pretty quick and easy to set up. And maybe if you're stuck running an older version of Cloud Foundry for whatever reason, the collector is a quick solution that you can use to get some metrics out of the system. So the collector works by basically querying the slash health-zed and slash var-zed HDT endpoints of all of the Cloud Foundry components. So every component in Cloud Foundry exposes these endpoints. Slash health-zed will return either a 1 or a 0 depending on whether or not the process is healthy or not. And slash var-zed will return some more detailed information about the process. So, for example, if you query slash var-zed on a UAA server, it will return some information about the underlying Java process. And so the collector sits there querying all of these endpoints for all of the components, gathering all the data. And it then uses what is called a historian to forward that data onto some third party. Unfortunately, there's a graphite historian to do this. I should just mention, though, that this is currently considered to be a community-maintained feature, so it's not being actively developed anymore. But it does work, and as I said, it's quite quick and easy to get it set up. So to actually do this, all we need to do is ensure that the collector is included as part of your Cloud Foundry deployment. And then we just need to set a couple of properties. So you want to say, use graphite to be true, and that you don't want to provide the IP address and port of a graphite server. Fairly quick and easy. And you go and run Bosch deploy, and it just goes and does its thing, and we end up with our metrics in graphite. So that's kind of okay, but there are a few problems with using that approach, namely that it's being deprecated. So an alternate approach that we could use then is to grab our metrics directly from the firehose using a nozzle. So I was kind of interested to see what a nozzle might look like for graphite, and so I wrote the creatively named graphite-nozzle, which is available on the Cloud Credo GitHub page. And this is a small go program that, as I said, connects to the firehose, listens for all of the metrics events, pulls them down, does some parsing and processing on them, and then forwards them off to a graphite server. And that's generally working quite well. So please feel free to check that out and let me know what you think. This picture, this is an example of what a graphite server might look like. Probably not all that impressive at the moment, but it's got some nice pretty colours, and a graphite always looks good on monitors throughout your office, which is great. And I think that that is everything. So thank you very much for listening. I hope that you've found it to be useful.