 Okay. Hello, everybody. Let me introduce myself. My name is Anton and I'm a Cloud Foundry Engineer at Alturas. I have years of experience of monitoring distributed systems and currently my team and I are developing a full-stack monitoring solution for Cloud Foundry. So today we are going to discuss how to never leave your Cloud Foundry deployment unattended. But we don't have much time, so I will briefly highlight some general ideas about monitoring your Cloud Foundry deployment. So let's start with our agenda. So as you can see, we will cover our Cloud Foundry deployment layer by layer. Then there will be a few thoughts about updates, security, drills, and then I will share some of my tips or my thoughts. And finally, there will be a question and answer section if we will have time for this. So the first layer we're going to discuss is infrastructure as a service. So what you can do to monitor your infrastructure as a service? So you can monitor availability of infrastructure as a service itself. So for example, you can monitor availability of data centers, of availability zones. Also you can collect metrics and create alerts based on this internal metrics. You can do this if your infrastructure provides you some kind of API. If not, you can still use some kind of vendor specific monitoring solutions. And also it's really reasonable to monitor health of your virtual machines. So good candidates to start CPU, memory, input output, network, and etc. And also it's really reasonable to monitor availability of your virtual machines. So for example, monitoring agent up and running and virtual machine is reachable. So our next layer is Bosch. And as you know, Bosch is crucial layer of Cloud Foundry deployment. So you have to make sure that you properly set up monitoring of your Bosch director and Bosch deployed virtual machines. So in case of Bosch, you can't and you probably should set up email notifications because this email notification, you receive a lot of interesting event notifications like about processes on your virtual machines, about SSH events, and about your deployment state. For example, deployment started or someone tried to connect a virtual machine and failed and so on. And also it's reasonable to set up walk forwarding in Bosch director. So we will receive walks from Bosch. And also it's really reasonable to collect metrics. So in this case, Bosch health manager will provide you with basic metrics for your virtual machines like CPU, memory, whatever age, and also it will provide you health status of your virtual machine. So for example, in terms of Bosch, it means that virtual machine is running or processes or job up and running and virtual machine is reachable. But if you want to gather some advanced metrics, you may consider to use some separate monitoring agent like colleague D, telegraph and some sequences. Okay. So next where, so it's a quote foundry itself. So what do you need to do here? Once again, you need to collect walks. And by walks, I mean both walks from platform itself and from applications. In this case, you can use firehose to collect walks from applications and from some internal components. And you can use syswalk forwarding to collect metrics from quote foundry components and maybe from some kind of third party services like Redis, MySQL, and so on. So the next step is to collect metrics. And once again, you can use firehose to collect metrics from internal components of quote foundry, like quote foundry API, like Diego, like UA, and you can use metrics collectors to collect metrics from external components like nginx, MySQL, Postgres, and so on. And also, it's reasonable to set up alerts based on walks and metrics. And you don't need to, for example, from the start, you don't need to decide which metrics are relevant or not. You can find a very comprehensive list of valuable metrics in key performance indicators in quote foundry documentation. And also, I will suggest you to set up URL checks. For example, URL checks of quote foundry API, of UA. I know it's a really simple advice, but it gives you an opportunity to look at your quote foundry deployment from the outside. And this is really good stuff. So services. So for services, you also should collect metrics. You can use collecting add-ons for this. They usually have a lot of plugins to do that. And also, for some of services, you can use firehose to collect metrics. So nowadays, a lot of Bosch releases include integration with firehose, so you can have metrics out of the box. And of course, you will need to set up alerts. And for this alert, you can use vendor's recommendations. Okay. And just a few notes about monitoring your applications. So really simple, but really useful. It's to set up URL checks for your applications. And also, you need to collect metrics. You can use application performance monitoring to do that. So you will have some kind of automatically generated metrics out of the box. But personally, I don't really expect that this metrics will cover everything you need. So I prefer the second approach, more old-fashioned way, I guess. So we can just instrument your code and send metrics to some kind of time series database. And benefits of this approach is because it's cheap and fully controlled way. So therefore, you can define metrics with the real value. So, yeah, in cooperation with automatically generated values or metrics from APMs. And also, you need to collect logs. So there is at least two ways to do this. So we can connect to Firehose and receive all logs from the whole code fund redeployment for all applications. Or you can stream logs from particular applications by using service instances of log aggregators. Okay. Let's talk about updates. So it's really great to keep track and to install latest version of services because they have security fixes, bug fixes, and new features. And also, you need a new version of stem cells because of security fixes. But if you did this at least once, you know that this is a very boring task to do. So I suggest you to use some kind of continuous integration for updates. So it can be concourse because it has a great integration with Cloud Foundry. Or you can implement a continuous integration with CI tool you like. But seriously, installing all updates of Cloud Foundry or Peel to Cloud Foundry is really boring stuff. So it's really reasonable to use continuous integration for this. Security. So please keep track of common vulnerabilities and exposures. You can find them on Cloud Foundry website. And if you have CI, you can install all these new stem cells and services automatically. But it's reasonable to sign up to email's newsletter of CVS. And just a few thoughts about drills. If you have a possibility to make drills, it can be a really good tool in your pocket because you will be able to understand what can break in your Cloud Foundry deployment. And it's really reasonable to simulate crash of specific virtual machines, crash of data center, like data center outages, and network issues. And if you do this, you will be sure that your deployment won't let you down at the time of real life failure. So if you can do drills, please do. Okay. And just a few tips from me. So please ensure a sufficient coverage for your monitoring battery frame from overdue unit because too many alerts and metrics will create information noise. And trust me, it will kill your monitoring because you will get used to them. And eventually, you will not notice them. So start from something small and only then grow your monitoring, your alerts, and so on. And also, it's really reasonable to create some kind of knowledge database. Every time you face some problem and you fix this problem, don't forget to add this solution to your knowledge database. And also, it's really reasonable to write post mortems. So future generation will understand what you did. And also, I can suggest you to create simple but very useful alerts based on basic use cases. So for example, you can use basic user workflow for doing this. For example, what users do every time they look into your way, they list applications, they open URLs. So you can create alerts for all these steps. And also, you can create alerts based on basic metrics like error rate, availability, and so on. Okay. And just a few words about our monitoring solution. It's called heartbeat. We have system monitoring and basic application monitoring, high availability, integration with logs, a lot of predefined dashboards and alerts, and a lot of integration with third party services. And just a few words about our logging solution. It's called log search for Pivotal Cloud Foundry. It's based on open source, log search for Cloud Foundry. So you can use open source version for your open source Cloud Foundry and our Pivotal Cloud for your Pivotal Cloud Foundry. So it's a really major product and it's really easy to use. And that's it. So if you have questions, please go ahead. We don't have a lot of time, but maybe we can take one question. The next talk, the speakers are in the back. But maybe we can take time for one question for Anton. Okay. Thank you very much, guys.