 Hey, everybody. Sorry we're getting started late here. I appreciate you coming out to look at open source monitoring for OpenStack. This is saved by the bell, monitoring OpenStack with OpenSource tools. My name is Jason Grimm. I'm an open cloud architect at Brackspace. I'm joined here by James Thorn and Vincent Rogers to my peers. We're going to skip introductions and get moving kind of quickly so we can get to the content that we want to look at. So we're going to go through a systems management background. We're going to look at OpenStack monitoring from an overview perspective. And then we're going to dig in with some walkthroughs on how to actually do this from a technical perspective. So about the presentation, we work for Brackspace, but this is not a sales pitch. If you have been to any of the marketing talks, this is not that. So the send questions in real time, we're going to skip that. My contact info is there. If there's any questions you have, just send them to me and I'll answer them after the summit. And everything in here is 100% open source. There's no proprietary anything. I'll skip this, too. It's some icebreaker, useless trivia, and around saved by the bell. It's not boxing, it's not classroom, it's not a bad TV show. It actually, people would be buried with a bell in the coffin in case they woke up and were buried alive. So the analogy there is that if your OpenStack environment isn't monitored, if you don't have the bells, you may not be dead, but you may be on computerjobs.com. So monitoring in general is a component of a larger operational management model. I think all of us here that are in infrastructure management know that monitoring comes along with a suite of other activities, like logging and alerting and performance management provisioning and triggering. So you've got your data center infrastructure management at the bottom, all the way up through your web and presentation tiers and access tiers. I wanted to start with this because we start to look at how some of this changes from a monitoring OpenStack perspective. So on the far left, you have the traditional physical model. You've got the physical operating system. You've got your infrastructure support tier apps, database app or application tier, web tier, and access tier. With the virtualization adoption, what got slid into that stack was a hypervisor and a guest operating system. So now there's two more components or areas that you have to monitor, manage, and take care of. So cloud, I mean, I don't know what the vernacular is, but cloud, what I'm calling Cloud V1 here is kind of where we're at today, where we now have added a cloud operating system. And that cloud operating system has a closer tie to the physical operating system and the physical devices. So now that line, so that area in red there, more and more becomes what we have to be responsible for because there's a, the line is blurred between what our cloud operating system is and what it's actually doing to the hardware. So some of those roles and responsibilities are emerging. Cloud V2, I don't know that the whole stack is going to be red, but we see some very interesting things where stuff is moving down the stack and coming closer to the cloud operating system tier and a lot of interesting discussions on what is platform as a service and what is, who's responsible for what layer and should you be doing this here and should you not be doing this here. One call out here I think that's interesting is guest operating system, right today, not everyone is to that point of making a completely agnostic throwaway designed to fail guest operating system image. If you get to cloud, if you get down this track and you're still trying to monitor the guest operating system and care and feed for it and know where it is and all that stuff, then you're probably doing something wrong. So that's gonna drop, there's all these things that come in, the guest OS is one of the things that is going out. This is another view at where open stack fits in the stack and where pads is and where software is. So it's strong in orchestration, extrapolation, API control, scalable, it's open. What is lacking or I'd say de-emphasized is operationalized, the logging and metering and management monitoring and all that stuff that it pushes outside and we'll take a look at why it does that, right? So one of my first questions coming into open stack two years ago was this very question, like where is my monitoring functions? Where can I see my alerts? Where can I see the stuff that my graphing and where can I see the performance metrics and all this stuff and it took me a while to get it that I was kind of saying it could do all this stuff and then people said well it's not designed to do that and then I would say well why is it not designed to do that, it should do it. Now I understand that it's got an intentional narrow focus on infrastructures and services. What that means is that there's an assumption that if you're deploying open stack, your environment already has tools in place to monitor your other Linux servers. Those tools by and large work pretty well for monitoring open stack as well with a few tweaks. So existing Linux tool sets work, most shops already have tools. There's no reason to re-bundle a tool set or recreate and at the same time there's no limitations to lock in on how you wanna monitor your environment. So it's kind of like bring your own mentality, right? As a other benefit there, there's less bloat in the software, there's less points of failure. The open API is there for non-core functions like monitoring and a reduction of complexity and increased efficiency. Again though, it is monitoring open stack or monitoring period is one part of a suite of things that you need to do. Whatever you're doing for your Linux servers today, from a logging and recording and patching and updating, you know, Burra config management is gonna have to be done in open stack and at a bare minimum and maybe a little bit more. Metering becomes more and more important if you're an internal enterprise deploying open stack, you may not care immediately about chargeback and showback. Your service providers are in the old days, even in just traditional hypervisor days, metering and chargeback and showback was kind of a holy grail that people were going to. It's still there in open stack, so you're gonna have to get better at that part of it. So here's an example of if you deploy an open stack and you don't know where to start with some default settings. This is actually taken, well it's on a sales pitch, but this is what we use internally. This is what I got out of our own documentation for monitoring open stack, at least the threshold. So you've got your hardware, typical stuff, you know, CPU idle, disk based utilization. Like I said, this is gonna look pretty familiar to anything that you're gonna do on your other Linux servers. So this slide is an eyesore and I'll get a better one. It's just in there for reference content, but it's all the services that you wanna monitor. And here's a view of the services broken up kind of functionally around what they do. So if you're, I think I've got a better picture. This isn't listing all of them, but if you're new to open stack, there is, you know, different services run on different nodes. The controller nodes are gonna be running your API endpoints and your SQL instances and your MQ and things like that. So you're gonna have to have a policy and a approach to monitoring that device and you're gonna have a different set of services that are on the compute nodes. Sorry, I was trying to see how much time we had left. So for hardware monitoring, you can roll your own but you probably already have a tool in place. If you don't, here's some free, right? You've got free like beer, just free, but not open. And you've got free like speech, which is free and open source. Nagios and some of the other, and Graphite and some of the other tools that we're gonna talk about. We'll do hardware monitoring as well. I didn't put that on here, but from a hardware monitoring perspective, it's almost easier to get a kit than doing it yourself. So from a software monitoring standpoint, these are popular suites. Nagios, probably the most prevalent. That's not a statement of quality. Although in many people's opinion, it is a statement of quality. It's a statement of consistency. It's been around since 1999 and tens of people are already using it for Linux monitoring and therefore are also doing it for OpenStack monitoring. So along with Nagios, there's a commonly accompanying tool set for log management, for alerting and for metering and graphing and reporting. These are some of those tools. So we'll get through that. So the first view that I wanted to look at is a do it yourself view. And it's not a suggestion that you should do everything by hand, but it takes us through how it can be done very easily in a very short period of time and what are just a couple of things? What are we looking to monitor and how do we do it? So I'm doing this on, well, I got it written right there, right? I did it on Sintos. It's exactly the same for Ubuntu with the package name change. So we install a couple of tools. We install a couple of OpenStack-like services, right? To Rabbit and SQL and Levert. And we just cat literally, you know, 15 lines maybe into a shell script and user bin, set some variables, set an array for services, set up a log and set up an email alert to send out to us in the event of an issue. We add that to Cron and you're done, that's pretty much it. You're now monitoring OpenStack. Let me show you the results here. So that script is monitoring, you know, Levert and HTTP and RabbitMQ and MySQL is on there too. I mean, with that few lines of code, we're getting an email alert. We're logging to CSV. We've got command line logging as well. It really doesn't take much to get your environment monitored. Not just monitored, but we've got logging and scheduling and reporting and email learning as well. You could take that. I mean, if you like that approach or you're already kind of a do-it-yourself shop anyway, you can enhance that a little bit by when you do the right to log out, you know, do it with HTML tags. You could, instead of just checking, that last screen just checked if the process was running. It just did a PS and a grep on the process. That's great except the process can hang and still show in PS, but be dead. So probably a better approach is to actually ask MySQL to show databases or something like that. You're actually asking for the health of the system, the same with Apache or an open stack service in that case. So you could have a trap for lack of a better term to go and do a NovaList and if NovaList dies, then consider that down. It gets a little bit muddier because NovaList is hitting a whole bunch of different things, right? It's hitting SQL and it's hitting Nova and it's hitting multiple components, so you'd have to dig in and tune that down to just what's needed. So we're gonna look at Nagios. Like I said, it's been around since 1999. The OpenStack Operations Guide itself says, you know, Nagios is a tried and true systems administration staple. I would agree with that statement and I like Nagios and most of the integrators and re-packagers are using Nagios and again, it's not a statement of quality, it's just that's, you know it works, right? So this installation is CentOS 6.5 Nagios Core 406 and Nagios Plugins 201. And the, so the first thing I do on any box if it's in that new box is update and upgrade and reboot to make if there's any kernel modules and make sure everything comes up clean and I probably should have put snapshot in here if it's a VM but then we install just a few dependency packages, you know, Git and OpenSSL and make and some of the common things. Oh, one other thing to mention here is that we're not, it wouldn't be very interesting to show you me just downloading a Nagios VM appliance and you know, doing or downloading a kit to deploy Nagios so that's all from source of what we're gonna do. So you create your users and groups, passwords and membership. Download and extract the tar ball for Nagios, the plugins and NRPE is the Nagios Remote Plugin Executor. It's a client server agent framework for lack of a better term that sits on the remote machine ready to execute things like checks against it. So, configuring Nagios from main, if you haven't downloaded from source, it's not configuring like adding things to text files, it's configuring the tar ball and getting ready to make executables out of it. Make an install, that's interesting. Sorry, I can deploy OpenStack but PowerPoint in two screens escapes me so. So, add a contact. So now we are into the Nagios configuration portion of it. So, the Nagios bits have been installed but they're sitting there kind of, you know, un-configured and idle. So, we make an install the WebConf which is the front end with a couple of switches about what we called the user and what we called the group. Do some change ownership on the directories. The NRPE is actually the more difficult piece to configure. So, the Remote Plugin Executor, I don't know the history of it, I just know it's a pain every time I go to install it because it's an Exynet service and you've got to manually add it to services. Here's, you know, if you're doing it yourself, you may want to use the distribution so that the CentOS or REL or Fedora, you may want to use the packages from your operating system distribution for that. But if you have to do it by hand, you create an Exynet D service and put your allowed and your users and groups, reports and all of that for that. We add it to services again by hand. We go into NRPE config and the only way to change in there is what host can access it. And the net stat, I didn't show the outputs of it but just a couple of commands to make sure it's running. Even though I've done this several times in the past, this portion and the net stat commands I put in there because it took several tries because, you know, IP tables was added to configure or something, you know, Exynet I had to restart or I had to install Exynet and things like that. So, again, going back to the, using the packages from your OS, if you went and installed NRPE, an Agios dash plugins dash NRPE, it would install Exynet D for you and get it started. So, in that case, going by hand, it might be kind of slow. So we verify Nagios and we, you know, set HTTP and Nagios to run levels and we start everything up. If you, you know, assuming everything worked well, this is the screen you should see. Again, it's not sexy but it is consistent and reliable and stable and it's open and it's pluggable and it works every time. So, unless you're installing NRPE it takes a few times. Oh no. So on the node to be monitored, I just, going through here real quick, the easiest way to get that up is a DevStack build. To do it yourself stuff, I just installed OpenStack like services such as HTTP and MySQL and Rabbit and things like that. Now we're going into Nagios and we're actually want to monitor more services. So, just if you don't have an environment, even if you do, throw a DevStack box up, which is remarkably easy to do, right, you install Git, you sync the repo and stack.sh. On the Nagios side, you define the host that you want to monitor, the remote host that is and you add that host to the objects file. And then you, after you add the host, you have to tell Nagios to, it's an include kind of behavior just like Apache or some of the other services that when you're loading, go ahead and include these config files as well. You can put everything in the same config file, but when you get to, you know, when you get north of 10 hosts and everything in the same config file, it's going to get messy and troubleshooting that and chance for error becomes greater. So, break out your host into separate files and in that file, you define the host then you have to define every service underneath it. So, if I want to monitor Nova or SQL or MQ or Sender or any of those, it's, you know, it's a stanza-like configuration for each of those. So, if you do go the single file route, 10 hosts with 10 services each becomes 100 stanzas, which is, you know, several pages of convict file that you don't want to go to. So break them out. After that's in, restart Nagios. And would you, you know, now here's my host in Nagios and I've got my checks, you know, for my services. That's pretty much as simple as it gets. That's it. So, questions? You can email me jason.gram at rackspace.com. I can answer questions. After the summit, I'm happy to do that. Oh, go ahead. Be done into work with integrating anything from Solometer into Nagios in terms of monitoring your guest VMs to alert when either thresholds are met versus the standard just using NRPE, but using it's like Solometer or any of the other built-in telemetry data. That's a great question and I'm going to defer to these guys because I actually don't know that. I wish I'd thought of that when I'd signed up for this because that's an excellent, that's an excellent question. Do you guys know? I haven't personally done anything with it. For example, within Rackspace Private Cloud, Solometer, we're not really ready for it yet and we wouldn't want to rely on its data to do any sort of alerting yet. That could obviously change. But yeah, I mean, Nagios is extremely extensible and customizable, so you could write whatever you want to do that, but unfortunately, I haven't done it, so sorry. Well, there's no more questions, I guess we'll just call it then. Thanks, everybody.