 This talk is RIP Nagwai's Hello Docker Shinkan by Rohit Gupta. So Rohit is a developer, force enthusiast, and an Indian nationalist. He's passionate about technology and has worked on the area of convergence of telephony over the web. Hello, everyone. You can hear me? You can hear me? Yeah. Hello, everyone. First of all, thank you for having me here. It's a real honor to be presenting in front of such an awesome audience. So the topic which I'm going to talk about is resting piece Nagwai's Hello Docker Shinkan. First of all, who am I? I'm a developer working in the area of DevOps. I've been involved in a lot of automation over AWS, working some bit on Docker as well. In terms of monitoring, I've been working in the area of monitoring for more than three years now. I'll be sharing some of my experiences with monitoring systems and why Shinkan makes more sense than Nagios and what Docker Shinkan has to offer. But before I begin, let me ask you a quick question. How many of you over here have never used any monitoring system? These Resuants? All right, great. Awesome. A lot of you. How many of you have used Nagios? Hey, great. So I have one good news for both of you. Let's see. So for the ones who have never tried any monitoring systems, there's maybe a terrible news, but it's a news. Monitoring systems are hard. It's a Shinkan manual, basically, documentation. In the documentation itself, it says, relax, it takes time. Why? Because monitoring itself is a different domain. You have a lot of new concepts. And if you want to have a proper monitoring setup ready, you have to take care of a lot of things. You may get started with a few monitoring systems, but it may not be useful. Like you may get either too many alerts or too few alerts. It may not catch the exceptions when required or it may catch too many. So it's a completely new domain. And that's why I say monitoring systems are hard. Let's talk a bit about Nagios. For Nagios developers, you already know. But Nagios is a monitoring system, IT infrastructure monitoring system. And kind of a standard tool for monitoring servers, routers, devices in many, many, many companies. The problem with Nagios is that it is, from the development perspective, it is dead. People have fought Nagios and created multiple projects out of it. There's a kind of love-hate relationship over here. People love to hate Nagios, but they still use Nagios because it is everywhere. Let's talk a bit about Shinken. What is Shinken? Shinken was originally proposed as Nagios 4 by Gene Gibbs, its author. But it was rejected by the Nagios developers. And so it became an independent project. So was the good news for Nagios developers? Shinken supports everything whatever Nagios supports by default. You can just take your Nagios configuration and use it in Shinken, done. It supports 99% of the configuration as it is. The 1% it doesn't support is written in the documentation. Why it doesn't support? Basically, those are very rarely used modules or the features. But again, if you want to have those, you can have it with the help of modules. I've used Shinken 1.0 and with Shinken 2.2, I saw dramatic changes in the features it provides. It has a new installer itself. It has a lot of new modules. Shinken is completely written in Python. So basically, for installation, it's really simple. You can do just pip install Shinken and it's done. But by default, it doesn't even provide you a UI, a bit surprising. So you have to install each and every module yourself by using the Shinken installer command. Shinken install web UI. You have a web UI installed. Shinken install something else. You have something else installed. So you don't have to pay the overhead of features you don't want to use. Let's talk a bit about architecture. I was talking about it is very modular in design. It is also built for cloud. So Shinken is essentially a bunch of components, or you can say different processes. This Shinken arbiter, this Shinken reactioner, scheduler, polar, and everything. By the way, let me talk a bit about the architecture. How does monitoring work? Say you want to know the load of your system. How will you do that? Anyone? So basically, say you want to check the load of a remote server. You can do SSH into the system, run the uptime command, and you have the load. So Shinken, Nagios, and many other monitoring systems do this polling approach. They poll the remote servers for the health status, and display it in the UI, have a learning mechanism, and everything. So Shinken also does that. The component which you can see in the green at the bottom, the Shinken polar actually polls the system at regular intervals. Scheduler is the one whose instructs polar to perform the check whenever required. It will schedule it every time. Reactioner is the component which will send you notifications, alerts, events, if you want to handle something on its own. Arbitr is a centralized place where you keep all your configuration and every, and it will configure all other components for you. You don't have to configure each and every component. Broker is something which is aware of what is happening in all the system, and it can provide tools on top of it. This receiver is an optional component for having passive checks, more of a push-based approach. Again, when I say it is built for cloud, each of these modules, each of these components are independently scalable. You may have five arbiters, 10 polars, 11 schedulers, all as per your design, all as per your requirements. Let's talk a bit about Docker Shinken. So Docker Shinken is a project basically what we did was Dockerize Shinken and have some of the must-have plugins modules pre-installed over here. We have three varieties of Docker Shinken. One is Shinken Basic, Shinken Plus Fruck, Shinken Fruck and Graphite. So basically, Shinken Basic is just providing you the basic UI, a Shinken Web UI, and few must-have plugins. Fruck will provide you another UI. Again, a good news for nervous developers who, Fruck UI is an independently developed project, and its UI is very similar to Nag use. So you will still feel at home. And Graphite is, again, providing one more tool for the graph support. Let's do a quick check. How does it work? So as I said, monitoring systems are hard. So this project aims at simplifying or decreasing the learning curve. You just have to run these three commands, and you are up and running with the monitoring system. Let's do it right now. The first command is just to clone a Docker repository, Docker Shinken repository. And I've already done that. The second is to CD to a particular directory. Again, let's first check what all possible directories are there. Shinken, basic, Fruck, and Graphite. So depending on the image you choose, you can CD to the respective directory. So let me just do that. I'm going to choose the third one. And the third, just run the Docker command. Bingo. My Shinken should be up and running. Let's check. Let's check. Yeah. So the resolution is a bit small, so you cannot see everything. I can do minimize here. So by default, the username and password is admin and admin, which is configurable. You can log in over here and see the different monitoring checks in place. So I've pre-configured it with monitoring Docker Shinken, the Docker container itself. So you can see a lot of different checks. I can click on any one of them. So it gives me a status message. It also has graph support. I've just launched the Docker container, so it doesn't have any graphs. There's also another UI. So I can just type truck. And over here, I get a login. That's it. This is very similar to what the Nagios default UI provides. So I can click on all the service. I can see the different services, what is the status, and everything. Yeah, for you to experiment, I have already set up a demo for you. You can just log in and play with it, if you like. Over here, I have not told you about one thing. Custom conflicts. So this is a custom conflicts directory is mounted on the Docker container using the volume mounting feature from the Docker itself. So I've just created different folders, different hierarchy to make it more understandable. So all you have to do is just place your configuration file over here whenever you want to make any change. If you want to monitor your host, you can just add your Nagios configuration or Shinken configuration over here. And you are good to monitor another system. Now let's check what it is required to monitor another system. Till now, I was talking about Shinken, which is a Nagios core replacement. Now let's talk a bit about the client side. You want to monitor, say, 20 different hosts or 100 different hosts. You will be running an agent on those hosts. It is nothing new. This is nothing about Shinken. This was there from before. An RP stands for Nagios Remote Plugin Executor. So you can install this agent on your system. And Shinken can pull this agent for health information. An RP is not the only way to perform health checks. You can do health checks via SSH, SNMP, and others. This is the an RP is most commonly used for monitoring. And I'll be talking a bit about this. So what is required for what do you want to monitor? So generally speaking, you would like to monitor system metrics. There are basically three broad categories in which you can classify monitoring of servers or clients. One is system metrics. Second is processes and applications. And third one is application metrics. So for example, you have system load, memory utilization, the space. And all of these, you want to keep track. And all of these will come under system metrics. Second is the processes you have installed. Say for example, you may have installed Elasticsearch, you may have installed Kibana, you may have installed X, Y, Z. So you want to monitor those processes, whether they are up and running, what is the memory CPU utilization of those. Those general monitoring of your application is I would say process monitoring. Third is application metrics. Say you may have a master slave setup. You may want to monitor what is the lag, a replication lag. You may have Elasticsearch setup. You may want to monitor the different shardings and so on. So basically, you can classify monitoring into three categories. Let's look at NRP and see how can we configure a simple monitoring. So I can just do an NRP installation using app or M package manager. App get installed, Nagios, NRP server, and I am done. This is the default configuration file, nrp.cfg. If you notice, there will be a line towards the end, include nrp.local.cfg. So I would recommend you to modify this file instead of modifying the original one. So let's check. I have already installed Nagios, NRP server. So this is a very descriptive NRP configuration, the default one. You have different options and everything. Most of it, I think you will not have to touch them if you want to change the server port. By default, it is 5666. From the Shinkank server or the Nagios, you connect to the remote server using IP-based authentication. So there will be something called allowed host. You can see over here. So we can add different IP addresses, common separated IP addresses, or CIDR, and different things. At the end, you can see include nrp.local.cfg. So whatever is defined is, by default, you include nrp.local, and it will be overwritten. So what do you want to overwrite? Most common thing is something you will have to validate your Shinkank server. So allowed host, you will write, I want to allow my Shinkank host to connect to nrp. Second is like don't blame nrp. Don't blame nrp is option for enabling arguments. By default, it is disabled and it is considered as a security risk. So if you want to enable it, you can enable it by having the value as 1. And the third thing is the most important, that is nrp commands. Think of it as a key value pair, the key being the command and the value being the actual shell command you want to execute. Now again, those commands are called nrp plugins, which can be written in any language, including Python, Ruby, Go, whatever. A lot of plugins is already available. You can install nrp plugins to have a bunch of plugins available with you. So here are two examples. Now let's look at the server side. Shinkank, there are three things. Think of a command you want to execute and then the command will be executed by services and services are present for a particular host. So think of this hierarchy. You have commands which can be used by multiple services, a service which can be imported by multiple hosts. So command definition comes in the beginning. I have written two command definitions, which is by default present. Just for the demo purpose, I have written this. Check nrp and check nrp with optional arguments. Then you define a host. Just the host name. Use generic host. That is a Shinkank template. You can define all your common configuration over there. By default, it has a bunch of common configuration. And finally, the address. Service definition. Service definition is the actual checks you want to perform. Check DNS and load per CPU are the two service definitions I've defined over here. Again, use generic service as a template over there. You have a bunch of common attributes defined, like what is the check interval, what are the different people who should be notified if there's an alert, and so on. Host name is Shinkank. This was defined in my previous definition, host definition. And finally, the check nrp, the command name, check nrp, and the actual command name I want to execute. And the second one is with arguments. Quick. Let's do a quick demo of this. So as I said, I've just added the authentication for different host, including my Docker Shinkank. So this is the IP of my Docker container. I am running my Docker container to monitor my local host itself, the host which is running the Docker. So I'm going to use this local IP. If you are monitoring a remote host with public IP, maybe you will have to write the public IP over here. You will have to whitelist the public IP. This is something which I have already whitelisted. You can see it over here. And there's one more thing. The custom configs directory which I've talked about, you can place your configurations. You can just modify the configuration. You can add and remove your configuration files. And Docker Shinkank will automatically detect that. So I'm going to just do that. I have a setup ready. So dev.local. So this is how I'm defining my host, some services. These commands, checkUsers and checkLoad are predefined in the Nagios nrp default configuration. So I'm just using those instead of defining my own. And what I'm going to do is just copy dev.local to custom configs directory. That's it. Let's check what's, oh, I forgot one thing. All the configuration file for Shinkank should be with .cfg extension. So custom configs dev.local and custom configs dev.local .cfg. And I can see the fresh over here. I have dev.local. That's it. So it is done right now. So it will take, I can probably just force initiate a few checks, select all of these, and we check them. So ping is OK. Check, ht1 is critical because the device does not exist. This particular device does not exist. And it is showing critical. There are a few other things like the checkload and everything. We have it here. So again, for the you and everything, you can look at the demo yourself, shinkank.wit.io. And the project is available open source in GitHub. Let's talk a bit about how this project can help. As I said, monitoring systems are hard. Now you have a basic setup ready with graphs and everything with you. A lot of plugins is already available with you. You can slowly start understanding monitoring systems. The learning curve is not like this. Now it is kind of flat. And gradually build up your understanding on the monitoring domain and start learning how to. So one experience I would like to share about the transition from Nagios to Shinkank. So one of the experience I would like to share is the transition from Nagios to Shinkank. So at Nularity, I work at Nularity. At Nularity, we were using Nagios. Hello? At Nularity, we were using Nagios. And we had a lot of problems with the way it was deployed and the features it was providing. So we have a deployment of cloud plus data centers. We have a centralized cloud platform where we have applications. And we have various data centers which is required as per our business. So some data centers in Delhi, Bangalore, Dubai, and so on. So with Shinkank, as I said, it is built for cloud. And it supports distributed architecture. I was able to, I just did one thing. The Shinkank pollers, which actually polls the remote systems for health checks, just added one in each of the data centers. And I'm done. I have a really distributed monitoring system. I can monitor each of the hosts within a data center using local IPs. But in case of Nagios, it is still possible, but it is not that intuitive. So that was a use case which I wanted to share. And that's it. I think I'm done quite early. I have a lot of time left. I would like to hear your questions. It should trigger some emails or it should trigger some SMS to the person which we have configured. I mean the mobile number we have configured or the email we have configured. Do we have any system? So Shinkank is more of an execution engine. You can think of like that. It is scheduling and executing whatever you write. You can write whatever you want. So the check which I have showed you. So here's a Shinkank command. You can see there's a check nrpe. This is not something Shinkank specific. It is an independent program. And Shinkank is doing a system call. Call this to perform a remote check in the system. Once it receives the value, it will evaluate and have it in the UI, send notifications, or whatever. Like this, there are a lot of community plugins available. For emails, I've also open sourced one AWS SCS-based notification plugin. For like HipChat, there's a HipSaint plugin, I guess, that is the name, which you can use to send HipChat notification. For SMS, you can have something like if you have used Thilio or PlayVope, you can use those. Or you can build your own plugin for Way2SMS in India, and so on. That's it. Like check nrpe is making a system call. And this is a very basic command. There's also a module which can help avoid system calls and do the remote network call directly from the Shinkank polar. There's a Booster and an Rp module. So there may be some Shinkank modules, which can help you do the same without making a system call as well. Nice introduction to Shinkank. I've been using Nagios for more than three, four years. One thing that I'm annoyed with Nagios is the way it gives graphical information about historical data. Does Shinkank do anything better in that area? Yeah, you're talking about historical data. Anything comparable to Neuralik? I talked a bit about the QI, right? Yeah, the QI is an independent project, which can be integrated, say, in Nagios, or Shinkank, or a few other monitoring systems as well. The QI can be integrated with Shinkank using LifeStatus API. LifeStatus API is faster than status.dot.dat passing. In Nagios, there's a status.dat file, which is being passed for all the status information. So in that sense, it is faster. Second, you can enable logging in LifeStatus API. And you can have all your historical information. You can check the availability status. You can generate all sorts of reports and everything. All right, I'll give it a try. Thank you. Thank you, Rohit, for the session. So this is more of a nutwise from you. I mean, I'm asked for an advice from you. What would be the basic monitoring requirements for a startup with just a couple of application server and one database server, and has only got a handful of developers and not any devous person? So what would you suggest that be the, I mean, what kind of basic monitoring services that we should be using? So that is what I have. Right, that's a beautiful question. I've also interacted with a few people from really small startups. And when I talk about monitoring, they say, we are not doing it. What it is. So yeah, from the developer point of view, monitoring is something which is often ignored. And really small startups don't do that. When customers start asking for availability, they think, yeah, we have to do it. So this is one reason Dr. Shinkan can help reduce the learning curve. And you can have your monitoring system. You can have your in-house monitoring system. A lot of people in startups use some hosted service as well, like Datadog is very popular nowadays. For application, Nurellic is also being used a lot, but it is more from the application, from application point of view. But some people use it for system as well, I think. Yeah, so I would suggest if you want to have your in-house monitoring system to have better control of what's happening in your infrastructure, you can start with Dr. Shinkan. And as your requirements grow, as your understanding becomes better, you can probably start building on top of it or having maybe a dedicated setup. All right, I think we are done. Thank you for being an awesome audience. Thank you, Rohit, for such an amazing talk. Pyke on India would like to present you with our token of thanks.