 Cynthia? Hi. So I just wanted to make sure this is the last session. So if you are not here for this talk, there are other talks too. And I will try to make it compact as much as possible. But my goal of this talk is to understand you don't stuck into one tool or methodology. I think you should ask why and what I am monitoring. It's more important for me than any kind of tool you are using. So let me tell something about this guy. So I joined on Monday in University of Maryland Medical Systems. So before that, I used to work at Madalia with me. And before that, I was working at Omniti. I have worked with probably more than a dozen databases, SQL and no SQL databases, and work with more than a dozen monitoring systems too. Oh, still Madalia email, sorry. You can reach me at Twitter or Gmail account. I think the other email will bounce. Right now, I'm hiring senior database engineers and system administrators in Balty Mode. If you are looking for a job, let me know I'm building a new team to migrate very large medical system from Oracle to Postgres and Hadoop. I blog sometimes at pateldanis.com. You can reach me through Twitter as well. And I run a Slack channel. If you are not on Postgres Slack channel, I think this is the right time to be on it. There are probably 1,000 people on Slack. They communicate on different topics. If you have a problem, if you are learning about conferences, experience with different databases, FDWs, there are a lot of channels in there that you can start your own. OK, today we are going to talk about what? Yeah, so what you are looking for, like if you must be aware, I think that there are a lot of solutions available. There are commercial and open source products available. What you should be looking for when you are selecting a monitoring solution. Sometimes you have luxury to select one. Sometimes you have to use the one that's already there. So I think those are the things I'm going to talk about, like what things to look, which metrics to collect. So monitoring is like there are three parts of it. Collection, trending, and alerting. So I think you should look for what to collect first. Then most of the things, whatever you are collecting, will be trended. And then you have to choose wisely what you want to alert on. Otherwise, you will be getting alerts, all sorts of alerts that you don't care about. Yeah, that's what we're going to talk about, what to alert on. Like monitoring is our growing topic that doesn't stop. Because Postgres comes with the new features. So in Postgres 8, you couldn't able to monitor some things that you can monitor in 9.1, 9.2, 9.4, 9.5 as new features, and 9.6 will come in a lot more features for monitoring. So if you are not keeping track of monitoring things they're adding, I think you are not actually using those. And the features that are coming up are very important. So if you start keeping track of those features, so it's very easy to keep track. So whenever there is a major release or minor release comes out, the monitoring features are normally listed out separately. So it's very easy to find them out. And how to react on alerts at 3 AM? And I will give some advice, but most of the things that you might already know. And open discussion, like as I said, there are probably 50 monitoring tools that I know of. So I might be talking maybe a couple, maybe five. But I would like to know what are the other monitoring tools you are using and what are the factors that were taken into consideration for selecting. So from my perspective, what are the things I normally look whenever I select the monitoring solution? Is I think it's a blend of both. I don't want to have just a single post-credits data was monitored that only DBS cares about. System guys have their own tool. The operations guy have their own. And then the financial has their own tool. So I think it's very difficult, because if you are, the goal of IT is to help business. So if it's an integrated tool, that will be really important. So I think centralized monitoring, that's the key word. So I think I have worked with the companies that has like Datadog, Wavefront, and other monitoring. And everything is like, you need to find out where to look for. So it's very difficult. So it's always good idea to have a centralized monitoring. The other question you would like to ask it is a hosted solution versus on-premise. So some companies like security concerns like me, I will be using hosted or not hosted on-premise solution, because I can't afford to have PII data. Because Postgres is kind of really like, if you're trying to insert row and it's a primary key violation, it will log that name and ID, everything in the log. And if it is going somewhere, that's a log entry. And it could be possible through the monitoring solution as well. So normally, people don't like to send on the hosted, but there are a lot of companies they don't care. And they keep track of those monitoring systems differently. So depending on the industry, you are working on how you want to manage that. Alerting, you want to make sure that whatever solution you are using does provide the alerting capability, dashboards, graphs. Easy to installation configuration, because if you are running like hundreds of systems, then I think manual installation is not a right way to do it. So as long as they provide some kind of automated tools for installation and configuration, that's what you should be looking into. Do they provide a database support like Postgres or any other databases that you are using it? PGSTAT statements is the great tool for monitoring. So PGSTAT statement is the extension in Postgres that you can have it so you can log all the queries ran on the Postgres database. So do they provide integration of that? If not, there is a way to do it. So those kind of things you should look into. Resource monitoring. There are four resources in IT world, right? CPU, RAM, disk, IO, and network on the system side. So I think just look at what are the things they can give as a plug-in or easy for installation. PG Bouncer support, like PG Bouncer is what it's called. Connection pulling, yes. That's the word, connection pulling. So does it like, because if you have PG Bouncer and then you are not monitoring it, I think you are doing something wrong. It should be looking at which clients are waiting and it's not. Those kind of things are very important. So that's a lot of things you should be looking at. So I am going to talk about some of the solutions that I have worked in the past and been working now. So these are the open source solution I just selected out of many. Like Sansu, it's pretty nice open source monitoring shoot. Jabbix, Genos, Nagios. I think Nagios has done a great job in our industry to actually educate us how to monitor, but now it should die. So I think you should search on that blog post. And I think, yeah, I think it did a great job. Now I think it's the next generation of monitoring solutions to be there. Nagios is like it's pretty old now. It's very difficult to manage compared to, and it's not keeping up with the new things. So that's why. Otherwise, a lot of companies are still using Nagios. So it's up to you. But I think you should read that blog that will give you more insight why. SAS offering, I have worked, Wavefront, Conus, Vivid Cortex, Okmeter, New Relic. And there are many that I haven't got a chance to work on. So I will do quick comparison between solutions. So all of them are providing Postgres support. Like if you are using Nagios, it's the Nagios, I think Czech Postgres. Most of them are using Czech Postgres. Plug-in. Easy to like, Sanso is easy to configure. You still need to have graphite for graphing. The recommendation confidence is pretty high. I like Sanso for sure. I think Jabbix and Genos are good. But that's my medium level of recommendation. I think Genos is more close source. And Nagios is pretty low now. I think Sanso is built on top of Nagios. So Nagios had some problems. So Sanso tried to solve those problems. It's a recommendation confidence from my. Yeah, it could differ day by day. SAS offerings. So for Wavefront, that's the thing I'm going to talk about in this talk. But it doesn't matter as long as you know what to collect. Then you can put your monitoring tool. You can send that data to monitoring tool. And they should be able to graph on it, like trend and alert on it. Circonus. So configuration is like I'm using Wavefront with collective plugins. Circonus, there are some default checks. So once you start a Postgres, they will allow you to like 10, maybe 10 different default checks that will be created by default. Vivid Cortex has a default checks, too. It's a pretty interesting monitoring tool, too. OK Meter, that's the one I recently played around. And it's a one-click installation. It's like a Vivid Cortex. But it's built specifically for Postgres-like. You can install it, and then it just works. But it's not like it's a hosted solution. So there's no solution for on-premise. New Relic is pretty interesting, too. But I have worked into it with it. But there are some plugins available, but there are some of the things that a lot of things are missing from those plugins. So as long as you know what to monitor, and if you can change those plugins to do whatever you are trying to collect, then I think that might be a good idea for you to use New Relic. If the other things are already using New Relic, there is no reason to just use other tools. You can just modify the plugins, and you can use it. So in the SAS offering, I think a lot of things you can still compare aside from basic things more advanced, like does it provide capacity planning? So on the disk, so there are like circuiters, for example, provide a capacity planning capability. So you can actually project the growth for next three years, five years in the dashboard. So that's very important. Real-time analytics is another thing like, can you write a queries directly like in Wavefront? I can write a TS queries directly and do some real-time analytics, anomaly detection. So I think Vivid Cortex has been working on that for a long time. So I think that means sometimes having a high CPU usage doesn't mean that everything is broken. As long as you understand the anomaly and react on that, I think it's very important to know. And it's also research projects. So hopefully that will help whoever is on call. I think it will help for them to not wake up in the night more often. Data retention is the thing like, if you're working the SAS offering, you want to make sure that some of the companies, they only allow you to look at the data for a certain period of time, like one year, five years, sometimes one month. So it doesn't, I think it's not useful if you can only see your data for one month because you want to actually compare for last year. Then I think you should ask that question in advance, like how much data I should be able to go back. Some companies like Sarkonas, they never delete your data. They keep forever. But there is a cost penalty to it. But I think you need to understand that. Support reviews like, as I said, now market is very competitive. So understand who is using other tools and get the reviews. And pricing is pretty important. If you are monitoring 100 system versus 100,000 system, so if you can't scale the pricing, I don't think so that's the tool that you will be looking into. So you might want to start with Sansu. So yeah, there are some things that I've worked recently, so I just came up with some good ideas about Wavefront, like it says a nice dashboard, alerting functionality, scalable solution, real-time analytics, complete monitor. And I just compared with the OK Meter because that's something new I just recently played with. So there is no why I did not put weird cortex here. There is nothing, there is no correlation. I just, yeah, it does provide, I just installed the OK Meter agent, and then it gave me all the graphs that I was looking for. So it was pretty cool, very easy to install configured, like I was using PG Bouncer, so the PG Bouncer monitoring is built-in. So you don't have to do anything. If you want to install Postgres server, you just install that agent, and all the graphs are already there. So it was pretty interesting. Thing? Yes, this is SAS offering both of them, Wavefront and OK Meter. Yeah, PG Bouncer stats, like you can actually dig down into client using server connections per IP address, average query timing from PG Bouncer, those kind of things. So even though you are not using this, what you should be doing is, like for example, I am using, let's say, New Relic. But I should actually download OK Meter on the Taste Instance and see what they are monitoring. Can I do that in New Relic or not? If not, what are the challenges? You can actually do that, those kind of things. And so that will help you to understand what are the things are very important, why the other people are doing and why I'm not doing. So that's something you can play around. You just need to install it, and they will give you maybe one month free. And by the time you will know these are the things are very important to me, because some people don't know PG Bouncer stats, like what to collect. So just install it and see what they are collecting. Can we do this? It's very simple. Monitoring solutions. So the solution, I've selected Wavefront because I had a challenge to monitor 150 dB clusters across the globe. I wanted to have easy installation standard because everybody else is using centralized, real-time analytics was very important for me. And the new infrastructure, we were building like Docker or RamiSource. So can it scalable? Because we'll be running probably 100,000 Docker instances. If the monitoring solution can't able to scale with that, I think that's not the right tool for you. So we chose Wavefront because we were already using it. And it was coming up with this criteria, like we could be tested it on Docker and because it was working with the CollectD. So once you install CollectD agent in all your Docker instance, it will just send data. So it was very easy to roll out. So that's the thing we'll be talking about for the rest of the talk is what to collect. So what I am collecting, and I might be missing something, that's why I'm here. If you are collecting a lot more than me, then I think I should learn from you guys. So first of all, I did set up a roll because normally I don't provide a super roll to monitoring roll. So I just created a CollectD roll, schema install schema, CollectD, search path, grant usage to the CollectD and I made sure that there is no super roll for CollectD user. So limited permission and why it's separate roll because sometimes when you want to exclude in the locks that, okay, I don't want to see anything related to CollectD. Just give me information about my application because I don't care about monitoring right now. So that helps you to, so having a separate roll will help you to understand if something goes bad, like you can actually track it down and no super roll is very important because you don't want to grant permission and for not having this, like you have to do some extra steps, but I think it's worth it. So CollectD plugin is pretty simple, like it does. You can just install on any system, Linux or Solaris system very easily. So it will come with the Postgres SQL plugin. So by default, it comes with the top part up to query plans, those are the basic monitoring it comes with, but I added the other ones that are listed as a custom. So you can add your own metrics or queries that you want to monitor. That's not listed over here as well, like I want to monitor something else other than this, like how many tables my database has. I can just write another query number of tables and write a query single selects count star from PG tables where schema is this and it will start monitoring. So yeah, it's very easy to configure. So once you know the flow, like you can add a lot more monitoring tools, monitoring queries in it and it will send the data, then it's up to the monitoring tool to consume that data. It could be sensitive, it could be anything you talk like I think most of the SAS tools are also supporting CollectD. So it's pretty industry standard. So the first one they give is backends. It doesn't give a lot of information, it just give you count star from PG stat activity of that data for that database name. So number of connections actually. But this query will not work if you are using CollectD role as it is because I did not give super role permission. So if the role is not super user, then it will only see the session it can, it is connected, otherwise it will say insufficient permission to see the query. So for that, I created a function, PG stat activity, it's a security definer function. So what it does, it just query the select from PG stat activity, revoke all function from public and grant just execution permission to the CollectD role. So now if the CollectD role will try to get that data, it will get from PG stat activity from CollectD schema itself. So that way, you don't have to provide a super role permission to CollectD role, it can still get all the details, all the data it needs to look at it. This will be changing in Postgres 10, but right now you have to do it until 9.6. Transjection, so that's another thing you were like, normally whenever 2DBS made like, they normally ask like, what's your transaction rate of your database, right? Like, how, like, there are two question normally, like how big is your database and what's the transaction rate for database? So if you don't know, I think, I'm embracing, like, so I think I would just put the square in there and select commits and rollback just get a commits, how many commits are happening and how many rollbacks are happening on my database, that will give you the transaction rate for your database. DML, the another thing is like DML queries, how many updates, insert, deletes are happening. So that is another question people ask like, your workload is mostly read write read only 80% rate, 20% write, is it like 80% write versus 20% rate, what is that? So you can get that detail from here, like how many writes are happening per second by just tracking the update, insert and delete. Yes. Yep. But in the monitoring tool, you can actually like see the rate. So it will just get the last value and new value and it will give you the, I think we are looking at the transaction rate, right? We are not looking at absolute value. So you just want to know the rate. So it doesn't matter when it started because you just care about the last value. And then you, the monitoring tool will just do the, I think, subtraction between two and just give you the rate. Table stats. Yeah. So I think you want to see the live tuples and dead tuples. I am not going into very detail into like what is live and that like that is like not used. So you want to just see like sometimes the data is slow but there are a lot of dead tuples out there or your database is not vacuuming properly. There are a lot of dead tuples lying around. So if you know like this is always growing up like why is it so like you can take a look and do further investigation into it. You don't have to alert on it but at least you know there is something going on wrong. I think there was a bug in 9.4 or 9.2 like there was a water vacuum was not processing properly. So those kind of things you can actually figure it out by looking at your system because each workload is different. Very plants. So it will give you the sequential scan, sequential tuple read. Sequential scan is number of time from like how many times some summarization of sequential scan number of scan happening and how many tuple read is like how many actually fetching from the sequential scan. Then index scan is the same number of it will be on the database like all the tables. But suddenly like if you see the spike on number of sequential scan you want to take a look like what changed. Like you want to it might be possible that query plan has changed now. So one of your largest table is now doing sequential scan. So then you can check between sequential tuple read that's a very important because it will tell you how many actually rows are reading. So you can actually understand that. Okay, now I think it's doing the sequential scan. It is going to the disk and fetching all the data. Same thing with the index scan if you have a proper index or not like those kind of things. So it's depending on the use case but as long as you have a data I think you have something to look at it. This guy is like this is all the checks I'm talking about are already in collectee. So once you start collect it will grab all this thing without doing anything. So it will give you the block hits read versus hit. So you can make a sense out of it. If you, I think each metric has like one pager or more than that, you can read on the postgres equal.org documentation. Just copy from here and paste and you will learn like what does it mean? If this guy is what is that toast? What is toast and why it is very important? Why now we are not having good toast block hit ratio? Those kind of things are. So you can understand those and if something changed and you don't know what changed I think you can correlate those things and help you to investigate performance issues. DB size is very important. Like you need to know like how much it's growing. Sometimes it's growing without growing business. So you want to understand like if the size is growing I think it's just growing on timetables or what why it's growing like it's just we are storing sessions that's not useful anymore or it's a one table is bloated 80% and it's like the data is only this but table is like two terabyte but data is actual data is only one terabyte. It's already 50% bloat. So I think you need to understand. So next thing like it's not here is you should run a bloat report to see if it is there are if you search on Postgres bloat report you will see a lot of tools and it's very available online. So you should look at it that okay this thing is bloating the table is bloating could be index bloating so you can re-index online index can be rebuild online if you are on the table you can rebuild table online using PG repack if you are not aware of that tool I expect interest very useful tool I have used for over five, six years now on production systems recently three months ago I used on five terabyte database PG repack it after PG repack run it was two terabyte. So three terabyte of data we could recover from running PG repack. So that's very, very useful. So all other queries I'm talking about are the custom one that you can add your own to. So as I said earlier was a back end it's a similar but I think you don't worry about how many connections are that you really worried about is what those are connection are doing are they just sitting in idle are they idle in transaction are they active or what the state is very important if somebody tells me I have a 200 connection that doesn't mean anything to me but somebody tells me I have 200 connection fives are active then yes I think you have very, very complex system that five connection active connection are very important like you so you want to know that like some connections are just sitting in idle and transaction for like for days that you just discovered because the vacuum wasn't effective. So I do actually alert on idle in transaction for OLTP system is more than couple of hours I do get alerted on it because you don't want long running transaction on your OLTP system otherwise the system will slow down because vacuum will not be more effective as it has to be and it does use collectively.pgstat activity so it doesn't have any permission issue and the union is doing waiting even though there are connection how many are actually waiting connection and that if it is even one waiting connection I would get like alert on it but I think I would wait for another couple of maybe five minutes and then get alerted on it so waiting is okay but waiting for long period time is not good. So those kind of things are important just number of connection is good idea to collect but what are the state for those connections are very important. So this is like a simple graph I just throw in from both the upper one is the wave front the bottom one is okay meter like there are some differences in the GUI like the graph design but the data is almost same slow queries that's something is something I learned from Genos so Genos has built in plug-in they were doing the slow queries but I wasn't doing it so what they do is like they do number of connections are active and it's been running for more than 300 seconds and the query like insert, update, delete or select we do ignore vacuum or maybe create index statement but we do care about why the insert has been running for 300 seconds more than that and we do get alert if it is this number is more than I think five then we get page so that means the system is pretty slow it can't able to keep up with the transaction we are getting it so this is the recently I added it slow queries learning from other systems so that's why it's very important to keep learning even though I think you can learn from okay meter like PGA bouncer stats transaction wraparound who doesn't know about transaction wraparound problem in Postgres so yeah search like I think two companies got hit last year by transaction wraparound I think hit hub or some other company so Postgres do set like I think two billion transaction so once you hit two billion it looks pretty big number two billion transaction but I think you will be surprised how fast you can just crunch in those two billion transaction so if you don't monitor on that number I think once it hit that two billion the system will go unresponsive and start doing all the vacuum it has to be doing so that will take higher priority than your business that you don't need you want your business to be higher priority than vacuum and people then will complain about it that okay Postgres is slow because just doing vacuum so I think you want to actually look at this age and it's reaching up to two billion like at least maybe 1.5 billion and then you really want to get alerted or maybe page and get taken care like why which table is not vacuum properly and you might want to vacuum manually so it's from database so yes but yes that's right internally that's true but I think it will take the largest value from the yes so it will list number of tables need vacuum so the is you decide like 1.5 billion so you want to vacuum everything it's greater than 1.5 billion transaction then it the age will go down after that yes there are number of things one thing is a long running transaction that's the one thing in the system if the long running transaction is always there and it can the auto vacuum can touch that table whenever it tries to vacuum it it will not be able to vacuum it because of some transaction is holding that lock shared lock I think so it will not able to remove that page so that's the other thing is like when highly transactional system you might want to tweak the auto vacuum setting as well like why because some auto vacuum you want to if it's a highly transactional so you want to make it more faster so I think you need to tweak that if not then the workaround is like just look at it and get alerted on if it is more than 1.5 billion and see why this tables are not getting vacuum properly so that's your next step first you need to know that and then learn the tables that's good locks the other thing you want to check is the locks like pg underscore locks table and you want to see lock is okay there are normally you see a lot of locks but if the locks are waiting then it's a problem why it's waiting so that's something you want to check about if granted is false and it's waiting then you want to check why the which resources locks are waiting and which table it's locking so it's very important to know this is simple wall files that's another thing so you want to monitor in like how many archived and failed because sometimes you are using NAS or like shared storage if it's not available it will fail to archive I had that issue once so that's why I added this check so you'd learn like it is failing because my mount point is not available so you want to learn more faster than suddenly you will come to know from the replication that replication could enable to catch up scans so it's a similar thing but what I am doing different I think I have only 15 minutes but I will go to next sequential scan is very important because what happened is I did write a separate function to get a sequential scan on large tables because I don't care about small tables if they are sequential scan but if it is a large table more than like 100 MB or something I do care so what I did is sequential scan on large tables I did create a materialized view and get all the table states if the table size is I think 100 MB and put it into materialized view so the next thing I am doing is I wrote a function that will be called from the monitoring system is like it refresh that materialized view every four hours and in four hours if something spikes up I know like there is something bad happened like suddenly large table is getting sequential scans and it get alerted and I get page on it this slide should be available so you can understand I don't have a lot of time but that's the basic theory behind it like if you want to get sequential scan on the large table you want to get page about it because otherwise it might slow down or maybe bring down your entire website average query time is another thing it's dependent on the PGSTAT statement so PGSTAT statements do keep all the query time all the queries ran I think by default it keeps about a thousand queries in PGSTAT statements view so you want to collect faster and keep track of the summary total time summary time or you can do on your monitoring system you just want to get the data and put it into a monitoring system and do a rolling average or whatever you want to do with that but GAT statement that's another I created a view on the function just to get rid of that super role permission issue so this is like average query time simple graphs okay meter has other graphs that I would like to implement further like you can actually dig dive into which query has been running so many times so you can get all those detail in PGSTAT statement I think it's up to the monitoring like how you want to interact you have all the data in PGSTAT statement I think now the next step is to figure it out how to grab it collect it and send it to the monitoring solution so that's what the okay meter has done like you can see the queries how percentage like they did like this this query has been running 15% of your time so you can do the same thing for your tool which query is using higher disk IO which query is like result how many rows are written because everything is in PGSTAT statements view like how many rows are written by the query itself so then you just need to bring the details and put it into the graphs and your trending system checkpoints like you want to see like if the it's a checkpoint is timing out or not slave lag like if you are running a slave is it catching up with the master or not like it's a simple query that just do PG last X act replay timestamp and just you can query it there's a catch in it like if your system is not busy enough this will give you false positive so there are other checks out there but I use this because my system is pretty busy and it's gives me the whatever I need it for now alerting on DB so you want to alert on uptime waiting connection number of connection waiting more than five you want to alert on slow queries number of slow queries you want to alert on sequential scan on large tables it's very, very important transaction wrap around is even more important than that because you want to get alerted at 1.5 billion transaction this space usage I think it's up to you it will bring down the data as if it is 100% full slave lag less than five minutes you want to alert on it so that's fairly basic but it will cover mostly three criteria like the uptime criteria, performance criteria and resource criteria those three bucket you have for alerting how to keep up yeah that's so design with failover in mind like so whenever you design a system it is going to be failed over so make sure that once you change the host name the monitoring shouldn't be freak out about it like it will be different no name the master name will be different if you are keep eye on like new features 9.5 enhancement commit time stamp tracking in 9.5 that's a new thing you want to get start collecting that detail now like last committed transaction the cluster name is the new thing like now you can have a cluster name so if you are running multiple cluster on your single server you can have that different connections names like reports DB versus personal DB so if you are monitoring on the processes now you can actually grab on the cluster name 9.6 waiting on like until now it was just giving you it is waiting but it wasn't giving you why it is waiting so now in 9.6 you can actually get that detail so I think you need to tweak that monitoring to provide more information from that yeah you want to use config management tool to just deploy monitoring you don't want to do by hand like manually Postgres 10 enhancement like whatever the things I did for getting rid of that super role it will go away because we are introducing PG underscore monitor role so it will be a special role that you can use for monitoring so you cannot log in using PG monitor role but you can create another role let's say monitoring role then you grant this role to the monitoring role and use that role so you don't have to worry about that security definer function and all the things that I talked about but it's coming in next year so for now I think you have to use this incident management how to be ready five minutes five minutes three am call page of duty calendar you want if you are not family page of duty there are other companies out there I forgot the name document matrix I think you want to document like why we are alerting our way why we are collecting on it I think if you don't document like people will just because you are not the only person will be relying on this so like transaction wrap around like if you just say this like the system guy will just don't do anything about it just ignore like this is the term that I don't worry about I think who cares about wrap around anymore alert resolution procedure make sure you have clear SLS escalation policy yeah scenarios like wait for server to bring backup failover scenarios like you want to other things I would like to include in monitoring you want to monitor the backup is it getting backed up properly I would highly encourage to actually restore it because when you need it normally doesn't restore your backup so make sure you restore it if you ever never restore your backup I'm pretty sure it will never work I have been to that situation before failover review alerts I think recently some GitLab I think GitLab had the same issue I think similar issue with the backup with Postgres review alerts before going on call like when you are going on call make sure that what alerted last week be prepared about it on call notification is it working like sometimes it's not working at all so you want to change the tweak like make a lower threshold and get it alerted once while you are in the office that happened to me as well think worst document like what if you are in movie theater normally whenever I go to movie whenever I am on call because that's the best time to go to movie because you don't do anything else and what if you can't jump on the server like do you have documentation available for other person on the phone can follow it what else anything else clapping I think I want to repeat I think I am here so you can ask me any question I am on Twitter email if you are not on Slack you should be on Slack I think if you don't get anything out of this talk just be on Slack thank you