 Welcome to Monitoring Strategy Considerations. I'm joined by David Orlandy, Senior Systems Consultant at Quest Software, who will discuss considerations as you develop monitoring and diagnostic strategy. My name is Lindsay Hooper, and I'm one of the Postgres Conference organizers. I'll be your moderator for this webinar today. With that, I'm gonna hand it off to David. Take it away. Thank you, Lindsay. Hello, everybody. As Lindsay mentioned, my name is David Orlandy. I'm with Quest, and I'm one of the Database Systems Engineers here. More specifically, I'm part of the business unit with particular focus on database monitoring and diagnostics solutions. I've been with Quest for just over eight years now. And before Quest, I held a similar sales engineering role at Embarket Arrow Technologies. Some of you may be familiar with them. It's a database tools company based in the San Francisco Bay Area. All told, I've been a part of the database software industry for about 20 years now. So thank you for joining the session today. I hope everyone is staying healthy and safe out there. I hope you find this information I share useful to your day-to-day tasks and responsibilities. So let's jump in and get started. The topic I'd like to share with you today is about the importance of monitoring and the many considerations as you develop a monitoring and diagnostics strategy. IT administrators need to keep an eye on availability, performance and storage of mission critical environments. With this in mind, there are several things to consider and prioritize. In the session today, the areas I'll cover include all in one versus specialized. In other words, do you want a monitoring solution that monitors lots of items? Or do you want to have one that is laser focused on one or two items? Build versus buy. Do you want to create and manage your own scripts or your own homegrown solution? Or perhaps invest in something off the shelf. Real-time diagnostics versus 24 by seven monitoring, we'll discuss advantages and disadvantages of both. Architecture, something to really consider and is very important. What are the moving parts under the covers with the monitoring solution? Are there traditional agents? Is there overhead associated with the monitoring and diagnostics solution? Does the architecture provide flexibility in deployment? Configuration, are your databases on-prem in the cloud or do you have a hybrid deployment? And the same would be asked of your monitoring solution whether you want it to be on-prem in the cloud or hybrid. We'll discuss alerting notifications and actions, all of which are important to stay ahead of potential issues and be notified of potential problems. Understanding what is considered normal activity. So in addition to just seeing straight out metrics of things like memory pressure or latency or query performance, it's important to know how your environment normally runs and how it behaves and potentially be alerted if it's not acting normally. Advisory is in troubleshooting suggestions. This is something most everyone likes to have to get advice or some guidance on what to do next if you do indeed see performance issues and alerts firing. Documenting the details of your environment. In other words, your monitoring metrics and being able to communicate that out to your stakeholders and your environment, very important. And we'll round out the session today by reviewing some specific post-grasp metrics and discussed with particular use cases. Now, as Lindsay mentioned, I will have a few questions for you, the attendees. So the first one I have for you right now is, I really just wanna get a better understanding of the post-grasp monitoring landscape by polling you for just a sample size of data. So if you care to participate and share, it would be appreciated. The first one, just use again the chat feature in Zoom to submit your answers. And the first one is with an answer of one, two, three or four, what are you mostly using in your post-grasp environment? Is it version nine, version 10, version 11 or version 12? Okay, all over the map, but I actually see what I expected so far, which is a lot of nine and a lot of 12. I'm personally seeing less of those in between. It looks like we got mostly ones and fours with a few threes scattered in there, so nine, 11 and 12. Okay, and somebody squeezed in a version 10. Thank you for that. One more, I'll cover here. Answer with one, two or three. Right now you're not really monitoring your post-grasp environment. Or two, you're monitoring, but with your own queries or scripts or homegrown application. Or finally, three, you indeed have invested and are using a third-party solution. Okay, two, two and three, okay, great. Really appreciate the responses and the feedback. Okay, great, that's good information for me. So let's dive into the content of today's topic and start out with all in one versus specialized. So perhaps we have some musicians in the audience or I'm guessing at least most of you like to listen to some kind of music. As a musician myself, I thought this analogy kind of hits the mark. When it comes to selecting a software solution, there's a tendency to lean towards and a kind of a one-man band solution that empowers users to essentially do it all. Though there are some efficiencies to gain this way, more often than not, each item addressed may suffer a bit. So this guitarist on the right is likely going to perform with more precision and accuracy than the one on the left, right? So two is often the case with monitoring solutions. It's certainly possible for a monitoring solution to properly monitor several aspects of an environment. But there is a lot to consider, your applications, the databases, the underlying hosts, virtual environments, middleware, storage arrays and so on. But the more you include as a requirement for that one solution, it can create some friction and can often dilute areas of concentration. In other words, it's important to identify your correct monitoring scope. Now, as you consider and add more requirements, you'll need to get buy-in from other stakeholders and in your enterprise, right? And again, as you need a solution to do more, details for particular areas of focus may indeed suffer. Now, to take this analogy a bit further, the musicians and instruments you have, the more challenging it can be to be in concert, so to speak, right? So it helps to identify not only what needs to be monitoring, but also what items to be monitored are most closely related and with whom and or what other groups you will need to be and stay in concert with. A DBA likely needs detailed information about those databases, but also likely the underlying window and or windows and or Linux host information. Virtualization metrics may be important too, but for a DBA, how much VM detail do you really need versus that of say a VMware administrator? Now, in my experience, application monitoring solutions often try to monitor everything, but often fall short in particular areas. So again, find that correct scope for your monitoring solutions capabilities. Build versus buy, I'll use another analogy and this time a house. There are lots of pluses and minuses to both build and buy and the things to consider are similar. So in that of build and buy for a monitoring solution. So with build, the big advantage is that you can customize on the front end to your heart's content. I want a roof with four gables. I want 18 windows. I want a basement. I want hardwood floors and so on. Now in doing so, what is that that you also have to weigh into that equation? The potential build downsides include time and cost. With a monitoring solution, build equals either homegrown scripts and or internally developed custom applications. Now moving away from that house analogy for a moment, there are added considerations, mind share. What if the person or people who wrote and maintained the scripts and or custom applications leave? And what about all of the updates you may need to do if and as new versions of what you're monitoring are introduced? These new versions or additions may require rewrites to your scripts or internally developed applications. Now in the Postgres world, that may mean manually querying for those statistics leveraging those PG underscore stat tables and views. And here are some examples of that. Some of you who shared that you're doing that today. Real-time diagnostics versus that of 24 by seven. Now depending on your requirements for level of support, real-time diagnostics tools can be all that certain environments need. You can run scripts and or open that client side utility to investigate what is going on right now. And that could be effective. And sometimes enough. Now that said, to know that there is an issue going on unless you've just run those diagnostics, this method often relies on getting feedback from the end users, internal or external and them sharing things like, I can't connect or things are slow. So with 24 by seven monitoring solutions, though there will be a bit more to the architecture and likely higher cost, you won't miss anything because those 24 by seven solutions will typically offer things like alerts, but also notifications. You can get an email if a problem arises, things like that, but also a repository to hold historical data. Often with the ability to use the user interface to pull data from a historical window of time to understand what happened in the past. And sometimes those solutions offer a means to query that repository directly. Now here's an example of an architecture slide. I removed the naming convention to protect the innocent. There are a number of items to consider specific to a software solutions architecture. Does it cost money per component? And based on what your needs or requirements are, might you need to add components? Is that architecture flexible? For example, how does it work with a DMZ? Or what if there are multiple domains? Another thing to consider are there upper limits to what can be monitored? What is its resource consumption or overhead? You often hear that in X percentage, 2%, 3% or more. How about agents, traditional agents? And what I mean by that is, does the solution require that you install software or services or binaries on what you wanna monitor? On that monitored host. If so, that invasiveness can have implications. There might be additional overhead. You might need to now engage additional business stakeholders. Things like getting approval and access to install those agents. And these agents may need updating over time. So in general, I found that though there are some advantages for detailed metrics, most folks prefer to avoid solutions or architectures that leverage traditional agents. What about the monitoring solutions graphic user interface? Is it a thick client? Now, if that's the case, it means you'll need to install a client-size software wherever you wanna view your monitoring metrics, as opposed to say a web-based client. Now, the look and feel between a thick client or a web-based client is obviously personal preference, but at least with a web-based GUI, all you need is access to a web browser. So those are more things to consider. Your configuration or your deployment. Now, what I'm about to cover, these questions will pertain both to the deployment of the monitoring software itself and what you need or want to monitor. So is what you want monitored on-prem and can the monitoring solution be installed on-prem? Is what you want monitored in the cloud? Is it infrastructure as a service? Might it be a platform or database as a service? Is the monitoring solution able to be deployed to the cloud in that case? Some folks need or want a SAS model for your monitoring solution, software as a service. Is that deployment option available for the monitoring solutions you're considering? And then, of course, the same questions apply for a hybrid approach. So this is the last of a few polling questions I have. Again, please use the chat feature in Zoom and answer with one, two, or three to best describe your Postgres deployment. One, everything is on-prem. Two, everything is in the cloud and that could take the form of maybe it's Azure, maybe it's AWS, RDS, maybe it's Aurora. Or three, you have a hybrid deployment and a little bit of both. And that's a really good mix. I'm seeing almost an even split between one, two, and three. Great, okay. In my personal experience, I'm seeing more and more folks moving to the cloud and that transition has really happened probably in the past, only the past 18 months or so before that I was seeing by and large mostly on-prem deployments. And when it started going to the cloud, I saw quite a bit of RDS, quite a bit of Aurora, and then very recently I'm seeing more and more Azure deployments. Now let's move on to discussing alerts as it relates to a monitoring and diagnostic solution. Now most monitoring solutions offer alerting capabilities. Thresholds can be defined and if that threshold is crossed, the alarm fires. Now what's important to realize though is that some solutions don't offer this capability out of the box. In other words, it requires a lot of front end setup work to get a usable monitoring solution. And I would think that in the time that it takes to set it up, you might be missing some important activity, activity that's causing issues in your environment. Now these alerts will typically be visual alerts, various colors, yellow, orange or red, signaling the severity of that threshold crossing. And that's obviously very important to stay abreast of potential availability and performance problems. But now what if you don't have the solutions user interface open and actively looking at it? And what will you do about these issues? And that's quite honestly often the case, right? It's not going to be all the time that you are able to have a monitoring solutions interface open and engaging with it. Most everyone on the line, I have no doubt, is wearing many hats and juggling lots of various projects and tasks. That's where notifications and actions are critically important to a monitoring and diagnostic solution. For example, a threshold is crossed, an alarm fires, you can get say an email or a text message to be made aware of the issue. Again, when you're not actively looking at that graphic user interface, which is probably more likely than not. Actions can be equally useful. Some monitoring solutions allow for an action to be initiated based on that threshold crossing. So for example, a threshold is crossed, an alarm fires, and you can have it configured to run a corrective script or maybe kick off an SNMP trap to work with, maybe your ticketing system like Remedy or ServiceNow and have that alarm initiate a ticket. Again, these are great ways to be more proactive and stay ahead of issues and be more productive in the process in more of an automated fashion. Baselines. Now, conceptually, we're all probably pretty familiar with the idea of baselines. It's a minimum or starting point used for comparisons or a fixed reference point. And I would say you'd wanna consider a monitoring solution that generates and maintains baselines. Now, why are baselines important? Well, it establishes what is considered normal activity for that time period. That way, it's easier to identify what is abnormal activity and therefore potential issues can be more efficiently identified. So you may find that sometimes the term baseline is used loosely and is really a comparison of an earlier window of time. But that's really trending versus baselining. Baselines can be generated or learned a number of different ways. There can be algorithms run underneath the covers periodically, it may be machine learning, or I've even seen AI being utilized. Now, an added beneficial capability is for the monitoring solution to be able to alert you if activity starts to behave abnormally. So for example, it's useful to be alerted of say CPU spikes to 90%. However, it has more context and is more actionable if you also have the information that for that particular time period, CPU is normally 50% or conversely, CPU is normally 90% for that time period. So that latter example is probably less pressing, right? It may not even be a problem. Rather, it's an expected workload for that time. So again, a baseline can provide more context in the metrics and values that you're seeing. Advisories, suggestions to fix, guidance, all those kinds of things. Now, whether you're new to the field or a seasoned IT professional, most everyone welcomes troubleshooting suggestions. Look for a monitoring solution that offers this as not all of them do. It's best if the suggestions are specific to flagged activity. They're sometimes referred to as advisories or a similar name. And what they'll do is describe that flagged activity, provide some context around the potential issue, and offer suggestions as to how to administer and fix the problem. And oftentimes, as you all probably already know, there's more than one potential cause, and therefore more than one suggested remedy. Now, this is, again, another example of how your monitoring solution can provide efficiency gains. And this particular screenshot shows some examples of advisories. Let's talk about reports. It's important to consider if you need the ability to both document performance metrics and or share that data with other stakeholders at your company. You'll definitely want to consider a solution that offers these capabilities. But there's more to this concept. You want to dig in further and find out if the solution, first of all, offers out-of-the-box reports. These are handy to save time. Creating your own reports can be time consuming. But ideally, you'd want both, right? Out-of-the-box reports for the sake of time. Both, right? Out-of-the-box reports for well-formatted reports to generate quickly, but also the flexibility to create custom reports to address documentation or communication requirements with specific content. You also want to find out if these reports can be scheduled and or emailed. That makes things easier and more automated as well. Dashboards, and what I mean by dashboards, essentially categorized pages of metrics. So for example, a memory dashboard with memory-specific metrics. Now, some solutions claim to support and monitor virtually any source. But here's the catch. Many of these solutions require that you build out dashboards yourself. No out-of-the-box dashboards available whatsoever. That actually caught me by surprise because the next question that comes to mind for me is that what are you paying for? From both a value and efficiency standpoint, out-of-the-box dashboards are something that you should demand. Otherwise, you're not all that far from the build-up option that we discussed earlier, like scripts and so on. So out-of-the-box dashboards are something you should definitely require when looking for a monitoring solution. But having said that, it's even better if the monitoring solution provides a means to create your own custom dashboards. Not unlike custom reports, custom dashboards provide the power and flexibility to build them out with your own customized content, but also customized display of those metrics. So having the ability, for example, to choose from a table versus a graph or a chart or a real-time spinner or a volume cylinder and so on. Either way, out-of-the-box or custom, these dashboards should be considered a step better than reports when it comes to monitoring data. Now, why is that? Most of us are kind of programmed to be able to want and need to share reports, but in the case of monitoring, monitored data is always changing, right? Data may be hours or minutes or even seconds from having new values. Reports are static, which means the report content is going to become stale and often fairly quickly. So with dashboards, the data is being refreshed in that repetitive fashion. So the data is more meaningful and up-to-date. It says real-time as the last time the data was retrieved. So we all know that important decisions are best made when the data is current and therefore relevant. And finally, the ease with which these custom dashboards can be shared is pretty critical. The creation of custom dashboards is ideal for providing subsets of data to different groups in the enterprise. So maybe the storage admins simply need those six critical storage metrics. Maybe management only wants summarized, trended data and so on. Now that can create communication synergy between those groups. So it must be easy to share those dashboards. So how about simply sharing a link to that web-based dashboard? It really can't get easier than that. And that's possible and available with the best monitoring and diagnostics solutions. To round out the session, I thought it would be useful to discuss some common performance problems with Postgres so that you can decide with what and how you'll monitor your Postgres instances. And you can be sure the solution of choice includes the means by which to address them. Now obviously this is a subset of common performance issues but I've included locks, query performance, disk space usage, the concept of bloat, and finally considering administration capabilities. First we'll take a look at locks. Now as many of you are probably already aware, one of the most valuable capabilities of Postgres is its support for concurrent ACID transactions. ACID being that acronym for atomicity, consistency, isolation, and durability. These database transaction properties help verify data validity. It's important to understand that locking can be a common occurrence in active Postgres databases. This is because Postgres uses these levels of locking as one of the ways to safely implement these concurrent ACID transactions. Now that said, managing peak performance includes being aware of queries that might be taking too long to run. In some cases these commands might be waiting for a lock. Helpful information includes things like the process ID, the user, the query that performed it, the query start time, and in the case of Postgres again, if that lock has been granted or not true or false. If it is deadlocked, one or more may not have been granted. So you'll want to consider a monitoring solution that can both identify and alert you upon potential deadlock transactions. Query performance, I think it's safe to say that we probably spend an inordinate amount of time in this area because sometimes it just comes down to those queries and who's running them and where are those bottlenecks. So one of the primary tasks of any DBA is to find and optimize poorly performing queries. It's not a trivial job, it's as oftentimes there's too much data to sort through and pinpoint problems. One angle to take is to find queries taking a longer amount of time to run and or find those statements with low hit percentages. This can be actually a result of long running statements. Now Postgres tracks patterns of data access and keeps frequently accessed data in cash. It's best to keep a cash hit rate of about 99%. And you might want to consider increasing the cash available with a ratio significantly lower. So find a monitoring solution that can both identify and alert you about these types of queries. Disk space usage. Now many things can happen if the database runs out of disk space and really none of them are good. DBAs understand that it's essential to monitor database disk space so that critical business processes are uninterrupted. An appropriate means for monitoring this may be to alert if say the average calculated table space growth for Postgres that rate is going to use up the existing space on the hard drive and say within X number of days. I've included an example or this here really is an example of why it's important to consider monitoring not only the database but that underlying host as well. This is an example of correlating that information. Now as a side note, it's generally not able to be accomplished when monitoring a DBAs environment database as a service. But really the disk space would be managed by the service anyway. To further diagnose space issues, first determine the databases and tables using those table spaces. Make note of those databases and tables growth rate over time to help narrow down and determine which may be responsible for that increased table space growth. Now keeping in mind of course that table spaces can often host multiple databases and tables. Now another possibility for unexpected growth is that the system has not been vacuumed in some time. And this concept leads me to my final two areas to factor in, bloat. So you'll wanna look for large tables and then see if there's a high number of what are called dead tuples. So again, as many of you are probably already familiar, postgres, in postgres there are tuples are essentially logical representations of rows. Tuples are not actually removed when they're deleted, rather they're marked for deletion. And those are considered dead tuples. Now a vacuum is just that administrative process by which these dead tuples are cleared out to remove that dead space from the file. So once again, having the ability to get alerts on your number of dead tuples versus live tuples or index bloat is a common issue in postgres. The ability to be alerted on those kinds of things can be very, very useful. And lastly, administration. To allow for a more seamless workflow, it's ideal to be able to take action or administer items you have identified with your monitoring and diagnostic solution. So some examples I've included here are some of the next steps involved with the use cases we just reviewed. So for example, you have excessive locks waiting. It would be nice to have a solution that allows you to cancel that query or terminate the connection. Experiencing database table or index bloat have the ability to vacuum those dead tuples. Other administration examples include maybe needing to better understand the query steps and how long each step is taking for tuning purposes. If you have that need, generate an explain plan. Now, if you've made changes to the table structure or query and you're ready to test performance, another administrative capability that's nice to have is to reset tracking statistics to display that new performance data. Now we covered quite a bit in this session. I hope it proved helpful and perhaps provides you a checklist of sorts of the important items to consider as you plan your environment's Postgres monitoring strategy. So I appreciate everybody's time and attention today and we certainly have time for questions. So one of the questions asked was, is there a product available from Quest? And there is and that's something I can speak offline with you about. I know that Postgres would like to keep these outside of the scope of particular products by one of the vendors. But yes, as I mentioned at the beginning, my primary job is to support Quest monitoring and diagnostics solutions. Couple of solutions called Foglight and Spotlight. So along the same lines, is there an all-in-one tool for this? And I guess I wasn't pulling the wool over anyone's eyes with my screenshots. They are indeed from Quest's Foglight solution that has all of these capabilities, yes. Just one added note there. The Quest Foglight solution is a cross-DBMS platform solution. So not only will it allow you to monitor Postgres, but other DBMSs as well, as we found most all environments now have multiple platforms. So maybe Postgres, but also maybe Oracle, or DB2, or Microsoft SQL Server, or MySQL, or MongoDB, Cassandra, and so on. Okay, well, again, before you jump off, I wanted to thank you again very much for your time and attention today. And again, I hope the content proved helpful. And thank you, David. This was fantastic, really wonderful presentation. I wanna thank all of our attendees as well. So regardless of where you are, I hope you have a good morning, a good afternoon, or a great evening. And I hope to see you on the next one. Cheers, thank you. Thanks.