 Today I'm talking about CloudWatch dashboards and custom Drupal metrics. I'm a sysops engineer from Previousnext. I've been working in the SD community for roughly 10 years and working at Previousnext, I manage and develop our skipper platform. Today I want to discuss what metrics are, how they're useful and what we do with them in the context of Drupal. So part one, metrics are essentially a single piece of information in a numerical format, which we can collect together and aggregate over time to present graphs and be able to extract information from those graphs. Typically you'd understand them to be something like Google Analytics where they have a piece of metadata behind them known as dimensions and they describe a fact at a point in time. So the importance is that at this point in time this information was active. Why we collect it is the picture they describe when we collect them together can tell a story and we can use that information to drive stakeholder investment in a decision or a plan that we're proposing to accomplish and the information is, it holds its value when it's true so we want to bake integrity into our pipeline making sure that it's as automated as possible and not handled by as many external processes as possible so keeping it on your platform to automate the process. An AWS metric is a little bit similar than what we'll deal with shortly but this is what an AWS metric looks like. So an adjacent object, very flat except for our metadata. Here we have a name, a type and labels with a value. The most important thing here is our name and value. So we can say this is our metric name, this is what the value was at the time and we use our dimensions later, in this case it's a webform bundle and that will be how we filter the information down the track. The benefits of using metrics would be to be able to tell a picture so if you had your core and module versions at a certain value at a point in time you could theoretically down the line if you needed the information to identify if you were exploited for some reason if you needed the information for key stakeholders you could access the information and you could identify when those values change. So exactly when your core updated, when your modules updated you would have that information on hand and you wouldn't have to go digging through logs or get history, that sort of thing. We can also identify other metrics, so if you had how many nodes of a certain type how many webforms were submitted and we're going to pull some metrics out of logs later so if you had your Drupal's watchdog identifying a certain piece of information query against it, find out how many of the responses were at the time and make a graph around that so you can tell a picture. The benefits to understanding application would be understanding how many comments, where the comments were going webforms were being submitted to, ideally just trends and useful information around numerical information. We also have health and security benefits of metrics so being able to identify when users were unblocked or blocked being able to understand the flood table sizes webform table sizes, so you might have a privacy reason for wanting to get rid of information, a given date or even the case table sizes which do have some security impacts. It's a good chance to plug here. My colleague next week will be speaking about Drupal's security from tomorrow to o'clock. Getting started, starting with metrics would be understanding what picture you want to describe, what information is useful to you and working from that. I don't know if on the information you want to find, working on how to extract it and exposing it in some way. If you have a lot of metrics, obviously you've got to manage them in a simple and sized way that's got the level overhead. That's where we come to Drupal, Open Metrics. Open Metrics is an open source standard for managing metrics at scale, it's designed for low latency and it can be implemented with text and protocol buffers for a deeper application integration. And typically what we would look at for observability or Open Metrics is a Prometheus instance or it's essentially a storage area where you have your Drupal endpoints which basically just get scraped by that Prometheus. And users typically use a UI just to query against that store to find the information or to build the graphs around that. And ideally you get that in managed services such as Grafana. And yeah, Open Metrics is a standard supported by the CNCF, the Cloud Media Computing Foundation. And yeah, that brings us to Drupal. So what we want to do here is enable metrics collection to Drupal. What other options we have, what metrics are available with what I'm offering here and what compatibility in Drupal looks like from a technical standpoint, what does compatibility look like. So we have a Drupal module, previous next developed to implement the Open Metrics standard and it provides revisioning content, user and node metrics as well as module versions, corvus, PHP versions and Q-sizes. What it does, it takes your website, allows you to add these optionally so you don't have to add them all at once and it allows you to provide a token to your endpoints so that you're not having to expose this information publicly. We also have a community module, the bonus pack which implements support for advanced Q and table sizes, very useful because of case table sizes, web form sizes, any application specific table which you might need to cut down or keep as minimal as possible for whatever reason, performance, privacy, it's an option to go down. We also wanted to look at the implementation. Ideally, we have an endpoint at splash metrics which provides a text version of an Open Metrics document. This is served by your web server and not the application so you don't want to have any impact on your application loaded at the time and it essentially serves a compatible document so you can either do this in a templating format or through protocol buffers which basically render it for you through a framework. Now we come to AWS where we're getting the information from Drupal and we're adding it to the platform itself. We want to go through where information can be found, what you can do with the information. I want to take you through building a metric based on log filter queries which are AWS constructs and what we do with the information. Here's our typical standard. We would have our SAS offering or an Open Metrics implementation which takes information from Drupal and puts it in the store which we explored earlier. What we wanted to do with AWS was to have a scraper in the middle where we could send the information to our destination using an implementation of our choosing. We can do this using just say compiled binary in Kubernetes, Docker or even Lambda which are very simple to develop and this information basically gets transformed by Lambda in the Open Metrics standard, converted into the AWS standard and sent to the account. Our dashboard's information come from free sources. We have our logs, our metrics and a development cycle, so everything else. Talking about logs, there's a lot of useful information logs. They provide the ability to analyze information historically to investigate and it's very good for security compliance. However, we can also build metrics around this and by extracting information from Drush as an example we can filter down individual pieces of information that we're interested in. We could take this information to our stakeholders, to our managers. We could say, here's a bunch of information. We can't lie about this. What do we do about it? Or you can produce a plan and say, here's what I want to do. Here's the information to back up my assessment. Do you agree? What do you want to change? So log filter metrics are basically metrics created from log groups or log queries. So you would go into your log group, choose a stream that would have reproducible information that you want to find. You create a metric from that information at the additional level there. An example of what a custom metric graph would look like would be something like this. It's a web form submission table row count. So we would go in and count our web form, how many submissions are made over how far. And from this graph we can observe that information has been regularly purged but not frequently enough. So it's probably going to become a maintenance task component in the future. This has obvious impacts such as privacy, CPU utilization, dealing with your database. So it's definitely useful information to have on hand. We can represent this information in other formats such as paragraphs and tables. This provides generic information in a simplified format. And anyone of any technical capacity can understand this information without having to dive into something outside of their comfort zone. Automated metrics. So this would be where your Lambda comes in or some sort of automated pipeline where you can guarantee the information is coming from a reliable source. You're not having to get a personal developer involved so that you know the information is coming from a reliable source so that you know it's coming in regularly. For this, your metrics are already on the platform so you basically have to run a query against those and we will see that in our example. Once we've finished creating metrics we can also attach alerts, alarms to them. So if you want to be alerted to a situation or if you want it to be notified a certain amount of something has come through you have the option of doing that. The example of useful information would be coming from purge requests from your CDN. So it's a very common circumstance where a content editor has pushed content or deleted something and for some reason the content has been updated. So here we can graph out the purge requests that are made to the CDN. We can identify if the information is missing and accurate or excessive because typically these have a cost factor to them. You can identify if you're making too many purge requests or if there's information that's playing out missing because there are service request limitations which might expose some information for you. It's kind of useful. If you're interested in this Kim, my colleague, is doing a talk on how COVID scales through the pandemic on our websites and that's happening after lunch tomorrow at 1.20pm. So not so automated ways. This is more catered towards a development cycle so if you're developing a Lambda or developing a process to get your metrics from Grouple into your platform or ideally you could be running this on CI on a less frequent cater. So what we want to do is get away from the height of the daily pipelines. We want this to be a regular thing because we know it's coming from a reliable source. We know there's no middle person and we know there's no people involved in the process. Get rid of operational costs. So for this example I will be showing you some information based on real data in fact and it's coming from a log filter query a log filter metric. So start our dashboard. We want to give it a name. We want to use a pattern in this naming convention. We don't want this to be a single thing. If you can make it a reusable pattern like a token where you have the application name you have its purpose and in our example this will be MyApp slash Grouple because you may have other metrics of interest for your application and you may want to scale this out to other applications that you manage. So here we can see a log filter metric creation form. We're running a query against a log group and we're giving it a designation of where it's going to live in the platform which is our namespace. It will exist under our namespace with our metric name and from there you'll find it's dimension which are things we use to filter it down. When a log metric filter is created it will return a value which is our metric value and this will represent one piece of information that was returned from that query. And the default value is optional. You can either have missing information or you can give it a value. In this case we're going to set it to 0 which seems a little sensible. So once we've created our information there's nothing there and that's because the information hasn't yet been collected. So I forgot to mention our query is searching for catch-up responses for the low score. This could indicate bots it could indicate people that haven't filled out the catch-up submission successfully and this information is found in our watchdog. If we proceed forward we have our query builder which will show you how to find this information what it has been made as a metric. We have our namespace which is a directory like structure of where it's found. We have our metric name which is aggregated by a Excel like a written figure. We have a sum so we're going to add all of them up in one frame that we're going to specify and we could use averages or something similar as an alternative. We also have our filters so if we had dimensions to work with we could say for this application for this piece of metadata find this information and here we have roughly 24 hours after this was created we have some submissions here or some lack of submissions and we can see we have some high points what we can do with this later is create an alarm around so if we wanted to be notified of a situation or a metric exceeded or is less than a certain value we could be aware of the situation that's happening almost immediately. Adding it to your dashboard is as simply as click on actions and go add to dashboard. It's a very straightforward process once the information is available. An example of what your dashboard could look like is on the board dashboards can be exported as JSON they can be very reusable and we'll get to how that could be just for a sec but you can use a lot of patterns to correlate these to codify these so bring it home we have all of the information in Dribble we have it being moved on to the AWS platform and we have a graphical representation of it so we need to identify what happens next and that would be analyzing and understanding are there spikes? are they malicious? do you know where they're coming from? what ends up followed by what happens when there's a situation I don't want what happens next and knowing who to contact is a very key question you need to ask and obviously that person knows what happens so they would have a run sheet associated with the alert. Next, you would need to codify your dashboards if you plan to scale out you can use technologies such as cdform cdk or cloud formation the information is stored as JSON so it is very scalable in that way takeaways, if nothing else I would like you to walk away from this talk understanding what a metric is and how it can provide you with the power to go to your stakeholders and say here is a situation where you do something about it do we do something about it what comes next Drupal is a fantastic opportunity to provide extra information the platform would otherwise know about from outside you have a set amount of information system resources, logs etc but how do you take that information for something more tangible than your work? that is really what I want you to take away now open the questions we are hiring so feel free thank you yes, the information comes from there and it is collected at a point in time the graph comes from presenting the information as an aggregate over time, so you choose I want this week's information Monday to Friday and you would see that graph representing that type as part of the AWS platform so once the information comes from Drupal onto the platform you would then create your dashboards from the information of the platform that would graph information over the time parameters but each of those data points represents one point in time one piece of information and that is when you would see that the metrics themselves are both Prometheus and AWS your service provider or hosting provider would have access to metrics whether or not they let you see that information as another thing but this is the context of Drupal and AWS one was Prometheus module how did it decide what the second question was so the Drupal module exposes a series of Drupal specific metrics things you would extract from Drupal that the platform wouldn't know about does come from the implementation we've got and it does provide a couple of choices so you go into your modules configuration you choose which metrics you want to expose but if you would want to extend on that you would obviously write your own for the application itself and that's an example of that is the bonus but you don't have to have metrics from Drupal from that way but if you wanted a deeper integration but information only Drupal knows about that's where that would play how did you think it would result in a log stream snapshot a log stream is a subset of your logs so a log group will only get a log stream will only get so big before it moves on to the next stream but it's all contained in the log room where you would run your query against your log stream basically a I think it's 50 log entries you basically want to determine what you want to find and that would provide you an opportunity to evaluate your queries from me this is separate to AWS completely it's the same standard and we're using that standard metrics from Drupal into the AWS platform this wasn't particularly our numbers presentation but I would strongly recommend him as president tomorrow we talks about how we go through the pandemic there's a lot of useful metrics that come out of that now I was just wondering like should I use this for live it depends what information you want to extract but if you want to have this information on that if you need to present all this information at some point in time it's a good company but it becomes more valuable with more information so bigger sites the value is added increases and good so whether you like the bigger the site the more interesting and valuable information does that help you? we are essentially using AWS exclusively and this was the option provided to us we had the option of choosing open source standard to implement the Drupal side of it but AWS strictly was our because that's how we operate nothing that I can think of is better or my experience with data I would say the open source standard is you know the standard you know it's flat and reliable the open source standard is the best because you know what you're getting everything I've done in the past has always been complicated for one reason or another but here you know your data structure you know it inside out and clients with dedicated clusters have access to information I haven't yet found that well thank you all for your time