 OK. I think we can start. Good morning at all. Welcome to this talk about observability. I'll show you in this presentation how we can implement observability, what observability is, and how we can leverage all those information to monitor and analyze our Drupal website. In this presentation we see how to implement observability, just using open source tools, so nothing commercial, and also all tools that you can use in your local machine so you can test it locally before going to production. My name is Luca. I'm a developer, mainly. I start with Java, then PHP, Go, and so on. I'm the commentator of the developed module and the maintainer of the monologue module. Here are my social links if you want to create something or write me on Drupal org. I work for Wellnet, which is an Italian Drupal shop, a digital agency. I just turned off the calendar. We do software development, designer, UI, UX, web marketing, testing, and so on. We do machine learning, SEO, et cetera. OK, let's start. Just a moment. One more. OK, so today almost everyone is working on some kind of distributed system. We have microservices, container, cloud, serverless, and a lot of combinations that can complicate the way we deploy, distribute, and manage our websites and system. And because of the diversity of those distributed systems, it's very complex to understand problem, understand how our website performs and something like that. So we need a way to observe and monitor the behavior of our system. And observability is a measure of how well internal state of a system can be inferred just seeing the external output. So we monitor our system just from the external. In this way we can do observability also on production system, because we don't need to debug or do something that can slow down our website. In effect we want to observe production environment and metrics like CPU, memory, IO are not sufficient anymore, because we want to observe our specific custom application, our code. So we want to use tools to do that. So there are three pillars of observability which are structure blocks, metrics and traces, or distributed traces if you have a distributed system. So in this presentation we saw all those pillars in details and how we can implement them in Drupal. We need all of those three to understand how our system is going. So we need metrics because aggregates data, logging because store events, and trace to analyze the requests. Ok, let's start with structured logs. Logs are about storing specific events. So we want to store in the events that occurs on our system because usually failures in a complex system usually may be caused by not a specific system that fails, but some different system that fails. So we have to reconstruct all the information from these different systems. In Drupal it's quite easy to do structured logs because we have a very battle-tested library which is a PHP-genetic library. We can use it in Drupal which is called monologue. Monologue is a PS3-compliant library from Geordipo Journal, the developer of Composer, just to say. And there is a module for both Drupal 7 and for Drupal 8 to integrate monologue into our website because the monologue module depends on the monologue library. We have to download the monologue module with Composer. The current version is 1.3 so you can add those lines to your Composer JSON and download the monologue locally. The monologue module doesn't have any UI to configure the module. So you cannot configure it using UI. You configure it directly in the binding services and parameters to the service container. So the first thing you have to do is to create a YAML file, maybe monologue.services.yaml in the site default folder, usually. And this file contains some information. The first one are those parameters. We saw details later, but monologue has channels, handlers, formators and processors. So the handlers is how monologue and where monologue write files. In this example we want to write a file that rotates every day with our logs. The format is how monologue write file. In this case we want to structure the log so we can choose JSON as a formatter. And then we want to add some list of processors that usually add information to our log message. To, for example, record the current user, either for the introspection, for example, add to the log message, the line and the file where the log occurs. And then for every handlers that we define, we add a service in the same file under the services key that needs to be called monologue.handler. The same name rotating file as this rotating file. And then we can define the actual implementation of the handler so that the class that do the work. The rotating file handler, for example, takes three arguments, part on the file system where to write the log. This is rotating. So we have to specify how many days, how many files we want to keep before start deleting the older ones. So in this case he creates one file today, one file tomorrow and after 10 days he starts to delete the first one and go on to non-occupied and more storage. And then we have to define the level from what we start log. So every log message from info to critical will be written in this in this log. Then last piece we have to add the monologue services YAML file to the list of the container YAML array in the settings.php so Drupal's log, also those services and parameters into the service container and we can access them from our code. OK. An example of a log message in this case is the a log from a core module so when the user performs a login Drupal log this message session operator third bin but we can also see other than the message we have the level the date time information and then all those extra information are provided by the processors so we have the actual file where the log is, the line, it's not a class so all the functions and then refer to IP and so on that we can use to query and extracted data from logs we need structured logs exactly to do this kind of these things to do so maybe we can extract all logs from some users or all logs from 1EP and so on with just stream of strings maybe our messages to the watchdog roads into the database maybe it could be difficult to extract those information ok exactly structured logs, it is simple to query them and for any sort of user for information we can write custom monologue you can write custom monologue processor to add some application specific information to your log maybe you have I don't know two different web server ok, not balanced and maybe you can brought to the log which server reply to a request for instance something like that we see later how we can put all the logs in some external application to analyze and query them ok metrics so logs are about storing specific events that of course during the life cycle of our application metrics are a measurement at a specific point in time for the system so logs are useful because they carry on a context a message file metrics does not provide any context because are used mainly for aggregates so we aggregate metrics and main usage for for example responding to this question so the number of time you receive an HTTP request how much time we spend on the request how many requests are currently in progress number of errors and so on you can think of metrics as a measure of something that happens in the life cycle of your application that you can sample, aggregate, summarize, correlate these metrics are useful to report the overall health of the system and to predict how our system will perform in future ok to instrument our application we use a software it's called Prometheus you can download here so in a moment use it and install it Prometheus was the second project to join the cloud native computing foundation after Kubernetes so it's very stable and you can use it in production no problem how Prometheus works Prometheus uses a pool architecture instead of send the metrics to a Prometheus server is Prometheus that comes to your website to scrape metrics so we have our application with a cloud library instrumented application to expose metrics we have this concept that used to expose to Prometheus data coming from third party application or your system operating system and so on then you can configure how many times in a minute so we scrape your application or the data from the exporter it store it in some time series database because we want to to store the exact time when the metrics are collected and then we can build the dashboard that uses this storage build the rules that uses this data to monitor and maybe if I don't know the CPU is to I or in case of specific application the number of nodes created in a second is to I or something like that is specific to a Drupal application ok so together information from our production environment we need to do two things instrument our application and extract data from the system ok, let's start with instrument our application ok we start writing a simple module to implement this concept of observability in Drupal 8 it's in sandbox in today but we plan to release initial version during this Drupal and then everyone who can contribute to the development ok the module is called observability the short name is O11 Y and it uses a library that not exist official client for for PHP they maintain the version for Golang for node but not the one for PHP but there is a project that do that that implement this so we can use this library of course we need to install the observability module with composer because it depends from PHP library and then we can start a collector registry which is the component that collects all the matrix inside our system and expose them to you can use different kind of storage PHP is a stateless application so we need a way to store those matrix during time we see later maybe it is configured to come to our website every 5 seconds, 10 seconds in the meantime we need to store all the matrix there are some kind of different storage one is APC but it has the problem that before we start the server all is destroyed because the APC is in memory we can use red with storage red is with persistent storage to overcome this Prometheus has 3 tags of matrix counters is used to represent a single monotonically increasing value so for example number of node created or number of requests served for example every time one of those events occur we add one to the matrix then usually in the graphic we saw the dashboard later in effect we see the align that go ups and down because we don't want to show a graphic with a line that increase forever but we want the range of the value during time usually we don't want the number of node created we want the number of node created in a minute or in a time frame so the graph go up and down because this value changes over time another kind of matrix is gauges that represent a single numerical value that go up and down so think for example the amount of the CPU can go up can go down on the memory load for example the last one is histograms that in the last user but it's sample observation over in a bucket so for example it divided on the value in specific bucket and then you can visualize the information this way for example ok so for example we can implement a standard Drupal hook with entity insert this is implemented in the observability module and then simply we can we define a service to manage all these information it's called observability dot matrix then we can get or register a new counter for example so in this case the library provides an existing counter or create a new one it doesn't exist in this method we define a namespace for example Drupal the name of the matrix so a message that can use to explain what a metric is and then a series of labels that have information to the matrix so in the permissions datastore we have a metric and labels attached to it to query for matrix to aggregate or something like that and then we call this ink by so we increment the counter by one and set the value for the label for in this case type is the entity type ID and the bundle is the bundle method in this way we add one to this entity insert metric that represents the fact that a user uses one entity for example a node or something like that the observability module exposes a URL the standard name in prometeos is slash matrix so you should go to the website slash matrix you see this textual representation is a specific of prometeos you see here the help label you have set in the code the name of the counter and a list of values for example in this case we have one article and two comments created in this website so we can add all those metrics to our code the observability module will implement some generic metrics so the number of nodes, the number of entity and something like that we will add some standard metrics but you can use those services in this method to add metrics to your custom code if you have a newcomers and want to have metrics about how many cart are created or many something like that you can add to your custom code for example then we need to extract data from the system there is a component called node exporter that exposes less metrics from Unix Unix system it is written in Go so it is easily deployable everywhere and usually we can use the node exporter to expose that about CPU, memory, file system this network and a lot of things then we have implemented application node exporter that exposes data about the system we need to configure and run a primitive server somewhere to aggregate this I say in the beginning that we want to try all those things locally on our local PC so for example we can use the image from Prometheus to run it locally then we start by creating a file to configure how we want to scrape data so in this case we have two scraping one for the internal and one from the node exporter and we say that we want to scrape every 5 seconds the Drupal endpoint and every 5 seconds the node exporter and where they are this is the internal docker name for the Apache container and the node exporter container then we can start Prometheus using the official docker image mount our configuration and those are the commands to configure the server if all works correctly all the data are collected by Prometheus in its data data storage Prometheus has a very basic dashboard it's not so useful so we need a better solution to query and visualize data and to do that we use another open source software which is called Grafana and Grafana allows you to query visualize alert to understand your matrix and on logs Grafana also has a docker image to run it locally or everywhere and it uses the promoql which is the query language from Prometheus to the data can build dashboard so for example we want to extract data with those queries for example the PHP request per second the entity created per second PHP memory peak in CPU usage and we want to put them on the same dashboard to see correlation between so we want to see more or not the CPU goes up goes down the memory et cetera so this is a screenshot from Grafana and we see for example in this example we start creating a lot of requests and some of these request creates also entities so we got the request goes up the number of requests goes up the memory occupied by PHP goes up the CPU goes up the free memory goes down so we can correlate those information coming from the system our application and so on Grafana is also useful for analyzing logs there is a project that is called LOCKY it's a new project from Grafana and Grafana Labs that is used to scrape aggregate logs similar the way permitted permissions do so we can add to the same dashboard when we have a CPU and so on we can also add logs from our application to also better correlate a log message with some other metrics that occurs in the system ok distributed traces so logs about storing specific events metrics are measurement at a point in time what we miss is a way to trace a request that arrives to our system go to all the layer maybe of our Drupal go to some external microsystem, microservices and we want to trace all these requests we use another software which is called Open Tracing Open Tracing is the API it's a vendor neutral API to instrument application to expose trace and we need tracing because logging a matrix we cannot use only logs and metrics to reconstruct the journey of a request through a distributed system so we use traces for that take for example a Drupal commerce website where the price of the product comes from an external microservice for example so they are not into Drupal but when the user opens a page with a product the price comes from somewhere ok for example in Drupal commerce we have this concept of resolver this is a price resolver to extract to define the price of a product with SKU we do a call to external external system we want to trace the fact that the request for a project page go to the Drupal then go to the microservice and so on exactly to do that open tracing is the protocol that defines of traces are done we use another software which is called Jaeger that is the equivalent of Grafana for permitted so Jaeger is the dashboard that we use to visualize all those things technologies for example this is the load of a product page in the Drupal commerce so we see we trace in the actual version of the observability module we trace the event calls and trigger rendering in the later version maybe we can also trace the service invocation or database queries and so on but we can see that all the request pass is all the layers of Drupal of the event one after the other Drupal is not multi-traps of the call satisfaction and then here Drupal performs an external call that reach another system the price resolver that resolves the price and return information to Drupal that continue to build the page ok we can use different clients for Jaeger in PHP the most complete is this Jaeger PHP the call is not so good but it's the only one that works ok, we use the only to generate a trace ID when the request arrives to to our system trace events symphony events trace tweak generation and to propagate the trace ID to every call so we can use this trace ID in every system involved for example middleware in the observability module that starts the first span that is one of the line we saw in the Jaeger dashboard so before handling the request is start the span after handling the request it marks the span's finish and then flash all the spans collected to Jaeger with this code we will see in the Jaeger dashboard that we have span for the service Drupal at the URL slash product slash one then the duration is one second one second then for example this is also in the observability module we can overwrite the client factory from the core with a custom one that do the same as the span but with the open tracing library we inject the trace ID into the HTTP header so we can retrieve from the external service in this case I develop a quick go microservice that take the SKU of the product to return a price and we we have the handler that respond to the rest query and a function called computed price that do all the work and we can see that Jaeger is able to construct all the journey and the duration of every component in time so we can explore all the the trip that a request do in this layer ok one thing one last thing that we have to do is to correlate traces with the log because usually I go to the Jaeger dashboard maybe I discover some problem but I need to to access to the log of this specific request for example so we need to correlate traces with log this is simple because we can add processors to the monologue module so the observability module provides a new processor for monologue that has a trace ID to every log so we have an open tracing here in the processors list and then this is the same log as before but with the trace ID message so we can use this to filter and query log messages by trace ID ok next steps are finished observability module one thing we can do is to web profiler do something similar but only in the Drupal world so not outside maybe we can integrate the observability module with web profiler because they do the same thing and write instruction to setup all the stack document and so on ok here are some blog post that explain a lot of interesting things about metrics and open tracing and so on ok just a quick slide about we are arriving because so if you want to join our team drop me a line join us for contribution tomorrow I will be there in the morning so if you want to view the code of the observability module to promote yourself to help developing it you are welcome and thank you I don't know if we have time for questions one question then if we have more questions I'm here so I actually have two questions but I have to choose I guess so those metrics how do you handle those in a more distributed system because if you run your Drupal multiple times then you can either retrieve the metric from one instance or you could retrieve metrics that accumulate for all instances but based on the type of metrics you want one or the other so how do you make that work so you have different systems you have a single Drupal but it runs multiple times like redundancy or skating arm so sometimes you want information for just one instance for example the amount of requests because it tells you a lot about how your load balancing is functioning but for example entity insert is something that you probably want aggregated over all the instances so how do you handle that? you can add labels to metrics so for instance you can add also the the Drupal instance so one of the from the pool from your Drupal and then in Garfano you can write a query and extract data only for one instance or for all instances and aggregate them differently ok, cool ok, thank you