 Hello everyone and welcome to the day's talk titled Crash Cars and Observability. My name is Matej Gera, I work as a software engineer at Red Hat, in particular I'm working with the monitoring team on the OpenShift platform and let's get into today's presentation. We will briefly look at observability as a popular password and we will try to go beyond to uncover its meaning and obtain a working definition. We'll also look at which signals are currently employed within the observability landscape and lastly we will look at selected topics in observability practice. The Internet is full of observability talk, from Twitter feeds to blogs and articles to vendors trying to sell their solutions to observability, but not often is it well understood what we are talking about. We will try to find a common understanding of the word, the one we can work to achieve our goals as developers and ops engineers. You might be also wondering am I doing observability, how can I start or how can I improve When thinking about these questions we should be mindful about the fact that observability is not an on or off switch, but this is rather a spectrum. You probably already are collecting logs from applications, possibly Matrix 2, but you might for example be wondering about diving into tracing as well. The good news is you are already quote unquote doing observability and are on the right track to further improvements. We all probably have an inkling of why we want to observe our systems. With move to the microservices architecture our systems have become astonishingly complex and entangled, comprising of hundreds or even thousands of components and moving parts. This is a daunting task trying to make sure the systems do not fall apart and function properly. This is where the promise of observability comes to play. It shall enable us to see the inside of those systems. Wikipedia definition tells us that observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs and we can already extract interesting points from this, like we can apply it to a complex system such as aforementioned microservices. The system has to be able to provide us with outputs, meaning we need to adjust our system to externalize the data about what's going on inside and we call this process instrumentation. In the end this should give us the ability to hypothesize about the state of our system and subsequently confirm or refute suspicions based on these outputs. Which signals can we rely on to gain insight into our system? Probably the most widely used ones are LOX. They signal informals of events in our system and it can answer the what questions down to the level of concrete occurrence of an event such as for example failed email sending. With LOX the potential downside lies with the large amount of events and therefore LOX generated by our systems. These represent the quantitative data. They can answer how much or how many questions for us, such as how many 500 HTTP status codes did the server respond with in the last 30 seconds. Traces tell us about the journey of particular requests or transactions in our system. It is useful in answering questions related to how information travels in our system and how they are being handed over between various components. While the previous three are often thought of as the primary signals well established within the observability landscape, there's a new emerging type of signal, continuous profiling. Continuous profiling helps us understand how resources such as CPU or memory are idealized down to the level of individual sections of code, thus giving us detailed answers on what's going on in the system with regards to performance. To further improve our ability to answer more complex questions about our systems, we need to correlate various signals. We can correlate signals by time of occurrence of particular event or by using signals metadata, such as RequestIT, which can help us to pair, for example, a particular log line to a given trace. Interesting emerging tool in this space are exemplars, which tightly couple metric or log with traces and make it easy to jump between the two signals in a UI, for example. You can also use signals, particular metrics, to measure the quality of your service. Set your goals in SLOs, for example, and track them with the help of metrics. This allows you to objectively measure how well your system is doing and how satisfied your users are. And because it is not humanly possible to follow every output, automated alerting can help with notifying us about the truly important events in our system. We can be alerted upon reaching certain rates of errors in the system, or even if we are depleting our error budgets too fast. But be wary of overdoing it. Too many alerts mean more noise. Focus only on what's important. Hopefully, now we have a basic understanding of what observability is, how we can use it to make hypotheses about the state of our systems, and how signals as outputs play the crucial role in making this possible. Thank you for your attention and be sure to check out the links to more resources on this slide.