 Hi and welcome to CNC at CIGO Observability updates. We have three people today. Simone Ferli, I'm an observability software architect at Ericsson in Sweden and a senior at the Junctelector at the University of Kastart in Sweden and a CIGO Observability member. Anna-Roger, Prometheus team member of Metrix Founder, CIGO Observability Chair and I work at Grafana Labs. Hello, my name is Bartek Plotka and I'm a Principal Software Engineer at Red Hat. I am a Prometheus maintainer and co-author of Thomas Project. I'm CIG Observability Technique and I also write a book about programming with O'Reilly. So let's get started. What is all of this about? Well, CIG is a special interest group, most of you probably noticed, and observability is fundamentally about making a complex system understandable for humans. There are other definitions, but like the distilled version of this is usually this that as systems become ever more complex, you need to actually be able to understand them as a human. And that's what we're working on. What this also means, as we are a CIG, and this is a group that we want to talk to you, this is also true for this talk, which is why we'll try and optimize towards keeping it relatively short to have as much Q&A and open discussion at the end of this slot. Look at our charter. We are basically there to shepherd the observability space within CNCF and help different projects, different end users, different vendors come into the space and to operate nicely. Okay, so let's quickly recap what we did from really the beginning of the CNCF CIG observability creation, which was a summer last year. So the most important duties that we were told to do is the review of the projects that are being proposed to change the stage of the CNCF where they are in the CNCF landscape. So stand books incubated or graduated stages. So what we did, we submitted four reviews for four projects. Two of them, Cortec and Thanos, ended up being incubated and we kind of spent a lot of time to review open telemetry, which are being passed and the review has been passed to the TOC and are awaiting their decision. So that was kind of a large chunk of our work. But let's now look on our overall high level landscape of the observability in the CNCF. For better or worse, we still are within this old school model of observability-free PLRs that we used to know, which is metrics tracing and logging. And the reason is that everything costs. So for example, metrics while adding less context that you might want to have in terms of unknown things that you should observe, still are worth to have because of just the lowest cost for collecting those and visualizing. And usually that's a start for your journey in terms of incident recovery and finding out what was the risk of that incident and what is the overall health of your system. So this is what you use for alerting and for immediate triggers or calculating SLO reliably over time, right? And usually this is a starting point for the incident duties and with the help of the logging, this is how you start your debugging investigations. And typically you would end up kind of going into tracing only if needed to gather more crucial information about the life cycle of the requests that happen sometimes between multiple microservices in your cloud. And that used to be our landscape, but it is right now still the case. And this is what people use on production. But that's not at the end, right? Like there are things beyond that. And it was predicted already two years ago that there will be more signals at the end. Something that clearly has appeared is something like continuous profiling where you want to know your performance over time and you want to know some historical performance. You can compare things, you can compare, you know, kind of time spent, CPU time spent per every function or memory allocations. And this proven to be very, very valuable. And there are in the open source communities project that allows you to do so. For example, Controf. And at the end, there might be more, like crush dumps. This is something that I think is very valuable for Richie. And I think it would be valuable for everyone. But yeah, this is some million dollars idea that we are still waiting to have someone to implement it. And within the CNC app observability, we also have some methods to activate more information from your system. You would say that those are kind of the testing practices that are maybe a bit novel in the cloud native foundation. And this is essentially, you know, advanced testing, like fuzzing or maybe something that I would say it's fuzzing on production, right, which is essentially chaos monkey techniques, chaos, chaos testing, where you are randomly triggering failover scenarios in the random parts of your system to ensure stability and, you know, good incident reactions. So now when we look on the project landscape, we see some movement from the last year or even from the last KubeCon in North America. So we, well, first of all, we noticed there are new incubated projects, Cortex and Thanos, and that joint open tracing, nothing changed on the graduation side. Flu and the Jaguar Prometheus are still very prominent and still getting more adopted. And on the sandbox side, we see new project trickster, which is, you know, known from amazing caching capabilities on top of Prometheus. We have open metrics and open telemetry, which were being proposed to get incubated. And we still have chaos mesh and litmus, which are our, you know, chaos testing projects we have here. So what's in our to-do list in the CIG observability? So we have some activities in progress, mainly a white paper about observability. So all these stories and telling a little bit, talking about the foundations of observability, like Bartek shared with you previously, including other topics. And most importantly, we're including everybody in the CIG and inviting everybody to contribute to this document, because we see that there are people coming from different areas from different industries, and they might have different references and also different input to contribute to this document. So our goal here is basically have a CNCF stamp or let's say a base document when somebody comes to the CNCF and wants to learn more about observability to what it is and how do I get from zero to hero and what exists in the landscape. This is at least one outcome we want to have in the CIG. The second document is best current practices on tracing from the end user perspective. So if you remember the triplers that Bartek talked about, so we have logs, metrics, and we have distributed tracing. So in tracing, you usually need a little bit of education on the application development side. So you usually have to learn how to instrument your code and which information is relevant from your code to be distributed with what's relevant to be propagated in a distributed context when you have several microservices talking to each other. So this is a little bit what we want to share in this document. We have a smaller set of people from the CIG contributing here. Usually people that already went on this path and learned full mistakes and learned best practices and they want to document and share that. So and as a backlog, we have some activities, mainly webinars on observability and a white paper on overlap of observability and AI and ML. So there is an L missing in our slide, but we do want the machine and learning aspect of that. So and a little bit as input, so learning ways of working. We are trying to understand or trying to have a little bit of a way of working that comes from, let's say, well understood or well operating communities such as the IETF if somebody is familiar with the Internet Engineering Task Force. So how the process of coming from an idea, from a project, from code to having a standard that you can share with other vendors, other companies and other developers and they all need to interoperate. So in the CNCF or at least in the CIG, we are trying to think about like having a clear distinction, like for example in the IRTF and the IETF, what is experimental and more research oriented versus what is more engineering driven problems and implementations. You can have implementations in the experimental and research oriented part as well, but trying to distinguish from where the let's say the impulse is coming in the community is something it's an engineering driven problem that's happening in a certain industry in a certain part that maybe it's not the community that I'm coming from, the industry that I'm working for. Also have a little bit more of a process such as the request for comments in the IETF so like a document that basically if you want to implement a protocol you have a document that describes or not one but a set of them and these documents there is a it's not that simple so there is a process and sometimes a very long process to go from a proposal to have a standard at the end that somebody can follow and can rely on that standard to be interoperable with another system. So how should we better implement that? How should we better work? Should we copy this way of working? Should we implement something different? Is it interesting to have proposals first in a document? Have people like having different implementations and trying to make their code to interoperate and then based on this interoperation you refine the documents up until the stage where you have a solid more solid document and then you could have an IFC so these are some things we're thinking about and one thing that and of course not not only trying to implement or copy what's working very well from other communities but one thing that could work very well in the same way that works in the IETF is also the mediation between projects and vendors in a sense of the goal is to have a standard, the goal is to have interoperable standards and regardless of from which project you are coming from or from which vendor you're coming from the goal here is to have code that's running and code that interoperates and is let's say vendor neutral so create a community that's based on a technical problem that you need to solve you need to implement and it's less dependent on interests that could be coming from a community from an industry or from a particular vendor well there are just three of us here but all the work being carried out in the SIG observability is there are lots of helping hands behind us so the list of people are here and apologies if we forgot about someone but we really try to include everyone here that participate in the calls and are active in in the SIG so thank you very much and if you want to participate if you got interested in what we shared here with you so we have our fortnightly calls the link here is here in the in the slide we have our repository everything that we talk about and we we share in the SIG is public we have a slack channel as well where people ask coming ask things and whatever a mailing list and if you have free time to spend would like to share with us or no so thanks everyone come to the SIG observability and yeah that's all from us for now thank you