 Hello, everyone, and welcome to the Open Observability Day, co-located event during KubeCon. My name is Bartek, and together with Eduardo, we are co-chairs of this event. And in the next 25 minutes, I would like to share with you some important information about logistics, acknowledgments, and some initial context about this event. So yeah, let's start. So in terms of logistics, two simple rules. Code of contact applies. So be nice and friendly to each other, respectful, because we want to make sure this space is just safe and harassment-free. Masks are acquired still. We have COVID times, so please make sure we have them on unless you are actively eating or talking. For the social media and other information, please use this hashtag, Open Observability Day, and also make sure that, I mean, to understand that the dogs are alive and kind of virtual attendees are watching them, and they will be kind of uploaded two days, at least after the event. There is Wi-Fi as well, so feel free to use it. For virtual discussions and any questions, anything you would like to share with others and maybe not on the scene or during networking, but just virtually, we have a dedicated Slack channel, or actually it's not dedicated, is dedicated for today for this talk. It's generally a Slack channel of tag observability. I will explain to you what tag observability is later, but feel free to really use it. And even if you are in person, just really help virtual attendees to understand what's happening and just answer the questions or even ask questions, right? So we will prioritize the questions from the audience here, just in person, but we would love to kind of take a couple of questions from virtual attendees as well, so we'll be monitoring this channel. And first of all, let's have some acknowledgments. Let's thank everyone who made this even happen. So first of all, program committee. There were like a bunch of people who were like very hard working on reviewing those talks. There were lots of them. And yeah, thank you, thank you to them. Finally, I mean, not finally, but a second thanks to just CNCF staff and venue staff. They are doing very, very hard work to make this happen. So thank you so much for all of this. And sponsors, of course, without the sponsors, we wouldn't be able to meet here. So thank you for Diamond sponsors, Caliptia and Chronosphere and Platinum open source. And we have also thank you to Two Gold sponsors. Okay, so let's share, let's spend a couple of minutes to really understand where we are, like why we are even here, what we are doing here. And by the way, this is my own cat. I really wanted to have my cat being a meme, so yes, I finally get opportunity to do so. So what's observability, right? Like, generally we are here for observability, but what it really means? Why it's so popular nowadays? So my definition is very clear. It's the ability to understand the current or past states of your application or process running on your machines or multiple machines, one or maybe thousands processes and understanding like what it is actually doing, right? Because we maybe program it in some way, but we have to make sure it's actually running as we expected. So the second definition is, you know, it's observability is answering unknown unknowns. What it means is that we are shipping this application somewhere and it's kind of complex environment and we would like to, we know certain known unknowns, like it can be down. There might be some errors on HTTP and there might be, you know, this memory usage roughly or we will know what is the memory usage. So those are known unknowns and we can cover those and unveil those using standard traditional monitoring. But observability goes even further, right? We really want to know the unknown unknowns. So perhaps the questions that we don't know we will be asking in future. For this, we need really good observability to be able to navigate through the states, through the events that application is performing. So we mentioned that observability kind of allows us to in first state of application using certain signals, using external signals. And this is where it comes to free observability pillar signals, signal pillars. And, you know, traditionally probably you are aware, we really were talking about metrics. So some numbers that aggregate certain events, tracing, which is kind of request, transaction-based information about, you know, even that happened across different transactions. And logging, which is really some kind of like, you know, simple observability for scoping the, you know, putting information about the events that were happening in our system. So all of this is great. I think it's already kind of expensive and sometimes difficult to set up all those things. But we are not stopping here. Like recently we're adding more signals, right? And this is where I'm, you know, I'm super fan of experimenting in this space. And recently we have grow in profiling. And particularly in continuous profiling, because as our application is sitting there and performing some work, we want to kind of go past into what profiles have been, you know, one hour ago. And this allows us to, for example, find performance issues, you know, that happened one hour ago. For example, you have a memory leak. It's too, and your application crashed. It's too late to get profile now. You want to get the profile just before the memory leak. So this is where continuous profiling shines. And we can in some way put it as another observability signal. But we are not stopping even there, right? Like recently we have an amazing blog post from Yuri and that mentioned there are actually six signals, right? Six, yeah, I'm counting it right. So he mentioned profiling on top of like three basic signals, but also exceptions, but also events. So it's amazing to be open-minded here and just learn about different profits of categorizing certain data differently. So observability is easier, but I'm not sure yet if this is like a correct categorization, there has to be a balance between, you know, clear categorization of certain signals we observe, but also like we don't want to have them too many. It's just too complex, right? So I would argue with some of this, but it's amazing to have, to see people experimenting and innovating in the space to make things easier in the long run. So be mindful, like check this blog post, it's super, super useful. So we, of course in the CNCF space, we have some projects that helps in this observability mission, right? So let's go roughly through them. We have, you know, as you remember, maybe the CNCF projects are kind of categorized into different maturity levels, sandbox, graduated, or sandbox incubated and graduated. So graduated are those the most major projects and we have on the left Fluendi as the first one, which is, I mean, we have maintainers here on this, on this, in this room, like we'll be speaking about that for sure, but generally it's just a, you know, just collector and pipeline for locks and analyzing those. We have Yager, which is kind of API and a bit of storage and like really patterns for distributed tracing. We have Prometheus, which is like a metric system, but it's much more than this actually. It just gives, you know, monitoring patterns that allow you to alert and monitor in a very reliable way in your infrastructure with PromQL and so on. Then we have incubated projects. We have Cortex and then I will take the latter Thanos as well as one because they are really solving similar problems. They are essentially databases for distributed metric ingestion and storage. So you can think of it about like Prometheus but putting in scalable kind of environments, right? Just, you know, to have like billions of metrics, not only 10 millions, like the single Prometheus can handle, right? But it's kind of forming the same family or Prometheus ecosystem that I'm personally in as well and kind of maintainer of Prometheus and Thanos. We have Open Metrics and Open Telemetry. Those are kind of similar. So let's start with Open Metrics. Open Metrics is just a standard or protocol you could say for ingesting scraping metrics from your application. It's born from Prometheus exposition format. Open Telemetry is much, much more. It has some standards for metrics, logs, profiles, or recently profiles, and then traces. Well, I said that, right? Traces, metrics, logs, and profiles. Some of them are maybe still in progress but generally does the mission. But that's not all. They have also SDK. They have also collector application that you can just put and it understand those protocols and can pipeline to your storage or render of your choice. And we have actually a couple of new projects down there with sandbox level. And they're pretty amazing. It's good to kind of see where things are evolving. So Foneo is security monitoring agent based on the eBPF. Kube Healthy is an operator that allows you to just very easily and like in a standard way check your Kubernetes system. So if your nodes are healthy, if your APIs are healthy and you just run it and then periodically checks and produce some metrics, OpenCost, which is like amazing framework and UI to just monitor your performance metrics or different features you have enabled on your cluster and it shows the actual cost of it based on the provider you have. So it actually allows you to kind of make some cost optimization decisions. We have Pixi, which is another eBPF agent but it's also kind of like bigger framework for taking observability and security kind of information from eBPF and kind of put that into some database. We have Schooner, which is open source UI for Kubernetes. So if you are bored of KubeCuttle, you can try that. It actually has a monitoring built in. So there are some dashboards. So it's pretty sweet. And Trickster, which is like a cache for web services but it's cache primarily for time series applications or databases. So you could use it with Grafana to like make your Grafana much faster because it caches some stuff. So generally, all of it is open source, all of it is Apache too. So this is what we are doing in the CNC app. We are trying to kind of maintain those communities and healthy projects, healthy software really. However, software is not everything. And as you probably know, like even as an engineer, coding is only part of your work. You really have to care about many, many different things to have a successful product, right? You have to find someone who will do operations of your software, community building, education, just agreeing on APIs and standards. And it's just critical parts. So code is just one thing, right? For this reason, we have various tools in the CNC app ecosystem. So one of them are those interest groups, special interest groups which are tied to communities. And there are advisory, technical advisory groups tag which are generally for all the projects, not only communities. So we have two groups for monitoring or observability kind of case. First, we have like tied to communities, seek instrumentation, but it's actually much more like it's essentially how to instrument communities but actually how to monitor communities and observe communities in an efficient way. So if you're interested into contributing into this space or maybe ask questions, for example, how do I monitor my communities? Like that's the space you should go. There are periodic meetings, there are kind of slack channels. So that's community around exactly that, right? Second group we have, we have tag observability and I'm kind of tech lead there. And you are welcome, we are meeting, be weekly as well. And this is more kind of over reaching the communities because we really try to unify the space of observability and educate others on higher level. So yeah, you are also welcome to join us, really talk about your user's perspective, what you are missing and what do you don't know or what would you like to know? So I recommend like just visiting this space. So if you want to learn more on the KubeCon, there are two talks about that. On Wednesday, we have a tag observability update and on Friday we have seek instrumentation. So if you want to meet those people, feel free, I highly recommend. But another kind of educational piece is of course this event. So let's quickly go through what we have ahead of us. So we'll start with two keynotes. Then there will be like building observability pipelines so something about Fluendee, then they will have break, we'll have then two talks, two main talks, then two lighting talks, then lunch, and three lighting talks, two talks around open telemetry actually, then the coffee break and finally three last talks around distributed web assembly or distributed tracing in web assembly. Observability introduction again and yeah, more observability in Kubernetes networking. So it looks pretty sweet. So thank you for those who propose talks and it's kind of first, I think, even open observability day ever. So hopefully if you would like it, if you give feedback that you want us to do those more, like please give that feedback and we'll continue. Okay, as a last kind of piece of this talk, I would like to, we have this idea to kind of share awareness of what those projects are doing. So the project you saw on the landscape and we will talk about essentially graduated and incubated, so those two first rows and I will have some updates from each of those projects. Just might give you awareness like what they do and where they are if you're really using them already. So let's go. So let's start with Fluendee and I will ask Eduardo to come and help me with that. Thank you. Well, first of all, thanks for coming to the first open observability day and just a quick updates of the project. Fluendee was born many years ago, 10 years and it's a really stable, we found it that is working in major cloud providers and in hundred and hundred of companies. And as part when we created Fluendee, we started iteration with the new generation of Fluendee which is called Fluendee, which is a sub-project of the Fluendee ecosystem. And this week we are announcing the launch of Fluendee 2.0 that has been a journey with Fluendee started like six years ago where originally for who you don't know it started as a solution for embedded Linux and then quickly started evolving as a solution for cloud native and distributed systems and everything that you see today with Fluendee has been thanks for the feedback from the community and from users where they started asking for Kubernetes filters, connected for different cloud providers or ways to consume data from system D or different type of sources. So and with Fluendee 2.0, the first major change and vision in the project it used to be just for lock type of data and now we switch all the project to support multiple type of signals. So now we fully support locks, metrics and traces, right? That means full compatibility with Prometheus, open metrics and open telemetry. With Fluendee 2.0 also comes with higher performance. Initially the project at the beginning didn't support any threading. It was just a single process used doing asynchronous events, doing asynchronous connection in just one thread. We started evolving with a more performance solution with threads in the output side to send data to destinations and with Fluendee 2.0, we are extending that to the input side. So now we can scale up even more in your systems. One of the other community ask is like, okay, me agent is working, but how can I see the data that is flowing through the agent before it hits the destination, right? And why? Sometimes things goes wrong in the network or sometimes the data that arrives to my backend like a database, Elastic or Amazon 3 doesn't have the format that I was expecting. So how do I troubleshoot and debug that? Tap TAP is a new feature that allows you to talk to Fluendee over an HTTP rest endpoint and tell it, A, send me a snapshot of the data that is flowing through this input plugin right now. And it will send you a sample for a couple of seconds. So you can expect live what's going on with the agent. For the other side, we didn't support at the beginning a TLS for the input plugins because we were doing networking, but most of the security side was for the output side of the agent. Now in the input side, if you wanted to use as a collector, a regator, you will have all these capabilities. And if you are a developer that is looking to put your business logic into the agent, we support this Lua scripting and a couple of filters also to do enrichment, but now we have extended this for Golang so you can write your own input plugins in Go, your input plugins in WebAssembly or your filters in WebAssembly. So that is Fluendee 2.0. So please feel free to grab your t-shirts outside for this project and well, thank you for coming. Thank you very much. All right, let's quickly move on to other projects. So I asked kind of community members of other projects to like bring, give me some slides of updates. So it's kind of fresh for you. Some of them are on this conference, but still traveling, so we couldn't have them in today. But let me kind of go roughly through the Yagger. So Yagger as a reminder is a distributed tracing system. And it seems like there are lots of updates recently since kind of last cubecon. So they have this monitoring tab and I'm using Yagger, so I'm kind of like a user as well. They have monitoring tab. So they have finally some connection with monitoring information, not only tracing. So it's pretty sweet. Finally, they have OTLP support, which means like a native open telemetry standard and they actually dropping their own protocols like Yagger protocols. So if you still use them, make sure you switch to the new OTLP with shiny new features. And there is adaptive, they added adaptive sampling. So it seems like you can control your sampling much more granularly and dynamically, which is super sweet because as you know, tracing can be expensive if you sample every event in your system. So making sure you make right decision is really important. Yeah, flying graphs for traces, which is amazing again. Like it's a specific kind of view that you could use actually for profiling as well. And yeah, the note of the application. Promit to use an open metrics, a reminder, a metric monitoring system, and I'm maintaining that. And we are moving as well a lot. We are adding new service discoveries. So essentially like how you can integrate with the system you are using to add your scrape jobs to really like control your monitoring metric pipeline. So we added new cloud providers, but actually Nomad as well, which is like super popular orchestration system alternative to communities. Additionally, we have a special long-term support kind of version. It's kind of like very special. I think I was even surprised that this thing can exist. So we have a version every six weeks and some of them like we are trying to be compatible and so on, but we decided to have one version being special and having this for longer so people can use it in more different systems that are not as dynamic as cloud native, you know, cloud native communities, for example, like communities. So having this LTS version is kind of super useful. So we'll make sure it's stable, make sure the security fixes and patches are there. Lots of optimizations and kind of formatting for and pretty printing for PromQL, which is super nice. You don't have to manually kind of format those in the UI. And finally, out of order, what it means is that PromQLs were never capable of appending old samples. So for example, you have a metric that you scrape and it is kind of from current time. And, you know, you want to scrape, you actually forgot to scrape, you know, let's say a period of two hours before that. Like if you would like to backfill that and append in some way via programmatic API or via remote write, it would just fail without order error. Now, Prometheus appends that. It has some cost, of course, so be careful, but it would just accept this if you configure Prometheus properly. OpenTelemetry, lots of changes is a big project as well. So, you know, logging work, Open Metric and Prometheus compatibility work, tracing general availability, telemetry looks being improved, there's client instrumentation, there are end-working groups, so more groups as well, metrics GA, which is kind of sweet, so kind of their metric protocol and SDKs are available. And finally, they start working on profiling. So that's pretty sweet. So we can have like those four signals for now being supported by OpenTelemetry. And two more projects, Cortex, which kind of gets better scalability with new versions and adding more maintainers and the newest release have vertical query shardings, so kind of improvements on query layer and some OpenTelemetry support. Or like, no, sorry, it's just for tracing of Cortex itself, they support OpenTelemetry, and then compaction is more scalable as well. And finally, last project I'm maintaining, I kind of co-funded this project as well, and we are moving as well with scalability. We have KatamaHashRink, we have ingestion rate limiting, so kind of like lots of performance fixes, and we are rewriting PromCon well recently. So if you want to learn more, visit us in PromCon or maybe talks in this event. So yeah, that's all from my side. Thank you very much, and enjoy today's event.