 Welcome to our talk on tag observability today. Actually, there we go. So today, we're going to be talking about some updates from Observability Tag. My name is Ken Fittingon. I'm an open telemetry architect at Lumigo. And I participate in the Tag Observability Group. All right. Hi, everyone. Great to see you here. Thank you for joining. I'm Alulita Sharma. And I'm a tag co-chair for the Tag Observability. Really very happy to be here today. And looking forward to having a great discussion on some of the areas we'll be covering as part of this talk. I think people are still getting settled. So please join in. Don't go. You're in the right talk. They're playing musical chairs. No worries, no worries. All right. It takes quite a while to walk. So I think we'll start very quickly with the Observability Tag and what its charter is, very, very briefly, and really encourage you to join in. It's an open community discussion. And our mission as the technical advisory group is to work closely with the CNCF Technical Oversight Committee, which is the TOC, which looks at project health as well as project features that are ongoing, as well as what we do is foster review and grow the ecosystem of open source observability across the Cloud Native Computing Foundation, identify and report gaps in the CNCF's Observability Project portfolio, which is a very interesting area. And we'll talk a bit more about it as we move on. Share patterns and good practices with users across the spectrum, both from and not only from enterprises, but medium businesses to individual developers, as well as vendors who are coming in and building and contributing to the projects that the CNCF has. Educating and informing users without bias that is being vendor-neutral, which is very important for the growth of open source and innovation in the open source ecosystem, especially in observability, which is really a very key part of infrastructure development. Having a vendor-neutral venue for relevant thought validation, discussion, and project feedback. So if you're looking at areas where you are building out new technology or want to see the project or really bouncing off ideas across distributed systems and engineering towards observability in related areas, this is a great place to join in and discuss. And then, again, once discussions are passed, initial phases usually implementation is done on the projects. And last but not least, supporting the TOC for process improvements and project due diligence. So if you are having a new project that is coming in into the CNCF space, typically in the observability domain, we look at and have technical experts from across the industry who participate together for supporting sandbox incubation and graduation steps and helping the projects to move to the next stage. All right. So with that said, again, for those of you who don't pay close attention to that huge landscape graph that we have in the CNCF, which you can find at landscape.cncf.io, this is the domain of observability projects in the CNCF space today. And I just call these out because they are different projects not only supporting metrics, but traces, logs, as well as Kubernetes instrumentation, collection, and visualization. So a whole array of projects, including cost optimization that is in this space. Three of these have graduated already, which includes Prometheus, Fluent D, and Yeager. And also, you have six incubating projects with open telemetry included. Open telemetry is the second largest project in terms of contributions in the CNCF space right after Kubernetes, so big project. 10 sandbox projects and one archived, which is open tracing. And open tracing, as many of you may know, already has kind of merged a bit long with open sensors into open telemetry. So the modern project today is open telemetry for that. So why does that matter? And why does this landscape matter? Is because, again, here we are. We are interested in seeing and very healthy end-to-end pipeline for observability, end-to-end, which includes instrumentation, collection, analysis, visualization, as well as storage. And project health is a very important part of understanding and measuring that. So health metrics, for example, which is something that we look at for each project and work with the TOC to be able to synchronize on that, include contributor growth, maintain our diversity. So are there more than multiple contributors involved from different companies, especially, or different organizations? Project velocity, what is the velocity at which a project is actually getting contributions, as well as being able to review contributions, PRs, as well as issues and discussions? And last but not least, a very important metric, which is user adoption. That is, how viable is this project from a user point of view? And is it sustainable? So this goes back to the core area, which many users ask, is that are these core technologies that are being developed in the open source space at risk? Are they healthy, or are they at risk? And a couple of projects that come to mind in the last few years, where the dynamics of contribution have changed significantly are Yeager and Cortex. And Yeager, as many of you know, is a core, core project in the tracing space. It has supported visualization for traces. It's supported storage, as well as analysis, to a large degree. And tracing has been inherently a very key part of observability. So Yeager's project health, if you go and look at the contributions that the project has had, over time, many of the maintainers have actually gone and started contributing to open telemetry, for example. So they don't have enough time to be able to also contribute to Yeager. And what that does is that, fundamentally, even though the large user community that has adopted Yeager deeply and widely for tracing has been using Yeager, but what's the future? And that's an area that the tag of observability actually works very closely with its community, as well as users and contributors in the projects. And Cortex is another example, which is a multi-tenant implementation of Prometheus for multi-tenant and scalability. And Cortex also has had a similar situation where original contributors on the project moved on. And again, there was, at a point in time, a couple of years ago, there was a gap in the maintenance ship, if you will, for such a core project. And industry uses these components deeply in the observability space, especially for cloud-native infrastructure. So where to next, right? The other areas that I'd like to call out in this space, and please feel free to call out any other areas or gaps that you see, in the current CNCF project landscape for observability, is missing core observability functionality such as a tracing store, right? Again, I think that we'd love to have some candidate projects that are open source and sit in the industry, such as Open Search, for example, that could be leveraged as a logging store, as well as tracing store, but would be great to have in the CNCF space. Similarly, UI and visualization is also another area that's kind of underrepresented, where you have a tracing UI that comes out of the box with Yeager, but then you have also other areas. There are some UI features set with Prometheus, but you'd love to see more sophisticated UIs and a whole visualization engine, including leveraging analysis and intelligent analysis, especially for observability. And a couple of other areas which have actually become more and more prominent in this space are also cost tracking. There is one project called Open Cost that is in the CNCF space, as you saw in the previous slide. But we need more. There is room for a lot more innovation as well as improvements in the cost tracking area, a feature set, as well as standard querying. So standardized querying, again, some work is happening in the spec space in the tag, in one of our work groups, which we'll be covering later. But again, there's room for other areas which can actually be added to the observability project space in the CNCF. So with that said, again, are there other suggestions that you have for gaps? Think about it. Please call them out if you see other gaps in feature set, which you'd like to see in the CNCF. Moving on. So I think at this point I'll hand over to Ken to dive into the tag activities, because he's been deeply involved in the work groups, driving some of the work groups, as well as others. So over to you. Thanks a lot, Lita. Just before I get going on this, how many of you have actually attended a tag observability session? Cool, some of you have. It would be great and fantastic over the next few months to see all of you attend at some point, just to get a sense of what we're talking about every time we meet and to hopefully begin to contribute. And this is an example of some of the things you'll experience attending those talks. So we've had a lot of speaker series over the last year. We had a few, maybe half a dozen in the previous year as well. We're definitely trying to get more speakers. So if you have a project that's of interest or something you're doing interesting with observability, please feel free to reach out to us on Slack and organize to come present. We'd love to have some user stories in the tag observability as well. So we had some sessions from Jonah Cowell on observability. Ryan Perry talking about profiling, which is now part of open telemetry. We had VJ talking about EB pivoting to Hotel. Ryan White talking about quine and graphs for observability. Philip talking about Thanos and its performance improvements, and then Matt coming and talking to us about open costs not that long ago. And continuing on with the success, we had a few projects added to the sandbox in the last few months. There was Kube burner and logging operator. They also presented to tag observability as well before they submitted to the sandbox just to give us a sense of whether it made sense to fit in the tag observability space. And we had a few projects promoted from sandbox to incubating as well, so open costs and Pixie. Pixie presented to us back in February. So if for anyone not familiar, that's like EBPF observability. And so some of the things we have in progress right now, we have the query language standardization group, which Chris runs, and we'll have him talk about more details shortly. There's the Observe K8s work group, which I'm a part of. And we'll talk about that some more in a second. We also have collaborations with other tags at the CNCF. So we've been working with Tag Runtime for founding the AI work group, which if you follow the slack, you'll have seen there's a new channel for that working group discussing what's going on there. And then landscape graph with tag security. So as projects evolve, things evolve, the tag is always looking to collaborate with other tags at the CNCF to ensure that when there's that cross-cutting concern between different groups that we're all on the same page about what the progress is, what the right thing to do is, and anything like that. Also, we spent some time working on an observability white paper for the last probably 18 months or so, I would say. It took a little while to get it finished, but it's kind of in a reasonable state now. But please go read that draft and give us your thoughts on it. We want to make sure it's something that the end user community actually wants. And it's not something we're just thrown together in an ivory tower, and everyone looks at it and goes, that doesn't make any sense to me. I don't want to read that. So please take a look at it and let us know your feedback. Much appreciated. So the Observe Case Work Group, the intent behind that is to develop best practices guidelines for how to set up and run observability for different use cases, as well as providing a means for anyone not familiar with the observability realm to get a better understanding of observability by utilizing an application and seeing what it does in the various observability tools of the CNCF landscape. So right now, we have a observability demo which builds on the OTEL demo and basically provides all the mechanisms you need to set up the CNCF projects for observing that application. So we have instructions for GCP and AWS already there. We'd love to get contributions for Azure. Please come and contribute. We're in the process now of wanting to put together a website where we can host this information and start developing the content and getting feedback from it and getting people looking at it. And with that, Chris, if you want to come up and tell us about the query standardization. Hi, my name is Chris Larson. I work at Netflix right now. I'm one of the co-chairs of the query language standardization working group, along with VJ Samuel. And I just want to remind everybody that friends don't let friends create domain-specific languages. There are so many of them out there in the observability space that it's a real difficult challenge to work with different vendors and correlate your data. So we've started the working group taking on the impetus from OpenTelemetry, whose standardize ingestion. Now we want to see if we can standardize egress of observability data. So we've been working on the working group for SenseCubeCon Europe. And we've had a lot of great presentations where we're gathering input from the designers and some of the big DSLs out there for observability. We've had presentations from them from the Google Monarch team, the Microsoft KQLM team, OpenSearch. We have Prometheus coming up next week, actually. And we have more from KX Systems as well. We're also working with the SQL Standards Group. And we have a lot more work to go. We need your help to collect use case stories around observability and more documentation about these languages that exist. Then hopefully at the end of this next quarter, Q1, we'll start actually discussing what we'd recommend as a standard that people could use for querying and interacting with observability data. So please join us. Our meetings are on the opposite Tuesdays of the TAG meeting. Thanks. Thanks, Chris. Thanks, Chris. And now I'll hand it back to Alelita to talk about some of the future for observability. All right, so as Chris introduced gently, again, query language specification from a user standpoint is super important, simply because as he said, there are every vendor has their definition of a query language today. And there is a need from a user perspective of being able to have a standardized query language. So with that said, again, I wanted to kind of go into what is the future of observability look like in terms of the short-term future, right? I mean, again, we cannot typically predict beyond the immediate to maybe three years from now. And some of the areas that I'd like to call out that are really, really picking up steam in the observability space from a vendor as well as a innovation perspective and contributions back to the projects themselves has been profiling, right? So many of you have been using metrics, traces as well as logs, logs are as old as programming languages themselves. And here you have deeper data as well as wider different types of data intersecting with what we use for observability to get more granularity and understanding of the systems that we run, right? And the systems that we operate infrastructure included. So profiling is actually a very, very hot space right now where there's a lot of multiple projects who have been working on EBPF implementations. Cillium is one of the projects that just recently graduated and is supporting profiling out of the box. Similarly, Pixie is another project that we called out earlier and there is also native profiling support that is coming into open telemetry from an collection standpoint, right? So that's exciting because profiling instrumentation as well as collection is super important even to be able to analyze this at scale. The second area which is, again, data collection continues to have a lot of velocity in the innovation that's happening in that space. Open telemetry as a shout out, just added profiling as the fourth signal, if you will, for telemetry as well as semantic conventions which are super important for guaranteeing interoperability across different versions of any software component. The elastic common schema which has been used forever in the logging space has also converged and has been donated by elastic into the open telemetry project to actually land into OTLP, right? Which is very significant for logging because that really converges an ad hoc industry standard into a new standard that is coming into place for a new generation of applications and infrastructure which is OTLP in long form also known as the open telemetry protocol. Similarly, cost optimization again is a very, very fast moving area at least from a user perspective as you are building out and scaling out large scale infrastructure as well as running global applications at worldwide scale. It is super important to be very conscious of cost, right? And cost instrumentation, definition of metrics, as well as different types of applications in the middle where also being instrumented in a standardized way is something which is super important for real time analysis and being able to plan and forecast. I did do a talk on cost management yesterday so if the recordings are available please do catch up on it. And last but not least in this space of cost optimization is also data management because here we are looking at another area of metrics or other data that will be instrumented from the infrastructure to be collected, analyzed and then visualized, right? And so there is a new stream of data that is evolving here. Another very cool area that many of us are interested in or hearing about or working on is AI observability. Now, this has been a term that's been in the infrastructure and platform engineering space for a long time. Similar, you've heard of ML ops and this is really the new incarnation of that which is AI ops. And what this does is that as we have more models being applied and especially large language models that are being applied, it really gives you the ability to combine standard observability analysis practices with the scale of computation on data and analysis on data that can be accelerated by leveraging AI ops and AI models. So this space as it evolves will actually continue to transform the space of observability itself because you are also introducing a new generation of assets that are being used by applications which are the models themselves. And as you would observe code, as you would observe infrastructure, as you would observe application performance, here you also have how do you observe the models, right? And this is another space which is super interesting. Watch out for a lot more work that we've, all the projects may be sharing in this coming year. And last but not least, observability UIs, right? Again, this has been a huge area of innovation. There are many great projects in this space. Grafana is a well-known project that's used, you know, Kibana, open-search dashboards, Yeager. There are many, many different implementations of UIs even in the open-source space today. But standardization is important just as, you know, Chris highlighted, very language standardization makes it easier for us to be able to correlate data at the scale that, you know, we look at data at. And UIs are the same way, right? It really helps you look at all different types of data that you're collecting from the infrastructure and applications to be able to visualize, you know, together. Another area which is correlated, you know, related to this is really being able to not only build dashboards dynamically as code, some of that functionality exists today, but it'll continue to get more sophisticated because you have more data, more types of data that are coming in into the space, as well as alerts based on that, alerts as code, right? Because at the end of the day, if you're running a model and you do not predict behavior and you can see new transformations in different states, you will have dynamic alerts also. And it really is that at the scale that, you know, data is being collected, you also have that scale to measure for analysis as well as visualization. So these are some of the areas, again, you know, in the tag that are discussed, a lot of the new innovators, both in the universities as well as an industry, and the open source, you know, developers and engineers who are working in this space all get together in the tag, and it's a great way to kind of think through some of these areas and apply them in what we are building. So with that said, I think we'll kind of take this, you know, as an example, but just, you know, this is one of the projects we've been working on and in the CNCF tag. And we, again, have some contributors to the project. Thank you, Matt, for some of the work that you've done in this space. There are other contributors who have actually worked on this project, but it's very interesting because it's really taking the kind of release process that and release velocity and understanding that across projects, right? So I think that these are hard to read, but we'll make available a link on Slack. If you're on the CNCF tag observability channel, again, feel free to look out for this link. All right, so with that said, again, did you want to? No, so basically when I was saying, please come help, we'd love to get everyone more involved because at the end of the day, the tag can only do so much without volunteers contributing their time and effort to help make these things happen. If we have three people doing things, then not a lot's gonna happen. If we have 20 or 30 people helping out a little bit, then a lot more can happen. So definitely come along, participate in the discussions we have at the tag, share your interests, any topics that you want to see the tag discussing, comment on the issues we have in the GitHub repo for the tag. If there's anything of interest there that you want to help out with, please shout and do so either through the issue or in Slack itself. And our sessions are twice a month, first and third Tuesdays, 9 a.m. PST, noon Eastern. There's links to the tag repository, the Slack and the mailing list there. But please come say hi, contribute. Even if you just want to listen to start with, perfectly fine, I was a lurker for a while before I started doing things and I think I've been attending them for probably close to three years now, but probably that first year was me just listening in and getting to know people. That's totally fine. Yep, totally, totally. And everyone's welcome again. Even if you're interested in a topic, please just join in and say so. We'll go and find the folks who are actually working on this in the larger ecosystem and invite them to come and speak. And with that said, again, did you want to dive in, Matt? Yeah, let's have Matt come up and talk to us about one of these, the Kubernetes resources ontology. As a contributor, again, did you want to do a shout out for some of the projects? Hello, everybody. So we have a bit of a tradition at KubeCon to talk about sort of how in concrete terms can people engage? This is not an exhaustive list. This is actually just a top five or six. So we do two things, we do that. And at the end, we also say hey, we think people should think about these things. Last year it was what's the shape of data in observability and in general in cloud native and what's that look like? This year I'll have another one that builds on that. But I wanted to walk through a couple of the proposals and in-flight work that people can engage with presently and as was covered, if you identify as an end user or a project maintainer or a contributor or some combination of those and you want to collaborate with some of the other folks that wear those hats, the tag is a place to do that. And top of my personal list is an ontology for Kubernetes resources generally. This gives us an organizing principle to structure data around. We want to have a local in-person meetup program. There are so many people that are entering this domain because it's so relevant and important and they are not all geographically in one place. So if you like organizing parties and or meetups or nerdy technical things, there's a program there for that and it needs warm bodies and people who are passionate about it. We have a proposed working group that the TOC will be expected to vote on sometime after Gubcon. There are details there. It's issue 1200, which is the bod of my first modem. The landscape was mentioned and I'll get to that at the end. It's been paused for about 13 months for a very good reason and I'm very excited to tell you what's changed. We mentioned collaboration. We have an ongoing work stream defined to work with the tag security out of which a project called Gwok happened, which is now not going to be a CNCF project but we will be collaborating with tag security to help with the data model for packages, CVEs and things like that. And lastly, there are labels. This is a small section of those. As to what to look for coming in the coming year, September 20th, a handful of weeks ago, is a special date for me. It's a date that I completed another lap around the sun. It's also when the GraphQL Foundation had their yearly summit that they just finished. One of the things that the landscape graph, which go check it out if you wanna see it, defines as a federated GraphQL, supergraph, subgraph, architecture, and there wasn't a clean, fully open way to do that this time last year. There were various offerings from all the major folks but they all kind of had a fly in the ointment that if you wanted to scale things up, there wasn't a fully open spec for how we federate. You could do stitching. You could do actual federation. You could do all manner of other things. But they're working and they just announced an actual spec. And secondly, part of the direction they announced is for dynamic GraphQL schemas. So as we are trying to observe the world and describe it in ways that have good UX and are self-describing, GraphQL provides us a really nice way to do that. And to do that, we need to federate it. And so check it out. All the talks are online. It's not a CNCF thing, but everything in the CNCF needs to address and understand the shape of data and to query it and do stuff with it. So have a look. The last bit is around dynamic schema. So we think of GraphQL schemas as static but as we build out platforms and APIs, all data is not in GraphQL. So as a thought experiment, right now today, a GraphQL foundation project lets you federate open API to GraphQL. That means anything open API can be described pretty easily by GraphQL with good tooling. So stuff to think about in the coming year. And again, I would encourage you to reach out to your friends even if you don't come to a tag meeting and talk about this stuff. If you have ideas that you think should be done, that's how most of those things started. So you are more than welcome. First and third Tuesday. Thanks, man. Thanks. Thanks, man. All right. So with that said, I think we are under a minute and I'd like to at least have some questions if possible. Great shout out to all the folks who have been participating in the tag meetings this year, as well as the work groups and discussions that have been ongoing. So a huge thank you to all of the folks who have listed out but also many more who actually have reviewed papers, reviewed some of the PRs and continue to work on many of the projects and work groups above and beyond these folks who are called out here. So with that said, again, happy to take a couple of questions. Do you have any questions? Anything you'd like to see the tag do more of? And while we're waiting for that, I'll just add that you can work towards having your name on that list for next coupon by attending tag meetings and contributing to issues, et cetera. And we will be there in Paris, too, in March. Thank you, everyone. All right. Can I have a question? Yes. Any questions? Please go ahead. I wanted to ask, specifically, Alelita called out OpenSearch when talking about the gaps in CLCF landscape. Yes. Can you share anything? Do you think OpenSearch should be part of CLCF? Is it going to be? We've actually been reaching out to the OpenSearch project and trying to figure out if they can actually, again, be within the CLCF. So again, ping me anytime. I'll keep you posted on updates, but nothing concrete yet. We are still trying to figure out the charter, the mandate, how would they actually be able to get contributions and more contributors involved? Because it is such a large space, right? I mean, not only are you looking at and a lot of dependency on Apache Lysine under the hood, but also very strong cross-collaboration and improvements on the elastic search layer, right? So, but thank you for the question. I'll definitely keep you guys posted. Hi, I'm Matt Ray. I'm the community manager for OpenCost. And it was nice to see that we graduated, apparently, to move in incubation. That's new. I don't think that's happened yet. No, I don't think so. But it was on the slide, so it counts, right? Oh. Maybe it was... But that's not my point. Okay. Excellent. Yes, yes. One of the things you said you were looking for were it's more general cloud costs. I just wanted to point out, two weeks ago, we added that feature. Wonderful. We now have access to all your bills. And so if anybody wants billing data, I'd be happy to get it to something better than what we have. Awesome. Yeah, and we're gonna be adding carbon footprint soon. Fantastic. Thanks, Matt. Thank you, Matt. Hi, Jacob. Hi. Good to see you. Good to see you. Thanks for all this. Like, very excited about a standardized query language. I have been thinking about, you know, sort of standardized dashboards and alerting for a while. One of the problems with it comes with semantic convention and where the actual, you know, metrics and traces and logs and what format they're in. Has there been any thought or decisions around, you know, basing it off of hotel semantic convention given the merging of ECS into that as well? Yes, absolutely. In fact, you know, the objective is, again, to leverage existing work that's already ongoing in the semantic convention space and, you know, extend that further. So, totally. I mean, again, our purpose is not to reinvent. It really is to kind of converge and reuse. Yeah, I really like the, you know, Grafana sort of shareable, like, dashboard JSONs. And so, you know, the Prometheus rule is alerting JSONs. But, you know, there's sort of this incompatibility between that and hotel semantic convention. And I'm looking forward to, like, those two things being able to converge. Yes, I totally agree. And also, you know, there are other interesting projects such as PERSIS, which is also being used as, you know, and it's an evolving project, not in the CNCF space yet, but it would be interesting to actually get them more deeply involved towards, you know, and that might help some of the convergence also. What's the name of that, sir? P-E-R-S-E-S. P-E-R-S-E-S. PERSIS. Cool, thank you. Sure, thanks Jacob. All right, I think we are at time and beyond. So, thank you everyone. Thanks very much everyone for attending. Really appreciate it. Thank you.