 So welcome everybody to the maintainers track session for OpenTelemetry. So we'll introduce ourselves and we'll get started with a very quick background on the project because I'm assuming most people here are pretty familiar with OpenTelemetry at this point. Followed by our current status then we'll talk about the road ahead and we'll wrap up with a Q&A amongst maintainers. So once we finish the road ahead section we'll invite the maintainers to come on stage. We've got a hand mic here and then we'll take questions from the audience until they give us the boot out of here. So to start with introductions, my name is Morgan McClain. I'm a director of product management at Splunk. I've been with OpenTelemetry since the beginning and OpenSensus before that as have I guess both my co-stars up here. And yeah, I don't do a lot of code contributions but I run the weekly maintainers call and I'm involved in a lot of the spec work and other work that's going on. Hi, yeah, my name's Ted Young. I work at LightStep, also one of the co-founders of OpenTelemetry and spend most of my time working on that project and I get to go. I'm Daniel Dyla. I work for Dynatrace. I've been the maintainer of OpenTelemetry.js for about two years and on the governance committee as well for about the same amount of time. Nice. All right, yeah. So just a briefest of overviews of the project and what it consists of. I assume most people know, but just as a reminder, this green blob is your app and if you would like to observe your app you install the OpenTelemetry SDK. The OpenTelemetry SDK observes your various libraries either by having those libraries natively instrument themselves or by installing an instrumentation plugin. You then export that data from the SDK to a collector either running as a sidecar or as a pool behind the load balancer and you send that data through OTLP which is OpenTelemetry's native protocol. The collector itself is a really awesome Swiss Army knife. It takes input from a huge variety of sources not just OTLP, connects all of that data into complex processing pipelines that you can write in order to do things like data scrubbing and enrichment and all that fun stuff and then outputs it again in OTLP or a variety of other formats. And what makes OpenTelemetry special beyond just being very friendly standard that's adopted by many different organizations is it takes all these different signals, tracing metrics, logs plus more and integrates them all into a single braid of data that creates a graph that a computer analysis can walk rather than having these separate siloed tracing metrics and log stores. And that is the OpenTelemetry project. Thanks, Ted. I guess I'm gonna talk about sort of, how do I, I'm not gonna start this one. All right, so I'm gonna talk about what we've accomplished over the past 12 months. Before I start, how many people in here have contributed to OpenTelemetry in the past 12 months? Can I get a show of hands? All right, pretty good. Yeah, thank you to all of you guys because none of this would have been possible without everybody contributing. So for those that don't know, over the last 12 months, we've done a lot of work on logging, which is going to be completed fairly soon here, I think, so we're excited about that. We announced the general availability of tracing in OpenTelemetry roughly a year ago. We implemented something called telemetry schemas which helps out with instrumentation stability and upgrade paths and things like that. We declared stability on the log data model, which is important for the logging SDK stability. We started a new client instrumentation SIG, which is working on building a lot of like front end instrumentations and stuff like that, and an end user working group to do research on, end users, what are their problems, and how can we solve them. And as of this morning, we announced the metrics general availability 1.0. So depending on language by language, I think we had a handful of languages included in the 1.0 and more will be rolling out in the coming weeks. Yep, and it's the, to be clear it's the release candidates for the 1.0. Yeah, I'm sorry, release candidates. So they're considered effectively production ready. There will not be any changes to them going forward. Modulo, some horrific bug is discovered one or two, obviously. Which will not happen. Hopefully. But this is effectively the general availability announcement for open telemetry metrics. This is big, because like tracing has been really, really well adopted. I mean, there's enough people in this room, I think it shows how popular this project is. But metrics is part of the original promise for open telemetry. So it's fantastic that now like a few years in, we've actually gone and achieved that goal. So it's something we're really proud of as a community, something we wanna celebrate together. Yeah. So we promised the road ahead. So I see a number of familiar faces here who came to our three or four hour long open telemetry community meeting yesterday. A big chunk of that community meeting was us drafting or at least aligning on the set of things that the community wants to work on going forward. So traces and metrics, those were the original promise of open telemetry, about a year into the project we started work on logs, that wasn't part of the original promise. But with metrics work wrapping up, it's an opportune time for us to focus on new things and see where we want the project to expand to. We had people put X's into a Google doc and we were in a rush making these slides so we just pasted the whole table into here. But the most popular item amongst attendees yesterday was logging, not a surprise, logging's probably about halfway done at this point. It's already fairly robust for a small set of use cases. That's the one that's garnering the most community attention. It's the thing that we wanna drive to completion next. Interestingly, the next most popular was profiling. So adding some kind of statistical profiling to open telemetry's data collection. There are examples of statistical continuous profiling tools that have been out there in the industry with some amount of success for each. But I think it would be incredibly powerful for us to be able to capture this data, these effectively stacked traces and other low level performance information from applications and then correlate that with distributed traces with metrics, with logs and finally and probably most prominently with the other resource metadata that we have. Today, within most observability solutions that consume open telemetry data or their own data, there is a bit of a gap or a disconnect between code level performance information and service and infrastructure performance level information. Open telemetry has historically focused on that service and infrastructure view. I think there's a huge amount of power to be unlocked by being able to tell people like your service is slow because we've already traced it down because this one function is screwing up. And so this is, it's really exciting just so much enthusiasm about this. No work has started here yet so this will need to get kicked off shortly but expect this to be a big area of investment going forward for the project. The third most large item is improving our operationalization, documentation, project confidence, basically improving the onboarding experience for contributors and for users of open telemetry and giving them just more collateral and things they can use to guide them on that journey. You know, there were comments that we were made in our project meeting that in many ways open telemetry is ahead of some other open source projects in this. This tends to be a sore point of most open source projects. People like to contribute code but not so much focus on documentation and other things. But we can still do a lot better here and so we're gonna see some major investments here throughout the next year. There's some other items here that if you're in the community are gonna look familiar or really from Daniel's summary of what we've worked on in the past that look familiar. We'll go in a few of them in depth and then we'll open this up for the Q&A. So the first is a big update on logging. To be clear for everyone here if you're not already invested or knowledgeable about open telemetry logging there's really two paths of work being done in parallel with the same group of people. But there's sort of two paths for data that we expect to see in the future. The first is open telemetry being able to capture data from existing logging sources. So there was a donation made to the open telemetry collector last year called Stanza. It's a very high performance effectively logging agent that can read and tail log files on disk and parse them and process them and then submit them to an endpoint. And so we see this project continuing to drive that functionality of picking up logs that people are already authoring from existing sources where they're traditionally just dump this text files on a disk and then read from there. In parallel we also want to have a more modern path for logs. There's a lot of challenges of logging today. And a lot of these stem from the fact that logs are written in a variety of libraries and SDKs and then they're written into human readable text files often very inconsistently. And so a few years ago when we started surveying this space and we talked to logging vendors and large companies and the cloud vendors about their challenges of logging internally, a few things kept popping up. Poor performance being one, right? It's very expensive to constantly process these human readable text files, but also inconsistency where because you're using different logging libraries and things to generate these logs, timestamps might be marked differently. You might have different annotations, different pieces of data on those logs. And so we'd like to have this more strongly typed system where we can actually bring in the logs in perhaps like a binary format or something and process them very quickly where they're all guaranteed to have the same metadata. There's actually been a lot of work that's been done on this effort. There was a conversation, I wasn't actually party to it, but there's a conversation to Alelita and some others on the open telemetry. Logging group had with Elastic about merging elastic common schema into open telemetry. So there's agreement on that now. So the logging annotation format and just the general annotation format and that we have an open telemetry will be merged with Elastic common schema which is really, really exciting because there's a lot of great collateral there. And this will help us guarantee that all logs, no matter which path they're processed through will have the same set of annotations, semantic conventions, everything else on them so they can be parsed consistently whether it's through that text parsing path or through that machine readable strongly typed path. There's also things we need to do in logging like extend our collector processing, the events API, framework integrations, everything else, obviously those will be worked on but that's slightly more mechanical than the work I already described. Okay, I'm gonna speed up a little more through the others because we wanna save a lot of time for the Q and A. There's gonna be investments over the next year and basically in perpetuity to improve the ease of use of open telemetry. This includes just building more integrations that work out of the box. This includes automatically turning on instrumentation and things like the collector, making it so people in most scenarios can go use, download one of our artifacts from the OSS repos, get started with it and it goes and instruments their app, instruments their infrastructure without any additional configuration. So we wanna make some big investments there and some are already happening. We're also gonna invest more in the experience of documentation, improving usability, simplifying configuration. Finally, there is work that's been kicked off on building a control plane for open telemetry. This is basically a messaging and transmission system that will allow the open telemetry collector and probably the SDKs and language agents to have their config change remotely live. Obviously for some companies or people using this, if you're already up to speed with Ansible and everything else, you probably are saying like, hey, I don't need this, right? This works fine today. But there's a lot of organizations out there who rely on open telemetry who have challenges rapidly deploying things and it'd be nice for them to be able to update their configurations and update how they use open telemetry very quickly and manage that centrally. So we're gonna be adding that. Another big area of investment, client instrumentation. Open telemetry has historically focused very much on capturing data from back-end services, back-end infrastructure. There's an obvious opportunity here to also capture information from things like front-end webpages, Android applications, iOS applications and others to extend observability to the full stack. To be clear, the open telemetry JavaScript SDKs that Dan works on have supported that technically since day one, but we haven't had a strong specification or standards for this. And so we want to bring the same level of commitment and standardization that we have to all of our back-end solutions where the same metrics are captured across different languages, the same traces are captured in the same way. We wanna bring that to front-end applications as well. So there's already investments here. Expect to see that continuing. I already talked about profiling. We expect to see that as a new signal type that would be the fourth sort of core signal type inside of open telemetry. In terms of new signal sources, there's also some investments that have been kicked off in the EVPF space. This would capture, to be clear, like traces, metrics, logs and profiles. It's still the same signals. But you can pull out signals in ways or from different sources that you haven't been able to before the EVPF. One of the first examples that this group has been investing in and investigating is pulling in a lot of network telemetry. So today, if you use OTEL or other tracing solutions, you have two services. You capture a trace between them. You get a span that captures like the RPCs like RPCs that go between them. That RPC today or that span today gives you no way to dig into how long was spent with application processing, how much with how long was spent on network telemetry or what part of that span represents something like a DNS lookup and waiting for that. With EVPF, we can add that really low level system information into that application telemetry and correlate it. And you could see, for example, in every span, if it was network latency that made it long or actual application, like actual application slowness. There's plenty of other interesting uses of EVPF that's probably just the most, like the first one has been proposed and sort of what you'll see initially. Next is there will be investments in a demo application for open telemetry. This will be useful for showing off open telemetry, but also very useful for automated testing. So we can do full end-to-end tests and performance testing of the open telemetry components as a community. There's work that got kicked off a few weeks ago. They're using the microservices demo, also called Hipster Shop that originally came out of Google. And they'll be working with that going forward. I think they're gonna add some more things to it. So if you're not involved in the hotel community, so if you use hotel, you're interested in hotel and you're here and you're thinking, hey, how do I get more engaged with this? There's a number of different ways. If you have a specific interest, so if you're, say, a Java developer and you're really interested in observability for Java, we have language-specific special interest groups and signal-specific special interest groups that you can join all of our SIGs meet weekly on Zoom. And we also have a, we're on the CNCS Slack instance where we also do our asynchronous communication. And so if you have a clear interest on a specific topic, that is the best way to get involved. And I think more specifically, if you just want to contribute generally and you have some spare cycles, another great way to get started would be helping our documentation on our website. So the QR code here that you can scan and that's an area that we want to invest in a lot next year, as I mentioned. Finally, we are interested in getting a lot more end-user feedback. So OpenSlimetry is used by a large consortium of companies, Splunk and Dynatrace and LightStep and New Relic and Microsoft and Google and various others all contribute and their customers use it. And so we want to hear more from end-users as an open-source community because typically the feedback gets funneled to one of those firms and while some of us here can't always be shared as broadly. So if you're an end-user, like one way you can really help the community is by providing us feedback directly. And so we've got an end-user work group that meets I think every week or every two weeks that's on the community calendar. That is a great place to join and just express your feedback. We also have some folks here like Shar and some others who are gonna be collecting feedback from end-users. So if you want to give feedback, please stay at the end. And we can do like various types of feedback sessions including video interviews that'll be used amongst all the members of the community to improve open telemetry. So with that, we've talked about the past of open telemetry and the future. We have some time now for a maintainer Q&A which was promised. So if you are a maintainer of open telemetry, if you're a maintainer of one of the SIGs, please come forward. We have a hand microphone we can share around. We'll just take a minute or two for the maintainers to come up and then we will start taking questions from the audience. So I see Anthony, I think Shar, do you wanna come up? And I know we have a few others here. At least I thought we did. We definitely do. Come on up. Maybe it'll just be the five of us. All right, so what we can do is we have the one microphone. Should we let them quickly introduce themselves too? Yes, that's a great idea. Turn this bad boy on. This is a different microphone. It's not the one marked on. It's not a sure, yeah. Oh, I think we're gonna turn it on back there. There you go, excellent, yeah. Anthony, there you go. Hello? Hello, hi, Anthony Marabella. I'm a maintainer of the GO SDK. I also work with the collector. Work at AWS on our distribution of open telemetry. Hello, I'm Shar Crudin. I'm a director of engineering at New Relic in the open telemetry community. I organize the end user working group. So Morgan's been talking about that, so hello. My name is Amir. I'm a maintainer at the JavaScript SDK. So we've only got one hand mic, so what we'll do is if you wanna ask a question, please raise your hand. We point at you, then just yell it pretty loud after we point at you, and then we'll repeat into the microphones for the live stream, and then we'll use the hand mic amongst these three because we already have our headsets, and we'll answer it. So just starting right up front. Yeah, probably, we haven't defined it yet. Repeat the question. Thank you. Can you give me how you offered that spring as a stack trace from the second one? Yeah, so that's a good question, so I'll repeat it for everyone. Thank you. So the question was for the profiles, are they gonna be represented effectively as stack traces that are regularly captured? And the second was for this profiling work when we have, and basically with an error log that we get today, can we correlate those or feed those back in? So for the first half, we haven't actually started work on profiling. Yeah, like it was literally yesterday when the community sort of decided to make a commitment to it, so I don't know, at the same time, like I used to work on distributed profiling at my last job. So yes, I fully expect it'll be like a, basically a regularly sampled stack traces that are coming out across the set of services. That correlation you mentioned was not one that I had worked on previously, but like that seems like an obvious sort of next step that we should include where, yeah, if you have a profile that happens to being captured when there was some catastrophic error that you would perform those correlations. In fact, I suppose we would always correlate profiles and other data. Yeah. Okay, can we get hands? I see one there, yeah. So the question was, EVPF is polarizing, which I didn't know that, but when would we use it? When would we not use it? What would we use it for? Yeah, I mean, I can speak to that somewhat. Other people can fill in. So the advantages of EVPF is you don't have to write code or instrumentation, right? It can be automatically injected. So that's a great way to just start capturing a lot of data with very little work. The problems that EVPF faces is that it is often injected at such a low level in this stack that it can be hard to access some of the other information you would want in order to do these correlations. For example, with network traffic, you are getting in at level four, low level network packet stuff. If you want to attach that information to a trace, you would want to gain access to the trace ID. But the trace ID is held in a header that's potentially encrypted. Almost certainly encrypted. Yeah, hopefully it's encrypted. Right, so that is just a tricky problem that there is not a just off the shelf solution for. So that's a difficulty. Likewise, there's some really interesting work around using EVPF for automatic instrumentation of language runtimes. There's one for some people wrote for go recently that looks really cool. But you have this problem where if you're also doing manual instrumentation or context propagation, all the normal stuff you would be doing, normal existing stuff, whether or not that EVPF instrumentation can interact easily and effectively with other forms of instrumentation is kind of like an open problem. I don't know if you have any thoughts on that. Yeah, I think one of the challenges there is that EVPF from the application perspective is read only. It can get the information about what's going on in the application, but it can't change the context that the application is using so it can't get propagated down that it's tied to a span that was created by EVPF. So it's a great way to get started with some visibility into an application that's a black box to you. If you've got to go binary that you're given that you can't change the source to, that may be an option. But it's not going to be a panacea, I don't think. Yeah, so in short, EVPF really interesting, but not necessarily a magic bullet that's just going to totally replace all the rest of it. It's another data source. Yeah, a tool in the box. Yeah, all right, can we get hands? More questions, let's see, hand back there, go ahead. So it depends, so the question was they recently switched from open census metrics to open telemetry metrics and it's been an interesting experience. And the question was, is there going to be a V2 of open telemetry metrics? May I ask, when did you perform the integration? How many, I don't know the Java? 0.12 of Java, I don't know when that was, anything to do with you? Okay, so the V2 you're looking for might effectively be what was published today. So I don't know the timeline of Java. 1.12, because I think 1.14 was recently released and that has the final metrics API and SDK implementation in it. Yeah, but in terms of an actual version two after what we announced today, the answer is not for a long time. Yeah, ideally. Yeah, ideally, at least for the API, hopefully not. Maybe the SDK, but. We get hands for more questions, I see one there, yeah. Yep. So I'll repeat the question then, Anthony, you're probably the best suited to answer it. So the question was the open telemetry collector. Some parts are considered like GA, generally available, there are other parts that are not. One of the parts that isn't is persistent storage. So actually buffering things on disk effectively before they're sent out. As a result, there are sometimes some network challenges, but I'm guessing you're probably sending, like some things get buffered, but some of it gets screwed up and too much data tries to go out at the same time. What is sort of the roadmap for improving that? So Anthony, you work on the collector, do you want to answer that? Yeah, so there is a persistent storage component that is intended to store the internal representation that the collector uses to an object store. I think we have disk backed implementations currently. As you mentioned, that is in kind of an experimental state. One of the things that would be very helpful for us as collector maintainers would be to get usage reports from that. So if people will go out and deploy that in environments where perhaps value might be tolerable, but you can still get feedback, I think that would be very helpful for us because as maintainers, we can build it, we can run it in some environments, but we don't have the opportunity to get a breadth of experience that comes from the community going out there and using it and telling us what works and what doesn't. All right, can we get hands for more questions? Is he one in the back corner there? So the question was, you work for a, you were in a security background, is there already a format or process for defining security alerts that can be sent from open telemetry? And just a quick question back, like when you say security alerts, do you mean like specifically like a new type of signal or like logs that have a security tag or do you have anything in mind? Okay, so it's like metadata that could be added to a trace, for example, that would define a security exception. I don't know if we have anyone who... Yeah, so the answer is not yet that does sound like a great working group to form to define some semantic conventions there. I would wanna double check whether Elastic Common Schema already has anything in it related to security. If so, we would wanna use that because that'll just get merged in as part of ECS then. Yeah, that was the one thing I was going to mention is Elastic Common Schema, I know, does have a lot of SIM elements in their schema. So as we integrate that with our semantic conventions, that will be the tool for tagging the data. All right, any other questions that we can answer? There? Yeah. So the question was that they're getting questions from their customers about like security certifications or just privacy and security questions about using open telemetry. Probably the same questions that customers would ask if it was like a third party vendor agent or something, I'm guessing, and I mean, I've gone through these at Splunk as well. And is there a plan to address this like as a community where maybe kind of a document or something that just sort of has the most common questions or has answers to the most common questions? You know, interestingly, no one has actually brought this up before as far as I know. It's been brought to me by my own customers, but I don't know if anyone in the community has raised their hand and said we should put together something comprehensive so people can point at it. I don't know, that's a really good idea. We should do that. Yes, okay, got it. So one area where we do that is we try to ensure that any instrumentation we are providing to people as a project does not include any PII by default. So that's the thing that we try to take care of. It's come up in the collector before. Anthony, you might be more familiar about it. Yeah, stripping it. There are processors in the collector that are designed for redaction of attributes, removal of attributes, processing of attributes as they flow along generally. I think that same principle can and perhaps should be applied at the SDK level as well in terms of span processors. I think we have some limitation in terms of span end, but finding ways to ensure that we have a way of saying I don't ever want this attribute to leave the process could be valuable. Yeah, quick question, Anthony, because I know this has come up before. People have proposed like having like, can I have a regex that looks for something that sort of looks like a credit card number or something like that and just like blank it out in a processor? Yeah, the redaction processor in the collector has that capability. Oh cool, awesome. So there you go. Another area that I don't think we have put work into but would be valuable is with baggage. So you do have this potential leak of your baggage maybe having something sensitive getting transmitted to some downstream service that you don't control. Unfortunately, there are not strong concepts of trust boundaries in the networks that we use right now. So that's something you may want to think about in terms of like putting a proxy in or something like that to scrub all of that information. All right, any more questions that we can answer going once, going twice? All right, well thank you very much for attending everybody. We'll be here for a few minutes to answer questions right on down.