 It's beginning time. Oh boy, okay, I think the doors are closing here. I will go ahead and start once a few more people get a chance to sit down. Hello everyone. Hello to my coworkers sitting here in the front row to make sure that I'm extremely embarrassed if I crash and burn in this talk. And hello to everyone else. This mic keeps on slipping down but I think it's doing okay. Hi, my name's Phillip. I am an open telemetry maintainer and I also work for a company called Honeycomb. We do observability and tracing observability. So to the surprise of nobody, I'm giving a talk about how you can trace with open telemetry. Specifically for monoliths though, which I think is something that doesn't come up very often in the context of distributed tracing. Usually people talk about it when they're referring to microservices and big old distributed systems and stuff like that. But I don't think that's really the reality that a lot of people live in. And in particular, I think I wanna begin by saying that I think monoliths versus microservices is a pretty meaningless distinction. I think the messy reality that we all live in is that we're kinda living in some sort of in-between world between microservices and monoliths and distributed monoliths and all that. And in particular over the past couple of years that I've been working for Honeycomb and helping people adopt tracing and observability for their systems, I've certainly noticed a few patterns. A common monolith application like yes, of course it's running in a single process, but it's typically running behind a load balancer. Maybe there's a database server too that it's talking to. Maybe there's a message queue involved. Maybe it's calling an authentication server somewhere. There's some third party vendors. Maybe you've actually split out some of that monolith into separate services and created them as microservices. At that point, you have a distributed system already. In fact, I would wager that a true monolith that just talks to a database when that database is on a separate system, that is already a distributed system. It may not be the most distributed of systems, but it is still one. And by definition, it can benefit from distributed tracing. Now, it may not necessarily benefit as much as the largest spread of like thousands of services or something like that. There's quite a bit of benefit to extract from that. Separately, which I've observed plenty of times in my own career, is that microservices grow into monoliths themselves. It's pretty easy to just keep on adding code to stuff. And eventually you go, oh geez, this code's kind of a mess. We should modularize this thing. And before you know it, you end up with a bunch of distributed monoliths because it's a lot easier to just modularize code that's inside of an existing process than it is to factor it out into its own separate system and so on. You may actually go and do that, but you may not necessarily do that in some other cases. And it's perfectly okay. This is just the natural evolution of a lot of systems. But I would say that regardless of where you are, if you started in like a pure microservices environment and kind of ended up with some form of distributed monoliths, or you had a monolith that you kind of partially broke out into services but never really got around to fully microservicing it, you already need distributed tracing if you're not adding it already. And that's to say that there's a lot of problems that occur in the context of systems that are outside the actual app code. Somebody may have had the bright idea to add GRPC to your system and now your load balancing is totally screwed and nobody knows what's going on. Or maybe somebody got a little too fancy with a SQL query and now it's taken a really long time and you're not really sure if it's your ORM or if it's the database itself. Maybe there's, sometimes there's bugs in databases, maybe it's something to do with your network transport layer. These things are also really, really hard to diagnose. And very often you don't even really know where to look for. Sometimes if somebody's really experienced, they kind of have a clue as to where something could be going but your average developer is gonna be like, okay, this is a unique issue, I've not seen this thing before, what do I even do? And so you can, even with the most basic of distributed tracing in just a monolith talking to a database, you can ask all kinds of interesting questions. Like how many database calls did this request take? Which database calls are the slowest? Which SQL statements are the ones that are not doing too hot right now? What's the average latency of my requests? Is this external API that I'm calling the thing that's actually causing a slowdown or not? And you can start slicing and dicing that by a lot of very interesting characteristics. And so this is all just at the edges of your system, I've not even really gotten into adding tracing to the monolith itself. But when you can start to answer these questions, you can unlock a lot of interesting fun behavior and realize that your application is doing a lot of things that it shouldn't be doing and diagnose a lot of problems that your users are already probably experiencing today. So you can rein in the edges of your systems with this is in particular, this is automatic instrumentation. You can also do a whole lot more. And so one example I wanna bring about, bring in about the early, the introduction of this journey is with a company called Intercom, who's one of our customers. There's probably some of you who are using Intercom, there may be somebody in the audience who works for Intercom. Intercom is sort of like a support chat system and like there's sort of a thing that you can embed as a part of your product and have a good support system and build like product tours and that kind of stuff. Their app was a big old monolith and a handful of smaller services that they kinda broke out here and there but they hadn't really fully gone the whole microservices route and they didn't really have a desire to do that either. They had a lot of structured blogs but they didn't have good tracing and so they decided to adopt it. And so they started just by instrumenting API calls, database calls and eventually getting into instrumenting their service themselves and they wrote a wonderful blog post that you can Google about where it really transformed their internal engineering. Where in fact they didn't even necessarily use tracing to be like, okay, this is how we're gonna diagnose the problems that our users are reporting but even onboard new users and understand just how this thing is working in the first place. When a user comes in, what's the journey that it actually takes like this request? Which other services does it hit? Which database does it hit? What are the paths within the monolith itself that it actually goes through? And distributed tracing actually ended up becoming a way to just simply better understand their systems in the first place so they can improve them more effectively rather than just sort of trying to be like, well, I know what this function does and I kinda know what this function does but I don't really know if one calls the other in this big old soup of code that we got going on here. And in fact, their journey here took about a year. They iterated a whole lot and so like it wasn't one day they had a whole bunch of amazing instrumentation. It was sort of a little bit more and more every single week and they eventually got to the point where now they have really, really rich tracing instrumentation. And so I wanna talk about that a little bit. You can call it tracing for a monolith but it's really just tracing for a single service. In fact, the principles that apply to tracing a distributed service are the same that you would apply to pretty much any other service that you're doing whether it's a small service inside of what you're doing because you wanna onboard tracing for the first time or if you're fully bought into the idea of tracing and you wanna distribute, you wanna trace your very large monolith that you have going on there. I'll call out a few examples along the way. So if there's one thing that I want you to take away from this talk, if you've not done tracing before is that traces are just better structured logs. In fact, a trace is actually a collection of these things that are called spans. And a span is just a structured log that has some IDs on it. And the collection of those IDs allow for you to define an order of operations and a parent-child hierarchy in your system so you can understand that, okay, these are the operations that occur within a given request and this is the order in which they occur. And so there's a name because everything needs to have a name. There's a span ID which represents the operation that is actually being performed. That span may represent the entirety of a request or it just may represent a single function call. There's a parent span ID so it knows what came before it so you can define that order of operations. Key value pairs of data, this is the log part, right? Basically there's a name and then a value and a name and a value and you wanna add as many of those things as you want. And then there's a list of what are called span events and span links. These are not really necessary per se but they can often be really helpful in modeling certain parts of your system. So especially with span events they sort of represent a moment in time that something occurred within the context of an operation and you want that time stamp and maybe some other metadata associated with that operation that happened. This is a really easy thing that you can just attach to a span. So it's like structured logs but it's better. It's more powerful. It's got a lot of goodies baked in and open telemetry defines all of these things across 11 different languages that are fully supported. So it's something that you can frankly get started with today. I like to show people this visual representation which again actually came from Intercom Engineering. This is from the blog post that I mentioned earlier. On the left hand side you can see a big incomprehensible soup of text. This is actually their structured logs before they added traces. And so there's a lot of actually really useful information in there. I'm not gonna try to zoom in so you can actually see what's in there. It's pretty incomprehensible. In fact, you might notice there's some colors in there. They tried to color code one thing and another thing so they can kind of visually distinguish hey there was this operation and this other operation and stuff like that. That's a losing game. Like that's really, really hard for you to parse. Imagine a user is reporting something wrong and like you have to just stare it into that thing for minutes, hours maybe and try to understand what the heck is happening. Now, so when they transitioned over to tracing the right hand side, this is a trace within Honeycomb but you can get the same visualization in pretty much any tool that supports tracing. You can see there's a hierarchy there and actually operations that are nested. And it's this order of operations and hierarchy that tracing builds in for you that really helps with understandability of the system. So it's not just that function A calls function B rather function A and function B are operating as a part of a hierarchy that's actually contained within this other thing that's happening. And so there's a whole bunch of nice little visual tricks that you can do with a tracing tool like understand okay, this like sub hierarchy is uninteresting to me so I can just collapse it and move on to the next thing. And you can't do that when you have this flat list of structured logs that you're looking at. So this is kind of conceptual. How to trace a single service is what I would like to call it, it's simple but not necessarily easy. So it's simple in that there's very few steps that you actually need to take but a lot of those steps need to be repeated in the context of your application and the information that you end up capturing in your traces is gonna vary from parts of your application to other parts of your application from system to system and from team to team. So this is probably one of the easiest things to actually Google if you go to open telemetry, let's say you have a Java service, right? Get started open telemetry Java. We have a really thorough doc site that tells you how to add automatic instrumentation. And this is kind of the first step that you wanna take whenever you wanna start instrumenting a single service is there's a couple different languages support actually modifying a, well adding that instrumentation without modifying a service itself. So in the case of Java, there's a Java agent that you can load as a sidecar that will instrument all incoming requests and outgoing responses, things like calling a Kafka cluster, pretty much anything where there's a library that you use to interact with another part of a system, it will instrument that and create spans for you automatically. This is true for Java, this is true for Python, this is true for .NET, this is true for Node, this is true now also for Go services. And if you don't want to use an agent or if you can't use an agent for whatever reason, maybe the language doesn't support it right now, there are instrumentation libraries that accomplish the same thing. So it's a little bit more work, but basically you just initialize open telemetry with these libraries, you sort of register like, hey, we're doing a Spring Boot application. So we're just gonna instrument string boot requests and that's it. And you don't have to go and figure out how you're gonna like parse this header data and that kind of stuff. It just sort of takes care of that for you. And so critically, what this auto instrumentation and instrumentation library stuff does is it creates the structure of your entire application or the skeleton, if you will. So you can now start answering those questions from earlier, like which requests have the slow database calls or if I'm calling this external service, is that external service the thing that's slowing down? It'll tell you on a service by service basis which things are slow. And it'll allow you with an observability tool to actually calculate a lot of useful information, like average request latency and average error rates and that kind of stuff and things that you would normally try to get from a metrics-based solution you can actually derive from these traces directly, which I think is really powerful because that allows you to stick with tracing as your primary way to do instrumentation. But then once you do that, you're gonna wanna get a lot more useful information out. So it's not just slow requests that you wanna care about, but there's particularly users that are involved with those requests or let's say you have an IoT based system, there's particular device IDs that might be experiencing, if you can imagine a IoT device experiencing something, but experiencing like slow database requests or bad behavior or something like that. And you wanna be able to start slicing your data based off of that application context rather than just what's going on at the edges of your system. So I like to think of it as sort of like layer one and layer two problems. So like layer one problems are like, at the edges of my system is everything doing okay, but then now I wanna dig into like my actual application. And the way you do that is by using the OpenTelemetry API, which again is very simple to add or at least start using, right? Once you add automatic instrumentation, what that will do is it will initialize OpenTelemetry in that process. And so then you can use the OpenTelemetry SDK to access those APIs and you can start creating spans. And the mental model that you wanna follow is whenever you think, oh yeah, we should log this thing. That's sort of a sign that you should either create a span or add data to an existing span. It's kind of up to you if you want to represent an operation as a span itself or just capture key value pairs in existing span. This is where it's simple but not necessarily easy because this is gonna be getting into the domain of your application, right? You wanna create a span at the start of a function or a method to sort of track. You can think of it as like tracking the operation itself. And within that operation, you can attach as much arbitrary metadata as you like. In fact, you can use span events to say, oh crap, this thing happened right now and we're just gonna add that data to the span and then end it and then ta-da, we're good to go. That's gonna be more challenging for a large service because you may not necessarily own the part of the code that you wanna instrument. And so that's why I'm continuing to say the word simple but not necessarily easy, especially when you want to instrument deeply such that you can use that instrumentation to understand your whole system. That's when you're gonna need to get a lot more team members involved instead of just willy-nilly throwing spans around and hoping things work out. And so there's sort of a secondary way to do this or rather additional way, I guess, it kind of depends on the size of the service. So hopefully if you have a big monolith, it is at least organized into different modules or if you're following like the Martin Fowler kind of spiel about bounded context, sort of like defining organizational boundaries within your application. Those organization may map to like, they may map to specific teams or they just might be domain organization. But regardless, there's a way that you can also represent that in your telemetry when you're creating spans within that single service. The easiest way to do that is to add an attribute to a span that you create. This is metadata that basically defines that name and that name can come from configuration. It can be defined programmatically such as if you're using .NET, you can just say the current name of the assembly and if you organize your modules into assemblies which a lot of .NET applications do, this is a great way to do that. Or you can even just define this as a required parameter whenever somebody wants to create a span in a trace, you can just say, okay, what is the bounded context that we're in or what's the name of the module that we're in? And the reason why you wanna do this is because it allows you to slice this data based off of that application context that you're in. So this is very analogous to services and distributed tracing. So with the full distributed tracing assuming you're like in a microservices world every span has what's called a service name on it. It's actually defined as what's called a resource, but I won't get into that. Basically whenever you create spans inside of a process it has a thing called service.name. And so that allows you to say, okay, well your which of my services are the slow ones and you can sort of like stack rank by services like latency and errors and that kind of stuff. That's great if you have lots of different services but when you're inside of a single service that big chunky boy that you got there is a lot of different modules inside of it. Well, organizing by service isn't necessarily the most meaningful thing in the world because well, it's all in the same service. So how do you do that? Well, this is sort of recreating that thing. And what's great is the open telemetry API makes this extremely easy to do and you can then use certain language constructs such as assemblynamefor.net or if you can get the package name of Java or if it's even just a file that gets loaded at that point in time that gets configured externally. Regardless, you then have an organizational attribute on every single span that you can then start to slice and dice your data by. And so this is really helpful when your monolith has its boundaries defined by different teams and you can actually even have that organizational context be like a team name or a team ID. And so then you can start to use sort of like the organization of your own company to your advantage when you start looking at things. So you can say, oh, well, is my team the one that owns the service or is it another team that owns the service? We see that quite a bit actually with some of our customers where they just attach literally like the name of the team to the instrumentation that they're capturing and then they're able to very quickly hunt down. Okay, this thing that's slow right now that I'm observing is this team I'm gonna go and talk to them. If you don't have that ability, it can be really hard to hunt down who the right person is for something. And so encoding that into your instrumentation is actually fairly easy to accomplish but it really, it really, really pays off in the long run. And so I wanna show you a visualization of an in-process span. This is actually from Honeycomb itself. This is a feature that we have called, it's called Query Assistant. It's just, so we have this big service, it's called Poodle. It's named after a dog, I don't know why, but it's named after a dog. This Poodle service does a bunch of stuff. It basically is the thing that powers our entire UI. There's all kinds of different traces that are shot off from all kinds of different user interactions. And so it basically is a monolith. So one specific piece of that is this feature that I worked on where you can type a little thing into a box, hit a button, and then it goes and it executes a query for someone. So that itself, maybe somebody news coming in, they're like, oh, there was a bug in this feature. How do I know how this thing works? How many operations does it actually do? Well, the answer is it does 48. And any of those 48 steps could potentially go wrong or something could be slow. In fact, within those 48 steps, we make another network call. We call into another subsystem. We get a bunch of schema information from our database and stuff like that. There's a lot of ways that this, this is actually kind of a complex feature. There's a lot of ways that it could be subtly wrong or we could introduce a new bug or something like that. And God helps somebody who's new on the team if they did not have this instrumentation in the first place to understand where something is slow or where something might be erroring out. Or even if the wrong decision is being made somewhere within that soup of function calls that we're doing. And so this is extremely helpful for onboarding but then also really understanding sort of where your latency and your errors lie. And so in this case, we had an operation that actually took what, eight seconds. That's actually quite a bit and about four of those seconds lie from the middle of this thing. And so that already gives me a sort of an indicator of like, okay, half the latency is coming from this one spot. Maybe we're not making the right API call here or maybe we could actually improve the way that we're making this API call. And this is sort of just immediately visually evident from this sort of thing. And all of this is from, frankly, not a whole lot of effort to instrument manually with open telemetry APIs. So I've talked a lot about instrumenting with spans and traces but chances are you probably got a bunch of logs in your monolith and you shouldn't have to throw them away. This is, we worked with one customer in particular who really wanted to stress this where they had a big monolith. The company's name is Loan Market. They're based in Australia. They were completely sold on the idea of distributed tracing, adding traces. They're like, yeah, we get it. We get it. This is what we're gonna do. But we have a big legacy of logs that have been useful for us in the past. And if you tell us that we can't use that, then you're not gonna get our business. I mean, that's a, so like open telemetry has recognized this fact. Okay, like this has been the case since the inception of open telemetry but now I think it's a really, really, really good time to start adopting because of a concept called the open telemetry logs bridge. So it used to be called the logs API. It got renamed to the logs bridge to better represent what this thing does. So the idea is that for a given logging framework, let's say, again, you've got a big old .NET application. It's using the Microsoft Net Logging Framework which is kind of a standard one that most people do to construct structured logs. What open telemetry will do is you initialize a trace within that same process. So that can come from automatic instrumentation or you can just do that manually on your own. It's kind of up to you. But then in memory, there's this object called trace context. And so trace context is the thing that contains that trace ID and that span ID that sort of tracks like what the operation is at this moment in time. That same object is then serialized and deserialized at process boundaries. And that's what sort of creates that trace for you as you start creating these structured logs that we call spans within a process. And so what it does is it actually intercepts that log as you call it and it wraps the body of that log with the span ID and the trace ID. And so it's like kind of creating a new span almost. It's not quite a span because the duration part is missing. There's a couple of other things that are sort of missing. But critically, the body of that structured log that you may have created is now directly correlated with the trace that you have. And so that means that you may not have like the ideal world of really rich tracing instrumentation going through your application, but you're kind of halfway there. And that automatic correlation is what's really, really important because then you can start saying, okay, I know I have this useful application context in my logs, I just need to know what was the request that came in or what was this other information that's existing inside of my spans. And like maybe I can even correlate that with things that are happening downstream like database calls that are going on. And so this logs bridge, I think is gonna be an extremely important concept for the majority of legacy applications to start onboarding onto open telemetry. I'm here to say that like now's the time to start looking to see if your logging framework is supported because this is stable. And this is stable on the spec, this is stable in the transport layer. And that means that framework support is starting to onboard across all the different languages right now. And so what you do with that, and this is what our customer loan market did, is they use this as a basis for slowly and gradually migrating stuff over to traces or sorry, spans I should say, but on their own time and when they felt like it when they had time. The value proposition of observability was not to throw everything away and start all over again, but this time with distributed tracing because everybody says it's better. No, it was to keep sending your logs where you're sending them. In fact, in this case, they would eventually send them over to Honeycomb or another tool alongside their traces, but they didn't have to interrupt that workflow. They didn't have to stop what their devs were doing already, the diagnosed problems. And they could figure out their own migration story. And so that thing that I mentioned before where anytime you kind of have, you say, oh, well, I want to log this thing and you want to go and create a span. What they would do is one of their leads started to look at all the different places where they were logging things and said, okay, well, this one's actually really important. Like these other things maybe not so important. So I'm going to focus on the important area and I'm going to turn that structured log into a span. And then I'm just going to repeat that process again and again. And so where they are right now is that they have migrated a decent number of their structured logs over to spans. And so their traces are getting richer and richer, but critically, there's really no loss of information throughout that entire process. Now, what that has actually meant for them, and this would actually mean for you if you were to do this too, is that your workflow for looking at a trace and then looking at like a correlated log might be a little bit disjoint sometimes. This kind of depends on the observability tool that you're using. Honeycomb supports the ability to look at this kind of stuff, but when you're looking at a log versus part of a trace, you have to use our UI in a particular way that's sometimes not the most optimal. And frankly, I think most observability tools are kind of like this too. Some will support a way to sort of like go to like the logs database, but because it's correlated, it'll actually take you to the right thing so you don't have to go and search for it. So it's not all unicorns and rainbows, right? You're gonna be migrating, migrations are kind of messy, but you can actually do it and you don't need to like have this question of like, oh, well, if I add tracing, is that gonna be compatible with like everything else that we're doing? That's not a question that you have to ask. And so to be honest, in the hotel community, I don't think we've really figured out what a good migration story should look like for organizations. And this is where I think a lot of us would like some of your help. If you're in the process of migrating a monolithic service over to tracing with open telemetry, we would absolutely love to hear from you and try to understand all the problems that you're having. If you're motivated by this talk to go and do that, yes, absolutely please. I would personally love to hear like what your issues are. And in particular, I wanna try to get a lot of these common things documented in the open telemetry website so that the next people who go and start to add better instrumentation to their monoliths don't have to run into a variety of stumbling blocks that might be common across lots of different orgs. So I wanna end with just a few takeaways that you can take back to your team. So traces are for monoliths, right? If you see the phrase distributed traces and somebody says, yeah, that's what you do for microservices. That's not necessarily wrong, but it's not really inclusive of the fact that there's really nothing about tracing that is inherently for microservices or against monoliths. They're sort of one and the same. And in fact, the practices and the idioms around tracing apply just as much to a monolithic service as they do to a widely distributed set of microservices. You don't need a ton of services to get value. You can ask even really basic questions with tracing that you couldn't ask before, at least not without a lot of effort with your logs. You can start with one service, get value out of that and incrementally add more. In fact, there's technologies where now in, like if you're running everything in Kubernetes, you can use the Kubernetes operator that will auto instrument every single service inside of a cluster. So that's really kind of a great option, but that's sometimes not the reality that a lot of people live in where part of stuff is in Kubernetes, some stuff is not necessarily in Kubernetes. But regardless, you have a lot of options available to you. And I would say that like if you can adopt the mindset that structured logs are, or rather I should say traces are a superior, like better form of structured logs because they have an order of operations and the hierarchy associated with them, then it can actually be fairly easy to get the team to onboard with this kind of stuff, especially when they can see that trace waterfall for the first time and really sort of see, okay, this is what my service is really doing. And lastly, you bring your logs with you with the LogsBridge API. This was something that was not really true about two years ago, maybe even a year ago, arguably. There was a little bit of support a year ago, but it was kind of janky and not that widespread. Now is the time to start looking into if it's actually supporting your logging frameworks or not. Now, it may be the case that your particular logging framework is not supported by the LogsBridge standard in Hotel, and that's fine. Check back in a couple different months or just go ahead and proceed and go ahead and add tracing anyways and look at your logs in one place, look at your traces in another place. It's kind of a messy world, but it might be better than not adding that tracing in the first place. So that's what I got. I have a little bit of extra time for questions, so please feel free to ask them. Thanks for the overview, it was really helpful. One question I have is around more like legacy monoliths. My experience and maybe others is older monoliths tend to adopt a polling architecture where they're like polling the DB or polling some API or something and creating a lot of noise. And so it creates a barrier to entry for like auto instrumentation because then you just end up with like a mess of useless traces that you have to sample out or it raises the barrier to entry. Do you have any suggestions on how you can onboard to tracing still in kind of an auto instrumentation way without necessarily having to tackle architectural constraints and all of that up front? That's a good question. So for anyone who didn't necessarily hear the question was that sometimes a lot of monoliths have architectural patterns like polling a database all the time instead of just like calling it when it needs to call it. And that can potentially create issues when you add tracing. So yeah, I mean, I think there's a couple ways to think about that. First is that most of the auto instrumentation components in hotel allow you to filter things out without needing to sample entire traces. So if something is proving like really problematic like basically like imagine you add the Java auto instrumentation agent and some of the instrumentation is super noisy because of this architectural pattern that you just can't unwind right now but then some of it is actually really useful. Well, you do have the ability to turn certain things on or off through configuration. And so that can be like a really good stop gap in the interim. Now it's not like a complete solution but you can sort of get something instead of nothing. The other thing is the sampling of that data. This is sampling gets into a bit more of a complicated topic but there is a concept and sampling known as dynamic sampling. And so the idea is that it's able to understand statistically based off of the frequency of certain kinds of traces and data that they contain and sort of up sample or down sample based off of the uniqueness of keys that you sort of established. So like if you have a particular key representing some operation that's going on and there's a lot of noise it can actually very aggressively sample those traces that are experiencing a lot of the same values in that key. So it's like that concept is kind of like that's a whole other talk. In fact, Kent Quirk here is he gave a talk on this topic as well. I would certainly advise chatting with someone like Kent about this concept. This is something that does exist today. It's not in hotel but it's likely going to be added to open telemetry pretty soon. So that's kind of what I would say for now. Hi. My question is will be there any performance overhead if I start tracing all the requests that comes to my web app? Yes, there will be performance overhead. The answer to that though is there's a really, really big old it depends for that though. It kind of depends on the shape of those requests, how much data is involved in each one and the provisioning of your infrastructure. So like I've seen somebody come in and say this added 20% CPU and memory overhead to my nodes and that was unacceptable. Usually the case was that they had one like really, really noisy piece of instrumentation that was kind of low value for them and then they turned that it off and it's like sitting at like maybe 1% to 2% overhead. There was another one with Python where we had a customer that was specifically like a thing that they were doing, they needed a custom what's called a span processor. It's a way that you can customize the way that open telemetry processes data and exports it while it's actually generating that data and they wrote something custom because they needed to add some specific contextual information and they felt that that was the best way to do it. But then that added overhead of about 30% or so to the nodes and so we worked with them to figure out, okay, there's actually a better way to sort of add this data and then again it dropped it back down to about 2% so I like you may very well experience some significant overhead, but I think there's a couple knobs available that you can sort of dial things back on and sort of get to an acceptable level. But that being said, because instrumentation creates overhead, there's, you know, in the open telemetry space we try to do as much as we can, especially by batching data up and flushing it asynchronously. But you're not going to be able to avoid some percentage overhead. Thank you. Great doc. Just wanted to ask you a quick question about if all my services are running on Kubernetes, is there any reason I would want to instrument my app for hotel if I could just drop an EBPF exporter or something like that? I would say yes. So I think EBPF is actually pretty radical. At some point EBPF might be the solution that we all get, but right now it's unable to extract any context from the actual application itself and it's unable to do context propagation throughout a given process. And so what it'll do is these EBPF agents will typically capture networking data or like if a database call was made and so some of the value of hotel auto instrumentation you could argue goes away if you could drop in a network agent. But if then you want to start being like, okay, well what if I want to attach a user ID to this stuff? Or if I want to attach a device ID or something like that, kind of don't really have that many options with these agents. The hotel operator accomplishes a lot of what that does by what it does is it adds an open telemetry collector into your cluster and then it configures all of your pods to send hotel data to that collector that's dropped into your cluster and it auto instruments each of them with whatever language you have. And so you'll get not only roughly the same shape data that you would get from an EBPF agent but often you can get actually much richer data by default especially if you're using Java. The Java auto instrumentation is frankly an incredible piece of technology and you can even get like some like method level instrumentation is automatically that are a very tall order for an EBPF agent to get in any kind of coherent manner. But that being said, I think this is a space to watch like things are gonna be evolving quite a bit. I would also say that like EBPF and hotel are not necessarily at odds or like necessarily different technologies. Both communities are like, we're certainly aware that we're trying to accomplish a lot of the same problems and we wanna try to make it so that like maybe someday the thing that you add to your cluster whether it uses EBPF or not is just kind of like an implementation detail. So you mentioned that span events were a way to tag that a certain thing happened at a certain time given that logs typically act the same way given event happened at a given time. Is there a way to turn existing logging into span events? And would that be something that should be at least should be evaluated? It depends. So yeah, so in particular one of our, one of the customers that I mentioned on market this is actually what they did is they basically wrote a wrapper that basically every time they're logging something with their logging interface. It actually under the covers just takes that same log, it just turns it into key value pairs on a span event. And so that same timestamp is, that's just the timestamp on the span event. That was something they had to configure in their code but it was a relatively light lift. It may vary from language to language if you have the ability to intercept method calls like that pretty easily. I think the answer is like kind of but there's not really a way to sort of like a configuration file that says like hey turn the logs into span events. And I don't know if that would necessarily be possible but I think also a thing that's worth looking into is this answer can also depend on the back end that you're dealing with. So this is kind of Honeycomb specific but what Honeycomb does is when you have an open what's considered an open telemetry log. So like it's the same log but it's been wrapped with like the spanity and the trace ID is we actually annotate it internally the same way that we do a span event so that it gets loaded up as a span event in our UI. So that if you've been creating logs as span events and you have these existing logs like it's kind of one in the same when you're actually investigating things. Now not every tracing tool is going to do that. So your experience might vary from tool to tool but that's kind of I said it depends but like it kind of depends on like two different ends of the end of the spectrum. So yeah. Thank you.