 Hey everyone welcome welcome welcome to the Jaeger maintainer talk We're gonna be talking about a lot of cool stuff. I'm Jonah Cowell, and I'm also joined on stage by Joe Elliot. Nice to meet you. Oh and We're gonna be covering a lot of things about Jaeger open telemetry Where the project is going kind of where it is as you know Jaeger is a graduated project one of a handful Very mature and well adopted in the ecosystem. Joe tell us a little bit about yourself. Sure I'm Joe and you can tell by the images up here. I work on Jaeger. I'm a Jaeger maintainer I work at Grafana, and that's a very terrible picture of my bike I like to bike around my city quite a bit and I don't know if you found your bike at the fountain or That's right. I just walk out and pick it up. It's been my bike ever since and I'm Jonah Cowell I'm the CTO at Logs IO also work on Jaeger among other projects I do a lot of diving because I live in a nice place for that. So that's what I tend to do when I'm not working some photos I've shot recently underwater and Yeah, so in the agenda today, we're gonna talk about distributed tracing in Jaeger a little bit deeper into Open telemetry and then kind of as we go through we'll get a little bit deeper into certain features that we built out We're gonna do a really quick live demo of some of the stuff that we're doing Hopefully I'm gonna do it over Wi-Fi too, which will be really fun And then I doubt we're gonna have time for questions, but come up after we also have a slack channel Which we'll hit on right at the end so with that I'm gonna dive into a Really quick intro for those of you that are just getting started or maybe don't know a lot about distributed tracing in Jaeger As Joe gets deeper into what you can do with open telemetry We wanted to just talk a little bit about a few of the semantics So a trace is an end-to-end transaction going through lots of different hops and different microservices Each of those hops or steps that are going through is a span And that's kind of a little teeny piece of the end-to-end transaction Inside the span you can have all kinds of metadata You can have tags you can attach logs you can attach all kinds of interesting data That's gonna help you either do analytics or troubleshooting And there's just a lot of different use cases and interesting stuff you can do with tracing Which is kind of the most advanced signal in Telemetry today in general and observability So with that kind of talk a little bit about the relationships It's really important in a trace to understand how those spans all come together and roll into Into what's actually happening with the end-user or the API that's being used or exercised And so here you can see some visualizations and you'll see more of them inside the Jaeger UI that explain how The different components of a trace come together and roll into that So obviously inside the data itself there are pointers that say this is the root span where I came from And this is the next hop that I'm going to and kind of shows the different pieces in the relationship And now what you can do is you can also calculate the data off of that to understand errors and latency And do all kinds of cool stuff that we're going to talk about a little bit later And here's the visualization you get when you use Jaeger So if you do install Jaeger, we make it really easy to get started We'll talk about that and you can get these cool visualizations of the tracing data that you're looking at And with that Joe is going to talk a little bit about auto instrumentation. Cool So we're going to talk about the open telemetry clients Those of you who are already using Jaeger may have in the past year or so seen Maybe a message or a warning like this in the docs or on your Jaeger client So about a year ago Jaeger deprecated all of its clients for all of the major languages in favor of the open telemetry clients the community decided to kind of converge on the open telemetry clients as they Kind of implemented the final features that Jaeger already supported the final one kind of being remote sampling if you're familiar with that So Jaeger clients have been deprecated and the Communities encouraged to move to the open telemetry clients which now are feature complete with what you could used to be able to do with your with your Jaeger clients The good news here is that there is very very wide language support for Jaeger In fact, I'm pretty sure it's a super. I'm sorry for the open telemetry clients, which I'm pretty sure is a super set of the Originally supported languages in Jaeger. I was surprised to see for instance Swift is supported So for all of you, you know writing your back end applications in Swift However, many of that is you can also instrument with Jaeger But you'll see some of the major players up there like dot-net and Java and go and node and all of these kind of Major back-end languages and platforms all already supported by open telemetry I will also point out that all of these languages also have a shim between open tracing and open telemetry So if you're using Jaeger, Jaeger implements the open tracing shim as well And it's quite possible you can very easily swap out the Jaeger back-end And your client for the open telemetry implementation of open tracing with maybe little or no code changes So open tracing is a older standard. That was an API standard Jaeger implemented open telemetry also supports that standard And I have a good link in the bottom of this slide I'll also say over my part of this presentation There's a lot of languages and a lot of details And it's impossible to get into all of it So I've tried to include links in all the slides the PDF we've attached to the schedule Should have this all of these links I encourage you to go get that and use this presentation as a jumping off point to you know get into the details of your own systems Another piece of good news is that the configuration of open telemetry clients is very similar to the configuration of your Jaeger clients all through environment variables and a lot of cases are even kind of like a one-to-one mapping so Like service name their service names in open telemetry the you know the data models very similar So a lot of this configuration is just going to be able to translate right over and this is of course not exhaustive I just wanted to show a few examples of Environment variables you were likely using to configure your Jaeger clients and you know showing you that all of these same options You're going to exist over here for your open telemetry clients So we have like sampling options of course is very important in tracing and you can see open telemetries Of course going to support that as well You know your Jaeger agent host where you're pointing your agent The propagation which is very important. We're going to talk about it in a second all of these kind of Configuration points should be available in open telemetry or something similar And again that link at the bottom has every environment variable and every silly little combination of parameters you can pass To talk about the Autoness of this instrumentation everybody gets excited about tracing auto instrumentation because sometimes tracing can feel a little manual We wanted to talk about open telemetry is capable of in this case And I have used maybe three or four of the different open telemetry clients at different times And I think the the good news here is that open telemetry generally does a good job in its clients of meeting the Expectations of the community and the frameworks that they're serving so in Java There's often an expectation of heavy auto instrumentation Spring in particular has a lot of hooks where you can do what you want after events like HGP calls and other events in your in your framework Dot net also kind of has a history of some of this a lot of these powers Whereas maybe go is a language where people expect a little bit more manual work a little bit more manual coding So in my experience diving into these languages, I'd say Your expectations as a developer is hope are hopefully met by the community And I'd certainly communicate that back if you're struggling and another really good point here is For all these languages even ones I don't develop into day to day They're a really good getting started documentation every language if you just jump into the the link there at the bottom You'll find your own language Hopefully and they'll be like the you know, hello world style getting started guide And you hopefully in an hour or two can have very basic instrumentation Set up and you can be writing to Jaeger and you can start seeing that you know open telemetry Jaeger is going to work just fine and you can migrate you find a migration path for your existing applications There's also just wanted to add. There's actually a new go auto instrumentation based on eBPF That's pretty interesting. It's super new, but I think we're going to see more and more innovation of Using things like eBPF to do more automated data collection So definitely keep watching the profiling and eBPF with open telemetry because it's really interesting stuff And I'll make it even easier and deeper data collection. Sorry to chime in. No, absolutely not. Yeah Trace propagation is also going to be very important So you have like a fleet of applications and they're all using the Jaeger clients and they're using what are called the Jaeger headers probably So Jaeger Propagates trace information with the headers and if you don't deep dive into tracing You might not know this but you're gonna need to know this real soon if you start rolling out open telemetry clients Because open telemetry by default communicates with w3c headers Jaeger by default communicates with Jaeger headers and there is some overlap there Most open telemetry clients also support Jaeger headers most Jaeger clients can talk with w3c headers But as you start rolling out some of the first apps with open telemetry, you might find some traces are broken You might need to go look at how you're propagating header values Maybe double-check and standardize on one of the two whatever is more whatever is easier for your particular environment And then you can get your full traces as they pass through both, you know You're Jaeger instrumented applications and your open telemetry instrument applications And finally I wanted to show an example of auto instrumentation So these are some screenshots from a Java auto instrumented app that I threw together I'm not a Java developer All I did was take a restful application that you know has get and post endpoints and then hits a relational database We've all seen this kind of application a million times and then I threw the Java agent jar I just linked it using the Using the instructions on the getting started page in open telemetry And I was immediately able to get some basic tracing going I had very nice attributes from both my HTTP side as well as my database side I'm not sure how visible that is. Hopefully it's okay But you can see that there are the HTTP method the HTTP path the URL The HTTP version that was in use are all automatically being added as attributes to my spans and my trace And then on the database side I was I kind of a surprise to also see a lot of deep information about the database call Automatically added with no code changes I'm seeing like the database query string the name of the database the name of the table that was accessed the kind of operation that was performed and It was really a rich set of information for almost no effort I really made no code changes just linked that jar file and it was ready to go So if you are in the Java spring world, you might have some nice surprises on your the ease of moving to open telemetry And maybe also some of the other more auto auto instrumented kind of languages Next let's talk about I'm moving to open telemetry. I have these clients going Maybe I found my migration path and I'm happy with my setup What kind of pipelines are available now that I have open telemetry and I'm more comfortable in that ecosystem along with my existing Jaeger back end So the good news here is that the open telemetry collector and the clients speak every Jaeger protocol So the Jaeger agents expect one of two protocols which you might be aware of thrift compact or thrift binary There's the port numbers up there the collector supports one of two protocols. Normally, it's g rpc from the agents And there's also a lot of us might have a Kafka queue in our Jaeger pipeline and our trace pipeline The open telemetry collector can read and write all of these protocols and it can write to the Kafka queue in the Formats that Jaeger expects so that means you can really mix and match open telemetry components along with your Jaeger components in almost any combination you want I also put a little link up there It's something important about the collector itself is there's two There's two distributions of the collector. There's one the core distribution does not have Jaeger support But there's one called contrib and I've linked the the docker image there the contrib Open telemetry collector does have Jaeger support. So that's the one you're probably going to be using along with your Jaeger ecosystem The reason we might consider using that's also probably impossible to read the reason We're might consider using the open telemetry collector along with our Jaeger pipeline is the open telemetry collector has a lot of really nice Processors and ability to mutate our spans as they move through it You might find that some of these nice abilities for instance I have highlighted a processor called the redaction processor and you can use it to look for maybe PII or Other patterns other kind of information. You don't want to pass into your back end and it will redact that But there's other processors that will add or delete attributes There's processors that will decorate spans with additional metadata The open telemetry collector is very rich in its ability to modify and work with trace data So I would encourage you to explore the collector as you are kind of getting into the open telemetry ecosystem Because the collector supports like we said all of the same protocols that the That the the Java part or sorry the Jaeger pipeline supports we can really kind of mix and match how we want So I have over here in the green box the Jaeger back end the collector writing to a database and the querier reading from that database And then on the left side we have our applications and they're instrumented now with the open telemetry client I can use the Jaeger agent like I always have because the open telemetry open telemetry client can write directly to it Or I could swap that out with the open telemetry collector. There's you know these applications can do both can do the same thing They can both read the same protocols. They can both write to the collector successfully So you have a lot of flexibility in your pipeline a lot of people use queuing in In Jaeger and like we said before because the open telemetry collector can put Trace data in the Kafka queue in the format Jaeger expects we can use open telemetry collectors in this situation as well So my client maybe I'm writing to a mix of the Jaeger pipeline and the open telemetry Collector pipeline and then I can still use my Kafka queue I'm pulling with my Jaeger ingester and now I'm writing into the database like I always have so there's a lot of flexibility here There's a lot of power a lot of a lot of configuration and a lot of documents to read I would recommend digging in and seeing what you can start doing with your trace pipeline And so what the reason why you may want to move from simple to advanced is either scale reasons resiliency reasons or other kinds of business requirements that you may have but that's typically why people ask Kafka Add Kafka one of the common questions we get all the time is my collectors overloaded or my back end It is overloaded. How do I help? How do we help and in the Jaeger channel? We're gonna tell you you have to look at where the pressure is and probably add some kind of queue some kind of back pressure Mechanism, I mean you really have to scale this stuff as you see fit Yeah, so Final slide for me just kind of trying to be silly here The point being because these applicator these these two processes are so easily interchangeable You know, you can do almost anything you want this particular pipeline is highly Unrecommended, but if you wanted to repeatedly write from collectors to agents to Kafka and back you can do it And maybe you have some bizarre business reason to go about building a pipeline as complicated as this But yeah with that I think we're heading back to Jonah sure cool Yeah, so we're gonna talk a little bit about Joe mentioned the processors There's so many powerful ones and new ones being built constantly like you just started building some of the cool things that show Architectures service maps exactly these are all processors So one of the cool things that we built over the new the last year was actually a new processor And I'm gonna talk about that and how it comes forward in In Jaeger so Jaeger is a distributed tracing system as you saw in the screenshots It can help you debug and understand where your transactions are going, but a lot of people today need APM I want performance monitoring performance data. So how do we move from distributed tracing to understanding performance? It's about introducing new signals aside from traces We also need metrics and we need to understand that so that we can look at my services slowing down I'm seeing more errors trending data this type of thing we need metrics on So we decided to build some additional components. The first piece is the span metrics processor And so you can see here and I'm gonna use my cool little thing is we've we've built We basically built as the traces come into the open telemetry collector We then can send them through this processor Which not only sends the data to Jaeger, but it also creates metrics in the process So we're actually able to derive metrics from the traces and you'll see why this is important in a second Because then we can now start surfacing these metrics and give you information about what's happening in the application So inside your open telemetry collector This is a typical configuration that that you would see for the span metrics processor And how it fits into the pipeline in the top is the definition of the processor And you can see exactly how it's calculating the different buckets that it's that it's collecting and how it's looking at The data in terms of breaking it up into metrics and down in the bottom You can see how this fits into a pipeline where you're calling the trace metrics processor and sorry it's so small within the pipeline right here and The result of this is that we then generate metrics from these traces and we're able to do cool things like Show you the metric data from your applications without any additional Instrumentation and I'm gonna show you this in a live demo, but we can show these Prometheus metrics. You can visualize them in Grafana. You can alert on them in alert manager You can use any Prometheus compatible back end with this feature And so that includes obviously Prometheus Cortex Thanos Mamiir M3 DB You name it anything that does promql will work with this and has the right exporter in open telemetry and I'm gonna show you a Really quick demo of how this looks in Grafana Just so that you can see hopefully I'm not logged out so we've built a couple of dashboards here one is Service performance monitoring dashboard and really quickly here. We can look at the total number of spans We're looking at six hours latency by service. We can actually look at the breakdown of all the metrics And then each particular service you can actually see my catalog service the latency the spans the error rates And then the second really cool dashboard is the service level view So if your team has a particular service like I own the front end of my application You can actually look at all of the particular metrics for your Individual service and these are just coming right off Prometheus Broken down by by the operation. So it's pretty powerful to have this data Available to the team because these metrics are very tied to your business in particular And the second thing that we did aside from creating the metrics is we made them usable inside the Jaeger UI So when you run Jaeger, you can actually see in this new monitoring tab How it's visualized and I'm gonna show you a live demo of this as well So in Jaeger if you haven't updated recently, you'll notice that there is a new tab right here called monitor When you click on monitor what it's going to do is you pick the service that you want And it's going to show you the latency and you see here. We've got percentile buckets Hopefully it's not too small the error rate the request rate and then I can actually go down and see the different Operations and let's say I'm starting to see oh look here. This is showing 11 percent errors on my order service I can actually drill in directly by clicking on that link and it's going to pull up the specific Traces related to that service. So just give it a second. So these are the the traces specifically related to the service So the idea here is It gives you a little bit of a monitoring workflow Obviously, we don't do alerting in Jaeger yet. I mean, you know, that is something that if anyone wants to contribute or work on We definitely would love to have alert manager integration and there's all kinds of cool things that that we can do with this data So that's kind of some of the new stuff that we've done So I did want to also hit on a couple of new features and for those of you that were at the keynote this morning I summarized these in a 45 second video. So it might be a little redundant One of the things that we did that allowed for some of what Joe talked about was was adding I think you did the work for adding otlp. No, you're Yuri did I said I do it for like a year and never got to it and he finally got that up with me and did it I think yeah, so so right now Jaeger takes open telemetry line protocol. That's the raw Protocol for open telemetry so you can send trace data directly from anything speaking open telemetry Into the collector which then gets stored onto disk the other piece which Joe mentioned was adaptive sampling We added that Specifically into open telemetry and do you want to talk a little bit about that one because you you did more on that That's the one. I actually did yeah, so adaptive sampling is a new feature sampling feature in Jaeger that allows you to adjust your sample rates in Response to your volume or your trace volume dynamically So for a long time Jaeger has been able to remote sample which means like pull a document remotely from a remote source And then use that to adjust sample parameters, but it was always a static document What this new feature adaptive sampling does is it will watch your trace data? And if maybe one particular service or one particular endpoint starts overwhelming your back end you can dynamically or the system will dynamically reduce the sample rate for that particular endpoint or service and Allow your back end not to fall over also if the maybe the Throughput of a particular endpoint goes down you could increase your sample rate So you have a larger percentage of traces from that particular service So it's just like a dynamic the ability to dynamically adjust your sample rates based on your current traffic Yeah, and there's a lot of getting back to those processors and open telemetry There's a lot of really interesting Sampling strategies you can implement and sampling and tracing is a really important topic in big talk unto itself The next feature that was added is you saw the kind of tabular views in Jaeger We actually had a community Contribution for flame graphs, which is kind of another interesting way to look at the trace data Joe mentioned earlier that the Jaeger SDKs are deprecated. So that's just another I guess it's not a feature, but Yeah, exactly we like cleaning things up so that's that's important too So some of the things that we want to do today in Jaeger those of you that are users There's a few different dependency graphs it would be nice to have one and Potentially one of the other things that we've discussed is using those new service views that you've been working on To generate that because today you have to run spark to generate the dependency views in Jaeger Or you can run like we do at my company. We run Kafka streams to generate those but either way it's More stuff for the team to manage and run so trying to simplify that is definitely a good idea Something that we need to do something general is handing out there is there we talked about those processors and the collector There is a new processor called the service map processor I think which generates service graph metrics and then Grafana can currently visualize those I don't know if there's any other place I can visualize them But they're open telemetrics and they're an open standard at this point So, you know, you can use those metrics yourself to generate visualizations Grafana can also do it Yeah, so the more context we can add on to those graphs the visual views that the more useful the tool becomes and Obviously, that's super important in when you're doing troubleshooting or monitoring the other piece is Eventually we've had stops and starts on the second major bullet here Which is how do we move away from the Jaeger collector which today writes the data to the back end and Implement something like a pair down open telemetry collector instead It's just less code for us to manage in the project and it's somewhat redundant So this one of the other Maintainers Pavel was working on this But then he kind of took a break on it and I don't know where it is But it's something that we want to do as the projects Matures and continues so more convergence with open telemetry is definitely the the trend with Jaeger And we've got probably Five minutes for any questions So feel free to raise your hand and we will take the questions Go ahead Joe pick I think we did one sec Hi, thanks for the talk does the go ebpf Collector that you mentioned does that do automatic context propagation? Yeah, they just added that into the project about two weeks ago You'll want to search for key val ke y v a l is the little startup that built that and just do key val go and You'll see it. We've tested it out And it definitely works well and does context propagation with ebpf, which is really cool that they're able to do that That is cool. Yeah. Yeah concur. That is very cool. Thank you. Oh, oh It I guess it's in the main hotel repository now, so I stand corrected. Thank you. Thanks Other questions Sure Guests in the green over there or sorry no over here you're next Definitely good a great talk Joe and Jonas are definitely helped to understand Different things an open telemetry and a good, but what's the future of eager? So today we definitely love the eager UI and the storage integrations are really good for Cassandra elastic search And recently the GRBC integration has been added But what's the future for eager and how do we separate out the boundaries between open telemetry and eager? This one's always a debate I mean we would love to have one database right ideally to do this But everyone kind of wants something different So we have people that love Cassandra and use that and people that want to use elastic or open search and then obviously there's other data stores like Like what is it prom scale or no time time scale DB? I believe that guy right there works on it. Yeah, yeah, so there's there's other options So they're compatible There's a bunch of pluggable things and I know you're working on some stuff on Tempo right to do I don't know I've considered it, but I've not done any work on it But yeah, you mentioned the GRPC plug-in, which is a great thing to mention So there is a GRPC back-end so you could write a plug-in for any back-end you wanted at this point And I think there might be a proto. Well, whatever. I'm not sure definitely a GRPC back-end which allows for extension Yeah, so I there's a lot of back-end options and I don't know It's the test suites get crazy right now for us to test every one of the back-end types And I'm not sure where that's gonna go, but yep. Yeah Sorry, I think we're back here. Yeah Great talk. I have two questions for you. First one is do you Have plans to introduce other streaming solution other than Kafka for example nuts Just stream and the second question is Do your metrics Also provide exemplars For example, the trace ad so that let's say if I have a graph on a dashboard I can see a spike. I can click on that spike and go to the particular trace graph Does the open telemetry collector support exemplars Is it remote write exemplars too? It I believe that the the collector does But in this case the The individual traces don't really matter on the metric itself because you already know the context because the The dimensions that you define in the processor for example the path or whatever tag you decide to use Is already going to show you the examples for the metric so it's Not necessarily relevant, but I guess we could if you want to contribute something that puts You know a trace id and there is an exemplar. That's totally Possible. I don't see why not. Yeah, I guess I don't know off the top of my head What the open telemetry support for exemplars right now is I'm getting a thumbs up from jurasi. So yes open telemetry supports exemplars so I would uh presume that right we could if it doesn't already Generate exemplars out of this or out of the span metrics processor Which would make for a compelling use case Like it's really nice to see the metrics generated from your traces and very quickly be able to jump over So I would review the docs if it's not there It's probably an easy add frankly it might be a good good first issue In that for that repo and that view traces button basically brings you to the traces related to the metric Anyway, so it's already kind of been there, but Um, was there a second part of the question? I kind of missed the beginning piece Did you hear the I don't know I had a second question If you are planning to support a different streaming solution like nuts or not just stream instead of kafka Uh, there are no plans at this moment to support other ones the current implementation is kind of like a very bespoke kafka implementation There's no I don't think there's a general shim there. So Uh, a new implementation will require some effort from someone else if you are interested in Contributing something like that. I would start an issue first and Discuss with the maintainers like how you want to implement it and we could talk about it I think the collector the open telemetry collector also only supports kafka. I'm not sure on that Yeah, I'm gonna look at your awesome. It does for sure It definitely does. Yeah, okay And I think we might have time for one last one. Yeah, we'll take one more Uh, thank you guys for the presentation great job. Uh, so at my company we use dot net. Uh, we create dot net apis Uh, and that's a couple months ago or a couple years ago actually or a year ago, whatever We were trying to get yeager set up, but uh, we didn't have the auto instrumentation yet And i'm curious does the auto instrumentation mean that there is no application Cold changes needed and we can just set the environment variables via Config maps or whatever to instrument the app without the developers themselves having to Pull in some type of package. Yeah, that's that's correct So there's a few ways to to instrument in dot net and you can either install the instrumentation Or it can be included as a package inside Your build if you're going somewhere that's serverless in dot net basically So it depends on how you want to deploy, but the auto instrumentation is there There's no code changes needed and there's a great team In that community that are that's working on it And uh, we're happy to take questions after Also, uh happy for you to jump on our slack, which is up on the screen We're we're definitely always there and collaborating and we're always looking for contributors And folks to participate and users to come and talk. So Thank you very much for coming and great to see everyone at cube con Thank you