 So thanks, everyone, for joining us. We're going to dig into what's going on with Yeager, what's new. We're also going to talk a little bit about open telemetry as well. Thanks for joining us. It's awesome to see the turnout at KubeCon this year. And I hope everyone has a great show. It's an awesome city. And I hope you all enjoy your time here. Pavel, tell us a little bit about yourself. Yeah, so yeah, I'm Pavel. I'm a software engineer at Red Hat. Yeager maintainer contributor as well. Open telemetry contributor and maintainer to Yeager open telemetry operator based in Switzerland. And when I'm not working, I spend time in the mountains, as you can see, doing some biking and freeride skiing. Nice. So we're kind of opposites. But I'm Jonah Cowell. And I'm VP of product management at Ivan. And I work on a lot of various open source projects, including Yeager open telemetry, open search, and several other projects. And when I'm not working, which is rare, I spend a lot of time underwater. I live in South Florida, and we've got great diving. So some photos that I shot over the last few years, but definitely enjoy it. So if you want to talk diving or distributed tracing, I'm happy to do that. So we're going to talk a little bit. We're going to start with some basics. It may be not that interesting to those of you that have implemented these technologies. But we're going to talk a little bit about why you should look at distributed tracing and give you a little intro to tracing. And then Pavel's really going to dig into open telemetry auto instrumentation pipelines and how you set up O-Tel and Yeager in the various ways that we can approach that. Also Pavel built the Yeager operator. And he's going to talk a little bit about some of the patterns in the Yeager operator. It's really useful for deploying and managing Yeager. I'm going to jump in a little bit and talk about some of the new capabilities in Yeager, namely the Prometheus integration and kind of where we're going with new features in the roadmap. And hopefully we'll have time for Q&A. So if not, you can tweet us too, so feel free. So just to give you a little refresher on distributed tracing, we sometimes want to talk about the technology and not ask why we're doing these things or who we're doing them for. So the problem is that most of us are implementing microservices architectures. And we've decided to decompose our teams too. So the challenge is when something breaks, where is it broken and who can fix it? That's really the main reason why we look at distributed tracing is to make it easier to pinpoint the problem and find who can actually fix the issue. So it's important to think about that. And it's a lot of different teams in the organization that can really take advantage of what tracing can do. There's a lot of additional use cases for tracing, besides just problem detection and resolution. If you want to understand dependencies, let's say you're making a change. What is it going to affect? Let's say you're moving part of your infrastructure to a different network. How is that going to affect my application performance? It really lets you drill into specific user actions from a technical level, so you can get usage information, but also the performance and how those users are using things. So it can really help bridge the gap between you and potentially a line of business or a marketing team. It gives you a lot of common language that you can really talk about. And then finally, when you use tracing for metrics, which we're going to talk about, you can monitor and maintain SLAs with the same data and the same instrumentation. So these are some of the reasons why tracing is so important and continues to be important. So the things that you're going to hear us talk about if you're relatively new to tracing, there's a few keywords here. Instrumentation, you can think about this, and I thought this was funny because the company that made them is Jaeger, which is not intentional, I did not make the graphics. Apparently there's a company that makes instrumentation gauges that's called that. But anyways, this is really how we collect signals and data from applications and infrastructure. We then have to store the data somewhere. That's where the data collection is, and we're going to talk about that quite a bit because Jaeger has a lot of options around data collection and storage. And then finally, Jaeger, the Jaeger UI is the analysis and visualization, but you can also use a lot of other tools. So we're going to talk a little bit about Grafana and how you can analyze data with that in terms of the visualization. So just a little background. Now digging into the semantics of open telemetry, so just a few of the ground rules before we get into some of the more sophisticated concepts like auto instrumentation. So when you hear about a trace, this is the end-to-end request. So think about it when you go to a login page and you click login. There's a transaction that's running from end-to-end. That's a trace that would be called login. Now that trace itself might be hitting several services. It's going to hit the user service. It may hit a third-party OAuth provider. These are all going to be spans that are part of that single trace. And the idea is that the span is a single component and inside that single component in the span itself, we can have tags and other metadata associated with it. So this can be a log message. It could be geographic information about a user. It could be the data center that that particular service is running in, for example. So you can do all kinds of things to provide context around the trace itself. And so here's some visualizations. As that transaction fans out to multiple services, you essentially have a hierarchy that goes across these different components as they come together to essentially deliver the request that the user made. So there's a few ways to visualize this and you'll see the timeline view over here is much more similar to what you would see in Yeager, which is showing you as time progresses how much time is being spent. So we're going to show you that when we get into the Yeager UI a little bit. So this is the first view of it, but that's that timeline view. As the time progresses, here is the path that the transaction is going through and the different components that it's interacting with. Pavel, tell us a bit about auto instrumentation. All right, I will go free topics. I will start talking about the open telemetry and open telemetry auto instrumentation and how you can migrate to it from Yeager SDKs. Then I'll talk about the Yeager architectures and how we can fit the open telemetry collector into those architectures or deployment models and then I'll talk about the Yeager operator. So for the instrumentation, you already know that Yeager ecosystem, it used to, like Yeager project used to be end-to-end tracing platform that was handling data collection, so how we actually extract data from the applications, but as well data collection, storage and visualization. In 2021, we deprecated the Yeager clients or the SDKs and we kind of recommend users to migrate to open telemetry. So the Yeager SDKs they implemented open tracing, which is another CNCF project focused on kind of providing you vendor or mutual API that you could use to instrument your applications to report trace data. This project was deprecated and merged with open sensors into open telemetry and so the open telemetry is the kind of the right migration path for Yeager users. The good news is that the open telemetry project gives you open tracing shim, which is a library that you can use, you can remove your Yeager SDKs and use the open telemetry open tracing shim instead. As you can see, the language support in open telemetry is quite good. It supports most of the mainstream languages. And so when you will be migrating, the first thing you will have to use the shim, by using the shim, it will allow you to keep your instrumentation code in place. So if you use the open tracing API in your business logic or you instrument some HTTP or RPC frameworks, you can keep using those and then just change the initialization code. With the initialization code, you will have to change how the SDK is configured. And good news is that the Yeager SDK and open telemetry SDKs are very similar. They use kind of the same concepts. So on the left side, we see the Yeager environment variables and on the right side, the open telemetry ones. There is always like good alternative in open telemetry. So for instance, in Yeager, we had to set the service name, which is Yeager underscore service name. In open telemetry, we have, you know, very similar environment variable hotel underscore service name. And kind of the same goes for other configuration options. This is not the entire list. If you would like to get the entire list, there is a URL on the slide. The next thing you will have to solve is the context propagation. So as Jenna mentioned, tracing kind of works across the services. It's able to capture the whole entire transaction and then correlate this data into a single kind of unit called trace. And to identify this trace, those services, they have to propagate a unique ID. And Yeager uses so-called Yeager header to encode these IDs. And open telemetry uses W3C trace context, which is kind of a different header with different format. So if you want to kind of start slowly using open telemetry, you have to configure it to use probably both trace context and Yeager at the same time. Yeager libraries, they as well, kind of support the trace context. So you can, you know, depending on your needs configure, have you like it and like progressively move to open telemetry and trace context. The Yeager project, or the Yeager project hosted only the SDKs, right? Which is the implementation of Open Tracing API that you could use to build your instrumentation libraries or kind of instrument your applications directly. However, open telemetry is much wider project and it gives you kind of end user packages that you can drop into your hosts or Docker images. And those packages will automatically instrument your applications. They are usually called auto instrumentations or agents. They are right now available for Java, Python, .NET, Node.js. It's super easy to get started with them. And I would highly recommend you to take a look at the auto instrumentation libraries if you are starting with observability. They also will do the context propagation. So when you use the auto instrumentation, it helps automate that portion where you don't have to write any code. The other piece is the go auto instrumentation is currently not in the hotel GitHub repository, but you can find it. And it uses the EVPF just because it's compiled, so. I think it has been already moved to the hotel. Oh, it got moved. Yeah, but it's something new in the project. It's maybe not production stable, production ready, but it is something that it's happening and maybe over time we will see more kind of EVPF based agents for another kind of natively compiled languages in the hotel. So what you can see here on this slide are essentially like three screenshots. On the left is the whole kind of trace with a timeline U. In the middle, it's our attributes from a span for HTTP requests. And on the right are the attributes for a database call. And this was taken from the Spring Boot application that Joe and other Yeager maintainer built. And the app was instrumented with the open telemetry auto instrumentation for Java. And as you can see, the Java agent, it captures a lot of rich information. And so for instance, for the database call, we get even the database statement, the SQL table or what was the operation and so on. Now I'd like to talk about the pipelines and how we can fit open telemetry collector into existing Yeager deployments. So you may first ask why we should do that, right? And the open telemetry collector, it integrates with Yeager. There is Yeager receiver and exporter for the, it supports the agent and collector APIs. The auto collector as well gives you a possibility to use Yeager remote sampler and as well integrate with Kafka. There is Kafka receiver and exporter that can be configured to use the Yeager format as a payload. You may want to use the collector as well because the open telemetry collector because there is a lot of kind of additional functionality that is not available in Yeager. The, this functionality is built usually as a processor and this processor, they allow you to usually mutate the data, which can be super useful if you need to extract new attributes that you can use later for querying or you need to do PII. Or for instance, as Jonah will talk about, you can use processors to actually extract metrics from traces, which is super interesting. And as well, for instance, there is a processor that allows you to, that recognizes like from where data is coming in your Kubernetes cluster and automatically attach the Kubernetes resource attributes like the pod name, deployment name and so on. And then of course, tail-based sampling which is really useful, which is making sampling decisions after a transaction is completed. And that opens up a lot of use cases like capturing all of my error traces and not capturing all of my transactions that are okay or performing well, which is super useful. But it does use a lot of memory, so be careful with it. Yeah, exactly. And the auto-collector is as well very pluggable and if there is missing functionality, it's easy to build your own collector to kind of solve your specific use cases. So the first architecture is kind of the simplest one, meant for production, so there is a database and Yeager collector and query talking to database and then Yeager agent receding data from the applications, in this case instrument with auto-client. And so you can use the auto-collector to kind of substitute the Yeager agent because it supports the same kind of protocols. This deployment, it's scalable, right? You can scale up the Yeager collector, however, at some point you will probably hit some scalability issues and to kind of resolve the scaling problems, you can put Kafka in front of your backend. And in this case, you will use the Yeager ingester to read from Kafka and store the data into your backend. In this case, the auto-collector can as well substitute the Yeager agent, but it can as well substitute the Yeager collector because there is Kafka receiver and explorer. So let's talk about the operator and the operator is, it's kind of a Kubernetes concept. It's a component that you can deploy to your Kubernetes cluster and it will allow you to provision Yeager deployment and as well operate it over time. You may ask like why you should be using the operator and not the, for instance, the Helm chart or like vanilla Kubernetes manifest files. And the reason is that operators should be kind of the most sophisticated way how to provision and operate your business application over time. The operator, it can provision database, it can do schema migrations, it can as well fix any kind of breaking changes in Yeager if there are any during the upgrade. It's as well integrates with other Kubernetes APIs as a CRDs, if it recognizes those CRDs in the cluster, it can, for instance, provision storage backend for your Kafka cluster. The operator as well, what we have built in Yeager, it can, you can give it a Yeager CR and it can, based on the CR, generate you the vanilla manifest files for Kubernetes. If for some reason you cannot use the operator. So this is the Yeager CRD. There's a lot of missing kind of configuration, but kind of the most important two fields are the strategy. The all-in-one deploys the Yeager all-in-one with in-memory storage in a single pod with a query and collector running as a single process. Then there is a production one, which is the architecture that I showed earlier with query and collector separate pods. And then the streaming one that, as well deploys the Yeager investor. And the storage type allows you to choose from the in-memory elastic search Cassandra and other supported Yeager storages. The Yeager operator as well allows you to inject the Yeager agent into your pods as a sidecar. And I would highly recommend you to start looking into the open telemetry operator that allows, that has the same capability, but it injects the open telemetry collector. And the reason is, as I mentioned before, the auto collector is much more, it gives you more functionality than the Yeager agent. And as well, I would like to invite you on Friday, there is a open telemetry Kubernetes tutorial. We will talk about the open telemetry operator, how to use it, how to provision the collector, but as well how to use it to instrument your applications on Kubernetes using the open telemetry auto instrumentation libraries. Awesome, thanks Pavel. So I wanted to talk about a cool feature that was built not too long ago, that's part of Yeager, which is moving Yeager from just being a diagnostic and debugging tool into being more of a proactive monitoring tool. And so the idea here is how do we start to collect information about the traces that can be used for operational purposes, namely metrics, and be able to understand when things are starting to go the wrong direction in your application. So operational monitoring, alerting, and being able to plan for changes are some use cases around that. So to give you an idea, when Pavel was showing you the repository of different processors inside open telemetry, there was a span metrics processor. And so the way that this works is that when traces come into your collector, which is the little picture on the right, we then are able to derive metrics from the spans and send them to any Prometheus backend and the traces continue to move to your Yeager backend. So this happens inside the open telemetry pipeline and it allows you to create those metrics off of the spans, hence the name. And the reason why this is really interesting is because then you can start to build histogram. So this is a configuration on the left-hand side and we're basically looking at how many transactions are falling into the buckets and also collecting data about certain methods in the application and being able to look at basically the red, the essentially the latency errors and usage of the application itself. So this can generate a huge volume of metrics. So you're gonna have to tune this down the way that you want it to show up in your Prometheus backend, otherwise you could run into cardinality problems. I've seen this generate huge volumes of unique metrics. So you do have to tune it. And we're happy to help on the channel, definitely. We get a lot of people coming and asking about tuning this. One of the other maintainers, Albert, wrote this stuff, but it's also been taken by the open telemetry community and a lot of new things have been added to it. I just wanted to mention that the reason why you could use this or why you should use this is that you will need only a single instrumentation that's for tracing and you don't have to kind of care about the metrics instrumentation in your applications. And as well, the Spend Processor, it will generate you dashboards that are kind of the same for all your services that are instrumented. We'll see that probably later in the screenshot. So, and it avoids otherwise a lot of people will try to instrument their load balancers or their API gateways or various other things to try to get the same data, but it's easier just to get it directly at the application level because then you're just gonna have less blind spots and the pipeline's much cleaner than trying to collect from so many different places. It's sorry, it's not deprecated for sure. It's in use heavily all over. So, oh, I guess there's a new version. We'll provide an update on it, apparently. There's a new collector that, I mean a new processor that's replaced this one. So we'll update it. Wasn't aware of the change. The idea is that this data then flows into Grafana. Well, it flows into Prometheus and you can visualize it in Grafana. So this is an example of what you can get in a simple dashboard from this data directly. So really giving you a good idea of what's happening in the application itself. This will work with any Prometheus compatible backend. So it could be a commercial service or it could be an open source backend. I listed a few popular ones here besides Prometheus that you can use for this type of thing. And anyone in the community can add additional support. You can technically send these metrics to any metric backend that is supported by Open Telemetry. The difference is that inside the Yeager UI, there's now this new monitoring tab, which only works with Prometheus because it does promql queries. But this gives you a nice view within Yeager. It's just a new little tab that shows up on the top and queries the Prometheus backend to pull this data in. If you wanna do alerting, you obviously have to use alert manager and the standard kind of Prometheus alerting capabilities. But this at least lets you visualize performance data in Yeager. So a little update in the last year, kind of the new key features that have come out and then a roadmap on some of the things that the community is working on today. So we now support Open Telemetry line protocol, the native protocol inside the Yeager collector, which means you can send it directly without using the Yeager protocol. So that's a new capability. Pavel mentioned the adaptive sampling. We basically now made that fully supported with Open Telemetry as well. And then there's been some contributions on visualizations. We have flame graph views now in the UI, which is a new addition. And as Pavel mentioned, the native SDKs are not supported anymore and you should use Open Telemetry. We also have updated Hot Rod and TraceGen to use Hotel instead of the Yeager SDKs. So just kind of a few updates that have happened. So one of the things that we're working on right now is supporting Clickhouse as a native data store for Yeager. So this is contributed. There's a lot of interest going on in Clickhouse for logging for metrics and for tracing. So this is something that's gonna become officially supported as a back end for Yeager. And then there is some other work on the UI around dependency graphs and trying to normalize those views. And over time, we would like to replace the Yeager Collector with Open Telemetry. It's not something that's actively worked on. We did a few kind of early stage POCs with it, but the idea is really, how do we make Open Telemetry more native to Yeager? And so that's some of the things that are going on in the community today. So with that, we've got about five minutes left for Q and A, so if you wanna raise your hand and then one of us will repeat it and answer the questions. So feel free to raise your hand and we're happy to take questions. First of all, thank you. Very interesting presentation. Have a bit controversial, maybe controversial question. It's about competitors. I noticed that Grafana released their own Tracing Solution tempo and after collecting some feedback from different people, what they don't like in open source tools and free tools is that there are a lot of signals, logs, matrix traces and you have a nice tool, but they're all separate and when you have an incident, you need to open 15 tabs. And what Grafana is now trying to do is to make it as one product and that's what other paid tools have. Like how do you see this competition between Yeager and Tempo? You know, what's the plan? Yeah, so the main author of Tempo is a maintainer of Yeager so we're pretty connected and Pavel does work on Tempo as well. I mean, they're different products for different use cases. I mean, my opinion is that they're different, they do different things. So searching free form in Tempo is not typically a good idea if you have a lot of trace data. However, Yeager supports that and so do the back ends for Yeager. So it's more of a full text search type of approach but as you said, you know, Grafana does bring together the signals and in Yeager we don't quite do that. There are other open source tools that are doing things similar like open search and things like signaws that also are doing some interesting things but, you know, I think there's room for lots of options but as a CNCF project, Yeager will always be a community project. We hope the other tools will stay that way but they're not part of the software foundation. So you have to think about it. Yeah, there is a reason that Grafana will become more closed source. Yeah, what's your take on it? I think you summarized it very well. All right. Okay, thank you. Next question, anyone? Yeah, sure, you can just yell it and I'll repeat it or we can here, I'll do it. Hello. First of all, again, thanks for the talk. So just maybe a little request because at OpenTelemetry we have also an OpenTelemetry demo repository and it's using both Yeager and Primateos but the monitoring is not integrated so I think it will be good for demonstration purposes and for visibility if this will work as well. Yeah, thanks. Cool. Good suggestion, thank you. Question? I think we have time for maybe one more, if not. Oh, one more, last one. Yeah, so we're using Yeager but we've come into the, well, maybe age or problem. How do you convince everyone to import a library to instrument their service or application? And EBPF helps a lot but we've been trying to figure out how to get as much tracing information out of our architecture without hoping that everyone will expose the traces themselves. So we are using Istio so we're thinking how does outside in tracing fit into all of this? Do you want to take this or should I quote Yuri on his blog? But it's up to you. I understood exactly what's... Just instrumenting Istio is an easier way to collect trace data. Like the service mesh doesn't provide you with full tracing instrumentation, right? You still have to propagate the context inside your application from the upstream to downstream service. So there's a great blog post that Yuri the creator of Yeager wrote that's on his medium that talks about this myth because we get it on the channel probably every week. Someone comes and says, I'm instrumenting a service measure of proxy and we explain that you're not gonna get the value of tracing because it's always gonna just have the view of the proxy and not the view of the application context. Yeah, that's very figured out as well. And the problem is that your request ends at the service and the magic resumes after the service so you're missing something. But yeah, there's no easy way to get around that even EBPF will. Halfway there. Yeah, you have to instrument the applications and then depends on your requirements how you want to do it. But like on Kubernetes, there is this open telemetry operator that allows you to, it has a podmutation webhook that will inject the auto instrumentation libraries to your workload at the startup. Oh, I did know that. So you don't have to. Neat. You don't have to touch your applications, you don't have to even touch your kind of deployment files. So go to a public session on Friday and he'll. Yeah, I'll talk about it. Amazing. Thank you. All right, I think I think that's it. Thank you everyone. Appreciate it. Thank you.