 started. So hi, I'd like to thank everyone who is joining us today. Welcome to today's CNCF webinar, How Open Telemetry is Eating the World. I'm Kristy Tan, Marketing Communications Manager at CNCF. I'll be moderating today's webinar. We would like to welcome our presenter today, Steve Flanders, Director of Engineering at Splunk. A few housekeeping items before we get started. During the webinar, you are not able to talk as an attendee. There is a Q&A box at the bottom of your screen. Please feel free to drop your questions in there and we'll get to as many as we can throughout the presentation and at the end. This is an official webinar of the CNCF and as such is subject to the CNCF Code of Conduct. Please do not add anything to the chat or questions that would be in violation of that Code of Conduct. Basically, please be respectful of all of your fellow participants and presenters. I'd also like to remind folks that the webinar slides and recording will be available later today on the CNCF website at cncf.io slash webinars. With that, I'll hand it over to Steve to kick off today's presentation. Thanks so much and thanks for having me. So today we're going to talk about Open Telemetry and how it is, like many open source projects, really contributing to a broad adoption and really kind of changing the landscape of how you can solve observability problems. I actually have a lot of material to go through today. Hopefully the goal would be if you're not familiar with the project, you'll learn all about it today, but even if you are familiar with it, there may be some aspects of the project. It's actually quite broad in scope, so hopefully you will learn something new either way. Also hoping to have time for a demo, but that'll be dependent on the number of questions. I definitely want to make sure people are getting what they expect from this presentation and we definitely encourage questions throughout. We'll try to get as many of them answered and can also touch base offline afterwards as well. So with that, just a quick introduction. My name is Steve Flanders. I am a director of engineering at Splunk. I'm also an Open Telemetry collector approver and a chair nominee for the CNCF SIG Observability project. Previously, I was at a company called Omniscient that was acquired by Splunk and at Omniscient we worked on the Open Census project, which is now part of Open Telemetry and prior to that I was at VMware leaving engineering efforts for their log initiatives. So I've kind of been in the observability monitoring space here for almost a decade. I provided some links to some other material if you're interested in learning more about some of the things that I talked about. So what is Open Telemetry? Hopefully this one's not too much of a surprise for folks, but it's actually the joining of two other projects. There's Open Tracing, which is in the CNCF today as an incubating project, and there is Open Census. These two projects had a lot in common, but they also had some significant differences. And so really the goal was to bring it both projects together and really provide a single solution today that people could rely on. The idea here is really to standardize and to make it easy so you don't have to choose between multiple projects that are kind of doing similar things. And so really all investments going forward by both the contributors of Open Tracing and Open Census are contributing now to Open Telemetry. So you can think of it as the next major version of both of these projects, and the goal is to sunset both Open Tracing and Open Census. So going forward, Open Telemetry is really the future. With that said, people are already using Open Census and Open Tracing, and the goal is not to leave people stranded. So Open Telemetry actually has shims that are completely backwards compatible with both these projects. So it really provides a transition plan for you as well. It's not like you're going to have to change everything now in order to take advantage of some of the things in Open Telemetry. There's actually a path forward for you as well. So let's talk about cloud native telemetry here. I'm sure if you're familiar with cloud native terminology, you've heard of the three pillars of observability. Here I'm going to refer to them as telemetry verticals, but things like traces, metrics, and logs. Sometimes these are called signals or data sources. It's basically information that you can collect from your application or from your infrastructure in order to figure out what's going on. And the end goal here is really to collect this data so you can answer any questions that you have and be able to solve both availability as well as performance problems. Now while you may have heard of these three pillars of observability, there's actually multiple layers for each of these different verticals that you need to consider from an implementation perspective. You have the APIs themselves, the canonical implementations and SDKs. You have the data infrastructure which includes things like data collection agents and collectors and services. And then you have interrupt formats like data wire protocols. You have other standards such as W3C. All of this is very relevant and usually they're specific to the the vertical that you're referring to. In addition to this, they're often language specific and as you look at more cloud native workloads that are moving to more microservices based deployments, you see a lot of polyglot architectures where you're leveraging multiple languages and so the question becomes how can you have consistency across all of this as you're collecting it. Now where does the open symmetry project fall in regards to these different verticals and layers? Well the goal is actually to cover all of it. And the way you should think about this is kind of twofold. One is when you look at these verticals and layers, it's really about the instrumentation and data collection aspects. You can think of it more as the what you're deploying in your environment in order to get the telemetry data out. What open symmetry is not looking to solve is the backend issue. So it supports sending to a variety of different backends. And we'll talk more about that later in this webinar, but it doesn't provide a backend. So you're responsible for plugging this in. You can of course use open source backends, Jager, Prometheus, what have you. You can also use commercial third-party vendors as well. Another thing worth noting is kind of the priorities from an open symmetry perspective. There's already broad support for tracing. We'll spend a fair amount of time talking about that today. Metric support is coming in right now. In fact the open symmetry project just announced its beta last week or a couple weeks ago. That includes both traces and metrics. And logs is starting to be discussed currently. There's actually a log sig that is forming, but there's no like native support for logging other than say like adding trace context, trace ID, span ID into some of the logs. And again we'll talk more about that in a little bit. The long term the vision is really to handle all these different verticals and really provide an open source, vendor agnostic, open standards-based approach as to how you would solve this. With the the end goal being that this is actually provided out of the box. There isn't much as a developer that you need to do to take advantage of this just by building your app. Hopefully you'll have everything that you need to admit the telemetry data and then you can collect it in the platform or platforms that you care about in order to analyze that data and get it back out. So I wanted to provide a little bit of stats. I'm a very data-driven person and there are actually some pretty cool numbers here. So CNCF has something called DevStats. You may have heard of it. It's online. It's backed by Grafana. So as of right now there are 104 members that are part of open telemetry but there are a broad range of actually active contributors. Many companies involved here from across the globe, lots of contributions. So this project is very much active, very much growing and maturing extremely rapidly. One of the really cool things is that there is support end to end for this project. What I mean by that is you're seeing cloud providers like Azure and GCP, big name vendors that are in the monitoring and observability space as well as end users that are both consuming this as well as contributing back. This is quite unique. You're basically seeing the community as a whole come together and really kind of support and embrace this which means that there actually is a problem here that needs to be solved. There's a lot of value that can come out of making this project successful. So that's really exciting in my mind, right? It's nice to be involved in a project where a lot of people kind of feel this pain. A lot of people want to help make this better and everyone's kind of coming together to do the right thing. But it's not just about the open telemetry project either. There's actually pretty broad communication and collaboration happening with other CNCF projects. So for example, we work very closely with the Yeager team and more recently with Fluent Bits given some of the log conversations that are happening in the log SIG. So we're trying to make sure that we're very much engaging others in trying to get insights so that any solution that's being provided is really applicable to the more broad use cases today. And one really cool stat that I wanted to share is that according to CNCF dev stats as of right now, open telemetry is actually the second most active project in CNCF. That's a pretty astounding number given that number one is Kubernetes and open telemetry currently is a sandbox project though it's leveraging open tracing and open census type information as well. That's a pretty cool milestone and again I think it kind of shows the importance of this project. People are really interested in this problem space and really want it to be better. Okay, so with that introduction, what I'd like to do next is kind of jump quickly into the architecture just to provide at a high level what is provided by open telemetry and then kind of drill down into more of the specifics so you can kind of understand how you can consume it. So there's basically three components that you could primary components that you can think of here. There's the specification. This is actually super important. This is the foundation of which everything is built on and that's broken up into three different areas. You have the API, the SDK, and then some data stuff including semantic conventions which we'll talk more about later. So the goal here is just to make sure that whatever is being developed from a specification perspective is broadly applicable because the idea is that this specification shouldn't change very much without significant consideration especially given that all the other components rely heavily on it and making changes requires changing everything end to end. The next component is really around data collection. So the open telemetry project does provide a collector that is capable of being deployed as an agent or as a standalone service that is completely vendor agnostic and actually handle some pretty cool things like translating into and out of different formats. So the idea here is that you have a single solution that you can potentially leverage and not be locked into one particular vendor's choice which is pretty cool. And then there is the instrumentation aspect. This is for traces and metrics and eventually logs. Again, the goal here is to provide a vendor agnostic way to do this and support broadly all the different languages, libraries, and versions that you care about and ideally do this in a friction-free way or provide enough flexibility where you can kind of control what is being instrumented, what is being sent, how you're enhancing that data, and what have you. As I mentioned, logging is incubating right now and that is a goal of open telemetry. It's still early days for that. A lot of the focus has been around tracing and metrics. As I mentioned, the project as a whole is in the beta status internally right now. We cut a beta release a couple weeks back. That didn't include all of the languages that are part of open telemetry today. It was a subset as listed here. You can see Erlango, Java, JavaScript, and Python. But there are some pretty cool aspects. Like Java includes both manual and auto instrumentation. We'll talk about that more in just a little bit. That's for the tracing aspect. JavaScript includes the web component as well which can help with real user monitoring. Another very common use case here. And there are plans to add more broad open auto instrumentation support as well as get additional languages. .NET and Ruby are actually pretty close to making it to the beta status. Going forward here, the goal is to get the entire project and all of its subcomponents into that beta status and then get to GA. I'll talk more about the roadmap here towards the end of the session. So I wanted to walk you through kind of a high level architecture of open telemetry, kind of where the components fit, how they would be stitched together. Again, this is meant to be super high level. Basically everything is plug and play. There's a lot of flexibility here. This isn't to kind of dictate how you must use the project. It's just one of the ways that we would typically recommend stitching things together. As I mentioned, given that Traces and Metrics are kind of the primary focus right now, I'll talk about them specifically. So let's assume that you have your application. You probably have multiple of them, probably microservices running throughout your environment. They run on one or more different hosts and the net result is you want to collect out the telemetry data that you care about and then send it to one or potentially multiple different backends. So how you would do that using the components that are available in open telemetry today, the first step would typically be deploying the open telemetry collector as an agent. The collector name is a little bit of a misnomer here. It can actually be deployed a variety of different ways. A standalone binary, sidecar, daemon set as an agent, or as a standalone service. But you want to basically have something as close to the application as possible and you want it to be able to handle a few things. One is if it's running on, say, a host, you want it to be able to collect some of the metric information so you can do more infrastructure correlation. And then the second aspect of that is you want it to be close to the application to get the application metrics and trace information as well. And you might have scenarios where you want a subset of data to go to one backend and maybe all the data to go to another. So really there's flexibility in how you configure this collector to send that data. The other aspect would be around adding instrumentation to your application itself. Again, open telemetry provides a variety of different client libraries and those can be configured typically out of the box. They support sending locally to the open telemetry collector running as an agent. So here you can get both metric and trace information, whereas from the host you would typically just be collecting metric information. So this is kind of at a high level what it would look like. Maybe you only have one backend. Maybe you only want to collect traces and not metrics or metrics or not traces. Again, you can configure this in different ways. And then for more of the enterprise like production grade type deployments, we also support a model like the following where similarly, you still have an agent running locally, but you might have use cases where you might want to have a standalone service that kind of aggregates this data. Use cases for that would include like limiting the number of egress points you want to have out, controlling say API tokens if you're sending this to like a third party status vendor, could also include doing more advanced use cases like tail based sampling where you need all spans for a given trace to go to the same collector instance. So again, flexibility here and as how you could deploy this, but at a high level, this is what it would look like. And you don't have to necessarily take our word an open symmetry for this. Other projects are already adopting it as well. Yeager just recently announced they have the blog post here that they are in the process of supporting the open symmetry collector as a replacement for the Yeager collector. Again, very similar notion here. The primary difference being that today they're still using the Yeager client libraries. They are talking about switching over to the open telemetry ones in the future. But the other aspect is they're supporting Yeager natively. So like the default config for the hotel collector would be to support the open telemetry protocol, whereas Yeager wants to support the Yeager protocol. This is basically just the configuration change. But they offer a distribution of this that is pre-configured to natively support Yeager hand to hand. Hey, Steve, we have a question in the Q&A that I think makes sense to answer here. So Andre is asking, does open telemetry include adoption of W3C trace context as the standard propagation header format? Yes, so that's a very good question. Let me get to that in the client libraries. I actually cover that explicitly. The short answer is yes. But there's more I want to talk about with context propagation. So that is a very good question and just give me a little bit and we'll cover it in more depth. Cool. So now you kind of understand a little bit about the project, high level architecture. I want to jump into the specification because that kind of lays the groundwork for the data collection and client library aspects. So actually we talk a little bit about context propagation here. So I guess good segue for that question. From a tracing perspective, there are many different concepts and I can't get into all the specification here given the amount of time that we have. But some of the big things to be aware of is this notion of context and the distributed tracing world. This is actually super critical. It allows you to actually get context and correlation throughout your infrastructure. So I guess to directly answer the question, I didn't realize it was the next slide here. W3 trace context is natively supported by all of the open telemetry client libraries. So the answer is yes, absolutely. Many of these client libraries also support other formats because not everyone has moved to W3C yet. It's a very new standard that's coming out, but it really is the future of context propagation. So it's good that you're looking at it. B3 is commonly used. There are other context propagation formats that are commonly used and you're going to see support for that in the client libraries as well. What's also kind of cool and I'll talk a little bit more about it later is that you can actually support having multiple context propagation formats run in parallel. So you can receive multiple different ones. This is going to be important because many people that already have tracing in their environment are probably using something like B3 and they're going to need to transition to W3C. So open symmetry is actually going to make that pretty easy. It's just enabling multiple context propagation and then turning it off when you're ready. Some of the other aspects to be aware of, you have this notion of a tracer. It's basically how you pay it past context around and generate your spans. Spans is a typical distributed tracing concept is basically a call in your request path. It's made up of multiple different components. And then you have more advanced type stuff like sampling and how you handle exporting. But there's a lot of flexibility here. There isn't like a prescribed. You can only do it one way. As you can see for almost all of these, there are multiple different options. Context propagation lists, multiple sampler processors, exporters. So a lot of flexibility as well. One thing I will call out in the tracing world at least for open symmetry. We call key value pairs or tags or metadata attributes. So if you hear that term, it's kind of similar. But the naming for that is called attributes today. One other thing I want to call out about tracing is actually semantic conventions. As it turns out, open symmetry is not very prescriptive about how you actually denote your spans. It's kind of free form. It's up to you to add the data that is important to you. With that said, it does make recommendations on how to call things similarly that are well known. So HTTP calls are a good example of this. Databases are a good example of this. And this is actually super important because these semantic conventions make it so you actually can have a vendor agnostic solution. Whereas you know what a database is no matter which vendor or open source project you're leveraging. So this is actually very cool. It doesn't force you to do it. But it would be encouraged that you actually start taking advantage of this. And some of these are actually built into the client libraries today. I'll show some of that with the auto instrumentation work that exists. So let me give you an example of the power of semantic conventions. Let's say that I have an application. Doesn't matter what language it is. Let's say it's leveraging the open telemetry library or really any tracing metric library doesn't really matter. And let's say that your application actually calls out to a database. Now there are a few different types of ways that you could call to a database. Maybe it's something that you can control which means you can add instrumentation to the database as well. But more often than not you're probably leveraging another type of database. Maybe some sort of third party thing. Maybe you're leveraging like a cloud provider one. So you don't have direct access to actually add instrumentation. You're just kind of consuming the database. This is where semantic conventions can be extremely powerful. For example, let's say your application calls out to the database. If I'm leveraging semantic conventions for that call for let's say traces then I can denote in my span hey this is a client span which means I'm calling out to some other service. And I can denote it saying hey the DB type is whatever this thing is. Maybe it's I don't know MongoDB. And I can say DB instance is MongoDB 01. It's the first instance they happen to be calling. Maybe it has the DB.statement information. So I know that I'm running a select query on some table here. All of that can be tagged in as basically metadata onto the spans that are being generated from the application itself. And because of that even though the database is not instrumented I can now infer that a database exists and I can actually calculate information like I will know the amount of calls that I'm making so I can actually compute my red metrics request errors duration from an application level perspective. And I can also infer what that relationship is say like latency between my application and my database because again I have instrumentation on this side and I know what I'm calling. So this is actually extremely powerful and as you leverage more like cloud providers or third party services taking advantage of these semantic conventions can really give you more insights as to how your environment is behaving. On the other side of the coin here we have metric basics. So same idea as it turns out context is important here as well. So typically there is no notion of context with metrics but in the open symmetry world context is actually added. Span and correlation information is added in as metadata. So now I can actually enhance these spans and actually understand how behaviors are happening. The terminology here is slightly different. So instead of traces and spans we have more like meters and metrics measurements. And you do things slightly differently like aggregations in time versus say like sampling. But at the high level the concepts are very similar. Metrics are typically a little bit easier to understand because most people use them today where we're tracing hasn't received as broad adoption yet. But hopefully that'll change because especially as you start looking at more of these microservices architectures without the right context and correlation really things like metrics and logs are more symptoms and it's really hard to get to like root cause without having some sort of trace information or at least the context that's provided by trace information throughout your environment. One other really cool thing the open symmetry provides is this notion of a resource SDK and this resource SDK also has its own semantic conventions. The idea here is how can I identify the source of the object that is generating this telemetry data. And this is super important as I as I want to solve more let's say infrastructure correlation information or if I want to do problem isolation and I want to identify where in my environment a problem is occurring. This is extremely useful and the example provided here I think makes a lot of sense especially given it's kind of cloud native focused. So think about I have some sort of process that's running some sort of microservice let's say that's producing telemetry it happens to be in a container that runs in Kubernetes that means it has a pod name because that's how Kubernetes works. It will be running in a namespace again that's how Kubernetes works and it might be part of a deployment. It could be another potential object let's say that it's a deployment. All three of these things the pod name the namespace name and the deployment name can be added as attributes into this resource. So now I can identify kind of where this is happening in my environment and it's immutable. So I know it's state indefinitely. So some semantic conventions have been defined here and there is ability to tag resource information onto both traces and metrics. So now I have even more visibility not only can I answer application level questions I can also answer some of the infrastructure ones that come along with it. Okay so with the specification at least high level information and again if you're really interested go take a look at the specification it's pretty in-depth there's a lot of conversation going on there I can't cover all the aspects but I think that gives you enough of a foundation to help you understand what's available. I want to jump into the collector real quick and then we'll talk about the client libraries. So what is the objective of this collector and the idea here is really to provide an implementation for you so not only do you have your telemetry data but you have a way of collecting it and sending it in a completely vendor agnostic way. This vendor agnostic aspect is actually super critical. One of the things you commonly see is that most vendors provide their own agent or collector or both a type of implementation but it's proprietary to them even if it's open source it only works for their backends. It's very hard to extend it's very hard to make it kind of open standards based and so the collector is looking to solve this a problem by offering a way to basically receive telemetry data to process that telemetry data in case you want to make changes to it and then export it to one or more different backends. This also would include transformations or translations of that data. So for example I can receive and Yeager but I can export in Zitkin that's totally possible in the collector today. And there are some very high level objectives here kind of end goals in terms of usability and performance and just providing a single solution end to end but it might be more helpful to kind of understand this context by drilling into it a little bit more. What I do want to cover is the but why because it does come up from time to time. Look there are agents and collectors out there or why can't I just have my client library instrumentation send directly to the back end that I care about. And at a high level there are kind of two bullet points that I think answer this question. One is the goal of generating this telemetry data should be to ensure that you are not adding significant overhead to your application. You can't impact application performance. So it needs to be as lightweight as possible. And especially as we look at more microservices and polyglot architectures you're going to be solving this for every single language. So any feature that you add into a client library needs to be added to every single language. Any bug that you find is probably applicable to most of the languages and needs to be fixed end to end. So one of the things that you should be looking to do is to offload as much of that responsibility from the client library as possible so that you're not impacting the application. This can include things like compression encryption retry logic. It can include things like adding additional metadata or handling like redaction for like PII. It can also include like supporting multiple exporters. Maybe I'm using vendor A today but I want to use vendor B tomorrow. That would mean having multiple exporters configured in my client library or even rebuilding my app to add that support. The other side of the coin here in regards to why the collector would be time to value. If I offer and offload these responsibilities into a collector I can solve it in one language and not in multiple which makes life a lot easier. And I can move to more of a config based updates. It's usually pretty trivial to update a configuration file or even update an agent. It is not as trivial to go update your application code and to go push that through and get that rolled out throughout your entire environment. In addition the goal should be to kind of set it and forget it. The idea is that your instrumentation is configured once and then you don't have to touch it. And out of the box basically it'll support sending locally to the collector running as an agent which means if you deploy it then no configuration change is going to be necessary in your instrumentation. It'll automatically get picked up. And then finally of course as I mentioned vendor agnostic and easily extensible. So out of the box it provides support for a variety of popular open source solutions like Yeager and Prometheus. But it also offers a very flexible pluggable architecture so that vendors or really anyone can add additional support and capabilities as well. So let's look at the architecture of the collector and this is going to be a bit of an eye chart so I apologize but I was trying to think of a way to make this understandable. Basically there's a notion of receivers. This is how you get data into the collector. This could be push or pull based it doesn't matter. But basically it's telemetry data that's going to enter the collector. This works for both traces and metrics. So like there's a Yeager receiver today there's a Prometheus one and open symmetry actually has its own protocol as well. On the other side of the collector would be exporters which is how you send data out of the collector. And again you're going to support like the same thing on both ends at least when it comes to some of these open source solutions so I can export and Yeager or Prometheus or the open telemetry protocol. And then in the middle here you have this notion of processors and it's ways that you can kind of massage manipulate change the data as it's flowing through the collector. So this could include things like I want to batch the data before I send it out or I want to retry in case for whatever reason the exporting fails. This could also include things like adding metadata, redacting tags that may contain PII doing things like tail based sampling. All of that would be kind of processors or things that kind of happen in the middle. And you can actually define multiple different types of the same processor as well. So I can have two different batch ones. What you basically do is with this architecture you build what we call pipelines. A pipeline says hey I want this set of receivers to talk to this set of processors to this set of exporters and I want it to do it in this order. So for example you can think of maybe I have a pipeline where the open cemetery receiver is configured to go to the batch processor the cued retry processor and then I want it to export in Yeager. That would be a pipeline. As I mentioned the collector is capable of doing transformations so receiving an open cemetery and exporting in Yeager is fully supported. And maybe I want to have a second pipeline also defined where again I'm receiving an open cemetery. It goes through a different set of processors so a separate batch so that the batches between each pipeline is different. The cued retry is different between each and maybe in this case I want it to export in both Prometheus and open cemetery protocol. That's totally possible. So there's a lot of flexibility in configuration here depending on your use cases and out of the box it provides a pretty consistent experience to get you started. And then finally on top of all of this we have the notion of extensions which is like health information P-PROF Z pages is a concept of kind of sampling the data going through the collector and looking for potential problems. So again anyone well out of the box there are a fixed number of receivers processors exporters and extensions anyone can also write their own because this architecture is very applicable. There are a notion of core components so these are things that the maintainers of this project actually maintain that are built in out of the box. These are all going to be open source base. There's no vendor proprietary stuff in core and the goal is to keep core as minimal as possible. I won't cover all of these but it's all covered in the documentation. But a cool thing that we have is also a notion of contrib components. So we have a separate repository where there's more community based extensions that are being written processors receivers. And this is where say vendor information could go. So if you have like a vendor specific exporter it would live in this contrib directory. We have a way of building core and contrib and you can kind of combine things as you see fit. So you get the components that you really care about. Again I won't drill into specifics but go check the documentation on that. Next up let's talk about client libraries. I'm going to focus on Java but this is going to be applicable to all the client libraries. Basically what you need to do whether you're doing traces or metrics is instantiate a tracer or a meter basically a way of collecting context and submitting your telemetry data. You need to generate the data that you care about so spans in the case of traces or metrics in the case of metrics enhance them and then send them back out. I'll walk you through kind of a quick start example. I'll highlight this just so you don't have to read all the code. Basically you instantiate a tracer so you can say hey this is my service it's called in this case it says instrumentation library name and it's running running a certain version I can go ahead and generate my span one thing worth noting make sure you close your spans because at least in the case of Java they do not close automatically so you need to tell it when you're done and then perhaps add additional metadata that you care about so I added another version here even though there's a version in the in the tracer itself this could be data center information it could be garbage collection time it could be whatever you want right something that you want to enhance and provide additional information. So this will basically generate a span and you can do this in each of your functions you can do this for just service to service calls so say different rpcs between your micro services this generates the span information that you that you care about and then on the flip side you need to configure the SDK so you can actually export this data out so again kind of getting your tracer information telling it how you want to sample so for example you can say I want to sample all of the spans I only want to get maybe 50% of them so I need with a probabilistic sampler again flexibility depending on your use cases and then how do you want to export it in this case I'm leveraging the Jaeger exporter but it could be the open telemetry it could be Zipkin whatever one that you that you care about and you build basically a processor to export that data out I understand that there's a lot of information on these slides and given our time here I don't have a lot of time to go through it so you may not still be with me which is which is okay because manual instrumentation may be complex if you're not familiar with it this is really for people that have familiarity with kind of instrumenting their app manually and are comfortable doing so so open telemetry is also looking to ease this and to provide quicker time to value and that's going to be done through more of an easy button type of approach so what if I told you instead of doing all that code that I just showed you to manually instrument a single span I could do it with zero code changes whatsoever I'm sure a lot of people would be interested in this approach especially if they don't have traces instrumented in their environment today so in addition to manual instrumentation for traces there's also auto or automatic instrumentation as well this is done with no code changes only runtime changes so basically in the case of Java here I specify a couple other parameters during runtime Java has this notion of a Java agent and basically we provide a jar for you that does byte code manipulation and then you can kind of configure your exporter or other configuration parameters that you care about what's cool about this is it will instrument all libraries that it's aware of it'll ensure that it adheres to semantic conventions it doesn't require you modifying your code or doing it in multiple different languages and it's pretty flexible not only can I pass in parameters here I'm not showing an example but you can do this through environment variables as well instead of passing parameters directly into the command that you're running one caveat I will call out is that many people offer auto instrumentation don't run two on the same service you will most likely have a bad time so if you're using an auto instrumentation thing only use one at a time per service as it turns out auto instrumentation is library specific so not only do you have to have auto instrumentation you have to make sure that auto instrumentation supports the libraries and versions that you care about in the case of Java for OpenSlimitry there is broad support today already here's a list of many of them and again extensible system pretty easy to add additional integrations or additional library support if need be one other thing worth noting is that Java is the only thing that has auto instrumentation today in OpenSlimitry Python and Ruby are just getting started I think .NET is also in the process of just getting started so they will be coming and then other other client libraries need to be modified as well but the goal will be to offer both manual and automatic from a tracing perspective on the metric side I'm going to keep this one pretty high level it's similar to tracing it's just you're using metrics so you have to have like a meter you have to give it a name you can give it like version information you need to create your metrics you need to observe those metrics maybe on some sort of cadence and then admit that data out so just the syntax is different here but the concepts are generally the same and basically the specification ensures that everything is pretty consistent between both given time hey Steve yes oh hey sorry we have a quick question so an honest attendee is asking will there be any exporters provided for cayman.io k-a-m-o-n.io to open telemetry or do we have to completely migrate away from Cayman to open telemetry regarding exporting in a format the architecture is extremely flexible so there is not a Cayman exporter today but there's no reason why one could not be written so that could be like submitting an issue and someone from the community picking up we of course accept pull requests so if you're interested in doing that work we would love that as well but there's no reason why you'd have to migrate off whether it is an open source destination or a proprietary one it doesn't really matter as long as you can write an exporter and let's say the collector or the client libraries today provide flexibility of writing any exporter that you want so on the surface without actually drilling into the specifics of Cayman it should work but there's nothing there natively you'd have to get an exporter written okay I want to jump into a demo real quick after kind of showing off the different components you can kind of see this working end to end so what I have here is I just found the pet clinic it's a spring application that's micro services based it's open source on on get hub they provide kind of some docker-compose type stuff so I basically pick this one up after doing a quick Google search and I was like hey I would be interested in getting open symmetry to work with this project this project today does not have open symmetry from what I can tell it doesn't have any distributed tracing information it does have a zip conserver for for log information but it's not actually instrumented at least that I that I can tell so I took this application and I actually have it running here on my system and I actually modified it by adding open symmetry to it given that it's spring it's it's Java based I can leverage the Java auto-instrumentation I also threw in the open symmetry collector and if people are curious I actually pulled up put up a pull request into the repo saying hey let me know what you folk think of how to do this while it's a docker compose file so many files had to be touched to kind of update the docker compose on the surface the change was actually pretty minimal what I need to do is pull in the jars that are applicable to the open symmetry Java auto-instrumentation so this would be the the auto jar as well as the exporter I chose to use a Yeager exporter for this example and then I have to explicitly tell it that I want it to run the Java agent and which additional parameters that I want set so those are really the only changes I made to this app as you can see I didn't actually change its code these are runtime type things or in the in the case of docker like pulling down the docker dependencies but I didn't modify the spring application at all and I have that running here the other thing that I did is add the open telemetry collector in so just to kind of show that off I built a quick collector.yaml file so the configuration is YAML based and I basically said hey I want to have a Yeager receiver so I want to receive Yeager in because that's what I told this pet clinic service to export in and I want to export to Zipkin because the pet clinic example actually has a Zipkin server that is running and I basically built a pipeline around that so I have this trace pipeline that says take in Yeager I do have some processors so I have a batch processor attributes to retry this isn't required but it is kind of a best practice and then I said go ahead and send that data to Zipkin so I'm actually going to translate from the Yeager format into the Zipkin format and have Zipkin accept that and so that collector is actually running in this docker compose so there's a docker compose file I went ahead and modified it to have an open cemetery collector here so basically I pulled in the collector it's using the collector config file and then I exposed the Yeager GRPC port so that I could send data to it that's really the only changes that I made to this thing and then I started running it and this is what the pet clinic app looks like when you actually fired up you can actually go ahead and see for example the different owners these are built into the system there's a list of veterinarians this should populate as well I can actually register another owner so I can register myself those and I don't think it actually does validation there we go so it adds me in you can add pet information pretty cool and as that's happening you'll actually see that there are traces being or spans being generated so this app is auto-instrumented and now generating span data that wouldn't be there otherwise because the hotel collector and the auto-instrumentation was not there and if I fire up the Zipkin server that's built into this Docker compose you'll actually see all the different microservices and you'll see spans that are being generated from those calls that I just made and I can pull up any one of those traces and I can actually see the information about the calls that are being made the operations the duration and any associated metadata I actually added a tag for environment I called the test so it's pretty easy to get started especially with the auto-instrumentation aspects and it doesn't really matter if you're using Zipkin or Yeager or commercial vendor or not like you can modify the collector config pretty easily to get an end to end going where you see data flowing throughout your system so definitely try to take a look at this we'll be providing more examples documentation as a quick start guides have some this PR is up and I'll share the links in the slide but this shows that you can take an existing app you can add the necessary hooks to get instrumentation out you can deploy the collector very easily and even have it receive in one format send out in another and tag in additional information and this is set up in I don't know five, ten minutes pretty quickly here it probably takes longer to start the spring application and have it be fully running I think that takes about four or five minutes right now then it does to actually instrument it in a way that you're getting open symmetry data to be exported which is pretty cool so definitely encourage folks to take a look at that what you will notice is that the they give an architecture diagram these will be represented inside of the zip conserver so you'll see all the different calls the different microservices that exist and any errors and things that are that are generated so pretty cool overall I think all right so with the remaining time I did want to cover a few other aspects and then I'll definitely open it up broadly to to questions as well a few other things that the project already has in place today even though we're only in sandbox there is a governance board there is a code of conduct there is a technical steering committee so there's a lot of oversight here there's actually pretty good representation from from many companies to kind of ensure that we're not building something in a specific direction and we're actually taking broad community insights here one cool thing that we have that I think is more unique I don't I haven't really seen it in other CNCF projects maybe exist we have what we call open-slimmetry enhancement proposals we call them OTEPs you can think of them sort of like design docs in a way it's a way of kind of vetting an idea and ensuring that there's alignment and generally even doing proof of concepts before actually submitting the PRs and actually getting this built in this can be valuable for an individual project so like you'll see some of them that are specific to say the collector that wouldn't be applicable to the client libraries or they can be more generic or like there's a proposal that would impact every single client library and so we want to have an OTEP to ensure there's agreement before we ask the maintainers to kind of take on the work of making those changes the log SIG is also following this OTEP so kind of since it's getting started and it's unclear exactly what the work streams are going to be or what decisions are going to be made we leverage OTEPs for that I did kind of highlight this earlier but not only the collector also the the client libraries themselves have this notion of core versus contrib so core is what the maintainers are responsible for the idea is to be as lightweight and efficient as possible and to make really make it be as minimal as possible there's also contrib which is more community based and community based doesn't mean necessarily like third party vendor closed source there could be open source aspects here too where it doesn't make sense for it to exist in core because maybe a lot of people don't use it or it's more legacy or the overhead of maintaining it is not is not possible today so this is really cool because it allows us to move quickly if everything was in core it would really slow down our progress core become very large build times would go out there could be a lot of problems and as you're trying to solve the we want the ability of providing vendor agnosticity having like other third party companies be in core with this as well it's not the best outcome so really having that distinction is providing a lot of flexibility for us and then finally there is a there is a website that provides more information and actually links to the readmes much of the documentation is either in the get hub readmes today or on other sites like Java docs or Godox have their own destination but the open symmetry site is actually maintained it lists all the different types of components that are that are possible and supported it links out to to kind of blogs and and video recordings and other media aspects so I definitely take a look at that from a roadmap perspective we're looking to get all the client libraries to beta as soon as possible I mentioned .net and Ruby are getting pretty close there are several other client libraries that aren't there yet that need to get there our intention is to get this project to GA later this year specifically for traces and metrics that wouldn't include the logging aspect given that logging has just kicked off there also is an intention to get auto instrumentation for all languages but that will take some amount of time as you can see Java is the only one in that state now that's pretty far along it actually made the beta for it and then we want to get the initial log support or what the decisions are for log support later this year as well probably in a in a beta state with the idea being that that would GA probably early next year and then as always like most projects right improve the documentation PR is definitely welcome like if you're confused if you're having a hard time getting started that is a bug let's go fix it increasing adoption of the project overall including getting case studies I mentioned some some big big companies are using this today Postmates Shopify Mailchimp so just kind of understanding their use cases of how they see value why they're contributing things like that will be very important and then of course making the getting started as easy as possible as you saw like manual to automatic is kind of night and day in terms of the amount of effort necessary to get started so we really want to provide a friction free way to kind of get up and going but flexibility of enhancing that with additional information should you need it as well so next steps please join the conversation we have a getter there are multiple rooms but you should probably start in the community one it's a great place to start we have many special interest groups again come join us the meeting schedule calendar invites are all up on the community page and then definitely please submit PRs we leverage the labels of good first issue and help wanted so if you see those those are good places to start if you have your own issues or own things you want to work on that's always welcome as well I did want to note that I kind of put together this Google slides template I'll submit a PR for it so that people can leverage it if they're interested or enhance it and make it better but I thought it was kind of cool to show off similar like color palette and information for open symmetry as well tons of links I won't cover them these slide decks will be shared out with folks but much of what I covered here there's a link to it so definitely check it out if you think other resources would be useful please ping me message me find me I'm on Gitter I'm on Twitter like I'm around I would love to get feedback as to as to what you like and kind of going forward what you think would be good to drill into further about the project and with that I would just like to say thank you for having me and I would love to open it up for questions in the last like five minutes it looks like we have of time awesome thanks for the presentations Steve it was super informative so yeah we're going to move into the question and answer piece we only have a few minutes so if you have a question that you'd like to ask Steve please do submit it in the Q&A box at the bottom of your screen and we'll get to as many as we can we do have one in here right now pardon me Mike is asking how much additional overhead do traces add to cluster resources example hotel API calls using networks storing traces etc yeah so this is going to be client library specific or language specific so each of the languages should be doing performance testing of this of this for the particular language to ensure the overhead is minimal there is going to be overhead like it's not free but the goal is to make it as lightweight as possible because the goal is not to impact the application as I kind of mentioned earlier that's why it's pretty important that you deploy like the open telemetry collector as an agent or another agent if you have another one that you're leveraging today because then you can offload more responsibilities which means less resource consumption in the client library which means less resource consumption in the application so in general they're actually extremely lightweight and efficient they're built that way by design so we're not consuming a lot of memory or processing thing multiple times we use a lot of like streaming stream processing of this data through your application so you shouldn't run into performance problems I don't have performance numbers readily available I'm assuming that the maintainers will probably post this on the get hub repost themselves the collector for example has a performance section that I know for sure and actually the build process for the collector actually test performance and will fail builds if performance has deteriorated so you should be seeing something similar if you don't I would definitely encourage you to open a get hub issue because we should be tracking that if there are going to be performance problems or if there are known performance problems that definitely needs to be fixed because the goal is not to not to impact the app great okay so we have another question from fellow Steve he's asking is offloading work from application instrumentation to collector automation or configurable on that yes so offloading is is basically you have flexibility here it's your choice so from an instrumentation perspective probably the only work that you need to do is batching and so you'll actually see that there's a span processor by default it does simple which kind of sends everything versus batch that's probably the only thing that you want your client library to do everything else should happen within the the agent itself the agent can do its own batching across multiple applications which is powerful cued retry compression encryption but the configuration aspect would be if you haven't configured it in your client library you would then configure it in the collector itself so like that collector.yaml file that I showed you can add processing information there so maybe you want a larger queue or most maybe you want to have separate batches for separate pipelines there's a lot of a lot of flexibility and you can double up like I can enable batching in the client library and batching in the in the collector nothing prevents you from doing that and there actually could be good reasons to do it as well so everything is configurable but out of the box so if you take like the default getting started for let's say Java you will get the batch processor enabled and nothing else which is great because if you take the default configure the collector you'll get the cued retry for for free as well some things cannot be automatically configured so I don't want to give the idea that hey if you just deploy this you'll get everything that you need automatic that's not necessarily the case some things are environment specific and you will need to make modifications but the default behavior should be sane and should do the right thing for you all right looks like that's all the questions so that's all the time that we have today thanks again Steve for a great presentation and thank you to all of our attendees for joining us today a reminder that the webinar recording and slides will be online later today we look forward to seeing you at a future CNCF webinar and have a great day thanks all thanks