 Hello, and welcome to this session, log support and open telemetry. What is open telemetry? Open telemetry is all about making robust portable telemetry data a feature of cloud native software. That means the goal is to provide a set of APIs, libraries, agents and collector technology that's bundled automatically and available so that you can easily generate, emit and collect the telemetry data that you need in order to observe your systems. Open telemetry is a very active project. In fact, it's the second most active project in CNCF behind only Kubernetes, and this is according to CNCF dev stats. The project is seeing a lot of momentum with many people contributing and adopting the technology from cloud providers and end users to vendors and other open source projects. You can learn all about it in the links provided below. Open telemetry is really focused today around three primary signals, traces, metrics and logs, and each are in a different phase of maturity. Traces recently reached the stable milestone with three major languages providing support, two more right on the way, and the rest plan later this year. Metrics is currently in beta, but the goal is to offer a stable data model here in the next couple of months, and then instrumentation libraries will start adopting the data model, and logs is currently in alpha. The goal is that traces and metrics will be stable across all major languages later this year, and logs will at least reach the beta milestone. Hi, my name is Steve Flanders. I'm a director of engineering at Splunk and actively involved in the open telemetry project, including the collector and website. I'm also a member of the CNCF SIG observability. I've been working in the open telemetry space since its inception and previously on the open census project. I've also been in the observability space now for over a decade. Prior to Splunk, I was at an omniscient that was acquired by Splunk about a year and a half ago, working in the distributed tracing space, and prior to that, I was working in the logging space at VMware. If you're interested in learning more about me, take a look at the social media links. I want to start by kind of walking through the major components of open telemetry, and then overlaying how logs plays into it. Open telemetry is about cloud native telemetry. If you've heard of observability, you've probably heard of the three pillars of observability, which are represented as verticals here, traces, metrics, and logs. These signals, or data sources, are full of rich information, but the way in which they're implemented is actually many layers deep. For each of these, you need some sort of instrumentation API that is used to actually generate and emit the telemetry data that you care about. You need an implementation of this API, typically known as instrumentation libraries or client libraries. Then you need the data infrastructure to get the agentry, so you can collect this data, process it, and then export it to whatever backend you want to. On top of all of this, you have a variety of different interrupt formats. In the case of spans, we're talking about context propagation, but if you're just talking about trends, furring data over the network, then there are different wire formats that are used as well. All of this is what the Open Telemetry Project is hoping to standardize and allow for broad adoption in popular open source libraries and frameworks. The focus has been primarily on traces, metrics very close behind, and logs are being planned at this time. Where Open Telemetry does align is that it doesn't actually provide a backend. Instead, it is a open source and vendor agnostic technology, so it plugs into a variety of different backends. Thus, the end user can decide where they want to send their data, giving them full control of it. Open Telemetry consists of three major components. There's the specification that is foundational to everything that Open Telemetry does. This is where the API, SDK, and data pieces are fully defined. Then instrumentation libraries in the collector build on top of the specification. Instrumentation libraries are a language-specific way in order to generate and emit the telemetry signals that you care about. The goal is to provide a single client library per language. The collector is a single binary that can be deployed in a variety of different form factors, including as an agent or as a gateway. It serves to receive, process, and export data in a vendor agnostic way. Let's drill into each of these components. The specification is organized into what Open Telemetry calls signals. These are things like traces, metrics, and logs. There are other signals, too, but for the purposes of this talk, we'll just focus on the different data sources themselves. For each of these signals, you have a data model, an instrumentation API, so you can actually generate this instrumentation and code, an SDK, so you can process and export that data, collector support, so you have an agent or standalone loan gateway that you can use. There are core packages, which are part of the open source community, with popular things being provided out of the box, like Yeager, Zipkin, and Pyrethias. And then there are contrib packages, which may not be applicable to all end users, but do also provide support. This is where things like vendor exporters can also exist. Context is a key concept of Open Telemetry, with the idea being that regardless of signal, you have context and you can actually stitch all the telemetry data together. Resources is another foundational piece of Open Telemetry, with the idea being that these different signals come from unique objects in your environment. But being able to actually understand what that is, like what pod, maybe what container, what cloud provider, these things run in, that's where resources come in. The idea is, regardless of the signal, you have a normalized way of understanding what that object is. This also allows some efficiencies from a wire transfer perspective, because you can batch based on resources. And then finally, semantic conventions, which you can think of as like an open standard way of consistently defining metadata. This is where I can define how a database is represented, or an HTTP outgoing request, for example. Now, there is today a log data model that is in the experimental stage with an open telemetry. That log data model defines a record with several fields. Each of these fields are optional. As you look at these fields, some of them may be pretty common to you. Like if you're familiar with logging or syslog, for example, things like a timestamp, pretty common. Severity is pretty well known, whether it's like an info, warning, or error message. And then body, sometimes it's referred to as message in the case of syslog. These are pretty common fields that you might see or at least understand the concepts. In open telemetry, we can represent this in a structured JSON payload, like the following. In this example, we have a timestamp and then a body of a message with nested key value pairs. We could even extend this slightly by adding severity information. Again, each of these fields are optional, so what you specify is up to the implementation that's generating this particular message. Beyond the standard fields that you may recognize, there are also some additional fields, many of which are related to open telemetry in some way. For example, you have resources and attributes. These are core concepts in open telemetry. I already talked a little bit about resources. Attributes are very similar. You can think of them also as key value pairs, but instead of being applied generically to an object that's generating the signal, you would instead add attributes to specific signal events. So each individual log message, for example, could have different attributes attached to it. Beyond this, there are trace context fields that can also be specified, things like the trace ID and the span ID. Now you may be wondering why span information or trace information would be added to logs, and this goes back to context. The idea here is that if I have distributed tracing in my environment and I have logging, I could technically go from a trace to its associated logs, or from a log to its associated trace, all by this context. Let's take a look at an example of this. In this example, you can see that the trace ID and span ID has been injected into this structured JSON payload. This is what allows me to do the core context that I need in order to go between signals. Here's an example of a resource where we have this service name and version of a particular resource, but also the Kubernetes pod UUID. This could have additional metadata as well. For example, maybe the cloud provider if this was deployed into a public cloud. And then we have attributes, the key value pairs that are specific to this particular signal message that is being sent. Here's where you could define things like semantic conventions. For example, you can see the HTTP status code key and a value of 500. One final thing to note about this payload is you may notice that it has a timestamp and it has a severity field, but the body also contains the timestamp and the severity. Now, this could be parsed out and added as metadata to the structure JSON payload, or it could be added supplementally, being passed from one collector to another. So there's no requirement that this data actually match what's in the body. You don't even have to parse that data if you do not want to. Now, this log data model is very flexible. It supports many fields that can be defined, but none of them are required. Why? The primary reason is because open-slimetry is an open-source and vendor-agnostic solution. As a result, it needs to have the ability to convert from and to different formats. With the log data model and its current incarnation, it supports a variety of different open standards as well as vendor standards. For example, you may be familiar with the Apache HTTP server log model or the Elastic Common Schema. In the case of vendors, you may be familiar with Amazon CloudTrail or Windows Event Viewer. All of these different formats could be supported by the open-slimetry log data model. So in summary, an open-slimetry log record is what you might refer to as a log or an event. An important thing to note here is that open-slimetry does not actually distinguish between logs and events today, though it could in the future. Now, this log record is really just a structured JSON payload. Its definition is made up of one of more fields, all of which are optional. And context is supported not just for open-slimetry, but for other popular open standards, including W3C trace context. There is a notion of error semantics that's provided, so you can denote errors within your structured log messages. And you can convert from and to a variety of popular formats. Now, this log message that's being generated, log record, can be actually be embedded in spans. For example, there's a notion of span events for which the log record could actually be attached. Or it could be standalone, which is a little bit more traditional, especially in non-tracing environments. The key thing to note is that logging is still experimental in open-slimetry. So changes to the data model could happen in the future. Next, let's talk a little bit about instrumentation libraries. As I mentioned, this is a language agnostic way to instrument your application. So there's an instrumentation library per language provided. Within that instrumentation library, there is an API that is used to actually generate the telemetry data that you care about. An SDK that is used to process and export that data. And then all the other core specification concepts that I discussed earlier, including resources, semantic conventions, and the like, are all also supported in the instrumentation libraries. So how does logging play in? Logging is still quite early when it comes to instrumentation libraries in open-slimetry, but there's some amount of reference architecture or reference implementation provided with Java today. First, you can take a look at how you could potentially manually instrument your app for logging using the SDK extension provided in the open-slimetry Java project. If you're looking for automatic instrumentation, then the Java Instrumentation Repository offers a logger MDC. That logger MDC offers the ability of entering in things like the trace ID, span ID, and trace flags. You may remember these war fields that were part of the trace context defined in the log data model. So from an automatic perspective, you can get some pretty good visibility in context to go between traces and logs today. But it's still early when it comes to manual instrumentation. Java is one example of where automatic trace injection happens into logs. Python also has an implementation of this and you're going to see other languages adopting it as well. It's still being considered whether or not manual log instrumentation and instrumentation libraries for logs will be provided in open-slimetry. But first, a stable data model needs to be agreed upon because the instrumentation libraries need to depend on that data model. And until the data model is actually stable, taking a dependency at least on the manual log instrumentation is not advised at this time. Though using automatic trace injection is perfectly fine because changing those conventions is pretty easy going forward. Now, you might be wondering why logs are not ready in open-slimetry yet. You might recall earlier that there are different signals, traces, metrics, and logs. To date, traces and metrics have been the primary focus, but logs is quickly starting to pick up momentum in the open-slimetry project, especially with the announcement of the stable release for traces. Of course, PRs are welcome. So if you're interested in getting involved, please do. Now let's talk about the collector. The collector is a component that's configured via pipelines. Each of these pipelines is made up of one or more receivers, a way to get data into the collector. These can be pusher pull-based. Processors, what you want to do with the data as it's passing through the collector. And then exporters. Again, these can be pusher pull-based in order to send the data to the destination or destinations of your choice. The collector offers a few key features. One of the biggest is that it can actually translate from one format, like the format it receives into, into a different format, the format that it exports out of. This is powerful because it provides a vendor agnostic solution. The collector can also do things like crud operations on metadata. This is powerful for things like PII reduction, or maybe even enhancing the telemetry data that is passing through the collector. And again, the core concepts like you saw in the specification like resources are natively supported in the collector. At a very high level, here's a reference architecture of how open telemetry could be deployed. It's pretty common to deploy at least the open telemetry collector running as an agent on each of the hosts within an environment. And then the open telemetry instrumentation library, these are language specific on the associated applications. In this configuration, the instrumentation library out of the box is configured to send data to the open telemetry collector running as an agent. And that agent can be configured via YAML to send data to the backend or backends of your choice. You could also deploy the open telemetry collector as a standalone service or gateway as well, depending on the use cases of your environment. One of the things open telemetry tries to do is to provide an end-to-end reference implementation and flexibility and choice at every stage. If you want to have your instrumentation sending directly to the backend and bypass the collector, you could. If you wanted to deploy the collector and use a different instrumentation library like Yeager or Zipkin, you could. Again, these choices make it very easy to plug and play regardless of the environment that open telemetry is deployed in. If we drill a little bit into the architecture of the open telemetry collector itself, you might remember the concepts of receivers, processors, and exporters. Those are pretty foundational to the open telemetry collector. Now, you actually stitch these different components together through what are called pipelines. Here, they're represented in different colors. For example, you can see an orange pipeline defined where OTLP. OTLP is actually the protocol that open telemetry supports out of the box. So open telemetry is being received in, so maybe this is coming in from an instrumentation library, for example. It's going through a batch processor and attributes processor and it's being exported in Yeager. As you may remember, the collector can receive in one format but export in another, making it a vendor agnostic solution. The green line represents a second pipeline. Here again, OTLP data is being received. This time it's going through a batch and a filter processor and it's exporting to two different destinations, OTLP and Prometheus and Parallel. This also provides flexibility in choice depending on your requirements. As we look at log collection in open telemetry, this is probably the most mature aspect today. The open telemetry collector supports a variety of different components that are logging specific. For example, from a receiving perspective, there is a file log receiver, which supports tail logging. So for example, you can point it at a log file and it's able to pick up changes that are written to that log file. In addition, the collector offers native support for Fluent Forward. Thus, Fluent D and Fluent Bit agents can send data to the collector and the collector can process and export that data out. From a processing perspective, you have the ability of doing crud operations on attributes, batching the data, and supporting resources. All key concepts of the specification and a variety of different exporters, some open source and some vendors specific are available today and more will be added in the future. I would be remiss if I didn't point out that recently there was a donation by ObserveIQ of the stanza logging agent to open telemetry. You can read more about it on the GitHub issue linked above. Now stanza is being fully integrated into the open telemetry collector and things like the file log receiver would support tailing capabilities. That was part of stanza. But it's one of many parts of stanza. For example, there's JournalD, SysLog, TCP, UDP, Windows event log support, filtering, parsing and more. All of these capabilities are natively being added to the open telemetry collector. So it'll have more logging support in the future. In addition, there are a variety of additional destinations which will be supporting logs in the future. With that, I like to turn over to a quick demo to show you how you can get started at least with the collection aspect in open telemetry. At a high level, what we'll have is an open telemetry collector that is configured to receive fluent forward data. I will deploy fluent bit, which will be configured to collect Docker events and send it to the open telemetry collector. And I'll have the open telemetry collector send that data to two different destinations, one being a logging backend and the other one being to standard out. The collector will also be configured to add attribute and resource information, basically enriching the fluent bit Docker events that are being passed through the open telemetry collector. Let's go ahead and try this out. So first I will start a Docker container that is running fluent bit. So this is just stock fluent bits, the one of the latest versions. It's configured with an input of the Docker events input module. And for an output perspective, it will send data to port 8006 on a different Docker container. So we'll go ahead and start that container. Now I haven't actually started the open telemetry collector yet. So as this starts analyzing data, it will not be able to send the data and it will start throwing errors. While that gets fired up, on the other end here, I have an open telemetry collector YAML file. So in this YAML file, we will enable the fluent forward receiver. So that's the port 8006 that you saw with fluent bit. So we'll be receiving that data in natively. We'll go ahead and configure an attributes processor. So we're just gonna insert a single key value pair. Here that key is called foo with a value of bar. We will also configure resource detection so we can add a system information to the data that's passing through. And we're gonna send that data out to logging locally, as well as a Splunk heck exporter. I was hoping to show off elastic search, but unfortunately that exporter does not fully support logs yet. It'll be added here soon and then it'll be easy enough to demo here. Then we just define a pipeline that says receive fluent forward in. Go ahead and add attributes and resource detection processors and then export that data out here to two destinations in parallel. With that, I will go ahead and start the container. This is the container of the open telemetry contrib repository and it's using the YAML file that I just built and exposing port 8006. Now this collector will go ahead and get started and then you can see that the error message will go away and data is already coming into the open telemetry collector. So we can take a look at one of these log events. So for example, we can see there's a log event coming in where resource information is being added, the host name, here's my host name and then the OS type, which is Linux. Then we get a timestamp for that event. The severity is not set nor is the short name. The body, this body is coming from fluent bit itself. And then beyond that there are attributes like for example, the fluent tag is being set. So you can see it's collecting Docker events and you may recall the foobar attribute in the open telemetry collector. So what we can see here is that we have a mix of fluent bit data coming through the open telemetry collector and then we have resource labels and attributes that are being added by the open telemetry collector itself. Now that data is being passed into a different destination as well, being the splunk heck destination. And if we go ahead and look at the data coming in, we can see for example, here is a payload. That payload has for example, metadata including the foobar attributes that we saw earlier. You can see the fluent tag, the Docker events and a bunch of other rich metadata that's extracted from that payload. Again, this same behavior would be applicable no matter which destination you send the data to and it's very easy to change the configuration of the collector just by editing the YAML file. So very quick and easy to get started. This supports popular open source things including fluent bit and fluent D, native log support is being added to the collector and from an instrumentation perspective, there are options available today. With that I'd say please do check out our special interest groups. There's a variety of these that are defined. Some are language specific for instrumentation. Some are specification specific. The collector has a special interest group as well. It's a great way to get involved in the open telemetry community. Definitely join the conversation. Each of these special interest groups has their own GitHub project and GitHub discussions is leveraged. In addition, the open telemetry group is on the CNCF Slack which is cloud-native.slack.com. And of course PR is welcome and everyone is welcome to join the community. So take a look at the GitHub issues. Many are labeled with either good first issue or help wanted labels which makes it pretty easy to get started. Or of course you can join a special interest group and ask how you can get involved. And with that, thank you so much. As you can see open telemetry is moving quickly to provide cloud-native observability. The goal is to really help standardize and make it really easy to get instrumentation regardless of the different signals and collect and process that data in an efficient vendor agnostic way. Today there is rich support for traces that is already stable in languages including Java and Python and .NET. Other languages including JavaScript and Go are planning to be stable very soon here. The metrics data model is in the final passes of reaching its stable milestone at which point instrumentation libraries will start picking up and implementing that data model. And from a logging perspective the log data model is still experimental but is actively picking up attraction and the goal will be to get that into a more mature state later on this year with a decision around what will happen from an instrumentation library perspective and a minimum you can expect to see more trace injection into logs across languages. And the collector will offer native support for logging. In fact it already offers some pretty broad support for it today and it's already been tested at scale for both tracing and metrics. The initial performance numbers for logging look very promising and will be available and published here soon. With that, thank you so much and I hope you take a look at the Open Cemetery Project and I hope to see you involved in the project as well. Thanks so much.