 I'm Dan Jigalski. I'm a maintainer of the OpenTelemetry Collector and a principal software developer at ObserveIQ. I've been developing observability software for over 10 years and contributing to the OpenTelemetry Collector for the last three. Most recently, I've been working to enhance the collector's ability to manage and process telemetry. And that's what I'll talk about today. First, I'll talk about the context of these enhancements, specifically why the OpenTelemetry Collector is an ideal place to add more sophisticated telemetry processing capabilities. I'll describe the collector's pipeline architecture and make a distinction between managing telemetry and processing it. I'll highlight some limitations of the architecture and finally, I'll introduce you to a new feature set called Connectors, which resolve these limitations. So traditionally, there are three types of telemetry data, metrics, traces and logs. And until recently, observability tools were designed as end-to-end solutions with one data type in mind. So to achieve a high degree of observability, you would have to deploy several independent tool chains. And it also meant that if you wanted to make a change to any one component of that tool chain, you might have to replace the entire tool chain. So OpenTelemetry is improving upon this by supporting multiple data types from the start, but it also limits the scope of the problems we're trying to solve. We're focused on processing and generating telemetry. We're leaving the other problems to other tools. A typical OpenTelemetry solution looks something like this where if your application is on the left, you bring OpenTelemetry's instrumentation libraries into your code base where they can capture and emit telemetry. Often that data will flow through the OpenTelemetry collector, which can process it and forward it to whichever storage and analysis backends you like. And the collector can also generate its own telemetry. It can read logs from files, scrape metrics from APIs. It's also very highly interoperable. So if you've already established data streams using other tools, you can usually redirect those through the collector. So between the collector, the instrumentation libraries and compatibility with other ecosystems, the collector has, we have many ways to generate useful telemetry, but it's also, of course, very important that we have robust processing capabilities. A current trend in the observability industry is to shift left, which is with the idea being that we should move processing capabilities from backends into the collector or from the collector into instrumentation libraries. And while it's generally true that you can reduce costs this way, the collector has some unique advantages because of its role in the ecosystem. So in some cases, we can only shift left so far. So first of all, the collector is one of the only components that can process telemetry regardless of the source. If it comes from instrumentation libraries, the collector itself, other tools, it can process all of that data. So any capabilities we add to the collector are more broadly useful than if we were to add those capabilities elsewhere. Additionally, the collector has the opportunity to correlate telemetry that comes from multiple sources. So of course, the collector was designed with processing in mind, and it has had many useful capabilities for quite some time. We can filter, annotate, aggregate, redact, transform, batch. But I think what the collector has lacked is a flexible system for managing telemetry. And when I say managing telemetry, I mean something different than processing. I mean getting the right data to the right place, both internally in the collector and externally. Talking about things like merging data streams, replicating data streams, and routing data streams. And that's what the connectors feature set is all about. It provides new ways to manage telemetry within the collector, and it does so in a way that's backwards compatible with the existing architecture. As a bonus, connectors also give us the ability to work with multiple data types in the same place. And the end result I think is that we now have a generalized framework for processing telemetry within the collector. Okay, let's get into the details here. So the collector is built around a notion of data pipelines where each data pipeline is made up of individual components, and each component will essentially do one thing and then emit data. We have three classes of components, receivers, our inputs into the pipeline, processors will modify the data in some way, and exporters are the outputs from the pipeline. The basic requirements for the pipeline are pretty simple. We need at least one receiver, at least one exporter, and exactly one data type. The receiver and exporter requirements are probably pretty intuitive. We need to have an input and an output, but the data type requires a little more explanation. The collector, as I said, supports multiple data types, but a given component may not. It may only support one or two. For example, we have a syslog receiver that only supports logs, or we have a Jaeger exporter that only supports traces. So by framing a pipeline with a data type, we're adding a lot of clarity about what is expected of the components in the pipeline. And this, you can kind of look at this in two ways. We can, first of all, the components in the pipeline must be able to handle that data type, but secondly, we're also giving the user the ability to ask more precisely for the capabilities that they want. So if I put a OTLP receiver in a traces pipeline, I'm only asking it to receive traces. It won't receive metrics or logs unless I ask it to by placing it also in a metrics or logs pipeline. I'm gonna be showing you a lot of pipeline diagrams, like the one on the right. So I think it's important to understand the configuration that goes along with these diagrams. So I'll quickly run through the configuration of the collector and just highlight some of the relevant parts here. So first of all, we have a section for configuring each component class, receivers, processors, and exporters. And then within that class, you can configure multiple components. Each component, of course, has its own parameters, whatever is appropriate for that component. But importantly for this talk, component IDs, these are a unique way to refer to this configuration. And it's a very simple format. It's just the type of component and then optionally a name. Down here, we have a pipeline configuration. Pipelines also have a unique ID. It's the data type and then optionally a name. And then we specify the receivers, processors, and exporters that we'd like to use in the pipeline. And this implies the structure of the pipeline. And this is enough for us to understand the data flow in most cases. And we refer to the component IDs that we specified above. And this is how we specify the receivers, processors, and exporters. So this is enough for the structure, so that's what I'm gonna show you for the most part. All right, let's talk more precisely about the meaning of the structure. In particular, I wanna highlight the difference between pipelines and data streams. I think if you're familiar with the collector, you may not be making this distinction, or maybe you are, but when we think of a pipeline with multiple receivers, we should think of it as receiving multiple independent data streams. Those data streams will then be merged together into one data stream, and then that data stream will run through the processors. If we have multiple exporters, that data stream will then be replicated. So I'm talking about merging and replicating data streams here. I mentioned earlier, these are the kinds of things that we need to be able to do in order to manage telemetry. And so the collector can do some of this already, but the problem is that it's slightly constrained by the pipeline's assumptions. But we can look for other ways. There actually are a couple others. We can share components, we can share a receiver, and when we do this, the data that it emits will be replicated so that each pipeline will have its own data stream. And we can also share an exporter, so this will merge the data streams before it goes into the exporter. It's also possible to share components between pipelines of different data types, assuming that the components involved can support both data types. But for example, the OTLP receiver, OTLP is short for open telemetry protocol. So if you send metrics to this receiver, it will route them onto the metrics pipeline. And if you send logs, it will route them onto the logs pipeline. So we even have some rudimentary routing capabilities here. Okay, so at this point, I've already explained to you all the rules that govern how pipelines work. These are the expectations that people have based on years of using the open telemetry collector. So we need to try to respect these as we look to resolve any limitations with the system. So I'm gonna show you some scenarios that highlight the limitations and talk about how we can address them with the connectors framework. But keep in mind that all of these things will still apply. Okay, so let's look at a scenario here. We have an expensive analytics tool, and we have a lot of log data. We can't afford to send all of the log data to the tool. So we're doing the obvious thing and just filtering a lot of it out. But let's say we have another requirement where we must preserve all of the data, some regulatory requirement, for example. So we can't afford to send it all, but we also can't filter it. So we can use the mechanics that we've already discussed, share a receiver. This will replicate the data stream. And now we can filter one of the data streams, send that to the expensive tool. And the other stream, we can do whatever we need to, but ultimately it will go to some kind of cheaper cold storage. So this so far is so good, but let's add another requirement. Let's say that we have a policy where we need to redact PII immediately as soon as we touch the data. So we're adjusting one data stream, we wanna redact the PII, then we want to replicate and then process some more. The problem is we can't do this with the existing pipeline structure. Or at least we can't do this within a single pipeline. And the reason again is because we can only replicate immediately after receivers or immediately before exporters. So we can kind of look at this as partial pipelines, but none of these are valid. The first one doesn't have an exporter, the other two don't have a receiver. So as a workaround, we can use the OTLP exporter and receiver in combination to forward the data. So the first pipeline has an OTLP exporter, and the second one has a shared, the second two have a shared receiver so that will replicate the data like we wanted. But this is a networking protocol, doesn't seem ideal, right? We've, there's a lot of configuration involved just to get them to communicate correctly with each other. And just intuitively, this can't be the most efficient way to do this. So what we want is something that can fulfill the requirements of being an exporter in the first pipeline and simultaneously fulfill the requirements of being a receiver in the other pipeline. And all we really want it to do is to just pass the data along. So in the simplest case, this is a connector. It's an exporter and a receiver at the same time in different pipelines. And this is specifically the forward connector and it's just called the forward connector because we are just passing data along. So to configure this, first of all, we have a new section in the configuration for configuring connectors. And this is just like the other classes of components. And then when we use the connector in the pipelines, we use it in place of the exporter or receiver. Here's another example of using the forward connector. I just showed you how we could use it to replicate data streams. Let's merge data streams with this example. So we're receiving two different data streams. We're labeling them because we can tell them apart from then on. Then we'll run them through a forward connector, do some more processing, export. But the idea here is that it's a shared exporter for the first two pipelines. And so when we have a shared exporter, we merge data streams together. And then it's a receiver and the other pipeline. So this is the first capability that connectors give us. We can sequence pipelines. Previously, pretty much all pipelines were in parallel. You do all your receiving, you could do some merging and replicating. Then you could do all of your processing, maybe some more merging and replicating than all of your exporting. But now we can interleave the merging and replicating operations with processing operations. We can do this arbitrarily. Okay, back to the first scenario. Let's continue adding requirements. Let's recall we were ingesting data that we can think of as both high and low value and whatever the high value data is passes through the filter on the upper pipeline. But the lower pipeline is sending all data both high and low value. And let's just say that we want to really optimize this and we recognize that to meet our regulatory requirement, we can actually filter out the high value data and only send the low value data to cold storage. So one thing we could do is just add a filter that's basically the opposite of the filter in the other pipeline. But we can do better than that. We can route the data with a connector. And this works probably just as you'd expect. It's a exporter in the first pipeline. And then it plays the role of a receiver in the other two, but it doesn't always emit data to both pipelines. It will do so conditionally. You can have a pretty standard routing table. We have some criteria to evaluate. And then we list the pipeline or pipelines to which we would like to emit the data. Another way to think about that is that we're telling the connector in which pipelines it should act as a receiver when that criteria is met. So this mechanism of referring to pipelines and then conditionally emitting data to them is built into the connector's framework. So this means that we can have multiple types of connectors that behave somewhat differently, but use that mechanism. Here's another example, the failover connector. This will, by default, route data to whichever pipeline is the highest priority. But if there's an error downstream, let's say in the exporter, then that error will propagate back up the pipeline to the connector, and the connector can react by rerouting the data to another pipeline. So conditional data flow is another capability of connectors. And just between these first two, I think we've basically solved the problem of needing to be able to manage data streams. So, but we can keep going. Connectors can actually do more than this. So let's think of, we're sending all this data to cold storage. We're not really looking at it. We don't know anything about it. We can pull it up if we need to, but on our day-to-day basis, we're not doing that. So what could we do to get more value out of that data? We can summarize it in some way, and we can do that by generating telemetry that describes it. And the telemetry we can generate could be of any data type, but let's just say we're going to count the logs and counts our naturally metrics. So we're gonna generate a metric data stream that describes the log data stream. So we can add a, we have a count connector. And what I've done is I've added this to this pipeline down here as a second exporter. So the data's replicated. One replica goes to cold storage. The other goes to the count connector. The connector will then count the data and generate metrics that describe it. It's acting as a receiver in a new metrics pipeline as well, though. And the exporter there is the same exporter as before, I'm just assuming that we can support both metrics and logs. So why does this work? We have the count connector acting as an exporter in a logs pipeline and a receiver in a metrics pipeline. So the way to think about this is what does it really mean to be an exporter, a logs exporter? It means that you can consume logs and basically that you're the last component in the pipeline. So it definitely does that. And as a metrics receiver, it just needs to be the first component in the pipeline and emit metrics and does that as well. So let's talk about how data types is, where data type support is defined for connectors because for other components it's very simple, but with connectors we have to consider the dual role that it plays. So we have pairs of data types and more specifically we have ordered pairs of data types. The forward connector can forward logs to a logs pipeline, metrics to metrics pipeline and so on, but the count connector can take any data type and generate metrics. But you wouldn't expect that if we gave it, if we were counting metrics that it would generate logs, right? Like this is not a reversible operation. So each connector will pick and choose the ordered pairs of data types that it will support and implement the appropriate functionality. And then when you use it in a configuration, it's implicit what functionality you're asking for, just like if you put an OTLP receiver in a traces pipeline, you're only asking for the traces functionality. If you route logs to the count connector and put it in a metrics pipeline, then you're clearly just asking it to count logs. So we can generate data streams. And I think this to some extent blurs the line between processing and managing data streams, but there is an important aspect of the management here, which is that we're generating a new data stream. And I think if you consider what it looks like to count metrics or ingesting metrics and emitting metrics, you could just mix those together and put the new data into the old data stream. But this is a new data stream. And some users may not want to do that. And the nice thing is with the connectors framework, they have full control over this. They can put the connector in whichever pipeline they want to receive the data. And if they wanted to preserve the original data stream, they can, you can replicate it. And if you wanna mix them back together, you can merge those data streams. So you have whatever options there that you need. Now I wanna go through the first three scenarios quickly and create a visualization of the data that we've sent to both the expensive tool and the cold storage. So let's say that these squares are resources. And by resources, I just mean things that are emitting telemetry or things that the telemetry describes. So I'll use the space above to represent the data we sent to the expensive tool and the space below to represent the data that went to cold storage. So we, we sent, first of all, we sent all of our data to the expensive tool, but we couldn't afford that. So we filtered much of it out. Then we replicated it and preserved it in cold storage. And then we optimized a little bit. And then finally, we characterized it by generating some metrics and sending those to the expensive tool. So that's where we're at. Let's, let's assume we've done similar things for metrics and traces. Maybe the workflow is a little different, but basically we're trying to separate high value of data from low value data. So we have something like this and this is a pretty good place to be, but sorry, this is what we have. So this is a pretty good place to be. We have some of the telemetry from every resource. And this is a pretty standard approach to reducing data, right? We apply rules across the board that will remove what we perceive to be low value data from the data stream. Now, the problem is that we're only really looking at the low value data when we're, when we're developing the configuration and deciding what those rules are. So if we make a mistake, you know, it's very easy that there would be some kinds of, even if they're minor, but some kinds of problems indicated in the low value data that we've sent to cold storage. And we're just not going to see these even if we're characterizing that data in various ways. So one thing we can do is to sample data from there and sort of promote it to the high value data. So we'll find a little bit of the data that can give us some more insight into what's going on. So if we just think about this from the log's perspective, what we could do is before the data goes to cold storage, we'll replicate the data stream and we'll send it into a sampling processor. This can apply whatever criteria we want. Maybe it's random. That would probably not be very valuable though. So let's say we run to sample resources. So the logic would be roughly when data comes in, we look at the metadata that describes the resource that the telemetry is associated with. And then if we haven't seen that resource before, we make a decision about whether or not to include it in our sample set. If we do include it, then all logs associated with that resource will pass through this processor, otherwise we'll drop them. So that can give us something like this where we have all the logs for some of the resources. Now if we do the same thing for metrics and traces, we might end up with something like this where we have all of the traces for one resource, all the logs for another, and if we're lucky there might be some overlap. And it's possible that we could use deterministic criteria to decide what the resource would be, but that limits the types of criteria we can apply. What we really want is a framework that allows us to apply any criteria we want so that we can develop connectors that behave in the ways that we need. So let's zoom in on the tail end of the workflow we had there where we're routing the low value data to cold storage. Let's add in the metrics and traces. So we've got these three low value data streams going to cold storage. Now let's replicate all three of those streams and also send them to a sampling connector. And this can apply the same exact logic as I just described with the sampling processor, but the difference is that when it makes a decision about a resource, it can flow all of the telemetry from that resource through. And it also doesn't matter whether the first thing we see from a resource is a log or a metric or a trace, we can make that decision at the first possible opportunity. So we end up with a much cleaner sample set here where we have all of the data from some of the resources in addition to having some of the data from all of the resources. So we have a nice cross section now and we've done this without adding a huge amount of data to the expensive tool. If these were Kubernetes pods, for example, we may only need to sample 1% or less and we can still get a much better idea of what's going on. So correlated data processing is another capability of connectors. And if it's not clear what's going on here, the connector would be acting as a logs exporter and a logs receiver together and then a metrics exporter and a metrics receiver together, et cetera. So it's kind of like the forward connector but where data is just flowing through but it's all flowing through one single component and we can't do this with any other with processors. We can only do this with connectors. So this gives us that opportunity to reason about multiple data types in one place and make decisions that impact all of the data together. So just to review the mechanics that I showed you earlier, all of our pipelines still require a receiver and an exporter and a data type but when we say receiver, we can think receiver or connector. And when we say exporter, we can say exporter or connector. One thing that has changed is we can add a rule here that says connectors must be shared and this is only if you're using connectors and this is just in the sense that when you use a connector, it must be an exporter and a receiver. And the merging and replication points are the same. We've just taken advantage of where they exist in pipelines and the connectors because they are receivers and exporters exist adjacent to those points. So data going into connectors or out of connectors can take advantage of these. So these are the new capabilities that connectors offer. We still have work to do. Some of the specific types of connectors that I showed you are available today but others are still in development. However, the connectors framework is in place and I'm happy to announce that as of this week it is considered stable. So at a high level, I think the important result here is that the collector is now a much more capable tool for processing telemetry because it now has a generalized system for managing telemetry. Okay, that's it for me. I hope this has been helpful. Thank you for coming. If you have any questions, please reach out or come find me afterwards. And if you'd like to review the session, please scan the QR code.