 Hello, welcome to our really, really specific open telemetry collector talk. My name is Tyler Helmuth. I am a maintainer of the open telemetry collector trip repository and the telemetry Helmthrop repository. I started off as a user of open telemetry about two years ago, and then I started to start contributing. My name is Evan Bradley. I also help maintain the collector, and along with Tyler, I'm one of the primary contributors to OTTL. I didn't have much experience with open telemetry prior to joining the project about a year ago, but it's been really great to be a part of the community. So first I'd like to quickly cover what we're going to talk about today. First we're going to talk about why and how you could use the hotel collector to process data in your telemetry pipelines and how the open telemetry transformation language can help you with that. Then we'll take a look at OTTL in action using a real world use case, and finally we'll tell you about the current state of OTTL and where it's headed next. So before I start, for anyone who isn't familiar with it, the open telemetry collector is an observability pipeline middleware that can receive, process, and export data at scale. The collector comes with a lot of functionality, but we're really just going to be focusing on its processing capabilities. So why might you want to use the collector? Well, there's a lot of reasons on the slide, but the first big one is that you can process at the edge. Processing at the edge allows you to split this work across multiple machines, which can help increase data throughput of your pipeline. It also gives you a chance to control what data enters your pipeline to begin with and how it looks coming in. For example, you may want to remove unnecessary data early on in your pipeline to avoid racking up bandwidth costs. Your data may also contain PII that isn't ever allowed to leave its origin, so it needs to be redacted or removed as soon as possible. You can run the collector at the edge or anywhere else in your pipeline because it can be deployed anywhere. It can be deployed in containerized, virtualized, or even functions of service environments. And you can process data close to its origins or further away, such as at critical points of your pipeline, such as at the point of ingress at the boundary of a secure network. You can trust the collector to adapt well to these situations because it's fast and versatile. It has been written with high throughput and low latency in mind, so it won't slow down your pipeline, and it has low CPU memory and disk space requirements. Custom collector builds can also be made to fit the situation at hand by selecting only the components you need. And for the cases where you can't find anything that's already out there, all collector components are written using the same core APIs, which means you can leverage these to add your own code to accomplish a task. The flow of data through the collector is organized into pipelines, which are composed of individual components that each handle a particular task. The collector has five classes of components, but the three that we're going to focus on today are receivers, processors, and exporters. This diagram shows an example pipeline, where data comes into the collector at one of the points on the left, proceeds through the pipeline until it's emitted on the right. Starting on the left with receivers, receivers take data from an external source and translate it into P-data, the collector's internal format that's based on the OpenSlamtree protocol, or OTLP. Processors then take this data and perform some kind of filtering, editing, or additions onto it before forwarding it on. They both consume and emit P-data, so they can be chained together within a pipeline. Once processing is complete, an exporter will emit the data to its destination in a desired format. Most transformations of data happen inside processors, so we're going to be focusing on those today. So let's take a look at a slightly modified version of a real situation seen by collector users. Suppose that we want to monitor our Kubernetes cluster to quickly detect field deployments that happen because our pods aren't starting. We can use the K8 object receiver to monitor the cluster and collect the events, then forward these to our backend that takes logs over OTLP. The logs have all the data we want, but we're seeing a few problems. First, our Kubernetes cluster is very active, and it turns out we're collecting way too many events, and it's costing us way too much. Additionally, the log payloads are missing metadata, like the associated pod, and this is making it difficult to efficiently query for them in our backend. We know the collector should have some processors that could help us out, so let's figure out which ones we should use. So first, let's take a look at how this processing might look prior to OTL. Well, we know that the collector has processors that support each of the hotel signals. Processors are designed to work with at least one signal between traces, metrics, and logs. Our events are coming in as logs, so we're just looking for processors that can work with logs. Using the data that we have in our logs, we're looking for one or more processors that can fulfill the following requirements. We want to filter out all of the access logs using data from the body first, and then with the logs that we keep, we want to be able to use their bodies to extract their log levels in an associated pod onto the OTLP payload as attributes. So when we look closer, while there's a lot the collector can do, the situation isn't so simple. We've used the collector to process our metrics and traces before, but we're finding that a lot of the functionality offered for logs is different. The configs for the processors that we're looking at here are different than the ones we've used before, despite doing pretty much similar things, and so we have to learn a whole new config structure. And once we learn that config structure, the real kicker is that there's no way for us to even access the log body in the way that we want. First, filtering requires running a regular expression on a stringified version of the entire log body, and this is going to be really slow. There's also no way to lift data from the body onto the attributes. Taking a step back for a minute, what features would we like to see in a more generalized solution to our problems? Well, first, we'd like access to the full OTLP payload in one place, which is going to be necessary for us to access the specific parts of the body for filtering and metadata extraction. We'd also like to be able to form operations at any part of the payload so that we can get things looking exactly how they need to. And finally, we'd like it if our prior experience of metrics and traces would carry over when we're working with logs. So this is where OTL comes into play. OTL is a domain-specific language that specifically targets processing data in the collector. It runs inside the collector's process and is designed to directly work with Pdata, which cuts out the overhead of specialized execution environments like language VMs. A key benefit of OTL is that it works very well with Pdata's hierarchical structure. You can see an example of this data hierarchy in the log on the right. Just like OTLP, Pdata provides structure for common information found in telemetry, such as the resource or application that produced the telemetry or the instrumentation that collected it. Both of these are included in addition to the data itself. An obvious question to ask is, if the collector is configured with YAML, why use a DSL in this case? Well, using a DSL for processing allows users to easily express their desired transformation. Compared to a YAML configuration, it's substantially easier to write, edit, and read. The example transformation on this slide showcases the fact that the OTL statement is way less verbose than the equivalent YAML config. And we also hope that OTL syntax is going to feel familiar to many collector users. It's pretty simple and it's very similar to popular programming languages like Python, JavaScript, or Go. So as a language, OTL has been carefully designed to be just sophisticated enough to be sufficiently expressive without having a high learning curve or being difficult to read. To enable the user to do any sort of transformation you would like, OTL comes with lots of built-in functions that work on a variety of data types that might be seen in any signal. Using functions to do these operations also allows them to be composed. The output of one function can be used as input to another. OTL also includes enumerated constants that have been taken directly from OTLP that allow the user to name a value instead of having to remember its integer representation. This is really helpful for things like metric type or log severity. OTL also has ware clauses that allow you to find under which conditions a transformation should occur, and you can combine conditions using its Boolean operators. For cases where you need to work with numeric values, there's a standard set of math operators. So when we take a look at the prior landscape of overlapping or missing functionality and mismatched configs and processors, this becomes cleaned up. All signals are now processed the same way, config formats match, and the user can easily express their desired transformations. Coming back to our use case, let's take a look at how the data that we do have can solve our issues. Well, first, we see a reason in a log body, so if we can determine what sorts of events we don't want, we can filter them out using those fields. We also see information regarding the pod, so we can look at extracting those details into attributes on the OTLP payload. Our backend is gonna later use these attributes for querying when we go to find the logs. When we use OTL to solve this problem, we find that we can fill our requirements with only two components. Using the filter processor, we can drop the data we don't want, and then with the data that we keep, we can use the transform processor to transform it. The log on this slide demonstrates the final result that we're hoping for. It has metadata that shows the pod name, event type, and severity for the log. All of these will keep in the log intact. I'm now gonna hand it over to Tyler, who's gonna show how OTL can do this. All right. Let's take a look at how OTL enables our goals. We've decided that we can reduce our ingest volume if we drop events that have a reason of completed, because we've decided we don't want them, and we intend to use the filter processor to make this decision. This is an OTL statement. Sorry, this is an OTL condition. OTL conditions work with the underlying telemetry, but they never change it. The filter processor can use OTL conditions to choose which data is dropped. When the condition is met, the processor drops the data. In this scenario, we expect the K8s objects receiver that we mentioned earlier to be emitting the K8s events as logs, and those events are nested maps inside the body of the log. If we see a body that is not structured like we expect, we should drop it because it's not a K8s event. You can see in the first example that the body is a map, and it contains a nested map in that object key. Since the conditions are not met, the data is kept. You can see in the second example that the body is a string. A string is not a map, so the condition is met and the data is dropped. Now that we've ensured the body looks like how we expect it to look like, we can start checking the values of the event itself. Remember, we want to drop events that have a reason of completed. OTL allows nested indexing, so it's really easy to reach into the event and grab a value, and compare it to the static string completed. You can see in the body of an example that we dropped the completed event, but we keep the top event. So now that we've filtered out the data that we don't want, we can start safely transforming the telemetry we want to keep. This is an OTL statement. Unlike conditions, OTL statements transform the underlying telemetry by executing a single function, in this case, a function called merge maps. The target for transformation in this function is the first parameter, cache. Cache is a super specific hotel empty map that we can use as a place to store information in between statements. The second parameter for merge maps is another map. In this case, it's that nested map inside body that we checked on earlier. The third parameter is a static string, and in this example, it's the string absurd, and it tells the function how to perform the merge. So in this statement, we merge the object map into the cache map for later use. And while this statement isn't strictly necessary to achieve the goals that we want with the end result, it is gonna make the future statements we look at simpler. When you're using OTL, this will be one of the most common operations you perform, setting. The set function allows you to set a telemetry field using another value. The target of the set function is the first parameter. In fact, you'll see that pattern often. We have a standard that the first parameter of a lowercase function be the thing being transformed. So our target is an attribute on the log called reason. If the attribute doesn't exist, the function will create it. The second parameter is the new value. Our team has decided that we need this attribute to be lowercase. So we're first going to invoke the convert case function to change that text to lowercase without modifying the value in the cache itself. The output of convert case is then passed on to use as the new value and set. So this statement sets the reason attribute on the log with the new value of the event's reason. And it does that by retrieving the event's value or the event's reason from the cache and passing it to the convert case converter to get the lowercase version. You can see here in the example that we've added a new attribute on the log called reason with the value of backoff. One of the great things about open telemetry is its semantic invention. Our backends know how to take advantage of these standards, so we wanna set them whenever we can. In this instance, we set the case pod name semantic invention for any event that's associated to a pod. The first parameter I've set this time is a resource attribute called case pod name and the value is the name of the pod and we extract that from the cache. This is a really, really powerful concept in OTTL. Even though we're working with the log directly and we're grabbing values off of the log, we're able to reach up into the resource the log is associated to and change its values also. Now, what if this event isn't actually for a pod? That was the case. We wouldn't want to execute this statement. So what we can do is we can add a where clause to the statement to make sure it's only executed if the event really is for a pod. The where clause acts as a decision maker. If the condition is met, the statement is executed. Otherwise, the statement is skipped. Finally, we wanna take advantage of the native OTTL log field severity number. The field represents the severity of a log. The case events we're collecting can have two types, normal or warning. If the event's a normal event, we wanna set the severity to info. And if the event is a warning event, we wanna set the severity to warn. We could take advantage of conditions again to ensure that for any log, only one of these two statements is run, and therefore the log severity is only set one time. Severity number is actually an integer, but hard coding a nine or 13 into the function wouldn't be very descriptive. Instead, we can take advantage of OTTL's built-in enums severity number info and severity number warn to represent these numbers. These enums reflect the enums available in OTTL P and keep the statements easy to read and easy to write. With the transform processor and the filter processor in place using OTTL, we can achieve our goal. The top event is a backup event, so we keep it in transformant. What started off as a basic log, which is the event in the body of the log becomes a log rich with data that we've filled in, filled in all of its OTTL P fields. It has a k8spot event name as the semantic invention on its research attributes. It has the use in useful reason attribute that could maybe help us visualize the data later, and we've set its severity number, which we could maybe use for querying or something. The second log is the completed event, and we've determined we don't want those, so it's dropped during filtering. We do the filtering first so that we don't waste any time transforming data that we intend to drop. So what does it actually take to configure all of these statements? Turns out, not much. The configuration is pretty minimal. This config shows a snippet of a working collector config, and this is just a specific filter and transform processor sections. For the filter processor, each condition gets its own line, and then for each log record, the conditions are run in order, and any condition that matches the log's data means that the log is dropped, and any remaining conditions are skipped. Similarly, the transform processor, each statement is run in order, updating the log if the statement is executed. Again, we use a DSL to make sure that this can stay small and compact, and that our config stays readable. So we've been talking about one really specific example through the whole presentation so far, how to use OTTL to mess with your logs, but it can do way, way more than that. OTTL enables data operations like interacting with lists. You could transform your instrumentation scope, changing its attributes or its name or whatever. You could extract new attributes from a string. You could do intricate time manipulations on your metrics. You could generate new metrics. You could interact with your span events, change its attribute, change its name, and so on and so forth. All of these data operations are possible because of our log standard function library. Listed here are just some of the functions available to you. Those functions with the uppercase first characters are pure functions. They don't change underlying telemetry, like the convert case function we used earlier. Functions like this, you can do things like combining two attributes using the concat function. You could use parse JSON to turn a string from the body of a log or from an attribute into a map for future use. You could truncate the length of strings. You could limit the number of attributes on your payload. You could calculate span duration using OTTL's built-in arithmetic capabilities, hash attributes, and so on and so forth. These aren't even all the functions available for you to use. It's only what we could fit on this page. And if you start using OTTL and you feel like there's not a function that you need, just let us know. And we can work on integrating that into the library. So where can you use OTTL? It's available in several processors already, with the emphasis on the transform processor being the place that actually does the transformation of telemetry. The filter processor and the tail sampling processor use OTTL conditions to make decisions. The filter processor we talked about earlier uses the condition to determine when to drop data. And the tail sampling processor, you can use OTTL conditions to determine when to sample a trace. The routing and the count connector can also use OTTL conditions to make decisions. And in fact, be on the lookout for more components to start using OTTL conditions in this way, as it's one of the ways we hope to start standardizing configuration between processors. If you've got a custom component and you think that you could take advantage of OTTL to do transformation of data or to make a decision, great. OTTL is a package. It's available ready for use. And it even supports custom functions should you have a really specific need. So what's next for OTTL? It's been undergoing a lot of change in the past couple months, and it's currently considered alpha, but it is ready for general use. At this point, we believe the language has all the features it needs for the future and is in a good state. And we believe most of the breaking changes to the API have been completed. Once we're confident that the API is in a good place, we'll promote it to beta. The transform processors development closely follows OTTLs. We'll promote it to beta once we promote OTTL to beta, but we do feel the transform processor is ready to handle most common transformations. If you have any feedback or questions about anything related to OTTL, please open Github. It's really, really helpful for us. It helps us determine the priority of our users and to keep our quality high. Once we've promoted the transform processor to beta, we'd like to start deprecating redundant processors. We have an issue open already to determine exactly how we're going to approach this, but the processors we're targeting right now are the attribute, logs transform, metrics transform, resource attribute, and a span processor. Our goal is to make this deprecation as painless as possible. If you've made it this far and you want to know more about the things that we talked about today, here are some links to the different components that we talked about. Our goal is to always make our documentation for OTTL better. We want people to be easily to learn how to use OTTL, how it enables their use cases, and just like what it's all about. If you look at any of these docs and you want to provide some feedback, please let us know in CNCF Slack or in Github issues or whatever. Like I said earlier, it's really, really helpful when people engage with the community. Thank you for listening to us talk about the collector and we're opening it up for questions now. Yes, that's a great idea. And I'm part of that. And I should have known to promote it. Yes. So if you like open telemetry or if this is interested to you, if you like OTTL or whatever, on Wednesday at 2.30 is ContribFest. And so if you are a regular contributor or if you are a new contributor and you would like to just continue to do more or if you've never contributed and you would like to do your first contribution, you can come to ContribFest. We'll have a bunch of curated issues in the collector. A lot of them are OTTL, new functions that you could add. Come join us and make your first contribution to open telemetry or make your second or your third or whatever and just come help out the project. I don't know how questions are working, so. Is it working? Yeah, so it looks great. I have one question that is it possible in OTTL to create like a user defined function? Like in your example, the parsing, you know, the warning messages can be warning, warning for the back cell and I don't want to, you know, like define all these lines in every configuration. So OTTL allows custom functions. So like there's the set function that we looked at. If you wanted set to do something really specific, instead of using our set, you could define your own set and then your component could give that to the package and say this is a function available for the DSL to use. Does that answer the question? Yeah, just the one for what? So I have to define in the go programming or define as, you know, like a set of set functions? You would have to code it up and like build it into your, like write it in go, build it into your component and then it's compiled like with your component and OTTL is like a package. Great, thank you very much. Yeah, we do not have dynamic function imports yet. There is an open issue about that. If someone really likes Wossum, like that could be a cool way to do it. I'm curious about connecting like an orphan span via transfer process. I know that doesn't work now, but is this like a feature that could be imagined to work someday? It's tough to answer that one. I think it would depend on your case. If you have the information around the trace ID that you want to connect the span to, yes, but I think that you would need state around previously seen spans within that trace in order to do that. The transform processor is stateless so it wouldn't support that right now. But OTTL could be used in a stateful processor in the future to support that use case. Do you have any explicit design goals around things like safety? How do I make sure that a possibly untrusted user is provided some OTTL? It's not gonna blow up the open symmetry collector. Yeah, so OTTL is like what is available to the collector is compiled into the collector and so you can only write the DSL to interact what's available. So you wouldn't be able to inject custom code or anything like that. There's no availability for dynamic functions or anything like that. If someone got a hold of your config and mess with your statements, it would change what the transform processor was doing but they would have already gotten into your system and the collector's already compromised at that point. So I'm thinking about building a system. Maybe I wanna build a system where I can provide a user to offer dynamic OTTL so they can define OTTL run time. Is there like, are you intentionally not providing Turing completeness to make sure that they can't run something forever? Anything like that? Yeah, I would not call the language tranquilated. It's very constrained. I should mention that each statement doesn't, there's no state between each statement. You can use the cache, you can put values in a map and then refer to them later but there's no way to set variables or define functions or update any kind of like environmental context. Hey, how's it going? Great talk. Do you have any mechanisms of validating transforms where you could sort of like give an input and output and say, okay, this is the set of transforms and this is what I expect to happen. Almost like unit testing your transforms. That is a really, really good idea. And the answer is the debug exporter right now which isn't a great answer. But right now it's like run it through the pipeline and see what the collector spits out. It would be nice to have some more rigorous like validation in like capabilities in the like package itself. That would be nice, yeah. So I wanna make sure I'm not mistaken but you mentioned that the logs transform processors was the things that's being deprecated in favor of this? Yeah, so the logs transform processor uses stanza which is another type of transformation package in open telemetry. So the logs transform processor right now is in development and the main concern around it is that you need to translate things from Pdata into a stanza specific format and then back into Pdata and the inefficiencies around that conversion are what we're concerned about. What we're trying to make OTTL like functionally equivalent so that there's not gonna be any loss there. Yeah, I was wondering if there was anybody who would, I don't know how popular hotel log usage is right now. So maybe not, but like the process of like moving all of, if you have a lot of complicated like expr transformations that you used to do and now it's being deprecated was there like a warning ahead of time and did people know this was coming? Are there anybody who's like upset by the? Yeah, we haven't started it yet. So like nothing's happened. Okay, so this is the warning. Yeah, well, this is a statement that like we'd like to get to it. As Austin talked about earlier, statements in 2019 may not be completed till 2023. So like no timeline. It is 2023. Right, so like this statement is being made now. Maybe it'll be done in 2027. Like I don't know. The logs transform processor is currently in development. It's not included in the contrib distribution. So like if you're a user of it, you will have had to gone out of your way to get it and like more power to you. That's great. When we get to that transform process or the logs transform processor, like we are looking at ways to take, like take in the config and spit out the OTTL equivalent statements. So like filter processor right now you can do OTTL statements or you could do the old style of config. We have already written the converter of the old style config. It'll read it in and like turn it into OTTL. And the next step is to like spit out that statement and like that'll be part of the deprecation process. There is also way back, way back. I think like two years ago when we first started doing OTTL work, there was a question of logs transform processor OTTL. Which one should we do? And Dan, you can ask him more about it but he did a huge comparison between Stanza and OTTL and like would it be possible? It's all written up in a really good issue. And the answer is yes it's possible. It's a lot of work. So like we're still working towards it which is why nothing's like officially marked deprecated or we haven't done any feature gates or anything like that yet. So I wanted to add on to that. For things that are using EXPR like the filter processor just add on to what Tyler said. Like when we convert those OTTL statements the plan is as part of the deprecation process the warning message that we print out that says like hey this is deprecated. We're planning to print out like here's the config that you should move to like just copy paste this and you're good to go. We're trying to make it as painless as possible. So there is like a conversion story between expert. Exactly, there is a conversion story. For the filter processors use of EXPR we're getting kind of specific and we should get off the stage. Yeah, you're right. For the EXPR filter processor specific config maybe not because that one is like a full language and like converting that full language to our full language would be really, really hard. So maybe not that one but also the EXPR stuff there was like the filter processor didn't build in too many capabilities. If you've got some really complex situation like I'd love to know more about it. I'll message you. All right, we'll pass on to whoever's next.