 Now recording, welcome to the Open Tracing and Specification Council meeting. Everybody, good to see all your lovely faces. We've got a fun presentation today from Jonathan Caldor and Michael. Boy, can I not pronounce your last name. That's exactly right. You got it. Yeah. Great. From a long line of, can't pronounce your last name, family tree. But it'll be on tracing at Facebook. And rather than ramble on myself, I'd love to let Jonathan and Michael introduce themselves and then kick it off. So welcome. Hi. So, yeah, I'm Jonathan. I used to work on the canopy team on Facebook. I recently moved to another team. And so Michael recently joined the team as well. And so we're using this also as an opportunity to kind of like move me off of the advisory board and move Michael onto the advisory board as well. And so yeah, Michael. Hi, I'm Michael. I think I'm unmuted. Yeah, no. So I'm Michael. I just recently joined Facebook and the canopy team. Before that I was actually at Comcast for a while, but I actually worked on one of the things I worked on there was actually our sort of internal tracing system as well. You know, which was, which was the sort of X trace if can, you know, dapper style one. So it kind of got a, you know, some experiences with both worlds. So, which is kind of interesting. Yeah. Cool. So, yeah, let me attempt to share a screen, which is like the last frontier and giving a presentation over the internet. And so can everybody see this. Yes. Okay. So we're going to be talking about canopy which is Facebook's distributed tracing and analysis system. This is kind of an amalgamation of a couple of talks. We published a paper in SSP 2017 last year, two of our engineers talked to Q con in New York. And we're focusing this mostly on the instrumentation and representation of canopy. We'll talk a little bit about some of our distributed or some of our trace analysis and trace analysis pipelines. And so we'll be happy to sort of like talk more about that but we're kind of focusing on the representation side. Yeah, canopy is an umbrella term for a wide set of things for tracing at Facebook. The encompasses both are single and cross system tracing pipe or single and cross system analysis. We have instrumentation available in a number of languages CC plus plus Python Java and because Facebook PHP. Other languages are sort of supported through C or C plus plus bindings. Our instrumentation is integrated into both our like common RPC stack that's shared across all services. It's integrated deeply into our dub dub dub stack. So the overall page load process both on the client and the server, as well as some other common pieces of infrastructure. And then we're also sort of able to ingest data from other sources. So we have tracing in our mobile applications. The Profilo, I think is the name that we open sourced it under. And so we're able to ingest data from there and incorporate it into other traces through our backend systems. It also combines an extraction and processing framework. And so given a trace that we receive from some source, we're able to run custom user code to extract trace patterns, information from a trace, write them to datasets that we can then do aggregate analysis on. And then there's a separate team that works on performance visualizations. And so they work on both single trace and aggregate visualizations for these traces. Great. So can people see this again? For a second. Yeah, PowerPoint is not happy with screen sharing. Okay, I'm just going to share the whole desktop and hopefully we won't get infinite recursion with the preview windows that also pop up. Okay, so canopy is a little bit different than sort of other like span based tracing systems. We're an event based system that we then in our backend will take those events and parse them into a higher level model. Because we're not span based, we also sort of have explicit edges between points. We still enforce that the overall structure must be a DAG so we don't allow any edges that go backwards in time. But otherwise you can have sort of like edges between arbitrary points within your trace. And then we also have sort of metadata that every other tracing system has. We've sort of layered types on top of them and I'll talk a little bit more about what this means later. So this is what our overall model looks like. We have sort of five basic objects in our trace. We have the overall trace that encompasses everything. The trace is broken up into a number of execution units in a like explicit manner and execution unit represents a sequence of or a sequence of trace data that comes from a single clock. In practice this usually represents either a single host or a single thread within that host. But it can be used for modeling other primitives as well. Within an execution unit and execution unit contains a number of blocks. These are sort of the closest analog to spans. A block represents some duration of time. It contains a start point and an endpoint and then zero or more other points that may occur within the trace. And then a point is just sort of an instant in time that captures some single instant and then edges connect two points together. And so then all these objects can also have sort of arbitrary metadata attached to them. And so I said that we're an event based tracing model that we parse into a higher level model in the back end. And so here's an example of what I mean by this. Suppose we have a simple case with four events. This is sort of the classic RPC call and response events. These sort of get broke. You know, we have a call to receive on some back end service that back end service and some response and then the parent service gets a or records a complete event when it received the response from the RPC call. We're then sort of we take these events and interpret them as okay there's some parent block that the the call and complete are part of. They call to some other service which is generating a block at the start in the end. And then those are all both within some sort of execution unit as well. And so this is the base original instrumentation that we've had. We've since extended these events over time as we've added new model elements. And so as we introduce execution units and explicit edges, we've introduced we've introduced events that allow users to create these within the trace themselves. So one benefit that we get from having this decoupling of events from the actual model is that we're able to update cross system instrumentation without having to. Carefully arrange releasing instrumentation versions to both services at the same time. In practice, given you know service release schedules it's impractical to assume that the instrumentation version on both sides of the boundary are going to be the same. So you need to have some compatibility across the boundary. And so this decoupling allows us to say interpret events on one side of the boundary differently if we update the instrumentation while keeping the instrumentation on the other side the same. And so then this allows us to handle sort of all combinations of cases where service A and service B may either have old or new instrumentation or potentially even different instrumentation entirely. So coming back to the explicit edges, this is probably the like biggest single difference between and a span based model and our model. All of these things can technically be represented within spans. We found the benefit of having explicit edges meaning that we haven't needed to change the structure to add additional features change the structure of the trace to add additional features. And so for this I can walk through like one example that we have in our current system. So we trace through browsers, you know, which includes both the client side JavaScript that's executing, as well as the PHP running on the server. And so you can imagine some sort of JavaScript execution unit, which is recording all the JavaScript that executes on in this particular page load. And so there's actually three separate hierarchies that can occur within or three separate causality hierarchies that can occur within the trace here. So you can have the sort of standard one which is your JavaScript is making some sort of remote call in this case fetching a resource and then at some point that resource will return and JavaScript is able to use it. However, we also have a second causality hierarchy, which is the function hierarchy. And so, you know, the function call stack sort of represents here are, you know, relations between parent functions and the child function that they invoke. And so we found that, you know, this is useful for sort of representing nested blocks so you can have one block that is entirely contained with inside the execution of parent block. And we're able to use edges to sort of say that this child block is part of this parent block without sort of having to represent it as the parent function is making an RPC call to the child function which is occurring within the same thread. The third causality is actually an interesting one and it occurs in like, I guess more and more languages over time but like JavaScript and other continuation based languages and so here you can imagine our schedule function queues up some future that will then be executed later on. And so in this case, like our causality isn't necessarily between the root functions that we're executing, but in this case we've scheduled some function and then we have some, you know, common infrastructure stack which is executing these or pulling these entries off of the queue and executing them later. And so we need to connect to the actual function that we're executing instead of say the parent process function because that may also be calling other futures that are not actually connected to the one that we scheduled. And so, like, these are sort of all, you know, parent child relationships ish. But what we found is like they represent different, you know, different parent child relationships, and having edges and specifically types for those edges, allow us to say that the function, the function hierarchy relationship is different than the RPC hierarchy relationship is different from the continuation hierarchy relationship. So one other example where we found like having the ability to create explicit edges has been useful is representing application flow. One of the like common tools for understanding our traces is critical path analysis, particularly for browser traces. And so we ran into a problem where we end up with say our JavaScript requesting a couple of resources and our JavaScript thread tends to be fairly busy. And so we would get traces that look like this and we wouldn't necessarily know which resource fetch is on the critical path. And you could actually argue that, you know, either one of these is potentially the resource fetch that is blocking. However, if we have additional information from our application, say, when we actually end up using these resources, we can actually see in this case that we use resource to immediately but resource one, we actually don't need for some, you know, non trivial amount of time. And so we could have actually delayed resource one significantly without affecting our overall time, but it looks like resource two is actually our blocking resource. And so in this case we want to represent some sort of application flow that says we've received this resource or this result of this RPC call, but we don't actually need it for some period of time. And we should take this into account when we're actually computing the critical path. And so here when we're computing the critical path through resource one we can say that there's actually a slack in the critical path corresponding to the length of this required for edge. This is also allowed us to experiment with different representations of application application based logic. So for instance we've also experimented with saying that, you know, certain events must happen in order for other events to even be considered. And so again in the page load process there are some synchronization points where we know that we won't receive, we won't process a receive event until some other synchronous event has happened. And so we can say that this synchronous event is going to be blocking anything before it. And so it's a prerequisite for anything else that happens afterwards. So coming back to our metadata, we have sort of the standard string string annotation map that's common among a lot of tracing platforms. These can be attached to any object in the trace. So points, edges, blocks, execution units or trace, all of them have a metadata object associated with them. We've made a distinction between what we call core custom and error properties. And so there are separate types and separate maps for each one of these. The distinction is a little bit fuzzy between some of these, but effectively a core property is something which is used by our back end to interpret the trace. And so for instance, the type of an edge is a core property. This also allows us to distinguish between annotations that users add and annotations that we absolutely must have for, you know, loading or displaying the trace. Custom is then sort of a general bucket for any trace data or any annotation data that users add through their own instrumentation. And then error properties are typically used for noting errors in trace construction as opposed to errors in the overall execution of the trace. And so for instance, we might use an error to indicate that the trace instrumentation never closed a particular block versus like an RPC call returned a particular error. The other feature that we have is typed counters and so they're an explicit separate type from the string annotation map. And so these are counters that have a numerical value associated with them, a particular type and then also a precision. And so this allows us to say things that like 1024 bytes is distinct from 1024 milliseconds, which is distinct from 1024 kilobytes. But it does allow us to say that if the user records 1024 bytes in one place and one kilobyte in another place, those two values are actually equivalent. We've also extended it over time to more types as we've needed them. And so we've introduced sets of strings that can sort of be appended to over time. We've used this in particular on say like execution units or traces and then also for stack frame. So capturing either like sampled profiling data or the stack frame at a particular RPC call. So coming back to, I guess, putting this all together in between like the metadata and our events, we've run into, I guess, some fun challenges in modeling. And so going back to our old instrumentation where we just had a call, receive, response and complete events, each of these has some associated metadata with them. And one problem we ran into was, well, when we wanted to extend this to blocks and points and execution units, a call event now does more than just create an edge. A call event actually ends up creating a point and an edge to the RPC service that you're calling to. And so there's an open, there's a question of where does the metadata actually apply to like, does that metadata apply entirely to the point that it creates? Does it imply entirely to the edge that it creates? Is there a mixture between? We sort of made the decision that a call represents the edge and the point is sort of a side effect of that. And so the metadata gets applied there. But this does mean that, you know, when users are using the old instrumentation, they can't actually attach metadata to the original calling point instead. So this is why we sort of extended the instrumentation over time to allow more places for this metadata to apply. And with that, I will hand it over to Michael. I don't, do you want to try sharing your screen instead or do you want me to walk through the slides as you talk? All right, let's see. That's, that's like, I feel like this, this is dangerous either way. I'll give it, I'll give sharing my screen a shot. Yeah. There's a green thing. There's a green thing. Oh, the one, literally on hovering over. All right. All right. Did anything good happen on the other end? Yeah, I can't believe it worked. All right. Okay, let's let's just. All right. Thanks. So, so yeah, they kind of pick up from what's that? Oh yeah, presenter. I know how to use PowerPoint everyone. Oh, don't do that. Man, disaster averted. All right. Great. So, so yeah, to kind of pick up from where, where Jonathan left off. You know, it was kind of interesting before I, before I came here, you know, I worked with. We had a, well, we've open source, Comcast has open sourced it, but a, you know, a very span based sort of a span based tracing system. We called it money as in like follow the money. And then we had all these like clever things around it. Like the money bank was where all the traces lived and stuff. So it was fun. But the, you know, we did run into like some of the modeling issues that Jonathan was talking about and actually two in particular that like, we kind of ran into there and then read the canopy paper and then I quit and came here. But like, you know, that we were like, Hey, this, this could actually be useful were, you know, one is we have these sort of situations where we had a trace on a particular system. And, you know, a bunch of stuff is going on in the system. We just wanted to, to sort of attach like a profile of what was happening on that system at, you know, various levels to the trace. And, you know, kind of the best way we could think of to do that in the sort of span based model was you'd have a sort of a top level span that represented like the entire scope of the execution and you sort of start profiling and then end profiling when that thing closes and try and attach that profile to that top level span. But then it was kind of like interesting. You had to know that span was kind of like special, right? Like that was the one that, that like had the profiling information, right? And it wasn't that bad, but it was actually kind of clumsy as far as the tooling we were building around it went. And in sort of the canopy model, it's actually kind of more natural to just annotate the execution unit that represents that, that like request handling, right? Just because we use that sort of naturally represent, you know, here's the entire span of processing an individual request, but not span and tracing. And another interesting one was if you just had to put some work in a queue and you wanted to understand, you know, how long it was in there, right? And when it came out, you know, that's actually pretty naturally modeled by another execution unit with points for NQ, DQ and edges for causality, right? Whereas like if you sort of put it into, you know, you could model that as a separate span and this starts, you know, starting into the span or when it goes into and out of the queue. But it sort of means a very, very different thing than like most of the other spans do where it's like an actual RPC graph. Like it's like, oh, no, you just have to know, you know, as far as your tooling and stuff goes, it's like, oh, well, that particular span happens to be one that represents like this thing sitting in a queue for a while. So those are a couple of things that we actually did struggle with from a modeling perspective that we were actually pretty interested in when we read the canopy paper to sort of help us out with. So, you know, just kind of worth noting it was kind of an interesting thing to sort of see it from one side and now start to see it from the other. So that said, I wanted to sort of move on a little bit to talk a little bit about what we're doing now and sort of where we're focusing. You know, I guess Facebook's probably grown quite a bit in the past, you know, X years and we've got a ton of engineering teams, right? So one of the things we're focusing in on is getting the back end APIs that, you know, Jonathan alluded to into a point where they're safe and like clear and usable. So that means for us actually sampling isn't enough. We also need rate limiting and I'll kind of go into that a little bit later. We also want these sort of somewhat tailored, you know, high quality APIs and instrumentation layers that are for, you know, back end use cases. Right now we're, you know, what we really want to be is where we have just sort of like most of the complexity in dealing with, you know, the underlying model is handled in sort of instrumentation layer and an end user of the system just sort of has like a small set of functions like log a point, right? And, you know, the instrumentation layer just says, okay, well, here's the active execution unit, here's an active block and we'll put a point on it. Or if you do need to put a new block alongside of, you know, the sort of default one, you can sort of do that in an easy way. You know, so what we really want to do is just make tracing on the back end and just like really, really easy for folks that are building back end services. The PHP instrumentation we have that John had mentioned actually kind of does that to an extent already. But, you know, we sort of expose a lot more of the underlying guts to back end folks right now. You know, and then another thing that actually ends up being useful for is having, you know, sort of different APIs that are good for different situations. So, you know, one of the things that we did, we're just working with one of our teams that has some like really, really stringent sort of perf requirements as far as memory usage goes and then sort of, you know, they really worry about things like thread contention. The sort of flexibility in the underlying model is actually going to let us fairly easily create like a fairly tailored, you know, hey, if you need to, if you have a really high performance, you know, RPC system and like you don't want things going on behind the scenes that could cause additional thread contention or memory allocation, like use this API. You know, so that's one thing. And then the other thing we're working on is actually a sort of a revamp of, if anybody's read the Canopy paper, there's a, I think they're referred to as, I think, custom extraction functions or something like that. But it's essentially a DSL for working with traces that happen to run in our back end where we're working on sort of revamped, you know, expanded version of that, that'll run, you know, in a separate set of processes elsewhere and use, you know, then be based on sort of Python rather than this completely custom DSL. So it's really the two things we're working on now. That's actually super important for us because we tend to look at traces in aggregate a lot and we just sort of compute like summary, you know, information about traces often, right? And, you know, the stuff that's sort of covered in the paper, but I think the way we're going to be doing it is, you know, going to be different. So on the safety and API clarity side, like, this is kind of a, this is sort of an overview of what the instrumentation stack really looks like for us. At the bottom layer, we've got like a layer of sinks that do nothing, you know, just yield serializing the events that Jonathan mentioned and flushing them somewhere. You know, we have sort of an internal Kafka-esque system that, you know, is used. On top of that, there's a trace model that really just, you know, represents that trace model as an object model, but doesn't let you do things that don't make sense, right? Like you can't create like a block on a point, for instance, right? But it just makes it makes it a little bit easier to work with. But, you know, when you do things that they're, you know, when you do things with that model that will give you pointers so you can sort of keep reference to them in your code and then flush them, you know, it'll flush things to events usually right away, but you can do some things with more of that if you need to. And then on top of that, we've got sort of a set of code that deals with creating instrumentation layers, right? Because we don't want people to have to worry about things like, you know, propagating context, either through thread boundaries in their system or, you know, across system boundaries. We don't want people to have to like understand, really, really understand that trace model deeply and understand which parts are active, right? We just want an underlying instrumentation to take care of that. We obviously don't want people to have to do their own rate limiting because they won't and then our system will get, you know, knocked over. So that's not great. But then the whole idea being on top of that, we've got this sort of instrumentation kit to build back in instrumentations. We've got a set of instrumentations built on it that we are that are either built or will be building on top of that. And then what we really want to have most folks leverage is sort of this high level API that really just lets them do like a few simple things, right? And all of the more complex pieces are handled by instrumentation, right? Like at the end of the day, you know, so to lack something more like a logging framework, right? Like you log a point or you create a point and it goes into the right place on the trace, right? It goes on the right block or execution unit and, you know, and so on. And then, you know, maybe exposing, you know, some a little bit of additional stuff at that high level that, you know, handles like the 80% use case. And then if we need to do something more sophisticated, you know, people would have to go sort of layers down in this API stack to do it, right? And then like I mentioned earlier, another thing that we're talking about doing sort of on top of this is creating just a really, a really performant but much more constrained API that sort of just does that RPC trace model for some, you know, particular use cases. And it's kind of nice that like this, not only the underlying, you know, event and object model sort of gives us that flexibility, but, you know, sort of the instrumentation that we've got built up. Let's just do that, too. So this is a quick overview of the other big chunk of stuff we're working on now, which is actually borrowed from the QCon presentation of two of the other gentlemen in the room. Joe, Aaron, Edison, say hi. Yeah, we're good at video conferences over here. So basically, like I was saying, this is really a domain-specific stream processing engine, right? And language for processing streams of traces. And if you look at sort of the way that we tend to use trace data, we do a bunch of different things with it. One of those things is obviously just like looking at an individual trace, which is, you know, kind of useful if you know which trace to look at. But if you don't even know that, you know, or if you want to compare, you know, data and aggregate before and after some event, right? Like, you know, a deployment or something and see what's happened, right? Or if you just want to compute, you know, summary statistics off of something that can only be derived from, you know, a trace, right? You know, that actually ends up being, I think, a more common use case for us than just looking at an individual trace. So really with this, so yeah, so this is a domain-specific stream processing, you know, system for getting at that sort of stuff, right? And we've got a bunch of things that are built on top of it. At a high level, you know, really, we've got sort of a configuration-based, we've got our internal configuration system that you sort of put, you know, this Python-based, you know, DSL into. And that's run by this whole set of sort of machines that will go and run those on a per sort of use case basis, right? So it's sort of, you know, in the old system that's mentioned in the paper, we kind of ran the moral equivalent of this all in sort of one, you know, set of infra that was also doing a bunch of other things. You know, this one, the use case are actually split out, separate sort of, you know, chunks of machine for different, you know, use cases different, essentially going back to different users of the system, right, that are going to be doing vastly different things. So that way we get some more glad isolation and they're not stomping on one another. And then we sort of export summary statistics to, you know, an internal database called SCUBA and folks build things on top of that. So those are really the two big things we're working on for the time being. And then, oh yeah, the other important thing about this that I didn't mention, which is different from the stuff in the paper is that this actually does allow us to do sort of ad hoc queries, right, which was not possible with the old system. So that's super useful if you, especially when you don't even know what you're looking for yet. Yeah, I got one more image after that. All right, let's see. Yes, all right. This is the slide I stole from Joe. So that's the stuff we're working on now. And this is just some stuff that sort of, you know, kind of keeps coming up again and again that we're thinking about, don't know to what extent we'll actually end up really tackling this. But I think they're interesting things that normally I think we're thinking about but have sort of seen other folks thinking about too. So I thought it was worth mentioning. One of them is how do we sort of safely do more arbitrary context propagation at scale. And the interesting thing is scale here is actually more the diversity of teams and workloads than sort of the scale of, you know, our overall infrastructure. You know, there's sort of a, and I think anybody who's done this at like a big company has almost certainly came across the use case where somebody's like, hey, there's this stuff that will like magically propagate like this idea around and like I want that to propagate my session idea. So, you know, the notion of having a more abstract way of using the underlying, right, wherever you've got the tracing instrumentation of being able to propagate context through which I know is, you know, in open tracing with baggage and the paper that, you know, Jonathan Mace I think wrote like is has some really good stuff in it. But I think one of the interesting things is like once you open up that capability, you know, like how do you keep people from doing really, really bad things with it and such bad things that you end up having to turn it off is kind of an open question. And I think it's kind of worth thinking about like, is there a difference between, you know, that sort of really common use case of like, hey, I just want to propagate an ID sort of within my system boundaries for some sort of session or something versus the broader, you know, I need some context that really might propagate across a very wide set of system boundaries and just in terms of like safety for some sort of context problem. And then another one is, which Jonathan mentioned earlier, like one of the interesting things that I think this does fall directly out of the model, the sort of canopy data model is that it's actually really, really good at doing single node traces as well. And we get like really detailed traces of mobile clients and dub, dub, dub, and folks actually want to get like really detailed traces of their own back end systems, you know. But it's like, well, how do you then take that really detailed view of a little piece of the system and working into an overall, you know, distributed trace that that's probably broader? You know, you sort of create a lot of noise for people that just want that broad view, but then sometimes, you know, the person that's like looking at that really detailed view might want to look at something, you know, a few layers back. So there's actually a couple of different ways that we've talked about modeling that and there's some different things we've talked about doing that, doing with that in terms of like this tooling, you know, and so on. But it's actually kind of just, you know, it's kind of built the question of how we actually get like a good solid end to end trace while still having like these chunks of like really, really detailed trace at various parts of the system in there and, you know, because they really are different things with different audiences, you know. So yeah, so those are two of the things that I think we're thinking about and, you know, may or may not do anything useful with. Cool. So anybody have any questions? That was great. But I don't want to ask on this group because then it's going to take forever. Well, Jonathan is available for all questions at any time. So just, you know. I have a question about metrics and aggregates, actually. I'm wondering, I mean, obviously you have an event-based system and you're rolling up some amount of aggregates out of that, you know, into your tracing system. But are you doing kind of all metrics extraction based on the system or do you have a totally separate metric system? And if so, are you kind of sharing, are you using the tracing system for context propagation and like how would those two things relate to each other? So we have, there's an independent metrics-based system for sort of like operational management. The traces will tend to, the traces can share some of the data from those metrics. There's some caveats usually there where like the metrics capture our system level but say we want to capture request level metrics within a trace. But we can sort of pull the same like overall system, CPU utilization, things like that. What was the, oh yeah. And so then that has a separate aggregation piece because they're kind of collecting at some regular interval over hosts whereas we're sort of like very fundamentally kind of required base. And then there was a question about context propagation, I think. Yeah, so, well, I mean, it sounds like you actually have things separated between system metrics and then maybe your application level metrics are coming out of the tracing system. But the degree to which you may want to dimensionalize some metrics is at all happening, you know, that context tends to get propagated in the tracer. I was just wondering how that relates. Yeah, so one of the places where there is actually overlap is if you want to understand, say, the overall global efficiency of a particular system and in particular the resources that it's utilizing as a whole, you do need to kind of look at the resources captured through traces that, you know, the trace starts at some particular point and then you're looking at sort of like what are the resources used by this particular request and then aggregating over all of these requests in some sampled fashion to understand like request-based utilization through the system. We're currently using tracing for that. Like this is fundamentally at Facebook like context propagation and tracing are fundamentally tied together for better or worse. And so that's where, like as Michael said, like we end up with use cases where people are like, man, I really need to propagate a context. Like let me turn on tracing and we're like, you know. Yeah, and to that, like we don't have a decoupled generalized context propagation system. And I think, you know, to that sort of like how do we do that safely with the number of teams that we have is sort of an open sort of operational question that, you know, we'll have to think about and see if we can tackle at some point. But yeah, it's interesting when you have this many different folks sort of that, you know, yeah, it just gets interesting. Yeah, I hear that. Thanks. Anyone have anything else? So on the safety piece on context propagation, like you've kind of given us the problem. And I totally understand the problem of web through this. Do you have any thoughts on how you're actually going to solve it? I have some thoughts. Although the interesting thing I realized I somehow when I redid these slides, I skipped my slide on rate limiting, which is actually the important safety piece we're tackling now. So actually maybe I'll just go over that really quick. The other big thing we're doing with the API cleanup is we're actually adding pervasive rate limiting it and as well after, you know, after the sampling, sampling actually ends up not being quite enough for us because, you know, if somebody's doing a coin flip that they expect to be on like a tiny percentage of traffic, like maybe in a region that, you know, is being used for, you know, for some experiment and then suddenly a lot of traffic fails over there, you know, suddenly you can sort of get this explosion of traffic just because of that. So we're actually adding rate limits on a per trace and trace size, you know, before we can actually sort of start new traces to cover that. So that's actually one important safety piece. As far as generalized context prop goes, you know, I think it's something we've been talking about for a while and starting to think about, I do think that opening up a sort of completely generalized system and like saying, you know, like any engineering team and an organization of our size is free to like go and attach baggage to this thing is like a non-starter. We have, you know, we've got systems that, you know, that are extremely memory sensitive where, you know, like the engineers that sort of own those teams would, you know, sort of rightfully just, you know, say very loud things. You know, and then I think if you look at some of the safety that was sort of in the paper that Jonathan wrote, I don't have a note piece around, but, you know, essentially it comes down to like having a principled way of like, you know, capping the amount of size, the amount of data that gets propagated. But then it sort of has this like downside of like, oh, maybe you really, really rely on a particular piece of data, you know, now it's not there. I think to the extent that I've thought it through, I do think it's really, really worth thinking about how do you separate out the use case where somebody wants to propagate an ID within their system bounds and then sort of attach metadata to it after the fact that can be processed by like some other system, right, and have it emit that data, but sort of like then not have that ID cross, you know, be propagated outside of the bounds of their particular, you know, set of systems and then separating that out from like the generalized context prop, which should be very strictly controlled and really only used for like a specific set of blessed things with like a decent amount of process around putting things in them upfront, right, and like some like, some like set of configurations that like can't be changed outside of like, you know, review by some accountable team. So, you know, to the extent that I've thought through it, like that's sort of where I've landed, but, you know, that's just, we haven't really, really worked on it heavily yet. I've tossed around the idea of like prioritized namespaces this realm, but I haven't actually implemented any of that yet. Yeah, you know, and I think stuff like that I think is, you know, it's good. But then I think you do sort of run into, at least in our world, we kind of run into this thing of like, well, okay, what if, you know, the data in your most prioritized namespace is larger than like the amount of data that like the most, you know, sort of the most conservative team is willing to accept. You know, so you kind of just like need some, I think you do need to like sort of really tightly control like, you know, the generalized thing and then try and figure out how to build the more, you know, hey, if you want to do something within your own system bounds thing on the same, you know, context prop, but, you know, so I think really the notion of having like some sort of system bounds that say, you know, don't propagate these pieces outside of this system bound, but propagate like these less pieces that like somebody's actually gone and gotten buying from like, you know, the most conservative team that that stuff's okay is probably necessary, at least at, you know, really large scale. Yeah, I mean, I think this like, it connects back to the, you can have a really flexible API, but you also want something higher level and very restrictive that most users interact with. And in this case, there are a bunch of like very subtle questions around the propagation of like, you know, a, you may have some session ID that's propagating over multiple individual www requests. But each www request should have its own ID and like, making sure users understand like what are the boundaries where things crossover and are able to do it in a safe way. I think like we're it's a very, very open question on our side of like how to make that work. Yeah, the boundary issue is just pernicious for any form of context propagation. Yeah, big, big missing piece of the internet right now. Yeah, yeah, it's, it's, yeah. Michael, can you elaborate on rate limiting? You said you do it after sampling. So what happens to the trace if you start rate limiting? Yeah, so, so the way we're planning on doing rate limiting is we're planning on still doing it at trace start and not killing traces that are in progress. So like, for instance, like, you know, from the point of view, let's say that you're an individual node and you know you have one in a thousand coin flip, right? But when we configured that coin flip, we had like two nodes running and then for some reason now there's a thousand running. What would happen is you do the coin flip, the coin flip would pass, and then there'd be an additional rate limit check and we would have a set of centrally configured rate limits that say, essentially, okay, this, you know, this this policy gets to start five traces per second. It would check the rate limit after that and the rate limit would then fail. And that's also where it would check the trace size rate limit, right? So, so we're only going to do it at start. We don't want to try to get it, you know, we're not going to try to get into like, hey, could we somehow like kill, you know, a trace that's too big halfway through because that's just, you know, yeah. Right. Yeah, that all. So does that answer the question? Yeah. Cool. Thanks. I mean, we've done something similar, but not for regular sampling, but for like specific box level sampling that people were abusing. So we rate limit those, but for the reason we didn't do it for that the regular sampling is because we extract some extrapolations from from the statistics we get from overall traces. And, and there the probability of sampling actually very important. So if you start rate limiting that you cannot do extrapolations anymore. So, yeah, that's something we'd have to watch out for because we do it, we do it and we do that sort of thing in some cases as well. You know, the intent of the rate limits is just for sort of our own safety. Like we don't mean we don't intend them to be things that are going to be hit under like normal operation, you know. So and we're going to monitor them, right? Well, monitoring on there, somebody like hits them in that sort of situation will let them know, right? But, but yeah, it's just sort of something that, you know, we sort of can't get away with anymore and continue to onboard people without being kind of skittish. They're kind of good for like new use case where when you have, if we want to have like, we're exposed to like dozens or hundreds of teams, we can't manually go in and understand like what their limits are going to be or maybe volume of data or number of traces or there are like their requests per second or all of this. And they might not even know. So like, there's just so much overhead in this space that if we can, for the start case, have like a same bound or here's the amount of traces you can get or the amount of data you can send us. Like our system doesn't have to fall over and they can iterate on that. And then they can kind of move quickly without us. It's also good for the case of maybe not the, the new, the new whole tracing scenario case, but maybe I have this distributor request and somewhere in that request, we can't continue on because it didn't have instrumentation or something like that. And now maybe it's like a huge, like I said, everyone hits and they want to add a ton of like instrumentation and now their service has 10x the amount of data pumping out that it was before. And this gives us like a good way of understanding. Oh crap, maybe sampling was the same, but the size of all these traces went up significantly and we need to have a quick pushback mechanism. So our service doesn't fall over. So it's usually like those are the scenarios where it's like allowing us to move like quicker and not fall over and then come and react to it and then have someone be like, okay, what do we do? Do we do this properly? Yeah, one of our, one who isn't here right now, he's, he was working on more of a sampling based approach where we dynamically changed the rates based on how much they're outputting. I mean, we have both, as Yuri said, because of the debug override that we allowed in the system. So like, we still need that case. But yeah, I mean, changing the probability is a nice way of doing it. If, if you can, it seems like it's pretty static right now. Yeah, you take into consideration like size, like, does that, like, probability will be, it depends on like what you care about, right? Like if it's not, if all your traces are the same size, like, Yeah, size is an interesting point. We haven't done that, but we've seen it definitely. Yeah, that's a big, like, we started with like number and we're actually transitioning more to size, because our colleagues tend to be on, or sizes is more, is a better indicator. We have such diversity in what traces look like. And especially when you're, when you're not doing the case where it's a new tracing scenario, but updating an existing one that just a ton of like an updating existing source that a ton of people hit, it's really hard to understand, like, is this going to make us fall over immediately? You might have someone who's like 50% of all of our traces, but they're only like 1% of all of our data. But if they were to increase their data significantly, like we might just fall over immediately. And they might not understand that we don't have a good pushback mechanism. Yeah, are you doing the dynamic sampling just by propagating the sampling rates through the baggage? Oh, actually, we have like, we have like a remote sampler. It queries basically on a minute by minute basis, I think. So it's like we, the ingest side is sort of calculating the throughput from multiple areas and figuring out a good strategy of probability there. Yeah, yeah, that's good idea. It definitely adds complexity. I mean, I didn't write it so. Yeah. Facebook has something similar where we'll like get a constant volume of traces and we'll adjust the sampling rate accordingly with some feedback mechanism like running every couple of minutes. And that sort of has the same like, you know, there are, there are challenges with it. Yeah. Well, the main challenge is that there are some workloads which are periodical. And so when they're quiet, their probability kind of climbs up because there's nothing coming and then suddenly boom, they flood you. Yeah, we've gotten hammered with things like that, particularly around like aggressive rollouts to where somebody has like a high rate during testing and then it gets rolled to production. And there's like four to seven minutes that are not fun. And then the ability to adjust. Right, I think we're out of time. Yeah, that was that. Thank you so much, Jonathan and Michael. That was a great presentation. We'll be posting that on the internet and see you all next time. The internet. We'll be propagating it. Yes. See you all later. It's been fun.