 Linux screen sharing actually works. It's a miracle all right, and we are now recording All right, it's five minute pass. So I think you can get this kicked off. I don't know if VHS is on the call and Yeah, I don't mean to quick intro it's hot. Yeah, let's let's start it off that way cool. Yeah, so Todd Lutkon is I I guess, you know Todd has a lot of different stuff, but he's The lead for the Kudu project Happy to do project, which is a pretty interesting data store that Polymore a good combination of It's kind of like a chimera of different sort of pretty it's pretty powerful actually and and he's done some incredible performance work Making it work on a variety of different workloads, which I'm excited to see him present here today He's also a committer for a long list of other Apache projects and incredibly well-respected in that community so I'm really excited to hear his talk and You know, I don't have too much more of an intro aside from that, but the support good Thanks, Ben So yeah, I didn't prepare a whole lot then said this is usually a pretty laid-back relaxed call So I threw together a few slides starting by 20 minutes ago So I apologize for lack of polish here. Also, feel free to jump in I don't know how many folks are on the call But if someone wants to jump in and kind of guide the conversation to stuff that you all find interesting That's fine by me So I do have a couple slides and I also just wanted to show a couple of things live I guess we've got about 25 more minutes before you got to get onto other stuff If you need to talk about fine, this is probably more interesting than whenever we'll talk about that Okay. Okay. Well, we'll play by your then Click up anytime So yeah, 30 second intro to do I guess Ben already kind of covered it It's a distributed column store for those of you who aren't database people column stores basically Organize the data that you store in them Sort of in a column oriented manner so that each column store together and when you want to scan just one column out of Say a hundred columns in your table You can do so that having to waste the IO reading all the others which makes it very well suited for analytics One thing that we did included though is try to actually make this also efficient for random access So all three do is typically use for analytics We do some use cases that are pretty random access oriented and we do run some benchmarks using the more no sequel random access style benchmarking tools like YCSB that people might be familiar with It's also a distributed system. So we use raft for replication Imagine people here probably know about raft is basically another implementation of consensus very similar to multi-taxis So we do care about latency. I wouldn't say a latency is our number one concern We're not typically running a directly web-facing properties off kudu But we do usually have end users who are on some bi tool and they expect queries to come back sub-second And oftentimes that sub-second query actually boils down to hundreds or thousands of requests underneath So the the tail latency is actually pretty important where one tail outlier at the 99 percentile actually tends to dominate a lot of workloads I think people here probably are familiar with that whole idea There's the great paper that I really like called tail at scale from Google maybe six or eight years ago If you haven't read that you definitely should if you're working in Tracing. I Don't want to talk too much about kudu though I think the way that we approach building kudu is essentially to build a bunch of kind of generic systems infrastructure People who work at companies like Google or uber probably have a lot of this stuff already in-house not open source Unfortunately, we started from scratch on a lot of stuff So we built a lot of these things that probably seem familiar to people from either other companies or other ecosystems That are pretty generic to any distributed system software that cares about this kind of stuff I think most of the talk you don't need to know anything about what kudu does Just think of this as a platform for building high-performance low latency system software So I'm going to jump right into some of the various things. This is kind of like a grab bag talk It's not like there's one arc or story to this. I just kind of here are the various things we do what we've had to be useful The first one is pretty simple request scope tracing This is probably the thing we do that's most similar to open tracing where we have a By the way kudu is almost all in t++. So all the stock is about our c++ backend. We've got a macro called trace It takes a little substitution string with dollar sign placeholders And pretty much every RPC starts a new trace and we can pass it between threads And then the tracing is sort of gets appended to whatever the current trace is This is not actually a hierarchical trace. It's not really dapper or open tracing style. It's really just a log and When we accumulate this log So if you an RPC we sample it so we have different sampling buckets for different latency profiles And also we actually have timeouts propagated from clients So whenever a client sends an RPC it says hey my timeout is one second And on the back end if we realize that we responded to that RPC after one second We'll always dump the trace that RPC. So it gives us a pretty good idea What's happening on the RPC that are too long? Very very simplistic, but it's you know again It took like you know two hours to write whereas open tracing is a much more complicated thing And it's super super lightweight. There's no infrastructure. It's all in process We don't need to hook up to any collectors anything like that. So it's limited scope. I Mean that both in the computer science sense the word scope and also in the how much it accomplishes But it's been very very useful for us One thing actually I didn't put in the slides. We also have for each of these traces a very simple map of counters So if you look at an RPC trace will have the log and it will also have a bunch of counters Some of them are pretty generic. So our spin lock implementation will count how many cycles were spent spinning and attributes that's the RPC And then we also have a lot more Specific to the particular request. So if you're doing a right We have to write to the right ahead log and you might have time we spent Waiting to write to the right ahead log becomes a counter on that trace. I'll show some examples in just a minute Actually, what I do is in line show examples while I talk. So I have another browser here So here's a server. I just started running on local host I'm showing the RPC V page. I wish as they're running RPCs and sampled RPCs, but I've never made any RPCs to the server yet So there's nothing in there But if I go to my little Python shell over here and call this tables, there's no tables in this cluster So I just started it But that would have made an RPC. So if I reload this page, I can see the current RPC connections that are open Where it's from the state if there were an RPC currently running it would show up in this inbound connections list And then we can see a sampled RPC trace. So the trace here, unfortunately my browser doesn't show the new lines But you can see the time that it arrived How many microseconds it took if coming on to the call queue coming off the call queue Handling and then 220 microseconds later QA and success response I think this is a debug build. So all the times are much lower than you normally expect In reality, this probably would take, you know, a few microseconds not However many to stick is a hundred microseconds This is very simple, you know, if I do a bunch of these calls Probably all of them are going to fall into the same bucket. So we're not going to actually see this change But we resample once a second If I go to a actual one of our production servers For an internal use case here at Cloudera and check out RPCZ. We can see there's a lot more going on There's a bunch of connections open from a lot of different hosts In fact, there's one call that's currently in flight and you can see that the client sent a three-minute timeout on this This is a scan a call So far it's running for 11 milliseconds Lots of information about outbound RPCs because the servers talk to each other So this server has an outbound RPC to another server calling updates and census You can see it sent the call hasn't yet received a response If I go down to look at Some of the more interesting things you can see here a dark tablet copy, which is one of our re-replication RPCs and The whole information about what happened And then here's the metrics that I mentioned that every RPC has various metrics Some of them are pretty generic so our IO code will account these metrics like fdata sync How many we did how many microseconds they took how many microseconds we spent waiting on Mutex's DNS Some reason this request started a thread how long it took to start the thread Every thread pool that we use has Q time and runtime CPU runtime, I don't know why they're not in alphabetical order here, but So that we have a thread pool is called raft and a thread pool called tablet copy that this request used We can see this tablet copy thread took quite a long time. This is actually downloading a bunch of data from another server It's a longer request This is one particular sample That took 82 milliseconds, but if we scroll down you can actually see there's another sample of the same RPC that took longer And if we're lucky you might even have an example of a very long one This is pretty useful to find out like What are the outliers? What happened in an outlier that was different from other outliers? You know maybe it's the fdata sync, maybe it's mutex time You can go through and see all the different RPCs that we do That's sort of the simple RPC tracing that we do Another thing that I really like is that I found that oftentimes a single RPC trace won't tell you a whole lot It'll tell you hey this RPC took a long time waiting on a lock or took a long time waiting on IO We don't really know what happened that actually caused that it's some cross-request interaction So we separately have an infrastructure called process-wide tracing Unfortunately, they're not really integrated. They were implemented separately and never really Changed to use the same annotations or anything But basically these are mostly scoped annotations And there's a way to actually draw an arc between two scopes as well if there's an async event fired in one place and picked up another place and Essentially you have a category for each trace event some human readable name and then some set of variables And again, this is super low overhead when it's not on and then you actually enable this when it's on there's pretty high overhead when it's on because We have a lot of these trace scopes So I'll pop over to This server I have here and go to tracing dot html and hit record And you can see there's a bunch of different categories. We have this is the category that are in that trace annotation I'm gonna say record and then I'll make a couple RPC is this server and it stopped fortunately doesn't look great when I'm zoomed way in for the For the screen share, but you can see on the top. There's a timeline of CC usage and then various threads down the left You can see the one RPC worker was actually involved. I think I called request. I called list tables four times you can see one two three four if I zoom way in here I Can actually see the timeline that this call started here Method list tables It got picked up on this reactor thread, which is our network Easily be for a network been faced. I oh it did the parsing of the port of us If I turn on the slow events, I think we'll see Actually, I think the new version of the browser doesn't support these but it should actually draw an arrow here from here to here Showing that the the call it parsed here and picked up by a different Thread and you can see it actually includes in this trace the traces that we just looked at you can see When it was picked up when it was handled what the metrics were in this case, it's a pretty uninteresting call of no metrics And then responding success So this is again not a super interesting RPC. It's just a list table But if I go to the tracing page of one of our production servers And record it should actually fill up the buffer quite quickly I'll capture a couple seconds and you can see there's a lot more going on a lot more threads a Lot more RPCs going on and there's actually some RPCs that are taking pretty long. So if I click on a scan I can see that it took 705 milliseconds and I might be able to zoom in and see this is continuing a scan meaning that it started in the previous RPC It's reading some blocks. It got a cache miss That's probably going to be blocking on IO Just gives you a pretty good idea what might be going on you can zoom in and really see at the fine-grained level Here's a cache miss. This one is pretty quick. It probably hit the OS buffer cache. Where is that one that was pretty long? It's probably Actually going to disk this took 12 milliseconds. It's probably hitting a spinning disk This is all very useful because you can actually see Kind of cross-request when one thing might actually be causing an impact on another We're also able to see pretty interesting patterns in terms of red thread pools But we used to not have a lifo ordered or a thread pools So we'd round Robin across all of our workers and we wouldn't get this kind of nice chunking We're only a small handful of our PC workers is active It would actually be around robin a across a hundred threads and really hurting the cash performance and things like that So this has been very very useful for us to find process-wide lockups. We found some issues at TC malach for example We've seen some issues with Linux kernel where the M MAP some before gets held and then all the other threads block for apparently no reason But they're actually they're all blocked on a lock in the kernel But I found to be very useful It's way more information than you'd actually get from something like open tracing and it captures the cross-request So I think things like open tracing are useful to pinpoint. Hey, the server has high latencies But when you actually want to dig into what's going on on that server, this can be more useful Another nice feature of this This is actually the trace viewer that's built into chrome so I can type same here I can actually save a JSON file and We often are playing at a customer site on-premises and they can make these JSON files and attach it to a support ticket And then I can load it into any other cootie server or even in Chrome. I can just go to about tracing and unload the Wherever that json file went And it'll load in and display on anyone's Chrome browser In fact, it might even display a little bit nicer because it's probably a newer version that we've embedded in cootie itself So that's the process wide tracing In terms of inter-process tracing we actually haven't had a big need yet We don't have a lot of super deep RPC call stacks at least within cootie I think there's some cases where a user application if they're building like a website They might want to do tracing which case we want to support it for the consumers But in terms of cootie itself when we get a request our request is going to maybe wait on one other Server for replication, but that's about it So we don't really we haven't had a big impetus to go and do open tracing or dapper or or zipkin or Yeager or anything like that I also wanted to call out the unreasonable effectiveness of log statements So we have this really stupid macro that I probably wrote on the first week when I started writing cootie Which is scope log flow execution see past number of milliseconds and then some strength and It's just checks if this particular scope that you put it in takes more than that number of milliseconds It'll log out a statement saying hey, I took a long time to do X This was incredibly useful in customer environments when they they kind of called up and it's a hey Who's being a little bit slow? Well, why is it slow that I outbound it? I don't know here's the logs figure it out And just having these kind of markers and the logs that say hey look writing to the right of head log a bunch of Threads log this thing saying that it took a long time to write the right of head log It's a good point that maybe you're right ahead log good flow or overly contended by other applications and things like that So it's super simple, but pretty useful for the amount of effort it took So we just have these sprinkled around our good base in various interesting places A newer thing that we've added is process-wide stack trace collection So we have a thing called the diagnostics log now. I think I can probably try to show you that So just by default we run with this diagnostics log Which gets put in a long directory in this case. It's a dev build that's in temp. So if I Look at that file. It's semi human readable And basically you get stack trace records Which are by default once a minute with some jitter So we don't actually correlate with any kind of scheduled once a minute task So this one's 45 seconds apart. This one is another 45. This one is a little longer And then in order to make a little bit smaller we do a little bit of Dictionary encoding of the symbols till here in the stack trace line The stack just have hex addresses and then interleaved. There's these symbols lines which map those hex addresses to particular Particular symbols and function names The other type of info we put in these logs is metrics dumps. So we have a lot of metrics that are captured From the server histograms counters things like that. I'll talk about that in a minute We found that even though customers may have centralized metrics collection Oftentimes those do a lot of down-sampling or aggregation and it's hard to get down to what happened at this exact minute Or in between these two exact minutes What was the 99 percentile log of pen latency on this particular server? I think the best companies in the world probably can answer that question Most companies can and if they just have this really dumb gzip log in our log that you can get from the customer and look at this We have various tools. We can take these logs and graph them and calculate various derived metrics. That's been super useful Again, it's kind of the simple thing, but works pretty well And we've got a script called diagnosed parse stacks, I think it's called So they're on parse stacks on This log It'll print out a lot more information so the stacks and it does the symbolization shows it by thread Groups together threads that are all having the same stack So we have four reactor threads that are running LibEV event loops and it groups them together And it's a lot easier to understand what's going on there versus seeing hundreds of threads all with the same stack So I mentioned these are periodic We also have triggered collections. So we have RPC queues where when an RPC hits the system It goes into a queue waiting to be handled by a handler thread and we have a pretty tight limit on the length of that queue And if something arrives and it doesn't fit in that queue will evict an RPC in that queue based on priority Send back a message saying you need to come back later Essentially doing back pressure on the client by that mechanism. And when we actually do evict stuff from the queue we trigger At that point a stack trace of all the threads This has been very very useful for us finding reasons why something is locked up So there's some set of underlying locks that get held Maybe it's a logging issue or a kernel issue or something like that and then very quickly the RPC queues back up And then the trigger is a stack trace. So we have the smoking gun snapshot of here all the threads here Where they're all blocked and pretty quickly we can point out these issues the techniques like that have allowed us to find issues like In G log for example, if you just use the Google logging library in its default mode There is a mutex around logging and that mutex can be held while it's actually doing IO and IO can take a long time So we've seen these issues where all the threads end up locked on G log So we've moved out to async logging and those things got a lot better So these kind of techniques again pretty simple, but work really well The stack traces are also viewable on a slash stack webpage Again unreasonably effective simple thing So if I go to one of our production servers go to slash stacks pretty quick And I call this kind of a poor man's profile also if I'm curious what a workload is doing is it scan heavy Is it doing a lot of IO is it waiting on a CPU on something? Usually just a couple reloads this page. It gives you a pretty good idea of how busy the server is and what might be some bottlenecks So it's interesting to me to see a hash table look up on the serialized roblox call This is actually a known performance issue that we've since I think fixed so Very poor man's profile every loaded again I also see the standard hash table signed here on this call like that probably shouldn't be in that call We should have something a little faster there So slash metrics is pretty simple. It's a lot of metric stuff We built our own metric subsystem. We couldn't really find much good for C++ I think now the maybe the census project is trying to do a little bit with this But we implemented the HDR histogram data structure for high resolution histograms and all of our RPC uses as well as a bunch of other things throughout the code base track the latency histograms So you can see in this example that this particular right RPC Has two sniffing and digits of precision. We've done some number of them. I mean all the presentiles And these actually keep the raw bucket counts as well underneath So you can fetch it from slash metrics if you had a special query parameter and they end up in that metrics log So given the metrics log and given snapshots of the raw bucket count, you can actually say between any two points in time What are the 99 percentiles of a bunch of different things and we calculate it on the server level as well as individual tablets? And you can actually aggregate those raw bucket counts across tablets to say what's the 99 percentile for this particular table That's very useful for us as well to understand where the bottlenecks might be or where slowness might be coming from Another fun thing we built a couple years ago that found a lot of interesting issues is the stack watchdog So on important threads of things like for example the right-ahead log append we use this go to watch stack We give some number of milliseconds that we expect it should not take more than 500 milliseconds to append to the wall And then there's some background thread, which is the watchdog thread which essentially scans the registry of all the other threads and Does a kind of lock-free check to see if any of these threads is inside one of these particular scope watch stack scopes and If anybody's been inside one of those scopes for longer than the amount of time It's expected it'll actually take the stack trace of that target thread and dump it to the log There's some rate limiting to make sure that things don't go crazy in the logging It's been super useful to find various issues with the file system level or G log or TC Malik has been another case you found some bugs there Then in case the right-ahead log At this line of code was stuck for 600 milliseconds and it takes the kernel stack as well so we can actually see that inside the kernel it was waiting on JVD2, which is the Like the file system journal we didn't get right access to the file system journal So this is something that you know You probably would expect for right ahead log the fact that you had to wait for 600 milliseconds is a little bit unexpected Maybe that disk is either going bad or just overloaded In fact, it turns out that this is just due to Red Hat Enterprise 6 being really old and pretty bad implementation a lot of this stuff So you can also see in the user stack is in do write V Which kind of makes sense because the kernel stack is in the write V system call Um, so I think that's all the slides that I've prepared. I didn't want to go too long I figured questions and discussion is more interesting Is there anything that people want to hear more about or curious to see how the code works or anything like that I Have got a question. Thanks Todd. This is pretty interesting and I It's fun to see I kind of knew that you would do this Which is why I wanted you to do this, but it's nice to see a presentation about performance analysis and stuff like that That's not just like a hundred percent about distributed tracing because these other techniques are really interesting and relevant But one thing that comes up in my head. I think actually this is a fine example. It sounds like in this case. It was the issue had to do with Red Hat Enterprise 6 not being a very good implementation But a lot of the things that you're probably dealing with have to do with contention For some shared resource, whether it's the disk or something else then I'm curious like what you know Do you have techniques that you're using to understand the source of load when there's just no Contention issues overloaded resource that type of thing like what what are you typically doing to find the multiple writers or whatever it is That's contending for a resource Yeah, we don't have any super generic things for that I think specifically for lock intention our spin locks are instrumented with I probably should have talked about that here spin locks have some instrumentation where they collect the stack trace of the Unlocking node when it unlocks it sees that there was a waiter and collected stack trace. So it kind of knows which holders Were causing attention of somebody else And then we exposed that through the P prof web interface So I'll see if I can actually Yeah, so if I go to the special URL which can be read via the go P prof tool as well It'll tell us over this one second The various stack traces where we had some contention And I need to be symbolized if you have the binary as well So this is super useful for the generic kind of spin lock contention It won't tell us exactly like it tells us the stack trace and the memory address That doesn't tell us what kind of application level object was contended But it's usually fairly clear once you get this data that okay, at least I need to zero in on that part of the code to Understand where the spin lock contention is going Similarly you can get CPU profile from this kind of endpoint Honestly, I find the slash stacks to be Unreasonably useful for this kind of thing as well So one interesting example is maybe six months ago. We learned that TC Malik, which is the allocator we use has Sort of fixed free lists for all sizes less than one megabyte allocations But one megabyte and above Actually goes to like a central span list which was actually implemented as a linked list until very recently So just by looking at the slash stacks profile We thought why are half of our threads in TC Malik allocate large Interating of our linked list just in the stack traces you can see that and then by digging into the code We realized that what we thought was a less than or equal to one megabyte was actually a less than one megabyte and All of our arena allocators have been tuned to get one megabyte allocations By changing that down from one megabyte to a little bit less than one megabyte We got rid of a bunch of latency outliers It's kind of like this very stupid thing where we went to like 1,020 kilobytes or something And then we also submitted some patches upstream to TC Malik to actually Catch the one megabyte allocations and to make the central free list not use a linked list and use a tree instead And those two changes actually increase the throughput on some workloads, which did a lot of larger allocations by like 40-50 percent So it all kind of started from seeing something strange into some stack traces and then from there We did the next level of digging to actually understand what was going on Awesome. Thanks. I got a question actually First of all a great presentation. It's awesome to see all these details One thing that that comes up a lot is you've got all these different tools for for looking at Instrumenting various parts of your system kernel level stuff stacks threads user logs and Where some of the trickiness Shows up a lot of the times is sort of figuring out the right granularity of It in order for these things to be relevant You often have to sort of staple them together somewhere and that active Stapling them together itself has overhead and that often seems to come up as like the trickiest part And I was wondering if you had any thoughts on that or could you know relate experience report with trying to figure that part out Yeah, I definitely agree. I think we have a lot of systems that are useful in the right hands, but hard to expose what you should be looking at So we're trying to document things better. We're starting some runbooks for our internal support team to understand like how these things might be useful In terms of like correlating. Oh, I saw one outlier, but I wasn't collecting the traces at that time Yeah, we definitely have that you kind of have to like hope that you catch the thing happening that you wanted to see So it's a little hard. That's why we started to add more of these features like the diagnostic plug It's just always taking stacks and that's now on by the fall Yes, it took us a little bit of nervousness to be like is it actually safe to have this thing taking stack traces once a minute Because when we first implemented we actually found deadlocks and the dynamic loader If you tried to stack trace the thread while it's in the loader and we have awful work around to try to prevent that So I think there's always risk when you add this instrumentation either performance or bugs And I remember actually the first time we added the contention profiling I introduced this awful memory correction bug where I was writing outside of the stack and That almost got released to customers and it would have been really bad because we had a lot of crashes and things like that So there's always risk and I think for us It's okay to have even like a five or ten percent Performance reduction. I think our customers are not so performance sensitive And they're a lot more sensitive if stuff is down and they don't know why or stuff is performing badly And they don't know why and it takes us, you know, it takes us three weeks to understand what the performance problem is There'll be a lot more upset versus if you say well, you've got a five percent overhead But we can confine that problem in an hour instead of three weeks It's usually a good trade-off for us. It's probably not the case for every company but we tend to Lean more towards that side of the spectrum, I guess Exactly what you were getting at that's sort of our philosophy. Yeah, that is sort of it around There's well one trying to figure out the right granularity often seems to be part of the trick beyond it being potentially dangerous You know that there's some there's always some overhead that comes with this stuff and sometimes it just seems Especially writing databases C++ stuff people can be very very obsessive about Maximal efficiency and then you're saying what we're we're just going to add five percent overhead to figure out what's wrong with it It's almost like a like a cultural issue that sometimes you have to convince people that it's it's worth it Yeah The best example I can give there like yeah, we always have a five percent overhead But the five percent overhead has allowed us to pinpoint performance issues that have saved us 40 or 50 percent I think we've got me huge gains from things we found using this infrastructure We never added this part four percent. We still be stuck way back in you know a year ago, and it was much much slower So if they spend a little to win a lot Cool, great. You have one last thing that I didn't show is the heat profile, which is another thing. We've turned on more recently It's not even on on the server. Could we turn it on so recently? But the TC mile and keep sampling is one of these things that's not really well advertised It's quite low overhead and I think probably for our next release. We're gonna turn it on by default We have a lot of our own kind of internal memory tracking to understand where our memory is going but sometimes we have this case where the customer is like Your internal memory tracking says you're using three gigs of memory But like the RS ask according to top is 16 gigs. Where did I'll go? And we don't really have any clue usually but with turning on the keep sampling thing even though again might be a 1% overhead We'll be able to answer those questions a lot better And when we first turned it on started looking at some workloads with it We found huge wins that were like, hey, why are we storing that thing? We don't even use that thing and we remove it and they have 8 megabytes here or 16 megabytes there and it adds up Yeah, makes sense Right any other questions for Todd? Looks like that was it But that was a really great great presentation and we'll chop this up and put this on the web Todd for other people who are interested in kudu to check out Definitely seems if you're using kudu like this is like a great rundown of all the things you can do with it Yeah, yeah, I think the most advanced users probably find you I think a lot of users probably don't want to care about this Definitely just hope it works. Yeah, we've actually use it on the dev team a lot Although if anybody has any further questions feel free to I'm on the getter the open tracing getter So feel free to ping you there and I'll I'll check in later today Awesome. Thank you. Thank you so much Yeah, thank you for inviting me Ben. Thanks for having everybody Thank you Okay So Back to our regularly scheduled programming. We've got a couple things on the agenda around open tracing api questions First one someone put on here trace ID span ID. How do we make a decision to proceed? Is that Yuri? Yeah Yeah What's the question So I I don't remember what's the status of the spec I receive for this one, but definitely There's a lot of people keep asking and trying to open PRs in different language records. So Yeah I think we should just get moving on it. I know just like with my team We've been really focused on getting the scope and scope manager release for python at the door Uh, and so we haven't felt like we've personally had bandwidth to To also release and manage this in other languages while that's going on So that's honestly probably the hold up Python supposed to go out this week There's some final a little back and forth about naming conventions but People in general seem satisfied with that api. So my plan was as soon as that was out to start pushing on span and trace ID So my question is uh, are we okay with breaking the api in this case? Because it will be a breaking change in many languages, right? It's definitely in go And it may come it may clash with the existing tracers already implementing those methods, but potentially with different return types Yeah, so I would say the there's like two issues there one it it's a breaking change for tracing implementers um, but Not a breaking change But it's it's a breaking change is backwards compatible. So you now have to expose these on your tracer and issue A new version of your tracer, but that tracer will conform to the older api So it's not like you need to fork and maintain two versions And for users of the code, it's not a breaking change at all because it's simply an additive method So in that sense, I think everyone's fine Uh with it being a breaking change um, because it's it's more of just an in minor update as opposed to a major break uh the Other issue that that's maybe more serious or harder to see is around Naming these methods should they be called trace ID and span ID or trace identifier and span identifier? Which is a big mouthful, but uh, definitely limits the chances of a collision with a pre-existing Uh pre-existing method that returns something else. We've seen one example of that, which is the mock tracer has trace ID and span ID and it returns like a un but uh, we've been asking around actual implementers and no one With a tracer currently binding to open tracing has spoken up and said No, that won't work. So I think that's really the the final bike shed there There's been a lot of push from everyone to be like trace ID and span ID are nice names Let's and it doesn't seem to mess up any real code. So let's do it Uh, I am a little nervous that someone will show up Too late and say hey this messed with me. It's not quite a non-breaking change because uh In the absence of this apis today people do cast Uh to concrete implementations and use those existing methods Which may be written in different types. So though the end user code will be affected So that it's just a If you're casting I don't The only case where this is potentially a breaking change would be if You literally had the method trace ID and span ID with the same capitalization and everything else Yeah, which most most tracers had I think That that's that's been the question asking around like who literally has this method signature that returns something else and And no one has spoken up saying that they do The the other answer is just to name it something slightly different Right, I think that that's the final question that has to get resolved if you call it Just a slightly different name um, then You massively reduce the chance of there being a collision So no one called it trace identifier because that's really long to type It's just you now have this api. We're asking everyone to use and it's and it's Got a funky method signature As a result of this so really maybe it's like can we do a more exhaustive uh audit Of existing tracers that bind to open tracing and really get an active confirmation that It will or will not be a problem Well, as far as yeager every single library in yeager had a trace ID and span ID in the most Idiomatic form for the language, right? So in in go it would be like upper ID, etc So we're definitely gonna have clash and they would return like native types rather than strings But why don't you say something man? We've I did not see that that question anywhere But you said like mock tracer had the same thing and I would assume most of the Traces had the same thing. Well, then it's just a matter of of calling it Just calling it something else. Uh, I think that's that's the solution um Maybe something that's not as long as identifier For people who have to type this out manually. That's hard to remember how to spell and very long So I really think that's that's what we need um Span key trace key I'm fine with identifier myself, but Yeah I mean code completion seriously who types this stuff. I typed it Man, I type it I can't remember if the i goes before the e or how many e's there are I'm bad at typing and have no code completion sometimes Anyways, I feel for people Uh, it's also very long on the screen. I don't know Uh, let's not try to like go through the whole issue. I mean, I think Yuri's question is like how You know, this has been a known issue for a long long time in some languages And then, you know, the natives are getting restless and that there's like, you know People file issues frequently about this and we've kind of concluded that we should add something But that we haven't made progress 10. I think you're Accurately saying that like there's basically resourcing issues. We're doing this It seems like a simple change because conceptually it is but like it does require a bunch of Rollout care because of these issues that we're bringing up. So I think the question is what's the next step I mean and I would rather not if possible I get into the discussion the pr discussion in this call or whatever but when we could be something where we I mean the opening up the pr Without merging it In most languages is a very easy thing to do. I mean truly easy thing to do And it could be done, you know, like without getting everything through opening the prs Advertising them soliciting comments from from, you know, implementers That sort of thing could probably be done without a lot of time investment at least that's my two cents And is the stuff that could be taken sooner would also A lot of people who are coming in and filing these issues to see that in fact, this is this is like There's something in motion. I don't know how you feel about it But I think like did that stuff itself could be kind of parallelized so to speak So it's gonna be a happy remain open for a while Anyway, just to make sure people see them and and get a chance to comment I would agree with that. Um, yeah, and again my apologies for being uh, You know, maybe too focused on python right now But it's there has been a long running pr about this we could essentially socialize it a bit more And just kind of put it out there make tracking issues in every language And kind of announce that it's coming But uh, yeah, I um I do think what Yuri just said does me around coming up with a different name for these things Uh, just to make sure we don't collide Uh, so I think that's the final bike shed But but we should move on it very quickly once we've resolved that and I think if we go with the approach of Just picking a name that has a low chance of collision with anything that There's no reason why we can't get a release candidate out in every language quickly Um and get people to start binding to it So I think it'll move very quickly once once we do that I will uh, I'll uh try to get moving on that on Monday actually so Um, and the new python api should should be out probably on tuesday cool Okay, so we've only got uh about 12 minutes left on the call I wanted to bring up, um another issue that I would like us to get moving on as well which is um Sort of higher level apis for scope management. So i'm just going to share my screen real quick Just to Make it clear what i'm talking about If you go into presentation mode here so Uh, it mostly comes down to uh having both scopes and spans So we added a sort of active span concept to open tracing Uh, so that the tracer would be Uh responsible for managing, uh, which span was active in which uh context And if you have some kind of context switching whether it be threads or some async level userland level thing the the tracer would be tracking that using a span a scope manager so each context, uh that has a span is called scope and Uh, you can ask the the scope manager for The currently active scope and pull the span off of it Scopes have to be closed when they're done um, and that doesn't Always necessarily line up with a span being finished because you may be moving spans from context to context. So you may uh Make a span active in one scope Then uh close that scope move the span to another scope so on and so forth So it seems that at a lower level um, there is a need for For this extra concept of a scope manager um but If you see here, uh the amount of code you have to write it's not totally onerous um, this is just two simple functions start work, which uh Uh makes an active span in a scope Put a tag on it and then you finish it in another function So you pull off the active scope maybe do a log and then close it If you're writing library instrumentation framework instrumentation, uh, this usually doesn't feel too onerous because You're writing code inside of a plugin or an interceptor and most of the code you're writing is is really focused on on tracing Uh, this higher level concept. So that doesn't feel too bad However, uh, or at least to me it doesn't feel too bad Uh, but for application developers, um, if you if start work and finish work contain a lot of application code And you're you're doing quite a bit of this It gets onerous pretty fast It's also hard to get application developers up to speed on your team because there's kind of these like extra Concepts, you know, you're saying build span start active, but you don't get a span back you get a scope uh, and then If you make the span automatically finish when you Uh close the scope that's nice But now you're saying scope close at the end and you never touch the span there either So this adds like some cognitive load Uh, that's sort of above and beyond the the simpler model that we had initially envisioned so If you look at a a simpler api Uh, if you make some assumptions that you can make when you're writing application codes, such as the presence of a global tracer um You can make this a lot more declarative Right, it's possible to create an api where you just say start a span and it's automatically made active Uh, and then you can access it declaratively because you have access to a global tracer You don't even necessarily have to track the tracer or do any kind of object chain object method chaining You could just say hey tag the current span Log on it and then when you're done you can say hey just just finish this thing So i'm not proposing this precise api. I'm just proposing that it should be possible to produce an api that's That's this simple And in order to get application developers More comfortable I think as a community we should push for for providing some more official ergonomic api If not looking like this at least at least something with this level of complexity So that's that's my pitch I'm going to be pushing for this In the cross language working group Starting next week as well But I was interested if anyone had uh comments on this at this time uh, or thoughts about How to do this or any kind of experience reports? Uh from working with scopes and active spans in the field Well, I was I had to say that, um manual from a day to dog He had mentioned that he would love to see something like this as well. So I think this is something Um Nice, uh, I think it would require a bit of testing of course and refactoring and all that But I think it's it's overall a great idea Yeah I've heard from several people who couldn't be on this call that they're very interested in something like this Uh, so, you know, we'll have a discussion online on getter but um, there's also just the sort of general issue of You know, do we need scopes scope managers that kind of a thing? Um, Pavel, I know you are asking uh about that. Uh, do you have any thoughts on this? Okay there Pavel Yes, I'm here. Uh, I don't have No thoughts Well, I'm writing mostly instrumentations and there I prefer to pass things explicitly around Yeah Yeah, I mean, I think this really hinges on whether you're talking about instrumenting stuff in, uh, library versus just trying to get your work done as an application developer and these sorts of Higher level abstractions. I think make a lot of sense for the latter where we want an easy mode type of experience, but as Pavel was saying for you know, very Uh, meticulous instrumentation of shared libraries. It probably makes more sense to avoid the globals and stuff like that Yeah, I think a side effect of of making something like this more Aficious is is to make it clear that there would be two Style guides when you're writing instrumentation There's a style guide that talks about You know, don't presume a global tracer, right always take in a tracer as an option And fall back to the global tracer if I don't give you one Um And basically you wouldn't get to use this this cleaner api because this api makes a bunch of assumptions essentially Uh, the purpose of it is to hide Scopes and some of those lower level complexity that that you actually need when you're writing trickier instrumentation But but don't need in the most common use cases that application developers are are hitting over and over again Yeah, I would have some comments to start and finish but tag and log Looks very nice Yeah Yeah start span finish span maybe but but the long Basically the long and short of it is can we take the scopes and scope managers and make that a concept that as an application developer You you never have to think about that. You're not necessarily even aware that they exist Until you get into some tricky situation and then you dig into the docs and discover There's actually these lower level apis that you can use to to deal with those situations I mean, maybe it's better idea to completely leave out the start and finish And provide some api only to add metadata if there is an active spanner or something active Hmm Yeah, well, let's have the discussion on getter. This is basically mainly just a sort of advertisement to people that We want to kind of get moving on this um and Really, we should have it in a In a forum where you know people in time zones that aren't this can't make it to this call can participate in it But I would if people have ideas about what this kind of api Might look like or you know, if they're already working with application developers who have written something like this It would be great to start You know some contrived repos That are experimenting with this one nice thing is i'm fairly certain we can write all this without actually touching the tracer api That I think would be one of the goals So there's a lot of room to sort of experiment with different approaches to this and contribute I'd just like to to to get moving on that One thing I want to add is when I saw this in agenda. I thought that would be a different topic More about high-level apis for specific operations like HTTP requests or database requests. So which kind of I mean works in a similar way that people often ask like for some standard way of doing these things Yes, I I definitely think we need those as well and that could get get wrapped up in this For example, if you see tag where we say some tag key some tag value That's fine for your own custom tags, but actually You know going and finding the constants and kind of gluing them together when you want to do something like say You know log an error or an exception there are definitely there's definitely room for You know higher level functions that that do all that work for you where you can just pass it The exception and and not have to think about how that translates into which keys and values should be stuck onto the span Likewise for something like HTTP request Database request we could probably make some more ergonomic calls that aren't Creating a whole bunch of key value pairs Yeah, another thought on this is that we You know in this discussion earlier around trace and span IDs We would need to make a change like that in some kind of coordinated fashion across languages I think that for some of the higher level primitives They they actually naturally should deviate from language to language Like if you're working in a ruby rails environment or something like that the types of primitives that you might want For convenience are actually different than what you'd want and go and so on so forth And and that can actually make the stuff go a bit faster I think when we have to do cross language stuff because I think we're now dealing with like nine languages or something like that It's a bit daunting to start one of those projects knowing how much Parallel work is going to have to take place and For the stuff Yuri just mentioned around HTTP and things like that that might need some coordination But for things that are really sugar just to make it easier to do simple things I can imagine that happening You know in a decoupled way across languages and let language owners make those decisions independently Yeah, totally I think another way of thinking about this is there's been a lot and lot of work of trying to figure out what is the correct low level api For tracers to bind to and that work has been slow going. It's very difficult work But we're getting it feels to me getting to the the end of That and that's starting to gel up and now it's sort of time between this kind of work and Things like getting span and trace identifiers out there to allow people to start building middleware and other things We're sort of moving up the stack to application developer Zone and things that they would like And that world is definitely much more opinionated and nuanced And there's room even within a single language to have more than one way to To do this. I think we should probably offer some, you know official Version of this at some point just to lower the cognitive overhead, but I totally anticipate You know in java, there's some people who may want to do this kind of thing with annotations Some people who may want to do it using some other declarative strategy Like you said ruby, uh, there's a lot of different metadata magic Approaches to doing things And what's great about doing these as high level apis is like not everyone has to you agree You can have several different approaches here and they're all complementary with each other Even within the same code base and We're basically out of time So unless anyone has any final comments on this, uh, I would suggest we we take these discussions to the cross language Gitter channel and continue them there All right. Good call everyone Lovely seeing your lovely faces Ha ha ha So