 Live from Barcelona, Spain, it's theCUBE, covering KubeCon CloudNativeCon Europe 2019. Brought to you by Red Hat, the CloudNative Computing Foundation, and ecosystem partners. Welcome back, this is theCUBE's coverage of KubeCon CloudNativeCon 2019. I'm Stu Miniman, my co-host for two days wall-to-wall coverage is Corey Quinn. Happy to welcome back to the program first, Ben Siegelman, who is the co-founder and CEO of LightStep, and welcome to the program for the first time, Morgan McClain, who's a product manager at Google Cloud Platform. Gentlemen, thanks so much for joining us. Thanks for having us. All right, so this was a last minute ad for us because you guys had some interesting news in the keynote. I think the feedback everybody's heard is there's too many projects, and everything's overlapping, and how to make a decision, but interesting piece is OpenCensus, which Morgan was doing in open tracing, which Ben and LightStep were doing, are now moving together for open telemetry, if I got it right. So, is it just everybody's holding hands and singing Kumbaya around the Kubernetes campfire, or is there something more to this? It started when the CNCF locked us in a room and told us there were too many projects and really wouldn't let us. To be fair, they did actually take us to a room and really start the ball rolling, but conversations have picked up for the last few months, and at least personally, I'm just really excited that it's gone so well. Initially, if you told me six or nine months ago that this would happen, given just the way the project was growing, both were growing very quickly, I would have been a little skeptical, but seriously, this merger's gone beyond my wildest dreams, and it's awesome both to unite the communities, it's awesome to unite the projects together. What has the response been from the communities on this merger? Very positive, very positive. I mean, open tracing and OpenCensus are both projects with healthy user bases that are growing quickly and all that, but the reason people adopt them is to future-proof their own software, because they want to adopt something that's going to be here to stay. And by having these two things out in the world that are both successful, and we're overlapping in terms of their goals, I think the presence of two projects was actually really problematic for people, so the fact that they're merging is net positive, absolutely, for the end user community. Also for the vendor community, it's similar, it's almost exactly the same parallel thought process. When we met, the CNCF did broker an in-person meeting where they gave us some space, and we all got together, and I don't know how many people are there, like 20 or 30 people in that room. They did let us leave the room. They did let us leave the room, that's true. We were not locked in there, but they asked us in the beginning, essentially they asked everyone to state what their goals were, and almost all of us really had the same goal, which is just to try and make it easy for end users to adopt a telemetry project that they can stick with for the long haul. And so when you think of it in that respect, the merger seems completely obvious. It is true that it doesn't happen very often, and we could speculate about why that is, but I think in this case it was enabled by the fact that we had pretty good social relationships with open-sensors people. I think Twitter tends to amplify negativity in the world, in general, as I'm sure people... Not a controversial statement. Absolutely, the negatives are something in the algorithm I think, maybe they should fix that. Yeah, exactly. And it was funny, there was a lot of perceived animosity between open-tracing and open-sensors a year ago, nine months ago, when you actually talked the principles and the projects, and even just the general-purpose developers who were doing a huge amount of work for both projects, that wasn't a sentiment that was widely held or widely felt, I think. So it has been a very kind of happy, it's a huge relief, frankly, this whole thing has been a huge relief for all of us, I think. Yeah, it feels like the general ask has always been that for tracing, that doesn't suck, and that tends to be a bit of a tall order. The way that they seem to have responded to it is a credit to the maturity of the community, but, and I think it also speaks to a growing realization that no one wants to have a monoculture of just one option, any color you want, so long as it's black, versus there's 500 different things you can pick that all stand in that same spot, and at that point, analysis paralysis kicks in. So this feels like it's a net positive for absolutely everyone involved. Yeah, like, one of the anecdotes that Ben and I have shared throughout a lot of these interviews is there were a lot of projects that wanted to include distributed tracing in them, so various web frameworks, I think Hadoop or HBase was. HBase and HTFS were jointly deciding what to do about instrumentation. Yeah, and so they would publish an issue on GitHub and someone from Open Tracing would respond saying, hey, Open Tracing does this, and they'd be like, oh, this is interesting. We could go build an implementation file an issue, someone from Open Census would respond, say, no way, you should use Open Census. And with these being very similar, yet incompatible APIs, these groups like HBase would sit it and be like, this isn't mature enough, I don't want to deal with this, I've got more important things to focus on right now. And rather than even picking one and ignoring the other, they just ignore tracing, right? And with things moving to microservices of Kubernetes being so popular, I mean, just look at this conference, distributed tracing is no longer this kind of nice to have when you're a big company, you need it to understand how your app works and understand the cause of an outage or the cause of a problem. And when you had organizations like this that were looking at tracing its frontation and saying this is a bit of a joke with two competing projects, no one was being served well. All right, so you talked about there were incompatible APIs, so how do we get from where we were to where we're going? Yeah. So I can talk about that a little bit. The APIs are conceptually incredibly similar. And what the part of the criteria for any new language for open telemetry are that we are able to build a software bridge to both open tracing and open census that will translate existing instrumentation alongside open telemetry instrumentation and emit the correct data at the end. And we've built that out in Java already and it started working a few other languages. It's not a tremendously difficult thing to do if that's your goal. I've worked on this stuff. I started working on Dapper in 2004, so it's been 15 years that I've been working in this space and I have a lot of regrets about what we did with open tracing and it is an unbelievably tempting thing to start Greenfield like, let's do it right this time and I'm suppressing every last impulse to do that. And the only goal for this project technically is backwards compatibility. Yeah. 100% backwards compatibility. There's a famous XKCD comic where you have 14 standards and someone says we need to create a new standard that will unify across all 14 standards and now you have 15 standards. So we don't want to follow that pattern and by having the leadership from open tracing and open census involved wholesale in this new effort as well as having these compatibility bridges we can avoid the fate of IPv6 and Python 3 and things like that where the new thing is very appealing but it's so far from the old thing that you literally can't get there incrementally. So that's our entire design constraint is make sure the backwards compatibility works, get to one project and then we can think about the grand unifying theory of interpretability. Ben, you were ruining the best thing about standards is that there are so many of them to choose from. There's still plenty more growing in other areas. One could argue that your approach to this is non-standard in its own right but in my own experiments with distributed tracing it seems like step one is first you have to go back and instrument everything you've built and step two, hey, come back here because that's a lot of work. The idea of an organization going back and re-instrumenting everything they've already instrumented the first time. It's unlikely. Unless they built things very modularly and very portably to do exactly that it's a bit of a heavy lift. I agree, yeah. So going forward, are people who have deployed one or the other of your projects going to have to go back and do a re-instrumentation or will they unify and continue to work as they are? I would posit that, I don't know, I would be making up the statistics so I shouldn't but let's say a vast majority. I'm thinking like 95, 98% of instrumentation is actually embedded in frameworks and libraries that people depend on. So you need to get Drop Wizard and Spring and Django and Flask and Kafka, things like that need to be instrumented. The application code that instrumentation that burden is a bit lower. We announced something called Special Agent at LightStep last week, separate to all of this. That's kind of a funny combination. A typical APM agent will interpose on individual function calls to a very complicated and heavyweight thing. This doesn't do any of that, but it basically surveys what you have in your process, looks for open tracing and in the future open telemetry instrumentation that matches that and then installs it for you. So you don't have to do any manual work just basically gluing tab A into slot B or whatever. You don't have to do any of that stuff which is what most open tracing instrumentation actually looks like these days. And you can get off the ground without doing any code modifications. So I think that direction, which is totally portable and vendor neutral as well, as a layer on top of open telemetry makes a ton of sense. There are also data translation efforts that are part of open census that are being ported in to open telemetry that also serve to repurpose existing sources of correlated data. So all these things are ways to take existing software and get it into the new world without requiring any code changes or redeploy. The long-term goal of this has always been that because web framework and client library providers will go and build the instrumentation into those that when you're writing your own service that you're deploying in Kubernetes or somewhere else that by linking one of the open telemetry implementations that you get all of that tracing and context propagation everything out of the box. And you as a sort of individual developer are only using the APIs to define custom metrics, custom spans, like sort of things that are specific to your business. All right, so Ben, you didn't name LightStep the same as your project, but that being said, you know, a major piece of your business is going through a change here. What does this mean for LightStep? That's actually not the way I see it for what it's worth. LightStep is a product since you've given me an opportunity to talk about it. Will this move on your part? No, I'm just kidding. But LightStep is a product that's totally omnivorous. We don't really care where the data comes from and translating any source of data that has a correlation ID and a timestamp is a pretty trivial exercise for us. So we do support open tracing. We also support open census for what it's worth. We'll support open telemetry. We support a bunch of weird in-house things people have already built. We don't care about that at all. The reason that we're pursuing open telemetry is twofold. One is that we do want to see high quality data coming out of projects. We said it in the keynote this morning, but observability literally cannot be better than your telemetry. If your telemetry sucks, your observability will also suck. It's just definitionally true. If you go back to the definition of observability from the 60s, and so we want high quality telemetry so our product can be awesome. Also, just as an individual, I'm a nerd about this stuff and I just like it. I mean, a lot of my motivation for working on this that I personally find it gratifying. It's not really a commercial thing. I just like it. Do you find that as you start talking about this more and more with companies that are becoming cloud native rapidly, either through digital transformation or from springing fully formed from the forehead of some God. However, these born in the cloud companies tend to be that they intuitively are starting to grasp the value of tracing, or does this wind up being a much heavier lift as you start showing them the golden path, as it were? It's definitely grown like, I... Well, I think the value of tracing, you see that after you see the negative value of a really catastrophic outage. I mean, I was just talking to a bank, I won't name the bank, but a bank at this conference and they were talking about their own adoption of tracing, which is pretty slow until they had a really bad outage where they couldn't transact for an hour and they didn't know which of the 200 services was responsible for the issue. And that really put some muscle behind their tracing initiative. So, typically it's inspired by an incident like that and then, you know, it's a bit reactive. Sometimes it's not, but either way you end up in that place eventually. I'm a strong proponent of distributed tracing and I feel very seen by your last answer. But it's definitely made a big impact. Like, if you came to conferences like this two years ago, you'd have like, Adrian or Yuri or someone doing a talk on distributed tracing and they would always start by asking the 100 to 200 person audience, who here knows what distributed tracing is and like five people would raise their hand and everyone else would be like, no, that's why I'm here at the talk. I want to find out about it. And you go to ones now or even last year and now they have 400 people at the talk and you ask who knows what distributed tracing is and last year over half the people would raise their hand. Now it's going to be even higher. And like, I think just like beyond even anecdotes, clearly businesses are finding the value because they're implementing it. And you can see that through the number of companies that have been interested in open tracing, open telemetry, open census. You can see that in the growth of startups in the space, LightStep and others. The other thing I like about open telemetry is the name. It's a bit of a mouthful but that's, it's important for people to understand the distinction between telemetry and tracing data and actual solutions. I mean, open telemetry stops when the correct data is being emitted. And then what you do with that data is your own business. And I also think that people are realizing that tracing is more than just visualizing a single distributed trace. The traces have an enormous amount of information in there about resource usage, security patterns, access patterns, large scale performance patterns that are embedded in thousands of traces. That sort of data is making its way into products as well. And I really like that open telemetry has clearly delineated that it stops with telemetry. Open tracing was confusing for people that they'd want tracing and they'd adopt open tracing and then be like, where's my UI? And it's like, well, no, it's not that kind of project. With open telemetry, I think we've been very clear this is about getting very high quality data in a portable way with minimal effort. And then you can use that in any number of ways. And I like that distinction, I think it's important. Okay, so how do we make sure that the combination of these two doesn't just get watered down to the least common denominator or that Ben just doesn't get upset and say, forget it, I'm going to start from scratch and do it right this time? I'm not sure I see either of those two happening. So you're talking about the least common denominator. Like we're starting from what I was commenting about like two years ago, like from very little prior art. Like yeah, you had projects like Zipkin and Zipkin had its own instrumentation, but it was just for tracing, it was just for Zipkin and you had Jaeger with its own. And so I don't, I think we're so far away. Like in a few years, the least common denominator will be dramatically better than what we have today. And so at this stage, I'm not even remotely worried about that. Secondly, to like some vendor, I know he's been in his example, this is probably not the best one, but like for like vendor interference in this projects, I really don't see it, both because of what we talked about earlier where the vendors right now want more telemetry, like I meet with them, Ben meets with them, we all meet with them all the time, we work at them. And the biggest challenge we have is just the data we get is bad, right? Like it just either we don't support certain platforms, we'll get traces that dead ended certain places, we don't get metrics of the same name for like certain types of telemetry. And it's like this project is going to fix that and it's going to solve this problem for a lot of vendors who have this frankly really strong economic incentive to play ball and to contribute to it. Or do you see that this I guess merging of the two projects is offering an opportunity to either of you to fix some or revisit, if not fixed, some of the mistakes as they were of the past. I know every time I build something, I look back and it was frankly terrible because that's the kind of developer I am. But are you seeing this as someone who's probably presumably much better at developing than I've ever been as the opportunity to unwind some of the decisions you made earlier on out of either ignorance or it didn't work out as well as you helped? There are a couple of things about each project that we see an opportunity to correct here without doing any damage to the compatibility story. For open tracing, it was just a bit too narrow. I mean, I would talk a lot about how we want to describe the software, not the tracing system. But we kind of made a mistake in that we called it open tracing. Really people want, if a request comes in, they want to describe that request and then have it go to their tracing system but also to their metric system and to their logging stack and to anywhere else, their security system. You should only have to instrument that once. So open tracing was a bit too narrow. OpenCensus, we've talked about this a lot, built a really high quality reference implementation into the product, if OpenCensus is the product, I mean. And that coupling created problems for vendors to adopt and it was a bit thick for some end users as well. So we are still keeping the reference implementation but it's now clearly decoupled. So we have loose coupling, a la open tracing but wider scope, a la OpenCensus and in that aspect, I think philosophically this open telemetry effort has taken the best of both worlds in these two projects that it started with. Well, Ben and Morgan, thank you so much for sharing. Best of luck and let us know if CNCF needs to pull you guys into room a little bit more to help work through any of the issues. But thanks again for joining us. Thanks for having us. It's been a pleasure. I'm Corey Quinn, I'm Stu Miniman. We'll be back to wrap up our day one of two days live coverage here from KubeCon, CloudNativeCon 2019 in Barcelona, Spain. Thanks for watching theCUBE.