 From around the globe, it's theCUBE with coverage of KubeCon and CloudNativeCon Europe 2021 virtual. Brought to you by Red Hat, the CloudNative Computing Foundation and ecosystem partners. Hello, welcome back to theCUBE's coverage of KubeCon 21, CloudNativeCon 21 virtual. I'm John Furrier, host of theCUBE. We're here with a great gas to break down one of the hottest trends going on in the industry and certainly around CloudNative as this new modern architecture is evolving so fast. Richard Hartman, director of community at Grafana Labs, we're involved with Prometheus as well. Expert and fun to have on and also is going to share a lot here. Richard, thanks for coming on, appreciate it. Thanks for having me. Yeah, we were chatting before we came on camera about the human's ability to handle all this new shift and the future of observability is what everyone has been talking about. But you know, some say, no, observability is just network management was just different scale. Okay, I can buy that, but it's got a lot more than that. It involves data, involves a new architecture, new levels of scale that CloudNative has brought to the table that everyone is agreeing on. It scales there, new capabilities, thus setting up new architectures, new expectations and new experiences are all happening. Take us through the future of observability. Yeah, so one of the things which many people find when they onboard themselves onto the CloudNative space is you can scale along different and new axis which you couldn't scale along before, which is great. Of course, it enables growth. It enables different operating models. It enables you to choose different or more modern engineering trade-offs. Like the underlying problems are still the same, but you just slice and dice your problems and compartmentalize your services differently. But the problem is it becomes more spread out and the more classic tooling tends to be built for those more classic setups and architectures. As your architecture becomes more malleable and as you can choose and pick how to grow it along with which axis a lot more directly and you have to, that limits the ability of the humans actually operating that system to understand what is truly going on. Obviously, everyone is fully all in on AI, ML and all those things, but one of the dirty secrets is you will keep needing domain-specific experts who know what they're doing and what that thing should look like, what should be working, how it should be working. But enable those people to actually understand the current state of the system and compare it as to the desired state of the system is highly non-trivial. In particular, once you have not machine lifetimes of month for years, which you had before, which came down to sometimes hours and when you go to serverless and such, sometimes even into sub seconds. So a lot of this is about enabling this higher volume of data, this higher scale of data, this higher cardinality of what you actually attach as metadata on your data and then still be able to query all this and make sense of it at scale and at speed. Because if you just toss it into a data lake and do batch analysis like half a day later, no one cares about it anymore. It needs to be life. It needs, or at least the largest part of it needs to be life. You need to be able to alert right now if something is imminently customer facing. Well, that's awesome. I mean, I totally agree this new observability. Horizontally scalable, more surface area, more axes, as you point out, changes the data equation. Obviously automation plays a big role in machine learning and AI, great grounds for that. I got to ask you just before we move on to the next topic around this is that the most people that come from the old world with the tooling and come from that old school vendor mentality or old school architecture tend to kind of throw stones at the future and say, well, the economics are all wrong and the performance metrics. So I want to ask you, so assume that we believe, we do believe you because assume that's going to happen. What is the economic picture? What's the impact that people are missing? When you look at the benefits of what this system is going to enable, the impact specifically whether it's economics, productivity, efficient code, what are some of the things that maybe the VCs or other people on the naysayer side, old school will throw stones at? What's the big upside here? So this will not be true for everyone and there will still be certain situations where it makes sense to choose different sets of trade-offs but most everyone will be moving into the cloud for convenience and speed reasons and then deliberately not saying cost reasons. The reason being usually or in the past you had simply different standard service delineations and all of the pro-serve, the consulting, your hiring pool was all aligned with this old type of service delineation which used to be a physical machine or a service or maybe even a service and you had a hot standby or something if you got like really a few years back. The same things still need to operate underlaying what you do but as we grow as an industry, more and more of this is commoditized and same as we commoditize service and storage and network, we commoditize the actual running of that machine and with serverless and such go even further. So it's not so much about this fundamentally changing how it's built, it's just that a larger or a previously thing which was part of your value at and of what you did in your core is now just off the shelf infrastructure which you just buy as much as you need it. Again, at certain scales and for certain specific use cases this will not be true for the foreseeable future but most everyone will be moving there simply because where they actually add value and the people they can hire for and who are interested in that type of problem just mean that it's a lot more, more sensible to choose this different delineation but it's not cheaper. Yeah, and the commoditization and then this intermediation is definitely happening totally agree and the complexity that's going to be abstracted away with software is Novell and it's also systematic. It's just new and there's some systems involved. So great insight there. I totally agree with you. The disruption is happening majority of almost all areas. So in all verticals and all industries. So great point. I think this is where I think everyone's so excited and some people are paranoid actually frankly but we cover that in depth on theCUBE and other segments but great point. While we get back to what you're, where you're spending your time right now you're spending a lot of time on open metrics. What is that enabling? Take us through that. So the super quick history of Prometheus cause we need that for open metrics. Prometheus was actually created in 2012 and the wire format which he used to and the exposition format which he used to transport metrics into Prometheus is stable since 2014 but there is a large problem here. It carries the Prometheus name and a lot of competing projects and a lot of competing vendors. Of course there are vendors which compete with just the project. Simply refused to take anything in which carried the Prometheus name. Of course this doesn't align with their FUD strategy which they ran back then. So together with CNCF we decided to just have a new different name for just that wire format for the underlying data model for everything which you need to make one complete exposition or a bunch of exposition towards Prometheus. So that's it at the core and that's been ongoing since 2015, 16 something but there's also changes. On the one hand there is a super careful, a super, super careful cleanup and backwards compatible cleanup of a few things which the Prometheus exposition format series year four didn't get right but also we enable new features within this. And as Prometheus chose Open Matrix as its official format we also uplift Prometheus and there in both heads obviously it's easier to get the synchronization. Exemplars stand out which is a completely new at least outside of certain large search companies, Google who use exemplars to do something different with their traces. And it was in 2017 when they told me that for them searching for traces didn't scale by labels and at that point I wanted to have both I wanted to have traces and logs also with the same label set as Prometheus has them but when they tell you searching doesn't scale they tell you you better listen. So the thing is this, you have your index where you store all your data or your value have the reference to interior database and you have these label sets and they are super efficient and quite powerful when compared to more traditional systems but they still carry a cost and that cost becomes non-trivial at scale. So instead of storing the same labels for your metrics and your logs and your traces the idea is to just store an ID for your trace which is super lightweight and it's literally just one ID so your index is super tiny and then you attach this information to your logs, to your metrics and in the meantime also to your logs so you know already that trace has certain properties because historically you have this needle and haystack problem you have endless amounts of traces and you need to figure out what are they useful are they are they do they show an interesting error state high latency some error occurring whatever if that information is already attached to your other signals that's a lot easier because you see your high latency bucket and you see a trace ID which is for that high latency bucket so going into that trace I already know it is a high latency trace for a service which has high latency it has this and that label it was run in this and that context, blah, blah, blah, blah same for logs there is an error there is an exception maybe a security breach what have you and I can jump directly into a trace and I have all this mental context and the most expensive part is the humans so enabling that human to not need to break mental train of thought to just jump directly from all the established state which they already have here in debugging just right into the trace and back and just see why that thing behave that way it's super powerful and it's also a lot cheaper to store this on the back end for your traces which in our case internally we just run at 100% sampling we do not throw data away which means you don't have the super interesting thing and oh, by the way, the trace just doesn't exist there's a good job and that's the one thing to from day one this intent to marry those three pillars more closely the other thing is by having a true lingua franca it gave that concept of Prometheus compatibility on the wire, its own name and its own distinct concept and that is something which a lot of people simply attached to so just by having that name allow the completely different conversation over the last half decade or so and to close that Okay, go ahead, finish, close it up and to close that point of course I come from the networking space and basically IETFRFCs are the currency within the networking space and how you force your vendors to support something which is why I brought open metrics into the ITF to give it an official stamp of approval in the RFC number which is currently hopefully successful so all of a sudden you can slip this into your tender and just tell your vendor X, Y, Z okay, you need to support this and that RFC and all of a sudden by contract they're bound to support Prometheus native So does the IETFRFC support that RFC yet or no, is that still coming? I, so at the last IETFRFC meaning which was virtual obviously I presented everything to the Ops AWG there was very good feedback they want to adopt it as an informational ID reason being it is most or it is a documentation of an already widely existed standard so it gets different bits and pieces in the header currently I'm waiting for a few rounds of feedback on specific wording how to make it more clear and such but it's looking good oh yes, while presenting it they actually told me that they ran the conference with Prometheus and Profano so well that's how you get things done on the old school internet that's the way it was I mean I talk to Vince and all my friends and that's the generation we grew up I'm always telling a story on a clubhouse just random that I grew up in the era we used to pirate software used to steal software back in the old days pre-open source this is how things get done so I got to ask you the impact question the deal with open metrics potentially could disrupt all those startups so how does this impact all these stars because everyone's jockeying for land grabbing the observability space is that just because there's just too many people competing for one spot or do they all have differentiation what happens to all those observability startups that got minted and funded so I think we have to split this into two answers the first one, open metrics and also Prometheus we're trying really hard to standardize what we are doing and to make this reusable as much as we possibly can simply because Prometheus itself does not have any profit motivation or anything it is just a project run by people so we gain by users using our stuff and working in the way which we think is a good way to operate so anyone who just supports all those open standards just on boards themselves onto a huge ecosystem of already installed base and we are talking millions and millions and millions of installations we don't have hard numbers but the millions and millions I am certain of and that's installations not users so that's several orders of magnitude more so that actually enables an ecosystem within which to move as to the second question it is a super hot topic so obviously the VC money starts coming in from outside I don't think that everyone will survive but that is just how it usually is there is a lot of not very differentiated offerings be they software, be they as a service be they distributions where you don't really see much value at not a lot of much anything in ways of innovation so this is more about making it easier to run or taking that pain away which obviously makes you open to attack by all the hyperscalers because they can just do this at a higher scale than you so unless you actually really innovate in that space and actually shape and lead in that space at least to some extent it will probably be relatively hard that being said yeah when you ride the big waves like this I mean you got to be on the right side of this Pat Gelsinger when he was at VM where now he's at Intel told me on theCUBE one time if you don't get it right on these waves you're driftwood right so we've seen this movie before when you start to see the standards bodies like the IETF start to look at standards you start to think there's a broader market opportunity there's a need for some standards which is good it enables more value value creation whether it's out in the open or if it's innovative from a commercialization standpoint these are good things and then you have everyone who's jockeying around from the land grab incomes of standard momentum you got to be on the right side of these things we know what we know it's going to look like if you're not on the right side of a standard then you're proprietary precisely and so that's the end game okay well I really appreciate the impact final question as the world evolves post COVID as cloud native goes mainstream the enterprises in the cloud scale are demanding more things enterprises are they want more stuff than just straight up born in the cloud startups for instance so you start to see faster more agility obviously with deploying modern apps when you start getting into enterprise grade scale you got to start thinking this is an engineering and computer science discipline coming together you got to look at the architecture what's your future vision of how the next gen programmable infrastructure looks like you mean as in actually managed those services or limited to observability observability role observability just you're in the observability speaks to the operating system of what's going on distributed computing you're looking at you got to have a good observability if you want to deploy services so you know as it evolves and this is not a fringe thing anymore this is real deal this observability is a key linchpin in the architecture so maybe to approach this from two sides one of the things which I mean I come from very much non cloud native background one of the things which tends to be overlooked in cloud native is that not everything is greenfield matter of fact legacy is the code word for makes actual money so a lot of brownfield installations which still make money and which will keep making money and all of those exist and they will not go away anytime soon and as soon as you go to industry trying to uplift themselves to industry that form all those buzzwords you get a lot more complexity in just the availability of systems than just the cloud native scale so being able to actually put all of those data types together and not just have your okay nice I have my micro-service and it's fully instrumented and if anything happens on the layer below I'm simply unable to make any effort on debugging things like for example, for me this course they are so widely adopted and able to literally and I did this myself from the diesel gen set of your data center over the network down to the office if someone is in there if your station and your pager is stepped in such to the database to the actual service which is facing your end customers all of those use the same label sets use the same metadata to actually talk about this so all of a sudden I can really drill down into my data not only from me okay I have my micro-service but my database big deal no I can actually go down as deep in my infrastructure as my infrastructure is and this is especially important for anyone who is from the more traditional enterprise because most of them will for the foreseeable future have tons and tons and tons of those installations and the ability to just marry all this data together no matter where it's coming from because you have this lingua franca and you have these widely adopted open standards I think that is one of the main drivers in the future. I think you just nailed the hybrid and surprise use case operation at scale and integration of systems so great job Richard thank you so much for coming on Richard Hartman director of community at Grafana Labs talking observability here on theCUBE I'm John Furrier your host covering KubeCon 21 CognitiveCon 21 virtual thanks for watching