 Hi and welcome to what you need to know about OpenMetrics. I couldn't talk about OpenMetrics without also talking about permissives a little bit. I don't think I have to convince anyone who's attending in CNCF event that there's this time before permissives and the time after permissives. So let's look at the time before permissives for a second. Historically, the closest thing we had to a standard for metrics transmission and other type of telemetry transmission was SNMP. Many solutions like SNMP are based on ancient technology. For example, ASN1, TLS is also based on ASN1. And while ASN1 was great in the 70s, as of today, it's just a huge pain to implement and a tremendous security liability. If you ever look into the details, it just becomes verse and verse and verse. It's ingenious for its time, but it's really bad today. Just because of implementation complexity in the trade-offs which were forced upon people like 50 years ago. Many of those older standards are chatty and slow, or at least often. Many of those data formats are propriety or hard to implement or even both. A lot of the data models encourage per-vendor variations which might follow the letter of the law but definitely not the spirit of the law, in particular in the older internet things like networking and such, you can see this again and again and again where vendors go as near to being not in spec as they possibly can. And just going to the absolute maximum of what they can do to push their own thing or make their own thing more successful and maybe create a little bit of lock-in. Which is just historic baggage at this point. And pretty much all of those older solutions tended to have hierarchical data models which again was maybe fine for back then. But as of today, that's not really good anymore with the high scale and complexity of pretty much even smaller deployments. If you have your region, you have your data center, you have your customer, you have your whatever and then you need to select by customer. All of a sudden you need to reshuffle your tree or you need to walk up, go over, need to go down and like you can do all of this. It's not exactly hard, but it's just overhead which is not nice. Looking at the time after Prometheus. By now, Prometheus is the defector standard in cloud native metric monitoring and way beyond. Production industries, Industry 4.0, IoT, electrical power networks, ISPs, it's way beyond just the cloud native space. And by extension, the same is true for the Prometheus exposition format. The ease of exposing this type of data by hand like even by hand has led to an absolute explosion in compatible end points. We have thousands and thousands of export and integrations. By our own count, we have hundreds of thousands of installations and millions of users making the problem even worse. We have standard exporters for pretty much everything and we have libraries which for pretty much all the languages. And here the we is with my Prometheus team hat on. I'm switching over in a second. We as in Prometheus even have an exporter scaffold which makes it super easy to just write new exporters so you can focus on the metrics and what actually adds business value instead of focusing on writing yet another HTTP endpoint. And again, I probably don't have to convince anyone who's using Kubernetes or Prometheus that ladle sets are the way you want to use to access your data. Of course, this is just so much more powerful than hierarchical data models. As usual, when you have success, there's also politics involved. Some vendors where let's call it torn about adopting things which are from a competing product or project which literally have the name in Prometheus exposition format. Especially the more traditional vendors or the super large companies prefer to support official standards just for level setting and everyone playing equally, ideally. And we wanted to not lose the installed base of Prometheus for obvious reasons. We wanted to retain that ease of adoption. We wanted to make upgrading basically invisible to pretty much all the users. We also wanted to reject this kitchen sink approach where you have to support every single last use case which you can possibly conceive. And by this basically dilute and overburden your standard, your implementation, your what-have-you. On quite on purpose, we decided to do one thing, do it well, and remain focused and opinionated about how to do metrics-based monitoring right. A lot of competing companies have helped shape what has become open metrics today. And the result isn't actually neutral standards, a standard which takes pretty much everyone's who spoke up concerns into account. There's dozens and dozens of people and companies who helped us get there. We have a few marathon runners, Ben Kochi, Brian Brazil, myself, Rob Skillington, and Honorable Mention, Sameer Bola, who until relatively recently drove the standard forward. So that's nice, all of this. And that's like the history. But what does this mean for you, for the user? Well, I alluded to this already. The format is largely the same as the Prometheus Exposition format on purpose. It has cleanups and new features, and we'll see that in a bit, but on purpose it is largely the same thing just carefully evolved. If you're using the Python client integration or even quite old Prometheus from 2018, or Go or Java client libraries, any of the official exporters, you already have open metrics running in your system. Of course, for quite some time, Prometheus already negotiated open metrics, like literally four years. But this happened in the background still while we were finalizing the standard and everything without impacting any end users. Most of you might be surprised right now that you're actually already using open metrics at scale, but that's a distinct design goal of what we did. We didn't want you to be forced to reimplement against a new API every three weeks or six months or something. You should just keep working and be largely invisible. Coincidentally, you care about so much about the details of how open metrics evolved and how it be and how the Prometheus internal works. You want to have your stuff work and you don't want it to break. You don't want this to be a burden. It just just work in the background without being yet another hay lake, which you have to take care of. That being said, we have breaking changes, but there are very, very few. You can see the complete list on the screen. Counters require underscore total in the time series name. It's already a common convention. It's part, it has been part of our naming scheme and of our recommendations. Again, we with the Prometheus head on for basically years. I think even before Prometheus joined CNCF before CNCF was even founded, we already had that in as a convention, but now it's actually enforced. So for example, if your metric used to be called CPU seconds, it will automatically be renamed to CPU seconds total, which is not a huge change, but you need to be aware of this and reconsidered breaking. Ideally, your counters all end in underscore total anyway. Time stamps are in seconds, not a milliseconds anymore. And that's just for consistency. We use SI base units everywhere else in Prometheus. And so we just wanted to have the same thing at that one specific place. Of course, by extension, obviously in open metrics, you use SI base units wherever possible. And while you could always expose an explicit timestamp, it's usually an anti-pattern unless you are in supervalued defined circumstances, which means that by and large, even though we explicitly mark those things as breaking changes, most anyone will not even be aware of them if that they are breaking. Of course, anyone using standard Prometheus stuff won't be impacted. It's already happened in the background or you never even use those bits and pieces on purpose. We also have improvements for interoperability and generic cleanup. We tightened up a few specs or bits of the specification. So a little bit of spacing, escaping and such to just make the parser easier to write and thus make parsing quicker and easier. Of course, obviously you run quite a lot of scrapes in any given environment. And so being able to just improve the parser is going to or has led to quite impressive overall like at scale improvements in script time. We mandate an UF at the end of scrapes or at the end of expositions. Of course, formally you couldn't be certain if you actually had a complete scrape or not. You could deduce it, but you couldn't be certain. We allow for nanosecond resolution timestamps like if you do high frequency trading or something then you would probably need this kind of thing. We allow for 64 bit integers, not just floats. There's new metadata unit which tells you what base unit that counter or that metric is in. Underscore created for when metrics were created and reset which allows some deductions for certain rates and such. We extended open metrics for considerations for push and for pull. And in the consideration section of open metrics you can find quite some text about how if you need to do push you can do this in a compatible way with Prometheus. You lose certain properties of how Prometheus operates and what Prometheus assumes to be there cause while I mean, push and pull is largely religious to be honest, yet pull has a few nice properties which you cannot easily or not at all emulate with pull sorry with push unless you maintain state in the middle basically like staleness, upness in Prometheus those two things are really hard to rebuild in a purely push system. It leads to sometimes even or commonly you even needing to replicate parts of services covering in your scraping layer or in your metrics pipeline layer to be able to have a status of upness of lifeness of or stillness of your metrics. Just something to be aware of. Yet we have all those considerations in and open metrics explicitly supports push cause that's something which uses wanted and so we supported Prometheus doesn't support push open metrics. The text format is still mandatory to have a baseline where you know that things are working and also it means that the debuggability is really easy cause you can always just connect with a web browser and start reading stuff if you need it but we reintroduce ProtoBuff for the people who like it like for Prometheus itself we text format is has even been quicker than ProtoBuff under under high load but there are scenarios where people prefer Proto and that's completely fine. Something which is completely new is exemplars. Exemplars is if you haven't heard of them they're absolutely ingenious. Basically they allow you to attach information about traces to your logs and to your metrics. In this case obviously I'm focusing on metrics but it's also for your logs. Cause usually in traces you have this needle and haystack problem you need to go into a trace but you don't know if it's actually relevant so you need to search through all the stuff which is attached on the traces or to the traces to find things which match certain properties. As opposed to if you already have your high latency bucket or your error state in your logs or what have you and you have an exemplar attached to it you know what trace, what span you need to jump to and you can simply go directly to a trace or a span you retain all the mental context of hey why is this relevant to me while delving into the actual trace as opposed to basically needing to start at zero after searching for your needle and haystack. Exemplars are already widely supported. Prometheus, Cortex, Thanos, Loki, Grafana, others there's tons of software which already works with exemplars and from what I can see others are also adapting this concept cause it's just insanely powerful. Current state in Prometheus. The Prometheus client library is the reference implementation uses open metrics data model internally 100% blah, blah, blah, blah, blah has done this for years. So if you want to look at open metrics like the dirty details or the reference implementation look at the Python client. Again, on purpose course, Python tends to be easy to reach is used for teaching and such so we chose this as the reference implementation. Go and Java also supported so that's basically most of the Prometheus ecosystem right there already. Prometheus will preferentially negotiate open metrics and has done so for years. So again, you are most likely already using it without being aware. Info and enums are first class features in Prometheus. So you don't need to implement stuff. You don't need to need to implement them by hand in your client libraries. You actually have the support in your client libraries. You don't need to do this by hand. And if you scrape through the Prometheus exposition format like you actively negotiate this it's exposed in a backwards compatible manner which is nice. Other implementations, Datadoc supports ingestion of open metrics has been for quite some time. They even contributed performance improvements to the Python parser. Thank you. OpenTelemetry supports open metrics as a first class wire format and open metrics is part of the Prometheus conformance program. And that part is important. So now I'm going to put on my Prometheus head again. The Prometheus conformance program officially launched or launches depending on when this is published on October 14. Of course, that's when the Prometheus conformance program talk also by me is going to be published. So we have several test suites which test for Prometheus interfaces in the Prometheus org. And anyone, any vendor, any project, any product which wants to get an official Prometheus compatible mark needs to be compliant to all the interfaces which we define as relevant for that thing. So if you're scraping data, you need to be 100% open metrics compliant to be able to get the Prometheus compatible mark. There's other stuff like you need to sign contracts with CNCF or rather Linux foundation, but you get a nice logo, blah, blah, blah, blah, blah, more details on this in my other talk. How can you spot open metrics? Well, it's a little bit more of a both basically. And also there's a few, a few telltale signs. Prometheus Exposition Format 0, 0, 4. Let's say you have a counter, that counter looks as such. Foo seconds total 1.0, that's it for open metrics. You can see that we also have a unit metadata which tells you that Foo seconds is in seconds. You can also see that the type is now not Foo seconds total anymore, the type and the unit is Foo seconds. Of course, by definition, as it is a counter, you get the underscore total attached to the value by if you're using one of the client libraries automatically else you have to do it by hand. And you can also see that you have underscore created for whatever time. I think I put a joke in there, but I forgot honestly. You can revert it and see what joke I made back then. It's also in the official spec, of course, obvious. Yeah, but anyway, you see that the name of the metrics is now without that tail, without the suffix for consistency reasons for the internal data models. If you're not using the Prometheus client libraries, then you need to do stuff by hand. It's relatively easy. You add underscore total for your counters, even if you're using the client libraries and you don't do this now, just edit. And if you're using your client libraries, you will not see, I mean, in neither case, really actually see a change. Just edit and either the Prometheus client libraries will do it for you once you upgrade or you do it by hand and either way, you will not see an impact. If you write stuff yourself, like by hand, please send the correct content type, both if you want to explicitly do Prometheus Exposition Format 04 or if you want to do OpenMetrics1.0, either is fine, but please make sure that you explicitly set the content type correctly. And if you write scrapers or ingestors, please set your accept headers to negotiate Prometheus or OpenMetrics format, depending on which you prefer and how far you are in your own story of adoption. My recommendation or our recommendation that's with both heads on would obviously be OpenMetrics, Chris, that's the future. There's more resources. We have the category for OpenMetrics. We have the compliance suite for Prometheus. We have the part of the Prometheus compliance suite, which is living in the OpenMetrics repository. And that's it. I hope you have plenty of questions for me. Thank you very much.