 So, hello. We're excited to be giving this talk. Thanks for coming. This is how and why you should adopt and expose open source interfaces. And the slides should be available on the site if you want to follow along. So today in half an hour, we will go over our journey adopting Hotel and Prometheus, how we did it, and what we learned from it. So our hope is if you're someone with a system that you're looking to make Prometheus compatible, this will give you some ideas about what you can do. So first, we'll introduce ourselves. We're both from Google Cloud Monitoring in New York. I'm Shishi. I've been a software engineer at Google for about ten years. And I really enjoy distributed systems and weird query behaviors. And this is Daniel. Hello, everyone. My name is Daniel. I've been at Google for almost two years. A little fun fact about me. I've been coding for over half my life. It's something I'm very passionate about. I love solving hard problems. And I'm excited to show you how we solve this one. Of course, there are many others who are part of this project. And we just wanted to mention a few of them. I mean, we all know who did the real work. So we'll start by sharing a little story. Daniel, do you remember what life was like back in 2020? Oof, yeah. Google Cloud Monitoring was not very fun. Not to say it was primitive, but it was very limiting. And let me show you exactly what I mean by that. So let's start with the ingestion side. And this is the side that exports your metrics and sends them to Google Cloud Monitoring. On the ingestion side, we have custom push-based API. This means if you want to send metrics to Google Cloud Monitoring, you have to import our client SDK. And this is a lot of work. You have to do this all programmatically. So it's a very high barrier to entry. On the underlying storage side, we use Monarch, which is Google's plan and scale time series. If you're unfamiliar with Monarch, there is a paper available. And the reason I bring this up is because this has implications on the API layer. So for example, Monarch is schema-full. And this means that you have to describe what your metrics look like before you can send them to Google Cloud Monitoring. And we will talk more about these pain points later because they do become relevant later on. But for now, let me talk about the query side. And as you can see, you have to learn MQL or you have to use the UI for this. So again, this is another investment that users have to make. And it just makes the barrier to entry a lot higher. As far as visualization goes, if you want to actually see your data, you have to build the dashboard yourself. And so a lot of people are duplicating the same work. Like a lot of people use the same common tools. And now if you want to see a dashboard for this, everyone has to build the same dashboard for the same tool. So this was not very user-friendly at all. So let's stop and think about how this happened. So reminder, this was 2020. At the time, standardization was still happening. When we started Google Cloud Monitoring, there was no standardization at all. So what we did is not unique. Other cloud providers did the same thing. They provided their own APIs and integration was difficult. Ultimately, users do not want to be siloed. So we thought about what was the industry doing at the time? And the industry was heading towards two tools. The first was Prometheus and the second tool is open telemetry. So let me briefly mention what Prometheus is for those of you who don't know. Prometheus is a time-series database. It is capable of scraping your applications and it also has some learning facilities. Ever since Prometheus joined the CNCF in 2016, it's been growing exponentially, as you can see. There are hundreds of exporters. So if you want to use Prometheus with your application, if you're using a common tool, all you need to do is use one of these exporters. So an exporter is probably available for all the tools you use, so a lot of people already had access to Prometheus. And there are just so many resources available for Prometheus. So it became the de facto solution. It became very popular. If you're changing your jobs, no problem. Take all your knowledge with you. There's absolutely no vendor lock. And just briefly about how Prometheus differs compared to our push-based model in Google Cloud Monitoring. Prometheus offers a pull-based model. And what this means is that the Prometheus server actually scrapes your application. You don't send data to the server, the server goes to you. And so essentially, this means all of your configurations are centralized within Prometheus. And it also means the client SDKs are radically simpler than they used to be. And then on the query side, if you want to use Prometheus, you do have to learn PromQL. So there is some sort of investment here. However, if you're using something like a custom exporter, there's also probably pre-built dashboards for you. So you may not have to learn PromQL. You could just use these pre-built dashboards. There's many different dashboarding solutions that support PromQL. So the barrier to entry is very low here. Next, Shishi, can you introduce us to OpenTelemetry? Sure. So OpenTelemetry is a framework for telemetry collection. And it had started in 2019 as a merger of two prior projects, OpenCensus, which was instrumentation for metrics and logs, and then OpenTracing, which is for traces. So in 2020, Hotel was starting to gain traction. Compared to Prometheus, Hotel is really useful if you want a few things. For one thing, it lets you send all types of signals, so not just metrics, but also logs and traces. And then it also provides an option for a push-based solution versus Prometheus, which provides a pull-based solution, among other things. So here we are thinking to ourselves, we want to adopt open-source standards. And specifically, we want to expose the Prometheus and Hotel interfaces. How do we go about that? So in the next section of the talk, we're going to go through a case study of how we adopted Prometheus. And we did also adopt Hotel, but we aren't going to talk about that much here. So Daniel will start us off. Okay. So let's... So if we were to review everything we've learned so far about Prometheus in OpenTelemetry, it is way different than what we have at Google monitoring, right? So we don't want to adopt everything at once. So the question was how can we adopt these tools incrementally? And this is where we arrived. So we have this feature matrix and we've subdivided it into small components. As you can see, this is actually just the observability life cycle. So ingestion is the first one. So let's step into that. Yeah. So again, most users already are using Prometheus. For example, if they're coming from on-prem solutions or if they're using common tools and have access to exporters. So how can we allow customers to use those, but instead of storing it in Prometheus, how do we get it into Google Cloud monitoring? And we have three options here. We can either number one, we can write a tool that does exactly the same thing that Prometheus does. We can go and scrape the Prometheus metrics endpoint from all these applications, but that would be a lot of work, right? We'd have to essentially replicate a lot of this logic that Prometheus has and if Prometheus ever changes, we're going to have to change as well. The second option is to just use Prometheus directly. And unfortunately, that's not really an option. Prometheus can't already do this. So we've arrived at option number three, which is why don't we just work Prometheus and add this feature in there? And this is what happened. So we created a very small patch on top of Prometheus that all it does is just export metrics to Google Cloud monitoring. So if customers now want to use this, all they have to do is replace their image. It is very easy for a customer to onboard to us because all their configurations just work out of the box. They do not have to change anything. So that's really great. That improves user satisfaction. We've also accidentally become scalable here because we're sending data to a remote location. We're not storing anything locally. So you can spin up as many of these Prometheus instances as you want. You don't have to worry about charting or anything. Just spin them up. You can have these scrape different applications and it's all scalable. So overall, everyone wins here. The developers who use Google Cloud monitoring now benefit from this and the engineers us, we benefit as well because this is a very tiny patch. It's actually very easy to maintain. And finally, just because we're forking doesn't mean it has to be forever. And we are trying to get, we are working, we're involved in a Prometheus remote right spec, which would allow Prometheus to just send data to Google Cloud metrics directly. So also the community wins, right? Because we're actively contributing back as well. We actually even took this a step further. What if we have new customers who are on Kubernetes and they want to use Prometheus for the first time? Can we make this a little easier for them? And the answer is yes. We wrote a small operator that essentially spins up a Prometheus instance in every node. And we've turned it on by defaulting in Google Kubernetes engine so that new customers can immediately benefit and use whatever they're familiar with. So after we did that, this is what our feature matrix looks like now. Customers can either scrape using the SDK they're familiar with or they can use Prometheus and it all goes to a centralized source, which is Monarch. So that's great. But some customers may want to use PromQL, they may want to use their dashboards or learning configurations. So what can we do about that? Shishi, can you help answer this? Sure. As Daniel said, people already have lots of existing queries and dashboards that are defined with PromQL and they already know how to use PromQL relatively well. So the question is how can we lower the barrier of entry? And by the way, this is what AI thinks PromQL looks like. So we want to solve this problem by having our system expose the Prometheus HTTP API. And that API is queries and solves and also supporting functionality like autocomplete metadata, exemplars and so on. So we considered several options about how to implement that. The first one is just we could have run many separate vanilla Prometheus instances. And the issue with vanilla Prometheus is that there's no long-term storage and also it can be really difficult to shard horizontally. We also considered running another open-source system like Thanos or Cortex. And with all of these, for us, one issue is that we would basically need to run per customer deployments and that would be something that would be pretty difficult to operate and also it would require a lot of resources. So finally, we thought we could also use Monarch. And the issue with Monarch, of course, is that it doesn't actually expose the Prometheus API. So, and aside on Monarch, if you're not familiar, it's Google's centralized monitoring system. So it's a distributed and multi-tenant storage system. And it's already used by lots of in teams internally. So it already stores lots of data. And then finally, it's something that Google R3s have lots of experience operating. So that's useful. So we wanted to take advantage of all that. And we ended up choosing Monarch as the data store. So then the issue we had to solve was how do we implement the Prometheus query API over Monarch? And you would think that should be straightforward. What could go wrong? Well, it turns out there's a few challenges. One of the main things we had to deal with is that Prometheus and Monarch have somewhat different data models. And we had to figure out how to map Prometheus's data model over Monarch. So I'll give a couple of examples. And the first one is that Prometheus is schema-less, while Monarch is schema-ful. And what I mean by that is that in Prometheus, basically, you never have to worry about your labels. A time series just has a bag of labels attached to it. And you can adjust points with whatever labels seem useful at the time. Then you can write your queries without really knowing what labels are present on the underlying time series. On the other hand, Monarch has a strict approach. It has both resource and metric schemas. And there's specific labels associated with every schema. So when you adjust a point, it has to have a subset of those labels. And when you query it, it has to only reference labels that are actually present. And there are different kinds of advantages. Schema-less models are really easy to use. The user doesn't have to worry about labels. And schema-ful models aren't really that helpful to users, but they do help us scale more in the back end. So it is nice for us as implementers of the system. So this is mostly handled through translations. So, for example, on the ingest side, we want to make it look like there's just a bag of labels. So we have a translation layer that's comparing the points that are getting sent and adjusted and comparing that to what schemas are in the system. And then it's automatically creating and updating the schemas in the background. So then, for the storage system, nothing actually changes. Similarly, on the query side, our translation layer takes the promql query that's sent by the user, compares it to the schemas. It generates a query. In our case, we're translating promql to mql, which is Monarch's query language. And that query only references valid labels according to the schemas. So, again, Monarch is seeing the same thing as it always has. And then the translation layer itself is handling the other labels according to promql semantics. So that's one example. Another example of different data models is kind of how different kinds of data are represented in Prometheus and Monarch. So Monarch has a lot more different data types for one thing. It has strings, distributions, and up till recently, Prometheus data was all doubles, but now they're also native histograms as well. And so then, the example I have on the slide is that there's a difference in the way that counter data is represented. Prometheus counterpoints just have one single time stamp, and where the counter resets is inferred based on when the point value drops. That's when it assumes that a reset happened. While in Monarch, every point has both an explicit start time stamp and an end time stamp. So the start time stamp tells you exactly when the counter reset happened, and then the point value is basically a change over an interval. So this is actually helpful to Monarch because it allows it to calculate rates more accurately in certain cases. So there are like edge cases where in Prometheus you could have an uninitialized counter, or you could have a reset that happens very quickly, and the point value doesn't actually drop, and then Prometheus doesn't know about it. So those kind of issues are addressed with this. So we wanted Prometheus to also benefit from sort of this improved accuracy. So a good solution here was to contribute back to Prometheus, and the result is what's called the created time stamp, and that was work that was done by Daniel and some others, and there was a talk given at PromCon earlier this year, if you're interested in more details there's a link. All right, so now we come to a different kind of challenge. Another challenge we have is to keep our implementation and API in sync with the open source PromQL API. If we were to try to translate every query from PromQL to MQL, there would always be some kind of lag if PromQL gets updated, like it that has new functions added, and then we have to spend time implementing the translations, then they wouldn't be available right away. So our solution here is to run kind of a hybrid implementation. So we are running the open source PromQL query engine alongside Monarch. When our translation there gets a query, it decides which part should be executed by Monarch and which part should be executed by PromQL. So if it's like a really computationally intensive operation and we think it would benefit from distributed evaluation, we translate that to MQL and send it to Monarch, and then we have PromQL complete any other operations that didn't get translated. So here's an example query of, say we got this input PromQL query. The query plan would split it into this part that Monarch will evaluate, and that's in blue, and then a part that PromQL will evaluate in pink, and it translates the Monarch part of the query to MQL. So then the Monarch query engine starts off the evaluation, it gets the data in Monarch and evaluates the MQL, produces a partial result that gets passed to the Prometheus engine which completes the evaluation and produces the final result. And so when we want to keep stuff up to date, all we have to do is import the latest version of the PromQL engine. So what was the outcome of all of this? As mentioned, there's lots of benefits to users for having a drop-in Prometheus API. And then because we're sharing the backend, we have a single entry point to all kinds of metric data in that backend. So in Monarch, we also have non-Prometheus metrics like GKE system metrics and other Google Cloud system metrics, and we can use PromQL to create everything. And then scalability we've already talked about. And so this turns out to be good for both our users and for us as system providers as well. So back to our integration plan, we've adopted Prometheus interfaces for ingest and querying, and we have chosen not to adopt it for storage. So Daniel, what about alerting in dashboards? So it's great you ask, and that's actually a trick question. We already support it because we've implemented the Prometheus API directly on top of Monarch, right? So for example, here's a little code snippet, run and go, and you want to use the Prometheus API. All you have to do is replace the address with Google Cloud Monitoring's URL, and everything works as you'd expect. And we also see these benefits for any dashboard tools you want. All you have to do is replace the URL, and we've implemented a lot of the APIs, so auto-complete and everything like that should still work. And we've also added these benefits to Google Cloud Monitoring as well. So now in the Google Cloud Monitoring Metrics Explorer, you can either use the AI or if you write a query, you can decide whether you want to use MQL or PrimeQL, and there's a small box you could check to choose which one you want, and there's also limited conversions between these as well. So finally, our feature matrix now looks like this. It's really great because now you can use, you can ingest data any way you want, and also you can visualize or do whatever you want with the data any way you want as well. Last but not least, we haven't forgotten about open telemetry. We hypothetically already support open telemetry because there's all these Prometheus bridges. And we do have an open telemetry team at Google who's also sort of improving the space. We are not part of that team, so we don't know the exact details. But stay tuned. So finally, if you add open telemetry to the list, we have this. So whatever the users want, they have it. So let's recap some of the things we've learned today. Let's start with the motivating statement first, is that users do not like to be vendor locked. They don't want to be siloed. So especially when there's high barriers to entry. The next thing we've learned is that open source is enormous. There's a lot available, so why not reuse it? We've saved a lot of work by reusing what the community have done by forking Prometheus as well as using some of the prom QL library. The next step we've learned is you don't need to adopt everything all at once. If you can adopt incrementally, that's really great. We've done this and we've got really quick wins early on just by exporting the data to Monarch and just having it all in a centralized place. Next thing we've learned is you are not limited to what open source provides. So for example, we've went ahead and we forked Prometheus. We didn't have to wait to contribute back. We just forked and we went ahead with our fork. And at some point we'll give back, but we're not there yet and that's okay. And finally, the last thing we've learned is that we can benefit the community and we should. It just evolves the space a lot quicker if we're involved. So that's our main motivation for contributing. So again, we're involved with the Prometheus remote right spec as well as the created timestamp in Prometheus. So it's a win-win for everyone. And that is all. Thank you very much. If you have any questions, please stand on either aisle and ask us. Hi. Great talk. Thank you. I'm wondering how you got around the Prometheus remote rights lack of types when they're sending data to your back end. Do you just infer it from the metric name? Yeah. So, yeah, we actually kind of double write some of the metrics in a case where we don't know what the type is, but we don't really use remote right right now. We just use a fork, although that is something that we're thinking about in the future if we do support remote right, we can continue double writing. Unfortunately, this means the user does have to pay double for that, but it's a small workaround we have. Gotcha. And for your fork, are you doing it off of their writing interface or is it like a whole different interface that you're using? Yeah. I mean, so there are some types. We know what type of metric it is. We know if it's a counter or a gauge or a histogram. And then as far as the actual value, it's always a double in Prometheus. So that's how we can do that there. Thank you. You're welcome. Hi. Thanks for the great talk. How did you translate the native histograms into your Monarch because one of the things that we ran into is also something similar. We use a licensed back end monitoring tool that we send the data out to from Prometheus. But we had to translate the native histogram metrics back to the back end compatible thing. Then the whole purpose of encoding the data in the native histogram format is lost. How did you solve it for yourself? Yeah. Shishi, do you want to take this one? So I think the native histogram is actually pretty close to what is a distribution in Monarch. So I think we're translating there. Yeah. I think so as long as we scrape a metric send point, you can kind of infer all the histogram values and sort of translate it directly to Monarch. We do have a problem with the remote write though because it's possible remote write doesn't send all of the histogram metrics at once. So that's something we're still trying to figure out how to do when remote write comes out. Thank you. One other question is that when you split the queries with MQL and PromQL, stuff like that, did you notice additional latencies in just because of it and how did you solve for it? Yeah. So translating the query to MQL did improve our latency actually quite a lot just because Monarch is distributed and it's basically more efficient that way. Whereas we had the Prometheus node evaluating the whole query. That was pretty expensive for us. So okay, cool. If anyone has no more questions, then we can end here. Thanks.