 Okay, well welcome everyone. Thank you for joining You know, so I'll be talking today about stream versus batch leveraging M3 and Thanos for real-time aggregation. I Guess I just had a quick intro, but take the mask off. All right Sure, that's good. Yeah. No, we want to be able to hear. Okay Yeah, so just quick intro about myself. So yeah developer advocate at chronosphere Where I help out with the M3 open source community as a contributor and then I'm also a member of the CNC off of durability tag and prior to chronosphere. I was a product manager over at AWS Okay, so just running through the agenda for the talk today We're gonna start out going through the problem statement and an overview of streaming versus batch aggregation Then we'll go through stream aggregation with M3 followed by batch aggregation with Thanos and then quick overview at the end kind of Comparing the two So why does aggregation matter for real-time? Looking here at this example of the C advisor dashboard You can see it. So C advisor is a way to get your resource usage and performance metrics of all of your running Pods or containers so kind of CPU memory kind of your infrastructure level metrics It runs as a daemon set inside of a cubelet and then this particular Dashboard we're looking at all of the pods when there are gateway application But essentially you're just able to get a quick 10,000 foot view of all of your of all of your applications With this kind of with this dashboard So zooming in for this example, we're going to look at the this particular panel here That's highlighted looking at CPU usage across all of your gateway pods or containers So You know, you can see here that it's it's kind of showing an overview of all of your pods but if we look at behind the scenes at what's What it takes to really produce these results you can see that it couldn't it can lead to Quite a bit of time for your queries to fully render these results. So the way C advisor is is is aggregating or pulling This metric this container CPU usage metric is it's pulling labels across all of your pods In your in your pod group. So as a result, you're getting roughly 51,000 time series And that's going to be taking over 20 seconds for your results to render So and that's just to query this this metric. So if you did any sort of Functions on top like summer max you can imagine it would take much longer But in most cases you typically don't want to you don't need to look at your your metrics at the per pod level And you really just need that aggregated view to see what's happening across all of your your pods or containers So you can see here on this example We've taken the same metric except we've aggregated it against two labels. So container name and namespace And by doing that we're kind of we're also taking the sum of this metric at a rate of one minute And by doing this you're kind of aggregating prior to query and that's going to reduce the load on query time by quite a bit So you're going to be seeing you know seeing your results much faster So less than a second and that's because you've aggregated down to roughly 230 time series So yeah, so that's just an overview of kind of how aggregation can help with performance But a couple different ways of doing aggregation is streaming in batch So just to kind of give an overview for those who might not be familiar between the two Two ways of aggregation. So with stream aggregation you have data Being collected continuously or you know streaming continuously the aggregation is going to be done Reformed in memory on the ingest path before being written over to your time series database So you can kind of see in the little diagram at the bottom there and this is typically very useful for For for information that's needed immediately So for like dashboards for example as you know Your data is going to be aggregated in real-time and then available immediately for query once it's been written over to your time series database and then on the batch side a little different because you have you're going to have your data collected over time and the way it works is You know, you're going to have your aggregation performed by reading your raw metrics from your time series database before then writing back the aggregated results so you can kind of see the the two arrows Pointing from the batch job there on the diagram But this is typically meant for large large quantities of information. That's not necessarily as time sensitive And then the data is going to be aggregated in batches over time So for the purpose of this talk, we're going to be looking primarily kind of how Prometheus or Prometheus compatible solutions perform or do aggregation whether it be batch or streaming but just to start out giving an overview of how Prometheus does aggregation is They do so Prometheus uses what they call recording rules to do aggregation and what recording rules are essentially are that they A lot of our pre-computing of Frequently needed or computationally expensive queries before then storing back the aggregated results to your time series database So any the execution of these and the pre-computation of these are done Kind of in memory as a single process Your regular intervals that you set so every minute 30 seconds for example. So using that cron job type process and This makes it really useful for dashboards So by using by doing the pre-computation you're able to kind of have much faster results than if you were to kind of have to Revaluate your expression every time it's needed and then of course as it's supported by Prometheus, you're gonna have full access to from QL But typically if you outgrow a single Prometheus instance, you may want to use a remote storage solution So some of the most popular ones being Thanos and three in Cortex, which are all Prometheus remote storage and prom QL compatible and they use a combination of batch and stream aggregation You know to do their to do the aggregation, so we're gonna now focus on M3 for a streaming streaming aggregation, and then we'll get into Thanos for batch aggregation Okay, streaming aggregation with M3. So just quick overview of what is M3? So it's an open-source metrics engine comprised of four main components So there's the distributed custom built time series database called M3 DB Then there's the M3 coordinator, which is our down ingest and down sampling tier Followed by the aggregator which is optional to run depending on your use case But it's the streaming aggregation or distributed streaming aggregation tier and then finally we have an optimized distributed query engine as well called M3 query and Then M3 was built back at Uber and open source from 20 in 2016 to help with their metrics monitoring use cases internally and is now used by Many other companies including Chronosphere and it was designed to be Prometheus remote storage and prom QL compatible So just gonna show a high-level overview of what the architecture looks like so you can see on the right side You have instances of Prometheus It's gonna send in metrics to M3 via the coordinator using Prometheus remote right And then you can have you know your coordinator and then optionally your aggregator to do any sort of down sampling aggregation or Before kind of sending over your metrics to M3 DB And then on the read side similar thing you're gonna kind of send any query requests to M3 via the query tier Using Prometheus remote read Okay, so streaming aggregation with M3 so with M3 the way that that does aggregation is basically it moves the kind of Prometheus recording rule computation to streaming aggregation and it does this by what through what they call roll up rules So rule up rules are essentially M3's approach to aggregation of high cardinality metrics And it takes the similar it's kind of solves the same problem that recording rules does just with a slightly different approach And how it works is you can kind of see the diagram here it kind of aggregates across multiple time series And then and then what the aggregator and the coordinator do Is that it'll reconstitute This new rolled up our aggregated metric as a new histogram gauge Or histogram gauge or what does or counter metric yes Before then writing it back to before then writing it to M3 DB And so then once it's written over 10 3db it's going to be immediately available for for query And just what this is just kind of showing that Description visually so essentially, there's kind of be three main steps here. So the first step is going to be Sending in your metrics to M3 via beneath this remote right So from there you're going to have the coordinator and optionally the aggregator for Depending on your use case to do any sort of in-memory aggregation On the ingest path and then from there The coordinator will send over this new aggregated or reconstituted metric To M3 DB for storage Okay, so some pros and cons of streaming aggregation with M3. So first being on the pro side that You know, it is very you're going to get really quick career results With with this approach because you're doing all of your aggregation Prior to prior to query So you're going to have your results already ready to go And that also makes it so that you're going to have Very few Requirements on the query or read side of things. So you're able to then kind of use Kind of you're able to kind of Deploy more more like other things instead like Scaling up a higher number of alerts or recording rules because now because your your load is a lot less on on Under times your estimated base from the from the query or read side of things And then on the con side It can be complex to to operate and deploy And require some additional overhead and then additionally it doesn't support arbitrary Promql instead it just Like I mentioned the coordinator is going to reconstitute these metrics as New aggregated counter timer histogram metrics before then writing them over to your time series database All right, so now we're going to get into batch aggregation with thanos So a little bit about thanos For those of you who might not be familiar, but it's a cncf incubating project It was originally built at improbable Back in 2017 at open source And then it has several main components that we're going to talk about today. So there's the store which is essentially the Gateway to object store then we have the query component, which is a horizontally scalable and stateless query aggregation and de-deplication tier And then the sidecar which is one of the ways to deploy thanos And it kind of acts as a proxy for Prometheus via kind of remote read Read and write apis And then and then there's the compactor which is responsible for Down-sampling and block compaction and kind of replying any sort of retention policies And then finally the ruler Or the rule which basically uses thanos rule command to evaluate any Prometheus recording or alerting rules And then finally like m3 thanos was also designed to be Prometheus Remote storage and prompt you out compatible All right, so this is just a high-level Architecture diagram of thanos So as you can see here, we have a few instances of Prometheus with thanos running as the sidecar We have the query component fanning out requests to the each of the various instances, which will then kind of pull metrics Um, I would get them and kind of duplicate them inside the query tier Which will then be informed by the ruler for any sort of recording rules That that you may be wanting to run and then Um, another thing to note is that thanos does or the sidecar does write over Metrics and blocks of two hours by default to the object storage. So you are also able to think query for longer term metrics Okay, but how does that or how does thanos do aggregation? Um, so basically you're going to have the way it works is your your raw metrics data is going to be collected by your Prometheus instances prior to query aggregation So from there your query component can then perform any sort of metrics aggregation or prompt you all queries On top of your metrics that get pulled from your Prometheus instances And then and then you have the ruler Which will then implement any recording rules you may have before then writing your new aggregated time series data or metric to to object store And showing this visually Kind of four primary steps here again. It's a very pared down view of the architecture So we're not showing each of the components, but um, so you have you know first step here Metrics are collected by the thanos sidecar store Um, and then from there you're going to have your ruler Which is going to kind of issue your your query or recording rules and then the metrics will be pulled to kind of For to meet that query by using kind of reverse index querying and reading from storage And then the third step it's going to be having the query component Kind of evaluating that result on the the query result on your pulled metrics And then from there the new aggregated Metric will be sent over to your object store And you can see here as well like there are the query it has two different ways of kind of querying metrics So you can query directly from from your Prometheus instances via the store API for more like real-time queries and then you can also kind of access more longer term data through your object store Okay, but what are some some pros and cons of this batch aggregation with thanos? So on the pro side it does fully Support and it's fully compatible with promql. So you're able to kind of run those arbitrary promql queries And it's also especially as it compares to m3. It's more simple to operate and manage Especially if you're wanting to scale up and down your resources as you're not having to constantly like redirect live flowing traffic or metrics And then on the con side You're basically adding I guess comparing to stream aggregation. You're adding an additional step By having to re query and then read and write back your results To storage so kind of by doing all of that Over the network it can be expensive or you know kind of lead to large resource consumption especially for like larger queries and in addition to that you can have slow queries especially You know when looking at recording rules or cron job type queries like your if your query components are Going and querying a lot like a large amount of metrics from your various Prometheus instances that can take that can take You know Quite a while for for those metrics to fully be queried And at that point you may depending on your intervals that you're running your Your queries that you may miss those intervals If it takes too long for your query component to fully query and aggregate your metrics and So not only that but it also you know could lead to your query component being overwhelmed completely so just one thing to know on that side, but Yeah, so that's now we're just going to jump into an overview basically of everything we just discussed so kind of How do you choose like you know if you want to do streaming with m3 or batch with thanos So, you know kind of recapping On the m3 side I think you know one of the main pros is that it really does alleviate query requirements on your time series database By kind of doing a lot of that pre computation prior to prior to query that you can now Use a lot of that those resources now for additional purposes like more recording rules or alerts However, it is compared to thanos a little bit more complex to operate and deploy And it doesn't fully support arbitrary promql instead It reconstitutes these aggregate metrics as counters gauges and histograms And then on the batch side with thanos It's more simple to operate than m3, especially when wanting to kind of scale up and down your resources And it also does fully support promql However, by having kind of reevaluating And rewriting re and re reading your metrics over the network It can lead to large resource consumption and kind of slow query results as well so I mean these are obviously two examples of streaming in batch that are specific to m3 and thanos But I think they both do demonstrate some of the You know the the benefits and trade-offs of more like high level like streaming in batch in general So, you know, you can kind of hopefully use this to apply similar Learnings to your particular use case that you may have Um, but finally, you know, if you if you did kind of want to see if there was a way to do both together Um, I actually gave a talk with rob skillington up here Who's cto and co-founder of chronosphere We gave a talk promql on monday about how you can use the m3 coordinator to provide streaming aggregation Along with a remote storage solution like thanos or cortex. So definitely check that check that out if that's something of interest Um But but yeah, that's basically all I have so I think we'll just open it to questions