 Okay, hello everyone, so welcome to our session. I hope everyone is having a great group come so far. So yeah, let's get started. So today our topic is about introduction to Quotex, multi-tenants scalable premises. So I'm Ben Ye, I'm a software development engineer at AWS, and I'm a maintainer of Quotex as well as Thanos project. And I'm also a contributor to some other CNCF project like Prometheus, Argo CD, et cetera. So I have a puppy named Gui. And yeah, so today with me I have, I have Kredesh. Yeah, I have. Friedrich Gonzalez, I'm a software engineer at Adobe. I'm also a maintainer for Quotex. And what you see there is my puppy. Well, she's not a puppy anymore, she's Dobermanto. I just want to excuse myself. We, in the last year presentation, we had more puppies this year, it doesn't have that many. I'm sorry for that. But so, but what is Quotex? Let's talk a bit on Quotex. Quotex is a horizontally scalable, highly available multi-tenant, long-term storage for Prometheus. It's a community project. It was created in 2016. It's part of the CNCF's incubation projects. Has seen a lot of contributors. It has had a lot of maintainers over the years. These are the companies using Quotex or that have used Quotex. But I want to stop right here a bit. Who's familiar with Prometheus? Can you raise your hand? Great. And who's familiar with collectors, open telemetry collectors? Anybody running up? Great, okay, awesome. Well, back in when 2016, when Quotex was created and started to work on, we had this situation with Prometheus that as you might familiar with it, it's a memory database, right? So the more metrics you add to a Prometheus, it starts to increase in resources needs, right? So it needs more memory. So typically set up what people do, they have more than one Prometheus, right? They have two Prometheus now, just one Prometheus for some applications, one Prometheus for another application, and you solve your problem. But this probably increases. And now you have several Prometheus and now your view of the metrics starts to become a little bit different, right? You have metrics, some of your metrics here, some metrics there. And this is when most people use Thanos, right? They have like Thanos deployed that is able to see if you're familiar with Thanos. I'm just throwing out that if you're familiar. And it's able to see all the views. The thing with Thanos though is you have to configure it, you have to decide where to put every metric. That's how Thanos solves that problem. But what does Quotex does different, right? Quotex is an API and this API is able to receive the metrics from all your Prometheus, right? And it's able to handle all the cardinality, all the metrics that require that your Prometheus are generating and it's handling all the problems. I'm gonna go in deep detail on how that works in the future. But I wanna mention that we also have now in part of the CNCF ecosystem is the open telemetry, right? Open telemetry is something newer than Quotex, came later. And most people are using it nowadays to get signals, lock signals, trace signals, metric signals. When open telemetry collector came out, it was already ready to be used for Quotex. So you typically run not just one collector, but you run many collectors. And you can use the collectors with Quotex. You can send all those metrics to Quotex. You send those collectors. But it doesn't really end there. Suppose something else, it was invented and in the future that we don't know, that could also be used to send those metrics to the same API because it's an API basically. So what are the Quotex key features? What is behind Quotex? Is that it's able to do true horizontal scaling and you don't need to reconfigure Quotex if you need more cuttingality or if you need more active series. It's just increasing replicas in some deployment or some stateful set. The other thing that Quotex adds is the multi-tenancy from the beginning Quotex add support for multi-tenants. So you can send metrics to different places and your tenants are separated, completely separated inside Quotex and that is as a key feature of Quotex. Also in Quotex from the beginning was able to do faster querying because it provides a query cache layer and that cache layer is able to help you out, makes most of the queries faster. Also because it can horizontally scale. Those limits that I just mentioned that you can use, sorry, the tenants that you just mentioned, you can configure per tenant limits not just for all the teams. You can also say this every team, how many reasons it requires. So in this scenario that I explained before where you have some metrics increasing, not all the tenants are affected but only the ones that are generating the metrics specific to those. And last but not least additionally to the typical permittus API, Quotex provides a rich API for managing alerts and rules so you can create all your rules in Quotex but how does this look on the inside? So in the inside of Quotex behind this API we have this architecture. The architecture starts in the middle when you see the remote write. The remote write box you have all your agents. Here you see permittus. And permittus is remote writing to the distributors. The distributors take care of distributing the samples over to the investors. It does that by using the replication factor of three. That means all the metrics that get written to Quotex get written to three investors at the same time. And that is allows to not have problems if for example some investors fail at some point. The investors at the same time they take care of compacting those metrics and at the end after two hours they shipped to S3 or any kind of block storage that you might have. On the right side you have a typical dashboard tool and it's using the Quotex API for query. In this case you have a query front end which is receiving those queries and sending those over to the query carriers which has in charge of getting those metrics out of the investors and also getting the same metrics out of the storage if those queries are a long term. And it makes also the storage without there to download the TSDB blocks from S3 and making them available for querying in Quotex. Part of the query layer is also as you can see the result cache and also the index cache which allows for faster querying. And on the left side we have the alert manager API which is what I mentioned around receiving and managing those alerts and rules for each tenant. If you are familiar with Thanos to this point you might be seeing some resemblance and there's no resemblance to Thanos and the reason for that is because Quotex and Thanos collaborate a lot in the way it works. Sorry, in the way it's built. And we also collaborate with Prometheus and because we use basically the same code base as Prometheus for most of the features. We do contribute back. The Quotex project creates a lot new features and one of those features that was recently created was the PromptQL Smith which is a project that lets you find bugs in the query in queries. This library was used in the new Thanos IO PromptQL engine and was able to find more than 10 bugs in that repo. But what are the new features in this year in Quotex? Back to you Ben. Thanks, Freddish. So I will go and talk about so what's new features we added since last KubeCon to Quotex. So Quotex 1.13 was released last November and we implemented some important features and we also value operator experience. So our main focus is still reducing operator pains as well as improving the scalability and the scalability of our Quotex. So next I will go and talk about some new features and enhancement release in this release and the upcoming 1.13 release as well. So the first one I want to talk about is called Dynamic Tenant Short Size and it's a feature mainly works in the query path. So before that I want to emphasize how query path works. So we have mainly three components, query front end, query scheduler and query. So the query requests goes into query front end and the query front end does some kind of result caching and splitting and it basically puts the query requests into query scheduler, which is basically a queue of your queries. And we have query basically working as a workers, which pulls the query drops from scheduler queue and then executes those queries against storage nodes. So with shuffle sharding enabled, then each tenant can be assigned to a number of queries in this case. So for example, a tenant has two queries and each query is configured with a limited number of concurrency. So in this case, each query has four queries loss, which means at the same time, it can execute four queries in this single query instance. So this works because it helps protect whom kill for the query apart. And so in this kind of setup, basically if a tenant has two queries, it can run eight queries at the same time. And the more queries it has, the more queries it can process. So with this kind of background, let's take a look at this feature. So we have maybe a very small protest cluster setup with four queries and we have three tenants configured. And each tenant, we have this kind of runtime configuration called max queries per tenant, which is basically the query shot size. And each tenant has two, which means they got two queries assigned. So because of the shuffle sharding, you can see some queries are shared between different tenants. So this is the initial setup, but let's say we have HPA configured in Cortex cluster, which is quite common because sometimes we want to dynamically scale up our replicas based on some conditions. For example, if CPU memory or even maybe bandwidth usage exceeds some percentage, then HPA kicks in and more queries got scaled up. The problem is that even though we scale up more query replicas, the runtime configuration for each tenant is still two. It's still a static number. So with dynamic tenant shard size configuration, what we can achieve is that we can configure the value to be a percentage of the total number of replicas. So in this case, if we configure this number to 0.5, which means after HPA kicks in, the tenant shard size will be automatically increased from two to four in this case so that different tenants, they can basically utilize the new replicas scale up by HPA. So this feature also works for store gateway, not only in Query. So next, let's talk about another feature we implement to reduce operator pain. So this is a Cortex component called Query, or sorry, called ruler. What it does is that it evaluates alerting and recording rules against Quartex storage layer, which is ingester and store gateway. So for some users, they might configure like some rules which fetch a lot of data and they have no idea whether their rules succeed or not and they don't care about it, they just leave it here. So for those kind of rules, I think Quartex already have very good query limits to protect Query and the storage component. So for this rule, maybe it will hit some query limits for our storage and return four to two and stop evaluating the query even more, or if it gets lucky and it doesn't hit any limit on our storage, but it might fail at ruler evaluation time. But the worst case is that it doesn't hit any limits and the ruler just got evaluated and eventually it time out. So in this case, the rule just runs continuously but it never succeeds. So this might not be a big problem but when you have other queries at the same time and your storage might get overwhelmed. So it's hard to run other queries and it might impact availability maybe for your other queries and maybe for all the tenants in the Quartex cluster. So yeah, basically we are in fire. So this feature is basically be allowed to configure a number of rule groups for each tenant and you can just disable it so that you can stop it from being executed. So yeah, next one is also something kind, I really like this feature, it's called query priority. So as we discussed, Quartex is designed to be a multi-tenant system. So it works well and its main goal is to reduce blast radius and to avoid one tenant to be impacted by queries from other tenants. But one interesting problem is that we can have some kind of risk condition or some kind of this issue within a single tenant. For example, a single tenant might have different query patterns. Let's say they have ad hoc queries and which might be some long-tending queries, they are very expensive to evaluate and very slow and might even hit some limits. And they might also have some other queries like health track or probing queries which runs very fast. But it's more important because maybe if this query failed it will trigger some alarm and operator got paged. So as I mentioned before, there's a queue in Quartex scheduler and it's basically a FIFO queue. So first in, first out. So for example, if some user runs some ad hoc queries which are long-tending and those queries, they just fit into the queue and the health track or probing queries that arrive later and in this case they are at the end of the queue. So what will happen in this case? So maybe if there are too many long-tending or ad hoc queries, maybe they will cause a queue to be full. And in this case, health track queries will just be rejected because a queue is full in this case. And another situation is that the queue doesn't fall but the problem is that they are just waiting in the queue to be picked up by a query. But query has only limited capacity and concurrency as well if we don't dynamically scale up queries. So because those long-tending queries are very expensive and slow and maybe those important queries they never get executed until they time out in the queue. So how we solve this? And the problem here is that we actually have a queue but the queue is actually ordered by the in queue time. So what we can do is that we can have the same 5.0 queue but it's ordered by a priority. So we introduce a configuration called query priority and it works pretty neat because you can define a default priority and you can define some other priority you want and maybe you can use some rejects or time window to match a specific query or maybe within some kind of specific time range. For example, recent two hour and you want this query to be higher priority than the default one and so that they can be picked up earlier by a query and also there's a new configuration called reserved queries which you can maybe utilize one query to serve this type of query only and even though it might be kind of wasteful for some kind of concurrency slots but it ensures like this query should be executed successfully. So next improvement is called multi-level index cache. So this is how it works right now and it's used in the store gateway component. So basically store gateway can query a remote cache maybe it's memcache d. If there's a cache miss it will go to the bucket backend maybe S3 in this case. And this pattern works pretty well but the problem is that all the query and all the data you need to fetch they have to go over the network. So for index it can be a very problematic sometimes because those postings can be very large in your time series database block. So an improvement we did is to add another layer which is an in-memory LRU cache and it's the same processing in store gateway and store gateway will try to fetch from in-memory client first then go to the second layer of memcache d. So and this pattern works pretty well actually because usually the queries go to store gateways have the same kind of patterns the same kind of matches. So the hit rates of the in-memory cache is usually very high. So itself maybe it can handle maybe 90% of requests. So this helps us a lot in terms of bandwidth. And with multi-level in-desk cache you can do something more advanced with a new feature added in channels called filtered in-desk cache. So there are actually three types of index items posting series and expanded postings. So expanded posting is actually the intersection of the postings for your query and the size is usually much smaller than the actual postings. So storing expanded postings is usually more efficient and for some kind of limited capacity scenario especially maybe in-memory usage. So let's say you have only one gigabytes of in-memory cache and you can maybe store maybe 10K expanded postings items but for postings you can only store maybe 100. So it increase the hit ratio as well as reducing the evictions. And the second layer we can have memcache D which stores maybe series and expanded postings. And maybe the third layer we can have another memcache D cluster but with different configuration. So you can maybe configure it with external store or actually I don't know how it's called but it's basically allows you to store items in-desk so that even though like you trade off some kind of query performance but it actually allows a larger capacity to store more cached items. So it's probably has better performance than fetching from S3. While working on the multi-level cache another improvement we did is to improve the in-memory error you cache. So the caching cortex is actually the same as the one in the tunnels library which only uses one single lock to read and write all the items. So we notice that the performance can be very poor under very high concurrency environment. So what we did is that it's quite simple. We just use some kind of bucketized stripped cache so that we can have multiple locks to handle different items so that the performance is much better than before. So we also have some other amazing PRs. We don't have time to go through all of them. I just want to show and thanks for all the contributors to help contribute to make Quartest more stable. Yeah, so yeah, we have some new features. We are in progress. Partition Compactor, we have a PR and also we have native histogram, sorry, native histogram and OTLP ingestion. So this is what we are going to demo today. So yeah, I will hand over to Fredrish. Yeah, sorry, I wanted to. So I put out this demo recently. It's very simple. It's just extending this example of telemetry. You can reuse it. Sorry. Again, so I have this demo that I created very recently. It's just using the open telemetry example here and you can try yourself. I'm going to show it really quick. So I'm not going to go over the details of the inside, but basically it runs Quartex. Let me just cut this quick really quick. Basically it just runs a Quartex image. It's not yet merged this feature, but it basically is going to run a Quartex image and you can see the last line is just setting up the OTLP endpoint, which is just telling the application where to send the metrics. That's all you have to do. There's no remote write in this setup, right? And I'm just going to run it, which is Quartex, Grand Quartex here. You want to do it? Right. So when we do this, the application is running and it's supposed to start sending the samples. It's configured to be a sample of every 10 seconds. And we're going to see in the logs that is starting to, so I just posted also the samples that I'm sending to Quartex and the logs as well. So we're going to move over to the application. This application is basically an endpoint. So you just hit that endpoint and you get a roll dice. You get multiple. You can send as many requests from this. And if we move over to Grafana, which is already has a preconfigured data source going into that Quartex, it should have the metrics. Let's click here. Yeah, it did work. Right. So these are open telemetry metrics in Quartex. Yeah. I don't have anything. Go back to you, the demo. Yeah. I'm going to quickly show a native histogram demo. What do I have is that I have an application exposing a native histogram metrics. So I'm going to quickly show that. So there's an application which exposes a histogram instrumented by a native histogram. So, and I have a Docker Compole setup which have permissions and data to Quartex. So I'm going to quickly write. Cool. And let's go to Quartana. So it should be able to see it. Oh, yeah, actually this one. So we can see the native histogram instrument interested correctly. And let's try to run a query. So yeah, I think it works. So if you are familiar with native histogram, you will notice that it can run this query successfully without doing this kind of sum by LE, which is required by the classical histogram in premises. Yeah, yeah, I think that's all for our demo. So yeah, let's go back to our last slide. So thank you everyone for joining our session. So we have our GitHub handles and maybe Twitter handles here as well. So feel free to talk to us if you have any questions. And yeah, we are looking for maybe helpful for Quartex project and feel free to contact us on Slack channel or anywhere. Yeah, thank you everyone. I think we have two minutes for questions. Any questions? Can you get the mic? Hello, thank you for your speaking. I just have one question about the in-memory allow you that you've implemented. Does it means that now we have to be careful about the memory usage of the application mainly? This is my first question. And the second one, I'm really confused because I don't know what is the main difference with Thanos in Quartex because if we want to implement a solution, which one, since the code is almost the same, which one should fit our needs? So yeah, just... Yeah, thanks for the question. I think the first question is that whether the application can get maybe memory usage issue when we have the in-memory cache. So I think it's something you can configure. So you can configure the max size of your index cache, oh, sorry, in-memory index cache. In my case, maybe it's one gigabyte and you can also configure the max allowed size of the item. So maybe it's maybe 10 megabytes or something. So some memory usage should be under control, yeah, based on your setting. And the second question is about maybe the difference between Quartex and Thanos. So I think nowadays these two projects are kind of very similar, but at the beginning, they come from maybe different use cases. Quartex at the beginning is for multi-tenant and also remote write based. And Thanos at the beginning is multi-side car based in order to address the federation issue, in-premises, but these two projects keeps evolving and Thanos has a receiver mode, which basically quite similar to Quartex. And Quartex has the long-term storage, which you use Thanos code to have the compactor and bucket storage. So nowadays, like based on your use case, you can choose maybe Anelism, but I would say if you are maybe a small scale or it doesn't need some kind of multi-tenancy or limit control feature, maybe Thanos works well, but if you manage maybe a platform team that you want to provide service to different sub teams and you want to have limits in place and maybe better multi-tenancy feature, you can choose Quartex. Is that answered the question? Yeah, awesome. Go over time. I think one question. One question more, one more question. It's probably a tricky one. What's the limitation of Quartex? For example, what's the maximum amount of data that I can store in Quartex? Do this one? I can take the first part. I haven't found any. So to the sizes that I am, I haven't found yet that place. But I know there are limits. There are limits because I've started to see in the compactor some issues. So we've seen it before, some issues with active series there. Those issues have been tackled to a degree, but like building active series is probably a lot for a single tenant. I would say. And I mean, not, I think the problems I need to be specific is not most of the problems that I see in terms of scaling Quartex are around the query part more than the ingested part. The ingested part kind of like works, but when it comes back to a query part, it becomes harder to get those samples out. And even if you get the samples out, sometimes you get the samples out and the dashboarding tool just fails. So one of the things that I found is really important when you're thinking about high cardinality is like if it's really useful for you. And so one of my approach, personal approach, and also the team that I work with is that we always try to see the usability of it if it's really necessary more than if it's possible. So that's why probably I haven't reached a point where I needed. Does that answer the question? Yes, thank you. Good, that was it. Thank you.