 but yeah with that why don't you just you know start with just introducing yourself a bit while you walk uh you know what do you do uh sure so uh first of all hello all the folks who are listening to us and uh it's it's great uh that you have been spared some time for the possibility meet up uh out there on the Saturday morning uh so coming back to your question Joy uh I've been working with Razor Pay for uh quite some time now over three years uh and it's been uh fun or working over there uh we have been working and solving most of the observability platform problems uh where we tend to solve uh how do we scale metrics maybe scale traces and everything on scale and uh that's where the talk will be revolving around on how do we probably solve all of the metrics problems at Razor Pay using the proteometrics uh we'll be discussing around uh what uh what what the go-to solution would be and uh what was the journey what was the journey basically to actually reach to a point where we decided right okay we have to come uh use proteometrics for scaling metrics at scale yeah cool so I think yeah welcome uh Viva and I see Piyush is also here Piyush is our second speaker he'll be speaking at the second session uh welcome and uh you know Viva thanks a lot for you know uh agreeing to speak today it should be a great talk going ahead and we already have like around 10-15 people watching on YouTube so why don't you start with uh the whole conversation and so over and up to you screencast thank you Joy thanks a lot I'm just screencasting so uh hello folks once again welcome uh we will be talking about Victoria metrics at scale today uh on the also we'll be meeting up for the March edition uh I've already given a brief intro about myself I'm working as a lead devops at Razor Pay uh I'll blog try to blog a lot of things that uh usually come across as learnings to me so you know you can always go and check out on my social media handles uh going forward uh let's see what is the agenda of the today's approximately one hour 50 minutes to one hour things I'll be discussing about Prometheus which is a well known monitoring solution for uh any Kubernetes monitoring whenever you move in towards a microservice model you always see and if you want to think about monitoring you go towards Prometheus we'll talk about possible solutions for what what are the issues that comes up with Prometheus then we'll talk about the main handle of Victoria metrics how how it works what are all the components then we'll talk about what the what is scale that we are actually referring to here right we are not referring to any uh what what is the scale that will be referred to and the next most important question that comes around is that uh how much do you actually spend on the infrastructure so uh we'll definitely have it and then at the end of the session we'll have some time for the questions that you'll be uh having in and I'll try to solve it at the best I can uh okay so metric solution uh everyone like you're new to Kubernetes you just Google around on things that okay I need to monitor my Kubernetes cluster the first thing that you get get out there is uh Prometheus right you get Prometheus out there to be the first bit of any more continuous monitoring solution and uh there there are reasons why Prometheus uh is so popular out there in the community is because uh it is very easy to configure you don't have to do a lot of juggling around if you're do if you're just starting of a bit right it's very easy configurations and uh they they are head chats available over there for installation easy installation their dog installations everything is out there so uh you you have a very good variety of uh things then you have a lot of exporters available so suppose if you want to uh if you want to monitor your node health then you have node exporters out there if you want to uh monitor if you have for Kubernetes cluster you have kubesate matrix c advisor and whatnot right everything uh right now in this current industry or whatever is coming off is basically emitting metrics that is that are there with in the Prometheus format so it it has a lot of exporters out there and the community is moving towards it and and the last point is obvious right now because Prometheus is tend to a lot of attraction being the first one to solve the problem of monitoring so Prometheus did gain interaction and uh like everyone else we also thought of uh moving towards when we move towards Kubernetes back there uh three years back three three and a half years back we we have probably chose uh Prometheus as the monitoring solution that we have like everyone else would have gone into right but things don't uh things changes with the state right it's it's very easy to maintain a 10 node cluster 20 node cluster but when you go at a scale of more than 100 node 200 node 300 nodes then it becomes very difficult to you for you to manage because the scale uh actually changes a lot of things and uh before I get into the issues of it I'll just now brief out on the basic architecture of Prometheus uh that we have so if you see on the left side of this diagram you have your all your exporters and all your kubesate nodes and whatnot then this this is a Prometheus uh server that we're talking about so you have a Prometheus server that basically scrapes all these metrics endpoints uh by the configuration whatever targets you have been configured to and what it actually does is it it basically does a curl call to HTTP endpoint on the slash metrics uh thing it is the path and everything is configurable uh so in general the standard is slash metrics so it goes to slash metrics grabs the metrics and keeps it in its own local storage which is a persistent volume storage that you see here and then this is used by uh this is all the metrics have been visualized by Grafana and uh for alerting system and all that same you have alert manager that they have broken into a different component so you have alert manager which basically does the job of notifying it to Slack or any of the webbooks that you can configure so uh this is the basic architecture how Prometheus looks like now let's talk about the issues and the problems right so um I believe whoever are there in in the system and who have been installing uh installed uh uh Prometheus for their monitoring infrastructure would be very similar to a similar graph like this like where you get the metrics uh till uh here and then there's a broken metrics out there and then your metrics again starts fun to this is a very common scenario in uh when you use these kinds of things is because uh Prometheus doesn't support HA by design so uh Prometheus on its own uh on on its own GitHub repo has uh and on the documentation have designed that by design will not go HA so it has something called as a federation server which which kind of uh links to multiple uh small Prometheus uh servers that can happen but it's not a chip still if your one Prometheus server is down it's down so this and then that is the reason you see a lot of tabs in your metrics if if you're not a proper things off then uh there's a problem of long term storage with with the number of metrics that ruin your volume uh size keeps on increasing and when your volume size keeps on increasing uh whenever Prometheus restarts the wall size increases and it takes a lot of time to come back up so you'll find many issues out there at Prometheus community as well to see that because of all it takes a lot of time to come back up and then you have a single server for both scraping and visualization now what do you mean by this is that you have only one single point here which is doing both of the jobs here you're scraping all these objects from here and you're also uh helping with the queries that that are coming via grapano okay let's continue and then some of these things happen so you these are nothing but uh issues that have been reported on the Prometheus uh open source repository where you get uh out of memory errors you get higher CPUs and targets cannot be scraped and then you have a lot of things right so uh the reason of these is there are many increased number of nodes or any increased number of targets uh it keeps on happening and on the same psdb storage uh beneath it right it's a local storage and the local storage keeps on increasing so whenever you are doing any uh any queries or on top of it uh it it tends to use a lot of CPU a lot of memory and it is both intensive so it will take it will start using the memory first and then your uh CPU will be blocked on on the memory to respond so all these things happen now this a similar thing happened with us at recipe so we had a lot of uh uh downtime on the marketing infrastructure just because of these things because uh Prometheus was not uh not stable enough so we threw in hardware to solve the problem but that was not the ultimate solution for the problem right so we we started looking out for other alternatives towards Prometheus and how it can work i i'll probably uh touch base on on on the other problems like other solutions like Thanos and Hortix and then we'll go dig dive into it so uh Thanos has is is a very good uh was the first alternative that we took off and uh Thanos basically gives you solves your long term retention by using uh remote back end or via s3 or something of that sort but the problem with s3 uh native apis it doesn't support Thanos doesn't support uh dive structures within s3 and that's the main block because s3's uh root structure if you if you have multiple small small objects like over 10,000 20,000 objects within the s3 bucket without any dive structures then you're probably going into then then you have problem like it it's become very difficult for you to do a s3 api call and that that's where your Thanos uh becomes a very bottleneck because it doesn't have a solution of making dive structures on s3. So the next thing comes off with Cortex Cortex on theory looks very good uh it has all all different components for different different solutions different different problem assessments uh but the problem is that it's like at least for us it it sound it became too much remote too many moving components all the way too many cook soils all the way so the problem uh became out of there from Cortex right there there are lot of components and it became very difficult to manage it it became very difficult to manage and then uh we came off with victoria method we started evaluating with victoria methods now i'll probably uh transform it with this picture what what happens is that your entire monitoring infrastructure with Prometheus is this big shape uh shock that is on and uh so it becomes a monolithic kind of architecture right we are doing all the things that one single goal you are you are scraping it from the same server you are uh basically computing your alerting rules from the same server you are uh getting query resulting to the queries on the same server and it becomes it has become this uh big shock so what uh was required is to divide into multiple small small group fishes which can probably uh solve their problems on their own one so each one takes care of one you don't have to uh like it's not necessary one thing uh that's what victoria methods did at very very good right so it divided the things into three different parts one is your storage and query where you store data query data second comes you're scraping and third comes your alerting now within storage and querying also since it's a very big part it it divided things into three different components which is vm select vm insert and vm store i'll come to the architecture where vm select vm insert and vm store so what it basically tried to do is the similar thing on that is pictorially represented here it divided the whole big shock into small small components that took care of their own uh respective things so now even if one component goes down your other things still keeps on working the other functionality still keeps on working it doesn't have to matter where your one function goes down another has also has to go down and that's what the basic underlying architecture of microservices now here here we have enter the queen which is uh victoria metrics and uh there's there's the common architecture so if we see out here i'll talk about the the storage and querying thing so you have vm inserts you have vm storage and you have vm select and then you have your regular prometheus for doing uh you can have multiple agents for doing your scraping part so what happens out here is you have uh prometheus or influx tb or graphite or open tstp anything of that sort becomes your agent and this becomes your server model where vm insert for if you are you want to insert the data into storage then you have vm insert so this is the basically edge point between your agents entry points for your agents to uh writing data on your storage this uh the vm storage one is the central layer where all your data resides so what it does is you it takes data from vm insert it writes data to this and it keeps things in buffer as well so what happens is till the time insert comes off insert takes off the data from the vm storage takes the data from vm insert keeps it off and writes it and so this is how your storage work only happens so now your insert doesn't have to wait a lot then vm select is the one that is helping you to query that data so whenever from gafana you are hitting uh suppose you are querying any from ql then what happens is uh vm select interacts with vm storage and displays off with the data so this is how you basically go on and keep on segregating the right so we now let's talk about the statefulness of the component the only stateful component out there in the entire architecture is vm storage box where your actual data resides okay you can have pvs attached towards on your vm storage parts and then that that will keep on writing on the on that data and now the best even within the stateful components it's not a single point of feeling so what it's a distributed system you can have replications configured towards it so that your data is not only on a single disk or a single single node and even if a node goes down you still have other nodes to actually config serve the data right so this is on the server side of so i'll i'll probably summarize the points again so all the entire component in that thing is very simple you don't have to mingle around with the configurations and that's that's the best drp that victoria matrix is kind of so you don't you can't you just take the victoria matrix in a clustered object and you deploy you don't have to mingle around with many of configurations and everything goes smoothly then you have clear separation between rights and reach as i've already mentioned so right the rights are taking care via vm insert so vm inserts are basically doing the right towards the vm storage and the reach whatever are coming from gapana are coming from insert now think of a scenario where your vm inserts are down all your vm inserts are down or you'll make the processor zero right but still you are you will be able to query the older data which why because your vm select and vm storage are up so it's not like you're denying the capability of querying the older data and by the time your vm inserts are coming up you're it'll again start off buffing it through the vm storage you don't lose data the only catch here comes is that the timestamp starts differing when your vm insert goes down suddenly so how does it work says is because for any remote api actually keeps things in buffer and it how especially with promethys and everything how it works is you're still keeping promethys at this moment you're still keeping promethys as a statement right so from this creeps the layer and it uses the remote write api of promethys itself to write it to vm insert now once it's writing it to vm insert it keeps there's something called as a right hit lock a wall it works on a wall mechanism and it keeps a track on how much time data it has already and then it keeps things in buffer so whatever is the wall retention rate would be your time duration that you can probably afford to make your vm insert done but that's not always adjustable and obviously and it's not something that we'll want to write and again all your or vm insert your vm selects everything is stateless it can scale independently infinitely and and what not so you run from default configuration no extra fills is required storage ingestion everything is again and the best drb again that comes again from victoria matrix on the storage side is it does higher compression so what what effectively it means is that if suppose you are using one terabyte of this space with regular promethys victoria matrix can store the same data in around 300 gb and and that's that's what is the best part of it right that the compression that it does is is too the highlight some of the highlights is that you don't have to change anything on your when you're exposing any metrics or your applications doesn't have to change your exporters doesn't have to change everything is supported all your promethys thing is supported right and from the graphana side also you don't have to add another data source of that sort you just have to change the endpoint and point it to the vm select parts or the vm select load balancer and and you're done so you don't have to change anything on apart from that right it's supposed or many other agents apart from promethys it supports many other agents the graphi json and fsdb and you can even insert if i have a normal call it's it's vm so we distributed with the application factor as i've already mentioned that vm storage is distributed with multiple application factor so you can have five nodes with an application factor of three so your data is at uh your same data is there at the three three different nodes now the question will arise if my data is at the three different nodes how does uh how does it does a deduplication it has to do a dedu and it has a dedu built in so it has a dedu called as dedu mainstay interval now this deduplication happens at two layers one is at the querying layer this is what i said because of the replica factor takes your internet what happens is suppose now promethys didn't support it he by default so what i have to do is you have to run two different promethys at parts scraping the same data now if you're scraping the same data and pushing out the same victoria metrics the victoria metrics on the vm insert layer has a problem because now it has two similar metrics at similar timestamps now this is not possible right and this is why you have a feature called as dedu mainstay interval which basically means that it will check the timestamps of it it will take that okay two metrics are there which are similar to each other and what will happen is it will master timestamps if the timestamp is less than this particular configuration then what will happen is it will drop one of the metrics and store only one of them only one metric component so this way it does a deduplication factor as well and it also supports multi-tenancy so there are multiple organization which has multiple teams multiple business units and what not right so you can and if you want to do segregations within the business units within the teams within the project IDs you can do them via multi-tenancy support which via using namespaces are tenancy so till now what we have done is we are taking care of the storage side of it and the querying side of it so now our storage is sorted and our querying is sorted right you query it with a different component it's stored it with a different component but we haven't replaced the agency our agent is our agent for scraping things in the Prometheus itself right so what what happens is that Prometheus even if it's scraping a lot of targets when I talk about a lot of targets suppose you have a microservice that runs around 50 60 parts and and it is monitoring Prometheus and monitoring hundreds of these kinds of microservices which are running all these number of parts then what happens is your number of metrics individual method keeps on increasing and once it keeps on increasing Prometheus even on scraping takes a lot of memory and a CPU basically takes a lot of resources so it becomes very difficult to manage post post one point right so that that is where you can replace Prometheus with VM agent now what VM agent does is it's it is a direct replacement of Prometheus you don't have to change any of your configurations you don't have to change any of your exporters you don't have to change anything you just replace Prometheus with VM agent and it works like anything it works smoothly right so the reasons why we have to replace Prometheus with with that is because Prometheus is resource as we have already seen right in the previous issues as well that they have been of multiple such issues that have been reported because of the resource heaviness of Prometheus and Prometheus going out of memory it's been crashing and all that stuff there was one recent there was one scenario that we face a lot of time when the number of metrics increases was wall corruptions and Prometheus if you go towards the Prometheus documentation it itself says that when you're doing a remote write feature or a remote read feature of Prometheus what happens is that it starts taking double or triple the memory of what it will take without a remote write feature right and since if you want to use from Victoria metrics as a server it becomes very difficult because it kind of utilizes the remote write functionality and that means a lot of wall corruptions a lot even more memory utilization and hence the cost associated with it they have been what happens is that they have been delays in pushing metrics as well at some equations where you get network chokes and things of that sort VM agent on the other hand was very lightweight it uses like I'll give you a scenario when we are using Prometheus for just for scraping for multiple tabs we had to spend around we have to have eight or nine different nodes on which all your Prometheus were there just because there were multiple pistachio units and everything so nine different nodes were having Prometheus when we replaced it with VM agent only two or three nodes were more than enough for it it was like even then we have to reduce the instance type for making them so that because the resource are not being used so this is the amount of resource consumption difference that we're talking about it's it's three x or four x of the difference it's Prometheus compatible so you it can directly integrate with all of your exporters and everything it's stateless so as we have already seen that Victoria Victoria metrics is storing data on the VM storage layer so there's no storage that you have to do basically on your VM agent right so it's stateless so you can have in number of replicas for a single object and then you can have duplication supported on your insert so that none of your data is duplicated and then you get rid of that gaps that you get in the draft and that have shown already right then comes a VM alert so Prometheus again as I've mentioned earlier here in this part as you divide things into three different components so we divide things into storage and querying we divide things into scraping and then the third part is alerting so we have already discussed about how storage looks like how querying looks like how do we scrape it via VM agent and then now the last part comes off into alerting which was still handled at the Prometheus layer for this right so what happens is that this is replaced by a component called as VM alert what it does is it it basically computes all your alerting rules and based on the alerting rules it fires then it sends a signal to alert manager which basically in turn sends your notifications to your Slack channels and then you can within the alerting rules you can configure different types of plugins different type of you want to have runbook url you have to run have your grafana url and whatnot right everything can be configured right so VM alert is basically evaluates the alert against pictorametric and will send the file into your storage managers right cool let's let's talk about scale right what what scale are we referring to so what we are referring to is 2.793 trillion data points at any given point of time right so that is a amount of data points that if you do any single query that the amount of data points that it has to basically query back and see that okay uh what is happening and and give you back the exact results right and our data points are growing with a rate of 958 uh thousand every second so every second this these are the number of data points that I extra uh added to the server right and and each data points have has a byte of 3.33 bytes every now the best part is that even with these many this much of our data you have only 782 uh gigabytes of storage that is stored now note that this storage is across five different VM storage uh parts and with the replication factor of three so that means this this entire storage is actually thrice of the compressed data right so what happens is you have a replication factor of three so that is there is a level of compression that victoria metric supports and it saves a lot of disk space and whether you talk about uh any uh storage type right when you talk about uh ec2 ssd or anything of that sort right there's still disks they're not ran right and and with that that means it is slow so if you have a lot of data on your uh file descriptors uh if you did a lot of uh data while your file descriptors then definitely if uh that will slow down with an amount of uh with the amount of space etc with the amount of data that you have and we are making network calls for sure now this is some something that will uh fascinate a lot of engineers and would have uh like every company wants to save costs right it's it's not uh every penny is hard on right so you have to save that thing and and there have been costing meetings in uh almost all of your organization now how much do we spend for this entire infrastructure right you have uh the best part about uh about vm select vm insert vm alert everything is stateless so that gives us the flexibility to run them on spot nodes so we run uh all these components on hundred percent spot nodes with a variety of spot instances and with hpa enabled and so our minimum configuration is you have only two parts running at any uh off p cars and based on the load if suppose a person is querying a lot more more people are querying in the daytime right more people will be opening your dashboards at the daytime uh during off and off is awesome so that that time what will happen is will uh the hpa will turn your in will scale up your vm select nodes vm select parts and that's how it happens and since everything runs on spot it hardly makes a price difference the only component that runs on demand is vm storage is because it's straightforward although it still can run on some percentage of spot uh it depends on on the how your spot architecture looks like so i'm assuming that we are having uh five vm storage nodes uh with all on demand instances then we are spending around 13 odd dollars a day with three spot nodes for all the other components uh that that too is also based on if you want to leave headrooms towards other components or other parts to come in and stay back and and for hpa kind of thing you don't want an ect instance to go down and keep off it will hardly spend a three three dollar on on and then suppose you are having hpa on your vm agent uh parts for scraping things off then also the requirements is around 500 uh mbs and 10 millicores so this is the requirement of scraping things off which even uh i i didn't get a proper uh like instance for that particular kind of thing since you do not use t2 it still have taken a counted two dollar max to max mark over so the total spend of uh of supporting these many number of targets these many number of data points the trillions uh data points you are spending only 18 less than 18 dollars a day which is very nice and you you don't compromise on availability on so it's uh whenever you talk about cost everyone says that uh cost is availability will be uh uh will take a hit if you talk about a lot about so that that's not the pace here what we have done is we have made each and every component as a we have decided that we have having multiple replication factors everything right we have replications across your data storage nodes you have multiple vm in search you have multiple vm select you have a on the vm agent parts here and with all those also you are spending only a fraction of amount what you will have been spending with the same amount of data same amount of varying will be spending a lot more on a vanilla prometheus architecture or even if you talk about Thanos if you're talking about cortex it takes a lot because uh because of the complexities of the architecture that it will start building i've attached some of the references on uh on the things so there's been very good block on uh media for for a comparison between a valena prometheus victoria matrix benchmarking and it also talks about Thanos and cortex uh another hood as well so do check out this block it talks it it probably shows you with proper metrics on storage your memory or cpu knowledge consumption that it is seven x times bigger and this is a claim on their own uh on their website as well and on their documentation that uh within via this benchmarking they have proved that it's seven times faster than us and uh apart from so whenever we read such statements right we feel like it's it's it cannot be true because uh very very actually implementing it out your own research it tends to differ based on your uh hardware that you are using based on the architecture that you are using based on the metrics that you're exposing or but out of my personal experience as well uh vm agent have uh or vm victoria matrix as a whole on all the components have done us very well and uh it it probably has has actually proved that it is one of the lightest company uh that is available for the primary infrastructure uh yeah i'm pretty much done with my uh things i'll leave the next five ten minutes uh out there if you have any questions thank you viva for uh giving such a wonderful talk a very concise uh you know talk on victoria matrix and its benefits or comparisons over other toolkits uh thank you for uh you know coming here and talking about it uh so we have some audiences uh both on youtube live stream and here on the zoom call uh if anyone uh would like to now pose a question uh or maybe a couple of questions you can unmute yourself uh and then start asking uh viva uh something that's relevant to the talk he just uh you know uh presented so yeah so i see some folks here i see srikant raki avinash uh any of you would have any questions around the current talk you can unmute yourself and ask uh viva directly i'm sure you'll try and answer most of your queries if nobody else i have a question i wanted to ask you how uh how much time does it usually take to set up the complete uh components for running prometheus and getting things uh up and running for monitoring and which are the most time-taking components because uh the concerns people have apart from running prometheus is like it takes them one and one and a half months of set things up right so so i would like to know how much time did you take and which components for their uh most time-taking and if there is anything that could be improved to cut those time out sure thanks so thanks for the question uh so there have been a lot of things when you talk about your setting a monitoring infrastructure uh for and it takes around a month and an hour so there have been a lot of considerations that uh folks should really make while making this decision uh if you talk about a vanilla setup of uh of vectoria matrix and all all the victoria matrix agent and everything it probably takes uh a couple of hours not more than that uh to actually set up the entire architecture what actually uh takes a lot of time is uh to understand that so when when you're doing the helm uh installed from that uh using the henshout of vectoria matrix when when you install all this component what generally happens is uh it installs the engineering servers on top of engineering spot as well so it takes good amount of time to understand okay what is your organization idea and what is the end point that with which we'll uh insert do a remote write or a remote read from grafana so that is one uh point that uh the first point that you can search it on on the documentation the documentation is pretty good it's it's decent enough to have all your details uh and it will not take more than a couple of hours or maximum a day to set up the entire architecture right whatever we have discussed okay so what about scaling things up like reaching a scale that a normal organization can use uh uh there's a capacity planning one of the time taking things or is exporting matrix one of the time taking things any uh other stuff that consume time i would not say it uh a couple of hours is a total time uh it it takes for an organization to grow from zero to having a modern setup right correct so what are the components here and uh what are the time consuming time things here sure so uh for capacity planning what victoria matrix has is okay victoria matrix has some very good uh thing on on with reference to your capacity planning it completely says that okay how much data that you have written uh you know but for any startup organization it becomes very difficult to understand that okay how how much metrics i'll be emitting right it took us considerable amount of time also to say that okay we are we are emitting these many metrics and this is the granularity at which we are emitting the right so that obviously keep that there's no straightforward formula for uh getting those things uh it's about how do you uh keep on tracking things and you increase uh scale you scale as you go uh kind of thing okay so it's not you'll throw off petabytes of data in a single drop in a single shot right you will grow individually and uh what do you have to grow is within victoria matrix if you say is that you'll have the only component for take take care of growing horizontally is your uh vm storage node which which has some hassles because if you're increasing another node of vm storage you'll have to add that node on vm in certain vm selects as well that is one mistake uh that we did uh during our setup so there was a scale we were actually hitting uh the we were actually hitting scale and uh we were having resource trenches over there as storage nodes were not responding properly so uh we did increase the storage nodes but we didn't added that those nodes address uh on your selector near and certain okay and and this is uh after some debugging and all we we've got to know okay this is something that you'll have to keep note because vm inserts and vm select doesn't uh kind of does a service discovery and it doesn't use a service model uh to actually uh identify all your storage nodes uh but you have to explicitly provide them with uh cross dns end resolution for for that via headless service cool cool that was interesting so uh yeah i mean sort of a corollary to what anki does like what happens on bursty workloads what happens if tomorrow you release a new service and at this particular service suddenly start seeing a very very large amount of traffic how does vm storage uh scale when suddenly you go from 3.3 billion events to double that maybe over the span of a day would the storage handle that sort of bursty load or would it start dropping samples and you know suddenly you have a lot of blank canvas to deal with right so what happens in that scenario so uh what what happens in this kind of scenario is that uh you you have a component uh above your storage nodes right on both your querying layer and your uh inserting layer so what what we are uh kind of expecting with this question is on the insert side of things so you have an insert so what insert parts does it it keeps off a buffer between sending it to storage nodes so it keeps things thing on buffer now the drawback of this uh this particular uh scenario would be that whenever you're querying uh data on your vm storage nodes via via vm select you'll obviously you'll be getting uh differences in your amount of data that everything so the times time so suppose if the last if it's double the thing and if your vm storage is not able to handle that part or it's writing very slow then what will happen is the next you'll start missing your last five minutes of data and then you'll see all your data so it'll take uh some time to actually fill back that data but no none of the component will go down directly and uh because vm insert will take care of keeping things in uh buffer and uh it it'll only throw off it will act as a querying system you can say that so uh you think of it as a Kafka queue or any any queue where it you uh keep on pushing messages and it's it basically pops off the message as in uh when the consumer is allowed to uh is healthy enough to take that message now but that definitely comes with a drawback on on that part if that load is too uh extensive and it's uh out of things you will of the vm selects will not be able to see you the recent most data that uh you are expecting out of uh that that's the only drawback cool yeah i think that answers uh you know most of what i sort of tried to ask but i still see uh in real life you know there could be situations where things and but like you can always capacity plan of a new service to a certain extent but real world events always go ask you of most of our best late plans right real world skills are always always so different than what we can plan for but but still we can at least knowing the way this workload has been laid out we can have some plan which sort of answers you know if i was doing this how would i would want to sort of handle this right