 Hey, everyone. Welcome. Thank you for joining us in the Saturday morning and making space for the meetup in your time and your calendar. Glad to have you all here. So this is, I think, like, we skipped the meetup in the previous month due to certain, you know, scheduling issues but we are back this month August and glad to have you all here. So, can we have a quick rounds of like intros like our people here like here for the first time. Are there anyone here in the Zoom call who is joining the media for the first time so just so we can say hi to them. Okay, I'll take that as a no. And so here to start off the Bangalore Observability Meetup August edition, we have somebody from Grafana today with us. And we, he is one of the core maintainers of the logging platform created by Grafana, Loki, and today will be one of the first sessions he will be doing for us. The initial proposal that I had for him was that, hey, we need to do a Loki tip type. But very correctly he proposed that hey, why don't we start with smaller intros session first, and then slowly we grow the community into like deep and deeper dive. And I'm pretty sure we will have multiple sessions with him going forward in the other editions of the meetup in this year. And yeah, glad to have you here, Sandeep. You can unmute yourself and maybe you can just give us a quick intro and then start off with your talk. Hey everyone, thanks for joining. I am Sandeep, Sandeep Sukhani, I live in Pune and have been working at Grafana from last two and a half years. I'm a Loki maintainer and have been contributing to it since I joined Grafana. So, yeah, without further ado, let's start with the talk. We'll do an intro session today. And then we'll follow up with a deeper deep dive with like the intro would be more focused on what Loki is and how it solves the logging problem with different compared to other logging solutions. And then the deeper dive would be more on the internals, how to like what are the best practices, and so on. So, Loki is inspired from Prometheus and we'll see why we say it's inspired by Prometheus in a bit. So, I already introduced myself. You can reach out to me on Twitter and email for any help regarding Loki or any other help in general. So the agenda of the talk would be, first we'll start with understanding the Loki model and how it works, then we look at some of the features. And how do you troubleshoot an outage in Loki and then we'll start to do a demo, like ingesting the logs and pairing the logs out of it and then we'll see how fast is really Loki and how do you install and run it. So, before jumping into it, let's look at, let's look at a bit of history first. It was started by Tom and David in 2018 at Grafana, then with some stabilization and adding some features to it. We did 1.0 release in 2019. And since then, we have not slowed down, the team has grown, the contributors have grown and we have added a lot of features. We did a 2.0 in 2020, and recently in this month itself, we released version 2.3. As I said, the community has grown a lot and we have seen a lot of love from the communities, community members. We have more than 350 contributors. Like yesterday I checked it was close to 490, 398 something. So we are close to touching 400 contributors and we average about 100 PRs between the release. So there are so many things that gets added between the releases and we have over 13,000 Github stars. So with a project that started as recently as 2018, 13,000 Github stars is a very impressive number. So let's start with the Loki model. And so Loki takes a different approach compared to other logging solutions, like most of the logging solutions would index all of the logs, like it would, the index would contain all the logs that your logging solution, sorry, your application is logging. And because of that, the index grows as close to the amount of logs that you have, sometimes even bigger than that. And this causes a lot of problem, like imagine logging 10 tbs of data and your index growing as big as a couple of tbs. But with Loki, we just use attached labels to the streams. So the index is much, much smaller compared to the logs that you are ingesting. So here, an example would be with 10 tbs of log you might as well have just 200 mbs of index. So yeah, that's the Loki secret source. This is the comparison of an example log line when indexed with Loki compared to index with Elasticsearch. So Elastic would index the whole log line, like all the keywords from your logs. And instead of that, Loki would just store the timestamp and some of the metadata and some labels from your logs, so that it should be able to find out the right logs when you need them. So, because we say it's inspired from Prometheus, let's look at an example from the Prometheus itself. So a sample in Prometheus would have a timestamp, a metric name, and some labels attached to it, and the value reported by our application. So here the timestamp is a Unix timestamp in microseconds precision. There's a metric name called engine X CPU usage and the labels are labels attached are from your application like which application is reporting this metric and some additional details like what's the IP of it or maybe you would attach the name of the host where it's running. If you're running multiple replicas of your application and the final value that was reported by your app. Similarly, if you are scraping logs from the same application, sorry, ingesting logs from the same application, then the index would look similar. So here instead of microsecond precision, we have nanosecond precision for the logs for the timestamp. And similarly, the labels are what app the logs are coming from and which instance from it and the actual content of the log. So, as you can see, the content of the log would be not indexed the index would just contain the metadata and the timestamp of the log. So, the logs coming from same application are called and we name it a stream so all the logs from same application would be called a single stream. So here you can see there are two streams. The first one is App Engine X and the instance is 1111 and the next one is App Engine X and the instance is 2.2.2.2. So, the logs from the same stream would be stored together so that the queries would run faster. We are not going to cover much about how Loki stores the data in this session. So, let's move ahead and Can I ask a question? Should I wait for the end? Yeah, it's okay to ask a question. So, what happens when you search for the bits which are unindexed? Does Loki pick it up or how does it go through it? Yeah, we'll look at the Loki query model in next slide. Cool, thanks. So, hey thanks for the question but we have a dedicated Q&A session time after the talk so let's just wait till that time and let's put our questions in the chat and I'll ensure that it answers them one by one. Go ahead. Yeah, thank you. So, this is the power of Loki, like it reduces the logs to a minimal amount, sorry index to a minimal amount and if you look at it, if you have 1TB, like it's just a model, we think that it's not like we propose that this would be the proportion of your logs and the index. So, if you have 1TB of logs, you have 80TB of selectors and you reduce the time frame so that it searches 1TB of logs and with brute force it can go up to 20 GBPS and more than that. So, we'll see in a bit how fast Loki actually is. So, let's start with how it works. So, this is the architecture of Loki and you can see there's a read path and write path. So, let's go with write path first. So, there are multiple logging clients and there's also an API which you can use for pushing logs to Loki. So, when the logs come in they first hit the distributor which is something which would do the replication of the logs. So, there's a consistent hashring running between distributor and ingestors using which distributor would decide which ingester the logs belongs to so that all the logs belonging to same ingester would go to that itself instead of distributing logs all across the ingestors. So, after getting logs to ingester, the ingester would store the logs for in memory for a configured duration, typically for one hour and after that it would flush the logs into something called chunks and add the index entry for those chunks in the index store. So, the storage, long term storage is divided into chunks store and index store. And we support a couple of different types of object stores like S3, GCP, Minio, Azure and there are different extors supported like Bigtable, Cassandra, DynamoDB and BoldDB for running it locally. On the read path, there's query front end which is responsible for splitting your queries, doing some sharding work, caching the query results in the results cache and splitting the queries and forwarding those queries to the querier. And the querier component here is responsible for doing the actual queries on the long term storage and querying the logs from the ingester which still it still has in its memory. So, querier would first go to the index store, find which chunks would have the required logs based on your query and it would then download the chunks and pack the logs from it and send the results back to query front end which would then send the results back to the component that is querying the logs. So, here you can see we support different types of caches. The first one is the results cache which is used by query front end for caching the results from a query. So when you do that same query again or the query overlap somewhat with the previous query then it would just fetch the results from the results cache and send the remaining query to the query for finding the logs for the remaining duration. And similarly there's an index cache which is used by the query for caching the index from the index store. And there's a chunk store which is used by the query for caching the chunks from the long term storage. Similarly, here you can see there's a ruler component which actually embeds a query with it. The ruler component is responsible for supporting rules and recording rules and alerting in Loki. So you can configure ruler to record some metrics build and build and record some metrics out of it and maybe a lot on it based on the local queries in the ruler component. So, there's a lot much to this architecture so because this is an intersection I won't cover much because it would be too much for too much to cover in an intersection. In the deep dive section we'll see, we'll have deeper dive into how each component works, some of the components that are not shown here like the console or the distributed hashing. So, let's move ahead. So, consider Loki is a black box here. You can even run it as a single binary so we'll see how you can run Loki in different modes. So, to talk about how you can get logs into Loki, there's a client called Promtail build wires, which is similar to Prometheus like it would do the service discovery of your apps which are doing the logging and you can run it with Kubernetes or as a single binary. So, we'll see how to run Promtail in a bit. So, the Promtail would scrape the logs from applications and it would send it to Loki. And there are different ways you can get the logs out of Loki like you can use Grafana, you can hit the REST API or there's a Log CLI client build wires which you can use to query the logs using a command line. As shown earlier, there's a ruler component which runs in Loki and it can talk to Prometheus for sending metrics built from logs or alerting on some conditions from the logs. Like if you have threshold on the errors and it exceeds the number of errors in last 15 minutes then you can alert someone that your error rate is growing. So, we'll see in a bit. So, as I said, the Promtail is the agent build wires, it does the discovery of the logs and it processes the logs and attaches some labels to it, the metadata labels that we talked about to identify the streams. And it has also the capabilities to transform the logs, like you can rewrite some of the log lines or drop some of the contents from it or drop the whole log line itself. And after processing the logs, it would ship the logs to Loki. On the service discovery sides, we support two types here. There's a static mode where you can run it as a binary and give it a directory where all your logs are stored and it would start pushing the reading and pushing the logs from there. And there's another mode for running it with Kubernetes where it would do the dynamic service discovery, same as Prometheus and it would attach the logs, attach the labels to the logs using the Kubernetes API. So, the setting mode is pretty simple. In the demo, we'll cover the setting mode to talk about Kubernetes service discovery, we use the same code from Prometheus for doing the service discovery. And if you have same relabeling and service discovery config given to configure to with Prometheus, then you would get same logs and same labels for your logs as the metrics from your applications. And here you can see Prometheus would run as a, you would have run the Prometheus as demon set on each node and it would talk to Kubernetes API and find out which pods are running on that node. And based on your configuration, it would pull the logs from the container and ship it to attach labels based on your labeling rules and then it would push the logs to Loki. So here you can see there are two engine instances running on different two nodes and it attached the label engine X and the IP of the node. So we support various clients. Prometheus is the one which we recommend when you're running it with Kubernetes or static logs from your VMs. There's Grafana agent which embeds Prometheus and it also, you can run Grafana agent to scrape all the metrics logs and traces from your Kubernetes workloads. For other environments, there are there's LogStash, Lambda, FluentBit, FluentD and a Docker driver for various different type of scenarios where you want to scrape the logs from. So let's look at some of the features of Loki. So the most important part is the logql itself, which you use for querying the logs out of Loki. The logql language is inspired from Promql because they share they have similar data model. So the most basic query is getting the logs out of Loki. Here you can see there's no metric name which is which is the difference between how you do a Promql query versus a logql query. So without the metric name you just provide the labels from your for your streams. So here you are selecting app equal to foo and namespace equal to dev and the next we also support regex query similar to Promql. So here you can see it's querying for all the logs which came out of any application which has label name foo and value with anything and then bar. So we also support filter expression. So here you are using filter expression you can filter logs based on the contents of the logs. So this answers the question asked by someone sometime back so you select a stream and then you filter the logs based on the content. So it would show all the logs which contain error in it or you can you you can change the filters here you are querying for logs which contain error and also a timeout or all the logs which contain an error without you without it having keyword timeout. We also support metric queries so using logs you can build metrics out of it. So, there are a lot many aggregation functions that we support in local these are just some examples. So the first one is the rate query which tells you the rate at which your application is logging errors. The second one is count over time which counts the number of times your application has logged errors. And this is a quantum quantum query, which you. Hi Sunday you're talking on mute somehow you got me probably muted yourself by accident. Yeah. Thank you so much. Was it was I muted too long. We lost you like for a couple of seconds 10 seconds maybe. Okay, so I'll start with metric query so. So here you can see, like we support a lot many operators, similar to promql. So here these are just examples and the first one is the rate query which which would show the rate at which your application is logging errors per second. And the next one is count over time which which shows which counts the number of errors your application has logged in last five minutes so this is the window at which it is sampling the logs for and the next one is content over time which you can use to see Latencies build latency from just using your logs which previously you are using metrics for so if you can't have if you don't have a matrix for Capturing the latency, then you can use just logs to see the latency will see an example of it in a bit. So there's another exciting feature called custom retention and deletion which we just got released with 2.3 and we have been running it in production for about a month I guess. So the custom retention lets you retain logs for specific duration, like you can even select specific retention for specific streams like you want to retain dev logs for just 15 days and prod logs for three months or debug logs for a month and all the error logs for a year something like that and all the others after configuring the default retention you can also configure global retention for all the remaining streams like retain logs for, retain all my logs for three months but just retain these streams for 15 days or something so it's pretty flexible and there's also deletion support that we added recently, you can delete logs for specific streams for specified duration like delete all my logs from date 1st Jan to 31st Jan for logs sent by these applications and we also support cancellation of unprocessed requests like there's a configurable duration for it for which you can keep the delete request untouched so if you change your mind then you can go and cancel the request before it gets picked up for deletion. So these two features are only supported with BoltDB Shipper index store which is something which we built for getting away with hosted index stores for running lociate scale like previously for running lociate scale you had to use DynamoDB Bigtable or Cassandra but we wanted to get away with that requirement and reduce one more dependency so there's something called BoltDB Shipper which you can run with microservices mode and it would use just the object store like gcs s3 or azure or file system itself so you can use just the object stores for storing both chunks and index instead of hosted index store and one more thing we support both these features with BoltDB Shipper index store because we think that's the future which we are going to move forward with all our internal and broad clusters run with BoltDB Shipper and its production rating so you can give it a try So we also recently released recording rules and alerting support. Recording rules lets you build metrics out of your logs so you can configure the ruler to build some metrics out of it and then forward those metrics to destination which can be Prometheus, Cortex, Thanos or any other store which is from QL compatible. Prometheus is compatible and there's another feature called alerting which you can use to alert based on the condition of your logs like you can write locual queries and say that alert me when the error rate increases the threshold you can also do some things like when you do a release, watch the error rate, and alert me when there's some problem with the release so you can configure an endpoint to alert manager and it would send alerts to the alert manager. So these are the last ones are just some of the features there are many more features like we support replication. So when we talk in architecture there's something called distributed hashering using which we send logs to multiple ingestors like distributed is the component which does the replication and you can configure low key to replicate to n number of ingestors like we internally run with replication set to three so it would write the log. So you can send logs to three ingestors at a time and if any of them if one of the ingestors goes down you won't lose the logs. So it's safety feature and low key also is also multi multi tenant so you can send logs for with different tenants store the logs with different tenant IDs like you can partition the logs between different teams. So, like building team would would could use a different tenant and your services team, booking team could use different so they would they won't be able to see each other's logs and they can share the same at the same time they can share the same low key cluster. So, there's also something called right ahead log, which is also safety feature when ingestors would receive logs they would also write the logs to a disk and if for some reason ingestories crashed, then it would. So that it still has in memory and it has not written to the long term object store would not be lost it would when the investor comes up, it would replay the logs from the right ahead log and it would get the state back to what it what it had previously so this avoids losing any logs if the investor crashes before being able to write the logs to the to the long term storage. We also have support for live telling the logs. And the next one is zero dependency mode where you can run low key without any external dependency. So, foreign. So, foreign we internally use console, like we support console and there's something called member list. So it use it keeps the state of the ring in memory. Like I know it's too much info. I'll just skip it for now we can cover this in deep dive session and for the object store you can just run low key with file system. And there are many phone features. A lot of them are covered in the Loki docs and on the Grafana website. So let's see at a typical troubleshooting workflow that you can you generally go with. So there are, there are so many tools nowadays for covering your observability stack so first you get an alert in slack then you go to Grafana to see the metrics and maybe then jump to Prometheus and then you go to elastic for finding the logs. Then you go to eager for finding the traces and then you go to your application for doing the code. So instead of that we have all the three pillars for observability in integrated in Grafana. So first you get an alert on slack then you go to Grafana, then you can from Grafana itself you can jump to them from dashboards to you can jump directly to the metrics and from there you can directly jump to the logs without losing the context. From the logs you can jump to tempo traces and then you can go to the application for fixing the work. We'll cover this workflow in a bit in the demo. So I have a demo running already like I can quickly show what what is there for the demo. So I have like I have used the latest release from Loki repo and this is the default config that comes with Loki. So you can see it's a ML file and I'll explain some of the components from it, not everything. So here you can as we talked about multi-lendency you can see this there's something called AuthEnabled which enables the multi-lendency mode. So I think just doing a control plus plus shall walk. I tried it. Okay, that's does not work. We will have to bear with this thing. Let's get on with the demo. Yeah, sorry. Is there a way to increase font size from the editor since I hope this is an editor, right? Yeah. Don't see. No worries. Let's go. Let's get on with the thing. Yeah, we'll see later. Yeah, so so AuthEnabled is something which you can toggle to enable or disable the multi-lendency mode. This is the config for where you want to store the logs for the long term storage. And then this is the config for configuring configuring your long term storage. And yeah, so there's also this ruler component where you can configure the rules and where you want to send your alerts to. And then there's this from tail config where this is for scraping the file statically from your machine. So this is the script script config. Here you define some pipeline stages where you would tell from tail how to process your logs. So this is a regex. We support multiple ways to process the logs like we support JSON format we support CRF format. Sorry, regex format. So here it tells the from tail to pick some bits from your logs. Like the first one is the timestamp from the logs. The next one is which stream the logs are being written which stream the logs belongs to and then the content of the log line itself. And this these these properties would be shipped further to the next stage. Here we are telling from tail to convert the stream to a label, which would be attached to your log streams and then this is this here we are telling it to which format the timestamp is logged in. So it it's logged in RFC triple 39 nano format and the output stage would set the content of your log other than the labels. And here we are telling from tail to find the logs like here it here this here you can configure the path where it would find the logs. So I have already started shipping logs to Loki. And we'll see in Grafana. So I already have run in Grafana running locally. I'll just add Loki as a data source. So you go to the data sources page you select Loki and then at the end point here. That's then just save it would show you whether it was able to connect to Loki data source. Then you can go to explore and select Loki as your data source. There's something called log browser which would show you all the labels from your streams. Here you can see the part of the file where the logs are coming from and the stream of the labels. So let's go ahead and write some queries. So I had added a job label to the stream. And so the logs that are simulated are from a file server where you can it would log all the request that that it's getting for uploading downloading or listing file. And I have added some more bits to it for showing this demo. So this is the log line here. You can see the timestamp picked up by from tail and then then there are some of the labels added by like this is the actual content of the log and this is the timestamp from the log. So I was asked Loki to give all the logs from the job file server. You can do a comma and then filter it further like show me only their stream or show me only logs which contain status code equal to 500. You can. So there are various pipelines that we support for processing your logs log contents like I have done logging in log log FMT format so you can tell Loki to convert them to log FMT format and then all the logs all the bits here would be converted to labels and then you can further process it like show me logs with status code 500 or maybe you can do like duration greater than five seconds. So it would show all the logs which have duration greater than 500. Like is it visible for everyone the font. I think increasing it a little bit would help but it to me it's definitely visible but I'll leave that to the audience. Yeah. Yeah, I'll keep it zoomed in a lot. Yeah, so you can there are different formators that we support you can use Jason for matter if your logs are in Jason format, or there's there's something called pattern parser you can use to write the pattern in which your logs are written engine X has a different format so you can use pattern parser to build labels out of your logs using that pattern parser. So we also support metric queries so you can say that show me the rate of errors from this job. So it would show the rate of sorry. I'll select this error stream. Yeah, so it shows the rate at which your application is logging errors, or you can build a latency graph. So you can see what is the latency of the APS using Contile Over Time. So I'll show you the 99 percentile latency. So we'll convert the logs to log of empty format because we want to use duration field from it. There's something called unwrap which lets you convert the values from your labels to a metric. So we have a duration label will convert the duration label to a duration metric. And then we'll do give it a time window. We will see we can see the latency by user. So the user, all the users, this is showing the latency per user. You can select a specific user and see its latency. Or maybe you can just look for you can filter it further like I want to see only latency for upload calls. There are a lot more to lock QL. We can cover more in the deep dive session. Like this is this was just some basic things that you can do with lock QL. Also to show the jump that you can do between metrics and logs. So if you are having some metrics from the metrics and logs from application, you can query the metrics. Yeah, so you go you look at a metrics and then you notice that there are some errors in your application. Then you can select the window where you are seeing the errors. And then just change the data source to low key. Then it would show all the logs for the same time window. Using the same selectors. If you are having same conflict for your from from it is and from tail, then you can do the switch easily using Rafa. So moving back to the presentation. Yeah, so you would say that why do we need ad hoc queries with low key. Why not just use Prometheus. So the logs is something very basic to the observability like everyone does the logs, not everyone uses Prometheus. And there could be a reason that you can't use Prometheus in your in your legacy application because it doesn't support it. So you also can convert your logs to metrics on at the query time. So there are limits on the labels that you can attach to a streams both in Loki and Prometheus. So we support converting the contents from your log to metrics at the query time so that you don't explode the limit on the number of labels that you can have, like the cardinality problem. So this is the dashboard using built using just Loki logs. So you can see the local query language is very powerful and full of a lot of features. So you would say that with minimal log really how fast is Loki. So these are some of the numbers from our internal Loki cluster we have the cluster is doing getting 8484. Right QPS and the ingestion rate is it for it 84 MB per second or 70 bees of log per day. And the latency is just eight milliseconds average or 90 milliseconds in 99. And we are doing 250 queries per second and the latency there is 50 milliseconds on average and 250 milliseconds on P 99. So this is in no way a benchmark for how Loki would perform like everyone has different setup. The usage is different and it depends how much value you can get out of Loki based on your workflow. So to give an example in 2020 this like serial is one of the other maintenance and he's more into improving the query performance and local side of things. And he tweeted about he one of the queries did 10 GPS like process 10 GB GBs of log per second. And fast forward 1.5 years later. This was treated just last week. So the query did 52 GBs of logs per second. You can see the graph where the queries exceeding about 40 GBs per second. Like the pros the amount of logs that it processed per second. So how do you install and run Loki, you can install Loki using tanker. So we have something called json net, which in the Loki report which you can use and the tanker is a tool using which you can build the ML and deploy the workloads to Kubernetes. It's a recommended method because we use it internally and it's pretty well maintained we keep it up to date. All the time. And then there's Helm. You can which you which also you can use for deploying to Kubernetes. We have Docker images and Docker compose prebuilt. And you can also use the binary for running it locally. Which is what what I did for doing the demo. And then you can build from the source directly. So we Loki supports different modes. There's single binary mode for smaller installations or you want to test it locally. So single binary is the preferred method that you can also run multiple single binaries like Loki has service different services which we saw in the architecture. There's ingestor distributor query, so single binary would run all those services in the same binary same process. You can run multiple of them and make them discover each other using ring or something. And you can also run each component individually in microservices mode, like run just in just a set of ingestors a set of queries. Front end and so on. And then there's Grafana cloud free tier, which is forever free. You can ingest 50 GB of log per month to give it a try. And then you can continue using it if you are well within the limit or you can upgrade to a paid was paid tier. It also has it also comes with Prometheus series. So you can do 10,000 series with Prometheus with free tier. So this is the case study which we did with ptm insider. So I met met with Piyush at DevOps days India 2019 when we did the talk there. So he mentioned about how they switched from elastic to low key and we did a case study with them. So as you can see, they saved a lot of cost on and they saved a lot of cost that they were paying for managing elastic cluster. They claim that they reduce the cost was reduced from by 75% when switched from elastic to low key and they're empty tier reduce from 30 minutes to just 10 minutes. So that's it from my side. If anyone has any questions, I'd be happy to answer them. Okay, thanks for the great talk Sunday and thanks for an extended demonstration. Normally we finish demos in a very short period and thanks for an extended and in depth demo. So we'll be taking up questions. I think we have a couple of them. I think I'll just let Satya ask his question and then see if you can answer that because we had sort of stopped it there in the middle of the talk. So Satya, can you just unmute yourself and go ask the question. I think the question was answered but just to recap, I was wondering, you know, for the content that is not index, how does low key grab it and what would be the, I don't know if efficiency is the right word for it. So what is the speed difference if it's like, you know, if it's a label based or if it's been tagged as a label, obviously it's going to be like almost instant. How would that compare versus something which is, you know, hidden deep inside, and assuming that you're doing 10 bytes of logs per day. Yeah, so as I said, it depends on the workflow that you have. So if you have a lot of streams that keep, if you have a lot of stream term, then low key would not perform well because maybe it would be because of the kind of labels that you're attaching it with. Like if you are attaching labels called user ID or if you have an ordering application and you are attaching the order ID to it, then it's not something that is recommended. The labels model is similar to Prometheus where you won't attach order ID or something to your metrics. So as far as the content is concerned, we, we store the logs in something called chunks. And that is what your index points to when you do the query, we would fetch the chunks from the storage and then decompress the logs from it and then go through it. So if your chunks are really full, then it, then it should not have a problem like it can go through the chunks like imagine you have a million chunks for us. A query and versus a million 100 chunks, which are full a million chunks, 10% full or 100 chunks, completely full. So that is a difference that makes the kind of labels. So if you have labels which are the value is not bounded, then it would create a lot of tiny chunks and then query would suffer because it has to get a lot of chunks from the storage versus if you have a bounded index, then it would just process through the logs while going through it. You can build metrics from it on the query time. So it's not like, oh, I missed adding this label. I really wanted to add the order ID label to it. Then you don't have to worry because there's query time metrics available. As I showed I have like a related question then basically to leverage the architecture of Loki, we will need to have the level selectors as metadata and not inside the payload JSON of the log line itself, because you are not indexing that if it is if we have a structured log and let's say the level value is hidden deep inside that structure somewhere that really helps. We need to extract that and basically change the log implementation of it then. So you can query time tell Loki that this is these are JSON logs and then use the contents from your JSON logs as labels in. So, so only then the message part of the log will not be indexed. Every other key value here will then be converted to like a level. Yeah, so just tell it to treat those logs as JSON and then all the contents like if you have JSON and inside that there's a bit called foo equal to bar then the food would be well at very time. And then you can use it how does Loki know which part is contained and so we have less than multiple JSON. Let's let's think of it as a single hierarchy JSON where there are just a map of you. So how does Loki know which part is the content or not to be indexed and which parts are the levels which needs to be indexed. We explicitly tell that to look it during consumption during prompting. Not during consumption. The content is just a text for Loki and you at query time you tell it what the text is. Also I define the schema during the runtime itself not. Yeah, you just tell it how to make sense out of it like if your logs are in log log FMT format you tell it that these are log FMT or your logs are in JSON format. So the content of your logs are just text and the labels are key value pairs and the stem stem. So these are the three parts to your logs that are stored in Loki. Yeah, we'll go to Piyush's question. I think Piyush, can you please unmute yourself and ask a question. Yeah, I think it got answered during the course of the talk but like when the talk started I was wondering like is the label detection. It's something that we manually configured but based on the talk what's coming up on this is that I mean to give a very, it could be a bad analogy. Right when I'm creating my log FMT format and I'm defining the structure. It's essentially a similar way I can define the structure on the on the rom-tale, the YAML file like which of the columns you want to index and accordingly it will return the queries. So labels is just a metadata. You can, the metadata could be the name of the app then like in Kubernetes it would, there would be name space. Yeah, yeah, yeah right. So you can also like the rom-tale also support building the labels that processing time like based on your logs you can attach new labels to it. So that is also pretty powerful when configuring your labels. Got it, got it. And the content is just text. You can, it can be anything. It could be JSON, it could be just random text or log FMT or anything. Cool, I think that concludes our first talk. Thank you so much Sandeep for you know like a really great talk. While it was supposed to be an intro I think we were borderline deep type and I am excited to hear from you what the actual deep type content actually will be there. And I think most of us here would actually appreciate the deep type talk we are looking forward to it. That's it. I think we'll take a short break. But before we do that, I just want to ask from the audience that you know, does anyone want to take a very short flash talk regarding the stuff they are doing at their workplace. I see Manoj is here, Satya is here, Piyush is here. So we have a lot of great people, great people who are in leadership positions in their respective organizations in here. So if someone would like to talk about you know observability in their respective organizations, or if someone is new and just started with observability and they just want to like highlight even something simple that they have been doing. You know that will really be great. If not after the break session, you know we will into this meetup because we do not have a second speaker around. So we have like 10 minutes of break. If you feel like if anyone does feel like doing a flash talk please let me know in the chat or DME on the Telegram group if you are over there and will make a space for you to do your talk as quickly. There should not be any presentation, there need not be any PPT that's not the format of flash talks. You could open up like an online whiteboard and just draw stuff out in case you need to explain anything diagrammatically. So, yeah, I think with that we'll take a short break and come back in five minutes to see if anyone is doing a talk. If not, we will skip over to the banter session which we always do. We always like to catch up after the main session is over and we just talk about all sort of tech that we are doing not limited to just observability. And that will be today's meetup folks. Thank you so much for joining us. So we'll take a short break and come back. Hi, Navarune Satya, thanks for joining us today. Long time. What's been up, we can see a normal catch up session and gets and for the people who are joining us for the first time short introduction would be super nice. And Manoj, hi, how are you doing thanks for joining us. Hi Joy. Thanks for the welcome. I know I missed the first 30 minutes of the meeting but good to get on to this forum. No worries, no worries, both the PPTs and you know the videos would be properly edited and uploaded by the Haske team and you'll be able to catch up. I was actually looking forward to hearing from you regarding the talk regarding, you know, observability and UX. So I didn't say I kept quiet because I didn't have anything to do because a lot of brainstorming going at this point of time. If you can see my question to Sunvive, in fact, I had a demo with Locky Fox from Grafana a couple of weeks ago. I think Roland also on this call I think we worked together. So we had a demo with Locky Fox I think a couple of weeks ago. And I think, I think they are fun enough to say that like, you know, hey, high card and identity is a challenge. Right. And yeah, I think I kind of forget to kick off this UX discussion. I thought, okay, we probably come here and I'm going to hear the perspective of people here when it comes to what is UX to them. I know each of you support your respective organization in varying, you know, capacity. And I think UX is a challenge. Maybe I'll turn on my video so that you can have a face to. Yeah. So, you know, what, what is the UX to you. Right. So I just asked this question, like, hey, in your implementation, you know, what is how do you use what how do you use looking for is it for debugging purpose which is for detecting I don't use looking at my current focus I have been a person for a very long time but when it comes to UX of all the observable tools taken if we take it as a blanket sort of group while that is definitely not true. One of the comparisons I can try and make is that let's say any of the provider implementations like, you know, let's stop data. New Relic versus something like Rafaana. What what what matters to me is the in the UX of these things is not just how beautiful the dashboard sort of look after I have done configuring things, but what is the workflow of getting it run like getting bootstrap and what is the process of getting a bare minimum dashboard up and running how many hoops I have to go through right how many steps do I have to configure how much of it is sort of auto detecting and how much of it is intuitive for me. One of the good tests I think is that if you if you give us a new person who is probably new to DevOps or in the observable ecosystem as a whole, how quickly does that person adopt to this technology like if you ask them to configure a new Relic dashboard versus a Rafaana dashboard what is the entry barrier then how easy do they find it to get things up and running back is I think a test of good UX when it comes to observable tools, because if you have a tool where you're making dashboards and you have to go often enough go back looking for documentation, look at screenshots in the documentation then come back and then figure out that hey now this is the right way to do things that I think is hinders anyone's workflow right the, the UI itself should have enough tools like tooltips, links to docs or you know, helpful ways of in fully arranged workflows, which lets a person sort of figure things out as they go with iterations, rather than frequently when you have to documentation reading up and then figuring out. So to me that's like sort of probably a benchmark of how good an UX is for any particular tool in this ecosystem at least. I think that is probably covers like a one part of it right, basically how do you on board. Right how do you on board and what is a frictionless onboarding and how do you get a basic out of the box of observability, without having to worry about a lot right so that also brings to the point that hey how do we create an opinionated observability for the organization, especially the kind of organized stack we, you know, use to build services and application. Yeah, I think definitely I concur with you I think that's one of the mindset shift we're trying to, you know, bring even flip card, just to give an example. There are so many metrics and typically I'm sure this is something everyone faces in the organization, I don't know what percentage of metrics collected and ingested into your system is being used. Right, I think I really like you know, ask each of you to kind of do some kind of small switching and say is it 8080 20 or it is like you know, maybe even more skewed right. So that opinionated observability is something that you know is the theme which I'm trying to push these days again in a flip card that you know the vitals are few and you know that there are many trivial things which we go chase and that kind of are going to kill your beat your metric system beat logging system to kind of going to solve all sort of thing and so the user experience often can be like in a very purpose driven than trying to solve everything in the world. Right, so we have some, we have a metric system internally which kind of solves for very simple straightforward, you know, high level metrics and cardinality challenges to ultra cardinality hyper cardinality problems in a single system. And if you know, and that kind of use that kind of kills all kind of user experience. Right, so that basically talks about again, what is the time of using your observability tool, what is experience for various, you know, users and onboarding you talked about but then as you come to the, come to the application or the tool you're using, and there are different use cases for different set of uses some come there to just to detect and we should come even look at your monitoring system or metric system asset troubleshooting or debugging mechanism for that they for that that kind of adding a lot of, a lot of, a lot of tags or dimensions which basically leads you to that eventual in a hyper cardinality state and then all kind of all right so that's why I asked this question to Sunday Sunday, what is the, what is the use case or what is the primary objective of having your lucky in your organization? Is it for detecting issues? Is it for debugging some because viewing logs I agree, viewing logs is for a purpose. Right, it typically is for detecting something or debugging something or you're trying to figure out a particular transaction which someone said hey this transaction you're from Paytm something. No I'm from Grafana. Okay good. So there should be some, hey this particular transaction I want to search. I'll give you a typical example of flip card, right. So we have a, you know, you can say if someone comes and build a card in flip card and then they check out and then the packet comes to you maybe like after two or three weeks depending upon where you are or where the system, where the item is coming and then maybe like you use it for a day and then return it and it comes all the way back to your warehouse, right. Then kind of again as they say a transaction is running for like, you know, maybe weeks altogether, right, or maybe multiple weeks before it can accommodate. For us a lot of our problems are like you know debugging so when you look at local logs or any kind of look at of course throughput and all that comes into one side but tracing through what happened to you know individual transactions, which is running across multiple days you know things becomes one of the keeping and that is one of the experiences they're looking for. Hey can we trace everything about that particular transaction which probably happened across multiple weeks. So if you log some ID with it like order ID or something. Then you can use the filter to search for the logs using that order ID using the logs filter. I mean you need a unit if you consider this manager and workflow then you need a workflow ID across the for every thing in the life cycle of a card that actually happens like what is the order ID the payment ID, the shipment ID, whatever return ID like 15 different intermediary workflows right, but you need to probably like group them together like inside an application we do this with request ID right for tracing purposes. Exactly. Similarly you know I think we need to wrap the whole workflow inside a single UID and then somehow it's not everywhere right. So we have to yeah if you do sampling then we can look you know a typical tracing it doesn't work there and right if you an order ID is a you know for a company like Flipkart I think it's a high cardinality value right so. I think yeah, sorry. So I think my demo didn't make it much clear like I can share the screen again and show you some. I think last time we had a demo of this like with the exemplars where you could literally take like one ID from tempo trace and to see the correlated logs on Loki with that trace ID right. I can, is it okay to share my screen again. Hey Roland you could okay at this point I think everyone can unmute themselves I will consider that we have crossed over into a semi banter semi flash talk session. So people can unmute and you can just ask the question you can turn on the video if you feel like so feel free to just be yourself right now. We are live but since we're talking about a concrete topic we are all done. So, so my logs doesn't contain like the logs that I simulated doesn't contain a label called user ID in it these these this is just content of the log the user ID is in the content of the log not attached to the stream as a label. So, I can build a label out of it at query time because you don't want to set it as a label in the streams, like in Prometheus you won't use the user label, otherwise it would explore the cardinality. So, these are the logs are in log FMT format so I'll tell Loki that this is a log FMT format. So, whatever is there in in this would be converted into a label at query time. And then I would, I can use those labels at query time for further processing like if I want to do rate latency for each user, like I want to see what latency is for per user. Can you do us like an entity ID kind of a query which kind of scenario we just talked about where an entity ID you can get a for you know, pick up on the query time and can get a for you know, stitch everything together I don't know their demo application kind of house that level of That could be a bit hard because yeah, so you can do that ID like the user is another text for it, look for Loki the transaction ID or whatever would be another text for Loki so it would just convert it into a label if it's in like in log FMT there would be another bit here called transaction ID equal to something so you can use transaction ID here as a label, it would convert that into a label at query time. So, what does it mean to convert this into query time and what is the cost for a multi-terabyte or multi-terabyte. Yeah, I was just going to the like you have like 15 million transactions at that sort of a steal. What's the cost of retrieving all those chunks and then sort of filtering out the transaction IDs and then indexing them good enough way so that you are you're able to search and plot it right. So, either you pay the cost for managing that big index or either you throw that much power to queries and then they should be able to process it so it's not like you would use all the transaction IDs and query it like all the transaction IDs that your application is logging or all the user IDs that your application is logging you would look at specific logs in specific window and then just throw enough processing power and it should be able to do it. Yeah, it's a freelance I think absolutely. But it's still a challenging problem because then you need to basically be aware of your ingestion rate and your transaction rate in order to right size your cluster because at any point of time any arbitrary query which goes beyond the capability of the cluster to handle chunk sizes. Let's say you have enough compute to do 30 day window of transaction ID plots but the moment if someone comes in and does like 45 day window my cluster is open for a toss right because the compute is not there, enough compute is not there anymore so that could be a literal problem because it feels like the size is very arbitrary, I'm pretty sure for one reason, so you have data seasonality right sometimes. I think this is worth for a short window of time for example I have like a one hour of you know maybe order would have been a wrong example I took because I can't run into like a multiple days and then you cannot forget you know searching through the entire you know Hashtag to get the needle right I don't think it is a bad idea but if the same thing is for like you know I'm looking for something happening like in a live exam like a sale day of the card like you know last three, two hours maybe still reasonable you know the thing to kind of a stretch for this yeah. That's why I'm talking about you know don't try to make your system to solve all use cases right so you need to get the thing how can we create that tiers of you know focus the problem solving into this part so yeah. I think like the long term storage and search problems and fast search of logs in a definite amount of time in a constrained amount of time I think they're very separate use cases and the architecture needed to build that is very different like if I want to search like millions of transaction ideas I would rather put that in like a long term storage to you know sort of a big data sort of ecosystem big table or something and I'll use stuff like AWS or something to query that at a later point of time not stream my life log systems Exactly so in that case you don't need to get the latency sort of big issue right I can get you this data after say two minutes or you know that's fine with me so that's what so for user experience point of view when you ingest the log right I think really you know one of the focuses other weekend for you know what we want is how do we segregate right so so that you tier this in a log and P you know how many different type of storage you need for that weekend probably like you know we're mostly going to define that by the retention period or something like that so that you know the latency can be coupled with different use case something is really needed can immediately partition I think log is a partition optimized the solution right basically partition storage by a time window and then it works very well there but talking about a clustering of data where you have to do a huge indexation I think I'll pause both of you for a moment and go to Roland once Roland you had something to say on the chat I you can just unmute yourself and just let's have the conversation nothing more not much more on top of that as Manoj said I work with him I've been I've been part of the card doing building and operating this for about three odd years now and our lessons have been what Manoj articulated right it's primarily around scale and I would say interesting mix of use cases that that spans across a real time debug to historical debug to archival search and as Manoj was saying it's it is probably not a good idea to solve all of them. So I think the same concept and I'll go to the rest of the folks who are here we have Piyush we have Satya how do we sort of do you maintain separate stacks for RCA which is live debugging and have separate stuff for cold storage searches where you are searching for let's say a couple of months periods of data to go back in time and search for stuff how do you handle that in your particular organizations See in our case we run it for in time debugging so we have the ELK kind of a stack running where logs have up to three days are indexed on SFG back storage and running elastic search on top right and anything older than that the logs are anyways getting piped into an S3 bucket Over there what we do is on the S3 bucket the agent which pushes the logs in the S3 bucket if we have defined our own partitioning scheme which allows us to define the sub path on the S3 bucket and then the support team which needs to access these logs which are older than three days four days to answer some customer queries they use S3 selector on top of that So S3 selector has been working for us and I mean like even for a like about 250-300 GB of log search it takes hardly a couple of minutes to respond like I think that's it so basically I'm going to point to your point basically the joining of the logs and all something that keeps happening in the background that's not something we need is the assignment So in real time like in elastic search ELK stack or in this case low key can probably help but if you want to join and you want to analyze data over a longer period of time you will have to anyway store it outside so yeah So one of the things about working for a big company is that you have nice platform teams so we have a team which runs our logging system which is basically Splunk so I don't have to care about you know So Splunk does some intelligent swapping around between hot and cold archives that's configurable I think the default is like 7 days or 14 days it was revised recently I don't know and to me honestly I don't care because all I do is you know I go to Splunk I put in my query and then I start diving the Splunk handles it intelligently behind the scenes it will shuttle the data between hot and cold storage and it presents me the data and my use cases you know either ways for real time debugging There's a question about the UX for me what I want from my observability team is rather from my observability experiences you know when I'm working up at 3am I don't want to fiddle and change the settings and try to What I want is I want to you know have started a base query and start a good deep dive weekend and that's where my biggest problem has been with Grafana Primarily because of lack of experience with Grafana I'm getting better like you know I open Grafana I want to look at the metrics and I go blank like I can't get the promque a lot because it's 3am in the morning because things have changed now like I have ready made links we have run books with all those things these sort of help but you know like We should work on promque well at 3am Yeah that's not gonna work I'm sorry promque well at 3am let me honestly tell you my promque well experience 3am is like you know try to come up with something and then utterly give up and then my and then what I do is you know I'm like you know screw this shit I just type node that's it sorry node 1 so that basically gives me the load of every node and then from there I know what labels are there and then I start using that to deep dive but that's you know that's like a long winded approach This is a fair point on the UX side of things how easy it is to query in a language if you are either sleepy or drunk I completely agree once you start using Splunk ready to get started this long you're like what the heck is this you know SQL they have their own query language I have a question to you sir where do you work? I work at Adobe Adobe okay nice so I don't know how much a budget I don't want to ask a question do you want to Splunk? Because I work in a company where you know Splunk can literally literally you know eat up your idea okay we are in AWS so budget in dot time and developers just go on control again what the ingest right so I know Splunk is such a thing which spoils people and that gives you like a one place you kind of do everything that's experienced Splunk gives right I don't know who I don't know who is that go ahead with some Goyle so do you talk about again you have an elastic you look at stack 4 or your 3 days retention data and other way so do you ever come into this at a problem of you know people comparing about I need to one place you know one stop show kind of a thing where I want to go at one place and does it matter if you get it from you know no it has it has happened but in our case what happens is you know like when you're accessing a log right people already know that they need to query any recent event or they're looking for an archival is essentially let's we are talking only about 2.4 to you guys right so just by looking at the query or the support ticket or maybe even the incident that comes from the noctime right you know whether I need to look at my recent last one hour logs which are available on the ELK dashboard right and if it's a query coming to a customer support team and the queries are like 5 days older you want to just know that this is why we have given on top of S3 selection it's fairly convenient in that sense I think 3 days is a pretty decent window you're providing exactly again I think AWS has this elastic file system EFS right to that allow this to keep expanding and contracting the storage you need so we don't need to forecast that you know that my 3 day logs so they will be about 30TB or 40TB whatever it is EFS allow you to expand and contract and run 9% I think 7 days is also like 7 working days is a good enough window to give for me to you know moderately large startups definitely not an enterprise scale 7 day would be very larger enterprise type even we arrived at a 3 day window after the 10th trial of almost 7 to 8 months so then we realized this gives a sweet spot of you know having a low MPTR in terms of responding to the tickets as well as ensuring our storage and indexing costs are not going very well yeah resistance you faced during that time when you came to this 3 days no obviously I mean the S3 select takes about a couple of minutes to scan like 100 to 100 weeks of data right so initially whenever you try to have this kind of new system coming in there is a lot of inertia the developer used to complain and all that then the way I did it is that you know we started exposing the cost of the loving clusters to the respective teams I told them either you reduce it or you tell me what you want to do right and after a few days everybody goes back at it okay we can manage this anyway we have to just query 3-4 times a day so like 5-10 minutes extra we are okay to wait right so yeah as in all about visibility you go and tell developers how much is cost what is the effort to maintain all of this infrastructure and then they start cooperating essentially yeah but because typically one of the most important view I see developers looking for is that hey I see a spike now but can I zoom back or go back to like a last week this time or like last month it is a seasonality it has a spike right so that is where this 3 day kind of a cutoff really suits you the way you guys are talking I am really scared when I move out of Adobe because you guys are talking about retrieval in minutes for me it takes less than a second to send anything and we do over I am pretty sure I don't know the exact volume but it is over 100 terabytes a day of logs so I am like now I am really wondering okay I have been spoiled by Splunk so much yes you are you guys are literally talking about 3 minutes 5 minutes I am like really it takes me less than a second to get all of that that is like 3 minutes 5 minutes mostly on the cold side of things can Splunk do cold search also within like couple of seconds yeah I don't like I said you know I can find it won't be so it won't be like the hot one will be less than a second the cold ones will probably be like less than 30 seconds I guess I did one of your platform team members next time so that they can talk about this like how they will all be super super interested to get out yeah I have done Splunk very briefly at Razer 3 and then we properly moved out to you know from it yes I think it is at the used to come by at that point of time so yeah I mean it has not been super difficult like a couple of seconds of window 2 second 3 second latency for even a real time search is okay but the week on week analysis is something that I saw on Prometheus not on logging side of course but metric side I have seen we saw that with Thanos where Grafana would be a single interface for both like Prometheus and Thanos separately and then we can basically overly both the graphs together at least for the metric analysis part and at hot start like one of the critical things was we did historical analysis of our metrics and scale based on that so life Prometheus was one of the metrics based on which our entire Kubernetes cluster sort of scaled up and down and we also did the scale testing by applying the historical data replaying the historical sort of APIs in the sort of scale test environment and that sort of helped but yeah with logs it's sort of difficult right now I'm on Datadog and I'm really thankful that I can very easily do week on week analysis at least going back 3 weeks I'm getting a lot of data but this is good enough benchmark that's the other thing once you get started with Datadog right it's so before we moved to Prometheus and Grafana we were on Datadog I used Datadog before it'll be I was using its time tag I moved to it'll be I saw that they were continuing they were using Datadog I was very happy Datadog made our life so much easier like especially with the tagging system that they had now just tag your in-front, Datadog will automatically pick and apply it as labels created a dashboard literally in a second sort of thing and then we jumped from there to Prometheus and at that point we didn't have any dashboards on and that's where I said the whole prompt you feel like I think it's just a different frequency like a managed service costs higher the engineering cost is low on the other side like for open source platform like Grafana Prometheus your engineering cost is much higher but your platform costs since it's maintained it's super super low so like again I think this also is UX but this is more towards like what is more important to the organization right do you want to build an internal team maintain like a platform team which maintains like low latency Prometheus Grafana service that scales incidentally operates incidentally or do you want to assign that cost to an external platform and then just like you know go ahead and get everything done reduce engineering costs so yeah but when you have when you're in augmentation the size of it will be where you have literally thousands of teams it's dead by 1000 paper cuts right you must be the engineering time 14 might be miniscule but you know it adds up of course there's definitely no doubt I'm pretty sure that's the cost was the reason why we moved on it at all I don't have any insight into it you know and I'm not really saying that I regret moving to Prometheus and Grafana like as I've started using it more and more like I've completely built up the dashboards now you know we used to do and probably talk about this next time around in the next week but we used to do lock based querying and looking at you know how our deploys are going things like that and now instead of you know having to run Splunk search every 10 seconds every one minute I just have Grafana dash put on on one of my monitor without the refresh and since we do you know rolling rather we do AB deploys it's so nice to see the drafts being crossing criss-crossing it'll send a ladder that's being charged it's like you know the way one graph goes down and the other graph comes up we have it so it's just so nice to see I'm getting used to it but yeah but those initial first three to six months when we're struggling to cope up with Promql because moving away from Datedog to learning Promql was not the only thing as with any or there'd be lots of other competing priorities so you know like do I spend my time improving my knowledge in helping you know so that we can resolve our incidents faster where do I find the time to improve our existing stuff our tech data our feature requests that are coming so that was a tough thing to manage and we had a very very low time I think basically it was told us in six months we're out of Datedog and that's final and you know it's like either walk the plank or jump out that's what it's like yeah I mean these are these are very tough business decisions there are a lot of variables for roaming around like engineering costs or costs time to deliver time to get it bootstrapped all of that so yeah as Piyush says right it's all about priorities cool I think I think that was a good conversation so we have Nabaron and Sudev with us would you folks like to contribute anything here or shall we move on are you able to hear me yeah hey hey Sudev we can hear you hi how are you doing okay yeah good thanks for asking it's been a good interesting conversation right okay I take care of observability at Freshworks I'm just a lurker trying to learn what's happening around and all that and this has been my third meetup learning quite a few things right it's okay so yeah we have a internal system like we are based on elastic search and we have we do pretty heavy traffic right somewhere around 40 TB per day on the logs that's how it is even you're using elastic search and we only use elastic search there's nothing on us so like what's the purpose is asserted by elastic search itself what's your retention right so we have retention of 30 days so the data on disk is in a couple of petabytes right and we have our own hardware marketing from elastic search that's how we're able to give up to the SLAs requirements of multiple products within Freshworks right so you're like on proprietary elastic search platform basically in that case yeah so you have hosted right yeah so I think that's like coming up as a thing right like if you want hyperscale load latency scenarios internal platform being sort of break at a certain threshold of maintenance levels right where the cost of maintenance is higher than you know like the profit that you're getting out of it and at that point you have to have to go for an enterprise level solution like this so yeah yeah with the traffic that we are handling we are maintaining everything internally okay when can we hear a talk from you so that maybe next time around would be yeah sure I've been planning for it sure you have a month going ahead so yeah let's plan for that together like join the telegram group and that's what everyone watching also so we have a telegram group the link is in the descriptions I guess and if not we can ensure that the description is there or the YouTube link has the link to the elastic meetup and there the telegram link is already there so it's two hops but of course like if you're interested you can manage that so yeah if folks who are watching there if you're interested please join the telegram group and we can continue this conversation over chat and thanks everyone for joining us on a Saturday morning thanks Sandeep for making the time and all of you have had a hard weekend glad that you were turned off on Saturday and thanks Manoj, Satya, Piyush all of you have been regulars here and thanks Navaran for dropping by we just said hi yesterday and thanks for dropping by here glad that we could have you so yeah with that let's wrap up