 Hi, hi everyone. My name is Muzamel. I work for LinkedIn. As an SRE, as you might know, LinkedIn is the largest professional networking site. We have over 365 million members worldwide and we recently crossed 30 million members in India. That's the second largest user base. So we Indians love LinkedIn. As an SRE, apart from our day-to-day role, firefighting is one of the major tasks and we spend a considerable amount of time while troubleshooting. So when it comes to actual troubleshooting the site or say you have a building on fire, the last thing that you want to do is search for equipments to turn out the fire than focusing on turning off the fire. So the agenda that we are going to cover today is those tools. How did LinkedIn come up with those tools and what's the philosophy of using these tools? So when it comes to troubleshooting site issues, I'm going to show most of you are from the DevOps and operational background. What's the best place to start looking at when you start troubleshooting is obviously logs. Logs can be of many format. There could be service logs. There could be engine public access log. There could be if you're running a Java application, you have GC logs as well. Now let's talk about the main topic is why we need log aggregation. Spending time identifying where your logs are stored, on which machine my application is running on. How is it distributed? Is it one application or two applications? Think about it in a way that if you have a small infrastructure, you could fit. Is this better? When we look at log aggregation, why do we need really log aggregation? If you have a small infrastructure, you possibly know where your applications are running and where those logs are stored. But as you grow, we want to provide the best functionality to the user. We want to provide the best interface. But what's the use of that site or the best application if it's not reliable or not stable enough? You need to have insight into your application and how it's behaving. You need to be able to figure out how it's doing. Having a centralized location of these logs really makes sense when you are scaling. You need to have a proper mechanism of accessing the site. In the previous presentation, what Mike was talking about solved, having a centralized place makes sense when you have a distributed system. Before we start, let's look at the objectives of what are the things that we were looking for before we decided for the model that we went. We have a very complex call graph. The one picture representing is a very simplified way. If LinkedIn's architecture was that simple, I would probably be the happiest person here, but it's not. You could understand that we have a very complex call graph. To give you an energy, think of a tree inverted and that's how a call graph and LinkedIn looks like. I'm pretty sure for all major bigger companies, that is what it is. The second thing was the major part was we should be able to reduce the MTTR because that's some main criteria of looking at how good as an SRE you are doing at LinkedIn. The third thing is we needed to have control over the production. You need to be absolutely sure who you give access to production so that there are no adverse effects. In an example where you want to involve say the engineering team to troubleshoot stuff, they probably don't have access to that production and getting them access might increase your MTTR. Having a centralized location of these logs where they have only access to these logs, that makes real sense for us. We needed to support multiple use cases. What do I mean by multiple use cases? Say like we have a mobile team, they are mostly interested in what kind of devices are accessing these are applications. Say like how many Android versions, what version of our LinkedIn application is running on that? How many iOS devices are there? How do we perform versus iOS versus Android? This information is in log but making sense of it is really important when you have it in a centralized location. The other similar use case would be like the security team might be interested in what kind of IPs are accessing our infrastructure and the third case where as SREs we need access to all the information which is in the log to see how your application is behaving. We looked at a lot of solutions that are there in the market both open source and commercial versions but we came up with this design where we use Elk. I'm sure by sure find how many people know what Kafka is and what Elk is. Okay, there are quite a few people. So Kafka is the main transport mechanism that we use for collecting these logs, Elasticsearch, Logstash and Kibana. We'll be talking in brief about all of these topics. So let's start with what Kafka is. Kafka is an open source message-broken system that is now funded or is under Apache and this was basically written in Scala. The aim of this was to provide a unified high throughput, low latency, performance handling, real-time feed. So you have a whole set of producers that talk to the zookeeper that gives us the state and the topic that they need to talk to and it gives you a list of brokers that will be listening to these events. So once the handshake happens, the producers start pumping this information to one of the brokers. So this information is all stored in zookeeper. That's a centralized location for us to maintain all this metadata and then you have a set of consumers that listen that initially go to the zookeeper and ask, hey, I am interested in this topic. Can you give me a list of all the brokers that are having this data? And then the event is stored. Zookeeper maintains the set of the offset. What do I mean by offset is whenever a new event is generated, it maintains or increments the offset and it has the ability to check at any given point in time where your consumer is so that you know how much data is actually consumed. So Kafka also supports consumer groups. It's not really a queuing system, but since we needed to have the ability from, if you remember, from a previous slide, we needed to have the ability to support multiple use cases. So with Kafka, you could have three sets of consumers listening to the same topic and each of them, the state is maintained in zookeeper. Say like this could be my mobile team that's listening to all the devices that are being accessed. The second thing could be security which are only interested in the IP addresses that are coming into the access log and the third team could be SRE which are interested in all the information. So what happens here is we have a set of partitions. By default, this is a configurable parameter, but in LinkedIn, we use per topic eight partitions that there are eight brokers and the data is actually retained in the brokers for a minimum of four hours. But based on use case, this could be increased or decreased, but those are the default parameters. Just to give you an idea of at what scale we use Kafka at LinkedIn. This is the most common transport that you could say it's a backbone for us. We heavily rely on Kafka to keeping all the applications talking to each other. It's not just that we send logs through Kafka. Our monitoring infrastructure is also heavily dependent on Kafka to emit this matrix into our monitoring framework. Just to give you an idea on a daily basis, we generate over 875 billion messages per day. This is a very huge number. From my experience, I have never seen any other application, other transport mechanism that could support this kind of throughput. On a normal day, we come up with 200 terabytes of incoming data and we pump out 700 terabytes a day. The peak load statistics looks like on a peak load, we have around 10.7 million messages per second, 18 gigabytes of inbound and outbound, 70 gigs of outbound. You might be interested in knowing why 200 terabytes goes into the infrastructure, but 700 terabytes comes out. Remember that we had multiple consumer groups that could be listening to the same topic. If that information is large, the size of the outbound traffic goes really high. Why did we decide on using Kafka? It's a near real-time transport mechanism. What do I really mean by near real-time transport mechanism? This is not to be accurate a real-time mechanism. There could be a lag in terms of when the message is actually sent out to the Kafka brokers and at the time when it is consumed. So in general, at LinkedIn, we consider it's less than a minute. So anything under a minute is in our SLA. It supports multiple consumer groups and there is no overhead on the client and they want to actually send these messages to Kafka that comes with a typical locks decision. If you have actually used Elk, you know that if you want to ship logs, you need to install agent onto your box and then that agent itself is a Java application. It needs its own memory. Then you need to have some sort of monitoring just to check out if your agent is running fine. So that overhead is neglected or is not there. Kafka scales horizontally very well. Whenever the size of the topic or the message goes high, we could just add more boxes to our infrastructure and we are good to go. As I previously mentioned, it's not just logs, but metrics are also used to be sent. So it could be any form of data that could be transmitted to Kafka and it supports REST. With this, we have an option of writing different clients that could talk in different languages. LinkedIn has clients in Java. We have Python in Ruby and we are even writing clients in Go. Since this Kafka was an in-house product, we have a dedicated team that supports this infrastructure. How does it really work? So we have a highly customized log4j library that comes along with each of our services. When this library comes online is at the start time of an application, it starts a Kafka proxy and starts piping all your messages. This is configured, this log4j library is configurable in a sense that you could decide what kind of a messages go into. By default, we send out info, warning and error messages. Debugs are not sent just because of the high volume, but this is configurable. This is something that can be enabled and disabled using a JMX call. We actually append a lot of information along with just the log line. To give you an example, we have what we call as a trace ID associated with each call that happens at LinkedIn. Whenever an application or analog is written, this trace ID is appended along with the log line. We add the server name, the container version, the war version that we are running. This helps us identifying or isolating problems in a particular box or a particular version. You have a new deployment that has happened and you want to check the status between the older and new version. You could easily do that because this information is already appended into the log line. As a developer, the only thing that you need to do is include this library while writing your application and you are good to go. They don't have an overhead in maintaining or sending this out logs to Kafka. When the service comes up, it could decide what kind of logs it sends. We basically send the service logs, the access logs, as well as the garbage collection logs into Kafka cluster. Let me give you a brief idea of what Elk is for people who don't know. Elk is a combination of Elasticsearch, Logstash and Kibana. Elk is a flexible and a very powerful open source distributed real-time search engine. It is written or backed up by Lucene. And it gives you a lot of flexibility since it supports horizontal scaling. Say today, your log size is 200 or 200 terabytes and you predict that it's going to go up to 300 terabytes. You could just add more servers into your Elasticsearch environment and you get most of the indexes and everything gets distributed accordingly. And there is no manual intervention required when you're scaling horizontally. Logstash is a transport mechanism that is supported or is needed by Elk. It's basically, it's an A, you could think of it as an ETL where we attach the Kafka console consumer, that's a plugin that we have written to in Logstash, its whole job is just to listen to specific topic or to those brokers and get the data and write it into or hand it over to Elasticsearch. And Kibana is the GUI or the interface through which we look at this data. As human beings, we are evolved or we are designed. In a sense, we make, we are good at identifying patterns. I could show you 100 different log lines. You could not make sense of it, but a picture represents a thousand words. So this really helped us reduce our MTTR because now all this information is available in Kibana. And since these dashboards are dynamic, the data gets updated near real time. Just to give you a scale of our Elk infrastructure at LinkedIn, we have over 100 plus clusters. The way we have designed Elk is that each SRE vertical team has their own dedicated Elk cluster. They manage it. They own it end to end. They're responsible for making sure what kind of data comes in, how long they want to store that information. And when they want to actually truncate this. This is split across multiple data centers. This plays a real crucial role. When it comes to multiple data centers, what we use another functionality within Elk is tribal nodes. So what we do is we have a whole bunch of Elk clusters sitting per team and then a tribal node sits on top of that. You could think of it as a federator, which makes calls to specific Kibana instances to get and aggregate the data. Largest cluster is somewhere around 32 billion documents and eight size-ups up to 30 terabytes. On a daily basis, that particular Elk cluster gets about 3 billion documentations. So in a sense, you're good to go because this number is somewhere realistic to a large company. And if anybody is interested in setting up this kind of infrastructure, they can expect that the cluster would easily handle this kind of peak load. So what are the use cases for or where do we generally use Elk? The majority of the time it's used in tracking exceptions, Java exceptions. As you know, reading an exception log is pretty difficult when it comes to Java because the strike trace could span up up to hundreds of lines. We use Elk to maintain our dashboards. Whenever there is a new exception and comes in, we will be able to easily track it because we maintain a whole state. We use it to trace a call. This really is important for us. Say like you want to identify or look at a specific call that comes into LinkedIn and you want to trace down the entire call span from the front-end meteor to the database. So whenever a call is made within LinkedIn, we attach a unique ID with that call. This ID is then passed on to each of the corresponding application. So we could easily using this call trace, we can figure out if that call fails at a particular instance, you will be able to easily track it down. We use it to parse engine public access logs. Say we have an increase in 4XX or 5XX or a number of exceptions that are coming into or so we could easily use that to parse these logs. We even use Elk for deployment impacts. So what do I really mean by impact of the deployment? Say you have included or deployed a new wall. You probably want to canary it first before it goes into production no matter how good your testing or a QA infrastructure is. Things really don't pan out the way you have planned it. So what you could do is easily set up a canary and run comparisons against the canary and a baseline host so that you figure out where if there is any positive or a negative impact to your applications after a particular version goes out. And we even use this for looking at changing trends. What do I really mean by changing trends? Whenever you probably are including a new feature into your application, you want to see how the response is. Is that particular change into your application code or the call is the one that's increasing your latency or not? Or is it the one that's causing high impact? That is where we generally use this. There are multiple use cases but I just wanted to highlight some of the most common ones that we have so to look at this. So in a conclusion we need log aggregation to be able to be able to look at them on a look at our logs on a historical basis and make sense of it. This kind of infrastructure even helps us to reduce the retention of logs into your server. You probably don't want to retain a month's worth of log onto your application so that it's up your disk. You probably want to keep it low and we even are considering whether making and writing logs to the box is worth if we could actually incorporate L to all our infrastructure. So that's something that we are looking at right now but not finalized but there might be a chance that we don't even write these logs to the box so you're awarding your IOs and you can use that on the application layer. I believe I finished much faster. So that's about that I had if you have any questions. Why use Kafka over something as simple as like R Sync or like what is the advantage you got by using Kafka? So by using Kafka you do not have to run any other service. You gave an example of R Sync. I gave you an example of a Lumberjack application that comes with LogStash. So that's something that you need to configure on top of your application to do this but with Kafka you just incorporate one of our libraries and you're good to go. You do not have to set up anything on top of your application. But Kafka would also involve setting up some kind of the Kafka cluster would also involve setting up an infrastructure. So that's a separate infrastructure which your application doesn't really have to care about. So where you're running your application you do not have to set up anything else. Your box is only free or is available to run your application. Nothing else needs to be running on that. And Kafka has its own infrastructure to which you send. Check. Hi, Nenad here. I have two questions. One is a technical question. One is a slightly product or legal issue that comes in. The stack itself is pretty bad at handling access control. So how are you doing it across say teams because you mentioned every SRE team has their own ELK cluster. How do you ensure that for example customer support may be able to access business logs as they're called while a developer should be able to look at all the logs in the system. So this is one question. The second question is what are the different data life cycle things like in all the different teams because some teams can say I need data only for three months. Some teams will say I need data across two years. What are the different patterns you've seen and the main gotchas are the pros of the popular ones in saling them. Thanks a lot. Yeah. So when it like we really want to control the access to we use within Apache we use nginx to run as a proxy and nginx supports LDAP authentication. So we use that to control who has access to these instances and where we run EL is within a controlled environment and there are ACLs in place. Yeah, firewalls. We have rules that are in place to make sure that only a specific set of users like SREs or developer have access to it. Going back to the second question is about retention of logs. How this is so again it's unused to use cases probably like I know from the mobile team they keep data over a year to see the change in patterns every time a new version goes out for your application you would want to see how it behaving and you want to go progressively better with new versions of it. Some teams like there's another team that I have known of security. They also maintain data for over a year. That's another use case. There could be some teams who really are not interested in retaining data for one year. They just keep it for 15 days. So it lies from use case to case basis. So yeah, so what happens is like if we go with the mobile example you need to have a very large infrastructure to support this. One of the biggest problem that we have faced with Elasticsearch is they recommend that your JBM size should not exceed more than 32 GB. So there is only a limit of data that it can hold in memory. Yeah, so yeah, so what happens is in that case you need to keep on adding more servers. But since Elasticsearch is a distributed environment, you can easily add more servers horizontally and you would not see any adverse effect. The only adverse effect we have seen is where we go for vertical scaling, where we add a large disk, even if it's an SSD, we have seen a very bad performance. So the recommended model is horizontal scaling rather than vertical. Hello, if you came across with a tool called PubSubHubbub, there is something called Publisher and subscriber comes up communicating with a hub. So is there something in Kafka that it talks to a hub and distributes it across? Yeah, Kafka is a PubSub model. So if you remember my first slide about Kafka, you see there are a set of publishers that publish the message to a broker within Kafka. And let me go back. Yeah, so this is Pub and sub part of it. And then you do not have to worry about which broker you are sending the data to. So you mean to say that the broker is hub here? Yeah. No, so if the broker doesn't have any resources that it needs, I mean, it doesn't need to send anything. So it will be in an ideal state or it will be like an awake model where when someone hits the event and then come back and give the event back. So if there is no data coming in, it's in a hot standby state. Whenever new information comes into Kafka, it's always there ready to listen to something. So what we do is we never have found a use case where an entire Kafka cluster has sitting idle for certain reasons because the amount of data, like you said, we have around 875 billion documents that messages that go on to that has never happened to us where it and cluster is sitting idle. So last question. You heard about this tool, PubSub above where you at least consolidated like whether it comes to our requirement when it compared with Kafka because it's been a Google tool. So I'm not sure about it. I've not got the idea. Okay. Thanks. Couple of questions. You mentioned that you have clients written in Python also, which is which are pushing data to Kafka. I checked around a month back. The Python producer was not well and do keep aware like the Scala and the Java producer. Second question is, is there a better way of replication in Kafka? The last I know of is the mirror maker, which is which itself is tedious to set up. Yeah. So going back to your first question is you are using the ODOT7 version of Kafka or ODOT8. ODOT8 should be a zookeeper aware. But you mentioned that you have clients written in Python as well, which are pushing data into it. The Python producer, which I saw on the Kafka side, that was one zookeeper aware. From what I understand it supports, but I will have to because honestly speaking, I'm not an expert on Kafka, but it should be aware of the zookeeper and it should maintain state for you. The second part of the replication one. So could you repeat the question? The question which I had was that, is there an easier way of replication other than the mirror maker one? No. That's the best way that we recommend. And this is part of the Apache documentation that is sent out. Even at LinkedIn, we use mirror makers for aggregation within different fabrics or different data centers. Thanks. Yeah, I have two questions for you. First one, you talked about log4j customization. Can you speak a bit louder? You talked about log4j customization for the log format. So is this something related to Kafka standardization of the log format? And what kind of customizations you're talking about in terms of log4j? So the second question is how does a broker know which topic to go for? Like which cluster node topic and like how a broker knows which cluster node to go for. Okay. To answer your first question is the customization on the log4j is basically supporting a standard of logging. Since you have an array of application, you would probably want them to be of a standard format. The second customization that we have done is ingestion into Kafka. When log4j library comes up, it establishes or runs a Kafka proxy that's responsible for appending these logs into Kafka broker. The second part is, so can you repeat the second question? So yeah, so ZooKeeper is responsible for maintaining the state. So if you go back to my previous slide, you would see that ZooKeeper has the information of each topic that is there and who are those brokers that are responsible for this draft topics. Hi. As far as I know, as I know that when you create an index on elastic search, the number of shards are fixed. You cannot change it once index is created. And looking at the volume and the peak situation, so there are two questions. How do you estimate the number of shards? And if you see that it's not able to handle, how do you change the situation and increase the number of shards? Okay. So there are two situations. One is where your index is already created and say you have X number of shards associated with that index. There is no way in elastic search you can increase those shards. The best thing would be to re-index it with the new shard configuration. The other thing is elastic supports dynamic templating. So what you could do is using dynamic templating, you could increase the next shard that is created from X to Y. And whenever that new index is created, you have that number of shards available with you. Just curious to know about your Kafka cluster size. That's one thing. And the second thing is that the log4j module which is modified the version used in LinkedIn, is that open sourced or not? I'm not sure of the exact cluster size that we have, but I believe we are having more than thousand nodes supporting Kafka, thousand servers that are just supporting the infrastructure for Kafka. And the log4j library is not open sourced as of now, but I'm not sure if there is any plan to do that as well. Hey, so here. I can't see you. So I have a couple of questions on Kafka and then the whole ELK model. One part is that you use customized log4j to send the logs over the wire, right? So have you come across situations where the Kafka cluster itself was flaky and the logs were not able to shape and then you have to re-iterate and then you have to again push it. So do you also fall back on a log files if there is a network problem or if there is a problem with the cluster? Or you just simply discard those logs? So when is that okay? Losing the data. So what model you adopt there? So we go by the latter part is we try the best effort to send out these log messages, but we do not want to adversely impact the application by retrying these stuff. So according to our philosophy, it's a long line. It's not actual data loss that you have. So we would probably discard that message and the subsequent message if the network goes healthy after that, we'll go through. Okay. Other question is more about a UID. Like you mentioned, okay, you generate a unique ID for every log line. Okay. Yeah. Every call that happens into LinkedIn and that's appended into the log. Yeah. So what is the logic to generate the UID across the components? Like it will be going through calls with the API and other layer of services. So how do you make sure, okay, the same UID is going through all the places? So what happens is, so when an application gets in a request at the front end, it has the logic of generating a unique ID. That unique ID is then passed on to the say business logic layer. So that front end probably needs to talk to 10 different business logic application. So when it makes that call, it passes this unique ID to them. Okay. And that is where we are able to build a complete call price. Okay. So do you guys use still 0.8 or 0.8.2 on the one that is latest from the Apache Kafka? So we are on 0.8 as of now. Okay. So still that's why you are using Juki perfect to maintain the offsets, right? Yes. Okay. And what log format is this a JSON format or it is a plain text? It's a plain text. Plain text. Okay. Actually we were using the ELK stat on our production systems and we found it very slow. Actually there was a lot of performance impact on the system. And so then we decided to actually use it offline for offline tracing. But if you want to trace in real time, what is the solution that you would suggest? So again, the reason we went with as a Kafka as a transport mechanism is to avoid any overhead into your application itself. And then yeah, there were some cases with the older version of Elasticsearch, which was relatively not that fast. But the latest version that 1.4, 1.4.2 I believe has fixed a lot of issues. So we have seen a considerable increase in the performance. Hello. Here. So yeah, so Kafka, the message delivery syntax semantics are at least once. So, but when you're actually pushing out, let's say business metrics, you want the exactly once delivery semantics. I sort of understand why it is at least once, but at LinkedIn have you found some ways to have a wrapper on top of it for use cases where you need exactly one semantics? For example, you don't want metrics to be calculated twice. In the business use case scenarios, because that would be disastrous. So as this is a publishing subscription model, you do not have, we generally never get duplicate events. And Kafka within itself has some logic to avoid duplicacy. But if there is any duplicacy, we assume that the application or the endpoint that is consuming it needs to be intelligent enough to understand that. So it's because we wanted to be high throughput and low latency. So we're trying to avoid or keep as less load on the Kafka infrastructure as we can. Hey, here, center. So you mentioned log forges, your primary log input. What about the other applications like say Apache engine x that directly write to file or other applications who write to say syslog? How do you capture those inputs? And the second question is if you're running a log stage agent on each of your servers, what, I mean, it will consume load if the application starts writing logs at a high rate. So how do you handle that? So I'll answer the second one first. Is by default, as I said, AV only log info warning or error messages. Debugs are not recommended. Debugs are only allowed if you want to troubleshoot and it's only per box, per application, you turn it on and troubleshoot it. So it's not recommended that you send it over to Kafka, but if you have to, for some reasons, it can be configured through a RMI call within that application. That should be good to go and you can send those messages, but it's not recommended within LinkedIn. And your first question was, can you please repeat that? Yeah, it was about inputs other than log forger, files and syslog. So we are working, as I said, we are working on a Python client and we are even looking at using the logging library within Python to be able to send directly to Kafka, but we are not completely there. But once we have that, we'll probably open source it. Have you had any issues where your Kafka service was actually backed up by a lot of messages from the client side? That's the first question. The second question is, the log shipper has changed before they used log stash for Java. So you talked about a condition where the log stash isn't going to be, it's an agent and you didn't want to go by that. There are a lot of projects that have come out something like Mozilla's HECA and the recent advancements by Mozilla on a language called RustLang, which is going more native, which might be more faster to actually ship a log. That's what we're trying to look at. Have you tried to look at such projects like Mozilla's HECA, packet beat, and compare with the trade-offs that happen by shipping it to Kafka? So two questions. To answer your second question, we are not looking at any of their options other than Kafka, because this is something that we want to improve on. This is a LinkedIn baby. So in that sense, we would probably keep using Kafka and keep on improving on that. We have not looked at any other options that are available in the market. The first part is yes, there were instances where the broker, so the lag was on the consumer side. Every time we have seen this, the broker is able to handle this because we have a special hardware or infrastructure setup for it. All our SSDs, they have over 128 GB of memory for them. So there is never a problem. The only problem that we have seen on the consumer side, they are too slow to consume these messages. So we have set up a four hours of retention for them so that they give time for the consumers to consume them. But if they do not, those messages are dropped. Hey, so two questions. First one, what is the size that the broker, Kafka supports, message size? And the second one, what if the broker goes down? I mean, how is it handled? So from what I know of, there is no limitations on the size of the message that goes into. And to answer your second question is, we keep a replica of each of these brokers. So we could easily switch from a bad broker or broker that has gone down to a standby broker and your messages will go through. And since all the offset is stored in a zookeeper, you are not going to lose any of your information. So is it a manual switch or? Right now there is an option of switching it automatically, but since you would want to really know why your broker went down because it's a bad thing to happen, we tend to use a manual switch. Thanks. It's the primary storage for us. Sorry, I didn't get you the second one. Yeah, we have a dedicated, like I said, we have a dedicated team that supports it. So we kind of keep an eye on how this infrastructure is behaving. We even, okay. So I mean, if I want to use Kafka, and I want to know, like you said that there is no Kafka is not backed up. How would I know that Kafka is not backing up? It's going bad or it's not provisioned enough. Is there a monitoring for that sort of auto build into say dumping data into say, you know, graphite or something like that? So since Kafka supports rest out of the box, you could use that rest APS to monitor the offset. So this information is stored in zookeeper. It tells you where the consumer is and where your tail is. In terms of it maintenance and head and tail, head and tail is the information or the events that are stored within the broker. So you could easily use that information to set up your own monitoring. But out of the box, I don't believe we support or we don't emit this kind of information. One last question. So if I mean, I understand that having a PubSub mechanism allows you to do multiplexing of receivers. And without the publisher needing to do anything about it. So if you just have one entity that the logs are supposed to go to, then would Kafka still be of any use or would it give any significant advantage or anything over, say, if you were to just use a lightweight lumberjack shipper from the client? The main reason we don't use something like a lumberjack is you're adding another application that's already, which is running your production applications. So the idea of having an application sitting on top of it and then you need to configure what files you need to actually read from lumberjack. You need to specifically arbitrarily provide the files that you need to read from. That's going to increase your IO. It maintains a state DB as well to figure out what kind of logs have been sent out. So you're increasing the IO, CPO memory utilization of a box. Is that an advantage in the sense that you don't need to modify your application at all to monitor or something? Say for example, if you want to monitor the response times of your engineers, then you don't need to, like, rebuild engineers or anything. Just looking at the log files, you can do it. Yeah, again, it would be on a used-to-use cases. But again, we feel that we need to provide more flexibility of a user that he or a developer or an engineer is. He does not have to worry about what kind of information, what do I need to make sure to set up that these logs are shipped out to Kafka. That then he just, his more concern, his more concern should be writing his application and not worrying about these things. That's philosophy. If I have a web application and I would need to use this, that would mean that I do an import of one of the Kafka libraries in my application and say some configure that library and say log this in my application. That's all that I want to do, right? Yeah, that's it. Okay, thanks.