 if you didn't invite people. Hello. That was, after this presentation, now we have some kind of community meeting here. It will be pretty formal, but informal. We might sit there on stage. So if you have questions, talk about the roadmap, FluentD, FluentBit. Android, myself, will be pretty happy to talk about that. So take it a more casual conversation, if you want. So we'll have this session for about 20 minutes, 30 minutes, before the coffee break, so maybe a good time for feedback and talk more about the break. Thanks. And if you have production scenarios, you want to run by us or something like that, we'd be happy to take a look at questions about monitoring, questions about all sorts of stuff. Happy to chat about any of it. Hello. We have the first question here. Camry from Pandu. He's curious about all of you who is using... The aggregator is a single tone, going direct into the persistent store, so some of the functionality of the aggregation is moving into the collector area. Yeah. And my question was, so in Kubernetes, which is a managed environment, and the resources are automatically, so you can span up more and more and more. Some of collector and aggregator together, is there still value to have a separate aggregation level in compared to just getting the same functionality from a single container? I can base the answer on the use case that we see. It depends, for example, if I talk about, OK, if I talk about CaliTea customers, it depends of the company size and what they are trying to accomplish and what are the security requirements involved, OK? For example, if you have a cluster with a hundred of nodes and these nodes are talking to a cloud service, most of the big companies don't want to share secrets, which each node, because of security reasons, right? Yeah, there might be solutions like Bolt or something that you can use to distribute the secrets, but some security policies and security teams might say you should not expose secrets to the nodes, right? So you have to go through an intermediate layer, and that's why sometimes an aggregator or intermediate layer is required for that case. In other case, when the aggregator is needed, it's like, for example, I remember one case from one financial institution, it's like they said, we want to receive all the data in one aggregator because we have so many teams internally, like internal customers, that they don't follow our policies on how to structure the data, right? And we want to apply all the data sanitization in one place in our aggregator before hitting, you know, the destination or the final storage. The other case is like my cluster or my distributed system is growing so much, now I have a thousand, tomorrow I have 10,000 nodes, no matter if they're Bermuda machines, Kubernetes or not, if all of these agents, for example, are talking to my local deployment of storage database, right? Sometimes this database cannot handle the load, right? And don't have all the tools in place for that because it's decided that I will need to manage this like a cluster, scale it up, scale it down, and it's really complex because sometimes you don't know when your load is going to increase, it has to predict. And so what they do, they put the aggregator because they said in the aggregator, I will have a huge buffer in disk and I will make sure that the traffic goes kind of linear, linear always sending the same amount of traffic over the network, I don't know, 100 megabytes per second, always the same. And if the data gets delayed, that's fine. So it depends to case by case. In most, for example, that's why in FluentVet side, I would say we are seeing the same. Users of FluentVet are always looking for more performance. They're doing aggregation now with FluentVet. And yeah, we're getting this demand, yeah, from open search, walkie-toot, and also from security faults who want to make sure that the transition of the data gets in a safety way without sharing secrets or putting in risk other components of the environment. Thank you. Any question is valid if you want to talk. If you want to blame us because we break something, that's valid. Okay, we have one question there, if you can, thank you. I feel like guilty in these positions, like a little bit. Yeah. I got it. Yeah, a little question around efficiency. So we are in the health industry and in our business. It sometimes is more important that the things are logged. So logging is more important than business logic. So for instance, prescriptions, we have to log every prescription to go back and verify it. And my question is, now we are doing, we are writing to a file, reading it into Fluent D and stream it to Splunk. But we also have looked at streaming it via TCP. And the question is, is it more efficient or is it much more efficient to stream it or than to write it or this real overhead in the processing of the log input? What do you think? Okay, so the use case is health, medical information, safety is, as a private, is a bulb of performance so you cannot lose data. And you're using Fluent D right now. Yeah, we are using Fluent D. And have you faced that Fluent D CPU go so high at the moment or it's not able to handle the load? No. So it's a theoretical exercise really, but it's more of what way are we going to develop the platform because we are making a platform to take on services. So are we going to, do we have to suggest that they use streaming of logs or is it okay to write it to disk in like large production? I'm going to give you a kind of an answer that might be mixed with some suggestions, right? So this is like if you are thinking about a platform and this platform might be there an architectural perspective like two, three years, right? Because you will not change this in a few months, right? Once it hits production. I would suggest that if you're worried about that data must be saved pretty quickly and in an efficient way, I would suggest maybe use Fluent Bed, inset of Fluent D, right? Fluent D is efficient, but when you hit certain load, it might struggle a little bit and Fluent D is also a single thread process, right? So if you have more data to write, you're struggled to sending more data to write over the network. In the Fluent Bed case, that is optimized even more because when you hit the data, and for example in technical terms, when the input plugin receives the data, no matter if it's tail, TCP or no matter the source, this data by default will be in memory but you can enable the file system like a secondary system when you're going to get a receiving copy of this data. And now, why the mechanism in Fluent Bed is more optimal than in Fluent D? Because we use a similar approach than databases, we use memory map files, so we reduce the number of system calls that are required to sync and write the data to disk. We found that that's been a times faster and less CPU intensive when it's about to write data to disk. Second, and that happens in one thread in Fluent Bed. Second, in the output side, the output plugin is blank or elastic or most of them runs in a separate thread. So this blocking step will not exist in there, right? So if you are writing a platform, I will, that you rely on data safety and IO, I will go with Fluent Bed, my suggestion, because of threading, because of memory map files, and it's more optimized for those use cases. Now, once you get the data using that first safety step, like getting the data in as safety on disk and then be prepared to ship the data out, I think that then doesn't matter if you choose Splang S3 because you will have a backup of the data, right? One of the problems is synchronization. Sometimes data is generated so fast and rotated so fast that the agent sometimes could not be able to pick up that data, right? It depends of how noisy your application is. But yeah, I will base my suggestion on that. I don't know if that answered correctly, but if I'm going to think about a platform and think about IO data safety, I will implement Fluent Bed with these features enabled, file system, threading in the output, and yeah. So if Fluent Day will use, I don't know, for a minimal use case, spend like 100 megabytes of memory Fluent Bed might use 10. Okay, great answer. And I have another one, around the operator. How is, you have the metric collection, are you planning to add like custom resources to make the developers able to create collections like in the Prometheus operator? You can create a special service monitor that automatically updates the scraping of Prometheus. Do you plan to add this to Fluent Bed so you can automatically configure collection of metrics? I have, this sounds weird, but I have not used the Fluent operator. Personally, I think that Android has more experience with that. Maybe he can answer that part. Yeah, I think with the Fluent operator's roadmap on metric collection, most of it's geared towards, it's a Q, I think they said Q3, Q4, and that's the CubeSphere team. My guess is, when you look at the metric collection features that exist today, they're really geared towards having very firm knowledge about what you wanna collect. Whether that's NodeExporter or the Prometheus endpoint. With the operator, the intention at least is to be able to templatize that a little bit more. And of course, one of the other benefits with the operator is all of the service discovery benefits you get, and that also occurs with logs, right? Wild cards and everything there. I think there's a lot to gain from what the Prometheus operator's already done in that sense. And in some cases, it might do some of the same feature sets, but in others, it would require so much implementation that, hey, why not just use the Prometheus operator in that case? So, a bit of a mixed answer. But, yeah, I'd say, at least, the way we like to look at it is we try to do what makes a lot of sense with Limpit, but not just try to do 100% replica copy, like NodeExporter. We don't have full NodeExporter capabilities. That team's worked amazing on building that full functionality, and we've brought over some of the best pieces of it, but not the entirety. Yeah, and actually, I wanted to, one of the other pieces from the previous question, too, about whether you should stream the data or buffer to disk, either way works really well. If you think about Kubernetes, all of that is being written to disk anyways, right? It's all a container log, and we continually update the offset, so in the chance that something is to happen in the daemon set or the process dies, when it restarts, it knows exactly where to resume from, and I think maybe it's my own bias, but sometimes when you know exactly where you left off, there's a file there, you feel a little safer, right? That being said, the chunks are just the same exact data, but just formatted a bit different, so either way works, yeah, all good. I think we have two minutes for one more question because we have the coffee break. Anyone else? Or if you want to ask for some enhancement request, happy to hear about it. No feature request. Yeah, it looks like IBM is thinking about a new feature request. Yeah, yeah, yeah. I saw in one of the slides about Fluent Operator that they have multiple CRDs or custom resource definition for each source or output or filter. Like why don't you combine each of them in one? I mean, what led you to make that decision to have multiple CRDs? I have a question around that because in general, either you read from it or you don't. I don't know if it leads to a situation where you need to vary your parameters. Yeah, it's a good question, and I don't wanna speak on behalf of the CubeSphere team, but one of the thoughts at least that I would have around it is that you might, you essentially are constructing a single pipeline. The Fluent Operator is taking the distinct pieces of those sources, filters, and outputs, and then just bridging them all together. And while, yes, that could be done with a config map, you might wanna update the sources independently and not affect the remainder of the pipeline. So I think that was one of the intentions, of course, not trying to put words in the CubeSphere folks' mouth, but yeah, I think that was one of the big things. And also, if you look at how some of the services interact with those CRDs, they are very much like dropdowns of, hey, you click your source, click your filter, click your output, and I think that works well when each of those is a separate custom resource. So do you combine all these CRDs to form one config, and then, so, but if you want to change, like, let's say one of the sources, you, let's say in Fluenty, you don't have the option to dynamically reload your config. Anyway, you will have to restart, right? So how does that affect your CRD? Like, if you have only one, that will, like, you will have to construct the config again. Yes, so Fluenty, when Fluenty does support dynamic reload, but I get what you're saying, you're like, even if you change four or five sources, you're still, the process still needs to be updated. Yeah, I think that's absolutely true. I think, again, it was probably more for the convenience of just having each of those objects independently managed. Modularize. Yeah, because if you think about it, as one giant object of configuration, you would have to parse that out and make those each individual configurable elements versus just having them as individual configurable elements. So, like, how do you support, like, you can have many filter plugins, right? So how do you support the parameters as they come along? Yeah, good question. You know, I think with the filters, they're primarily based off of the, all that object does is house the match in it. And so, it treats everything as a giant pipeline. I don't necessarily know if it's doing anything too unique in the sense of saying, okay, this belongs to this label, this label, or this label. I could be 100% wrong, but that's at least my initial understanding of it. Okay, thank you. Okay, that was pretty good. We have community meetings, by the way, every Thursday, every two weeks Thursdays, we do a rotation to in North America late European time and then one European and Asia Pacific time. So actually, Mickey's led on one of those meetings before. I lead some, we have Pat who leads some others. So, would love to have more folks join us. We talk about roadmap, what we're building, help with specific use cases. So, yeah, always encourage more and more folks. And those are all recorded, so you can watch them on YouTube as well. Okay, I think let's go ahead and break for coffee and then we'll circle back here, I think after 15, 20 minutes. Just double check. Yeah, we get back here at three 10 for the next session. So, about 15 minutes. Okay.