 From around the globe, it's theCUBE with coverage of KubeCon and CloudNativeCon Europe 2021 virtual. Brought to you by Red Hat, the CloudNative Computing Foundation and ecosystem partners. And welcome back to theCUBE's coverage of KubeCon 21, CloudNativeCon 21 virtual. I'm John Furrier, host of theCUBE. We're here with a great segment with an entrepreneur and also the creator and maintainer of FluentBit Eduardo Silva, who's now the founder of Calyptia. There was a startup going to commercialize and have an enterprise grade, FluentD and FluentBit Eduardo. Great to have you on. Thanks for coming on theCUBE. Thanks for having me here. So I'm very happy to share with you some of the story and talk whatever you went about. Exciting trends happening with CNCF, KubeCon, CloudNativeCon, CloudNative, a lot of data, a lot of management, a lot of logging, a lot of observability, a lot of end user contributions and enterprise adoption. So let's get into it first by giving us a quick update on FluentD, anything upcoming to highlight? Yeah, well, FluentD is actually turning 10 years old right now. So it's the more mature project that we have for log management and log processing in the market. And we're really happy to see that despite a project that was started 10 years ago, it's adoption, it continues growing, the ecosystem from a plug-in perspective and companies adopting the technology that is really great. So it's very overwhelming and I'm actually really happy to take this project and continue working with companies, individuals. And right now, what is the position where we are now with FluentD? So I put the program up. It's like one of the things that people is facing not because of the tool because people have every time they have more data and more microservices. The system are scaling up. It's like about performance, right? And performance is critical. If you're slowing down in data processing, actually you're not getting the data at the right time when you need it, right? Nobody's people needs real time queries, real time analysis. So from a FluentD perspective, we are going to focus a lot on everything that is about performance. I would say for this year and maybe the other one, I would say that we won't see many new futures around FluentD itself as a project. So we'll be mostly about back fixing and performance improvements. Yeah, I definitely want to dig in with you on the data and logging challenges around Kubernetes, especially with end-to-end workflows and there's the different environments that sits in the middle of. But first, before we get there, just take a minute to explain to the folks, not that savvy with FluentBit, what is FluentBit? Real quick, explain what it is. Okay, so I will start with a quick story about this. So when we started FluentD, we envisioned that at some point, I'm talking about six years ago, right? All this IoT train or embedded or edge will be available. And for that, you could go back to heavy, right? If you have a constrained environment or you want to process data in a more faster way, without all the capabilities at that time, we say that FluentD might not be suitable for that. So the thing is, okay, FluentD was not longer like a single software piece, right? We wanted to say FluentD is an ecosystem, right? And as part of the ecosystem, we have SDK where people can connect applications to FluentD, but also we say we need like a FluentD, but that could be lightweight and faster. FluentD is written Ruby, right? And the critical parts in C. But since it's written Ruby, of course, there's so many pros and cons on how do you process the data and how much you can scale, right? So we say, if you're going to dig into embedded or small constrained environments, let's write a similar solution, but in C language. So we can optimize some memory, we can optimize some IO and all this kind of needs will be affected, right? And we started this break called FluentD. And FluentD is like nowadays, it's like a lightweight version of FluentD. It started for the better Linux, but after a few years, people from the cloud space and talking about containers, Kubernetes, they started to ask for more features for FluentD because they wanted, they had FluentD, but also they wanted to have FluentD on it because of it was lively. And nowadays we can see that what FluentD is established in the market. And FluentD, we're getting around two million Docker half dollars every single day. So nowadays the traction of the project is incredible and it's mostly used to collect logs from the files, from system being, and for most of Kubernetes environment is able to process all this information and append metadata and solve all the problem of how do I collect my data? How do I make sure that the data has the right context metadata and I'm able to deliver this data to a central place like a cloud provider or any kind of storage? That's great. And I love the fact that it's written in C which kind of gives the, I'll say it more performance on the code, less overhead, get deeper, closer, and people know C, it's high performance. Quick stats though, how old is the project FluentD bit? What version are you on? FluentD, I'm not sure if it's turning six or seven this year, but likely six. It's been around for a while. What was it? Yeah, yeah. We just released this week one that seven that three, right? We had done more than a hundred releases. Actually the release cycle FluentD is pretty fast. Sometimes it had releases every two, three weeks. So the iteration, the cloud native ecosystem is quite fast. People want a more future, more fixes and they don't want to wait for a couple of months for the next release. They wanted to have the container image right away to test it out. And actually since we as a project we've worked with most top providers like AWS, Microsoft Azure, Google Cloud Platform, the demand for these fixes and improvements are in a weekly basis. You guys got a lot of props. I was checking around on the internet. You guys are getting strong reviews on logging for Kubernetes. With the couple of releases ago, you had higher performance improvements for Google AWS, LogDName, Postgres SQL and all the other environments. But the question that I'm getting and I'm hearing from folks is I have end-to-end workflows and they've been steady, they've been strong but as more data comes in and more services are connecting to it from network protocols to other cloud services, the complexity of what was once a straight forward workflow end-to-end is impacted by this new data. How do you guys address that? How would you speak to that use case? Well, for us data, we have taken approach to data for us is agnostic on the way that it comes from what it comes from and the format that comes from. For example, if you talk about the common uses case that we have now, it's like data come from different formats. Every single developer use the all-hugging format comes from different channels, TCP, five system or another services. So it's very different how we get this data and that is a big challenge, right? How do you take data from different sources, different format and you try to unify this internally and then if you're going to talk, for example, to elastic search, elastic search using JSON, you're going to talk to Kafka, they have their own binary protocol. So we are kind of the backbone that takes all the data, transfers the data and try to adapt to the destination expected payload. From a technical perspective, yeah, it's really challenging. It's really challenging also that nowadays, so two years ago, people was finding processing, I don't know, 5,000 messages per second, but nowadays they want 10, 20, 40,000. So from an architectural perspective, yeah, there are many challenges and I think that the teamwork from the maintainers and the companies has provided a lot of value, a lot of value and I think that the biggest proof here is that adoption, right? Adoption and big adoption, you have more maximum ported, more enhancement requests. All right, so if I get this right, you got different sources of data collection issues if you look on the front end and then you got some secret sauce with a bit fluent, I mean, inside the Kubernetes clusters and then you deliver it to multiple services and databases and cloud services, is that right? Is that the key value? Is that the key value proposition? Did I get that right with fluent bit? Yeah, I would say most than a technical implementation of the value of the technical implementation, I would say that is two words, being the vendor neutral, right? So when you come, when you go to the market and you go to the talk to bank institution, hospitals or any big company, right? Most of them are facing this concept of vendor locking, right? They use a vendor database, but you have to get married to their tooling, right? And I'm not going to mention any vendor name, right? You can name it. But actually it's pretty fun. Well, for example, the business model, this company that started with S and ends with Splunk, right? For example, is you pay as much money, so you pay as much money compared to the data that you ingest, but the default tools ingest the whole data, but in reality, if you go to the enterprise, they say, yeah, I'm ingesting all my data into Splunk or X provider, right? But from the 100% that I'm ingesting, which I'm paying for, I'm just using this service to query at least 20% of that data. So why I'm ingesting this 80% extra? I don't need it, right? That's why I want to send, and this is a real use case. They use Splunk, which is really good for queries, analyzed data, but they say, yeah, 80% of my data is just archive data. I will need it maybe in a couple of months. So just I want to send it to Amazon history or any kind of other archive service. So users, the value that says is that I want to have a mentor neutral pipeline, which me as a user, I want to decide where to send data, when to send it, and also I can control my bills, right? And I think that is the biggest value. So you can go to the market, you will find maybe other tools for logging or tools for matrix, because there's a ton of them. But I think that none of them can say we are vendor neutral. Not all of them can offer this flexibility to the user, right? So from a technical angles performance, but from an end user is vendor neutrality. Okay, so I have to ask you then here in the CNCF projects that are going on and the community around Fluentbit, you have to have those kinds of enhancements, integrations for instance, for not only performance improvement, but extensibility. So enterprises, they want everything, right? They make things very complicated. They have very complicated infrastructures. So if they want some policy or they want to have data ingestion policies or take advantage of no vendor lock-in, how is the community responding? What's your vision for helping companies? Now you got your new venture, then you got the open source project. How does this evolve? How do you see this evolving, Eduardo? Because there is a need for use cases that don't need all the data, but you need all the data to get some of the data, right? So you have a new paradigm of coding and you want it to be dynamic and relevant. How do you see this evolving? Yeah, actually I'm going to give you some spoilers, right? So some news before we put it. Yeah, so end users have these a lot of, they have a lot of problems. How to collect the data, process the data and send the data. We just saw that, right? Performance is a continuous improvement, right? Because you have always more data, more formats. That's fine. But one critical thing that people says, hey, you say, hey, I want to put my business logic in the pipeline. So think about this. If you have to embed, we are the backbone for data, right? But we also provide capabilities to do data processing because you can grab the data or you can do custom modifications over the data. One thing that we did like a year, two years ago, is we added this kind of stream processing capabilities. Kind of a SQL for Kafka, but we have our own SQL engine and flow embed. So when the data is flowing without having any database, any index or anything, we can do data aggregation. You can put some business logic on it and says for all the data that matches this pattern, send it to a different destination. Otherwise send it to Kafka, Splang, or Elastic. So we have, this is what we have now, is stream processing capabilities. Now, what is the spoiler and what we are going next right now? There are two major areas. One of them is distributed stream processing, right? The capability is to put this intelligence on the edge. On the edge, I'm referring to, for example, a Kubernetes node, right? Or a constraint environment, right? Kubernetes on the edge is something that is going on. There are many companies using that approach, but they want to put some intelligence and data processing where the data is being generated because there's one problem. Once you have more data and you want to create that data, you have to wait and to centralize all the data in the database or your service. And there's a latency, right? Many of the sometimes hours because data needs to be indexed. But what about if you have a hundred of nodes but each one is already running fluent bit? Why you don't run the queries there? That is one of the features that we have. And well, now talking from the challenges from the spoiler perspective is people says, okay, I love this pipeline. I know that fluent bit has a lot of architecture, but the language C is not my thing, right? I don't want to call it C. Nobody likes C. Now we are honest about that and there are many passwords about security. Oh, not just that one, which is true, right? It's really easy to mess up things in C, right? So, and we said, okay, so now our next level is like we're going to provide this year the ability to write your own plugins in Wasm WebAssembly. So with a WebAssembly interface, you can run your own plugins in Go, Rust or any kind of WebAssembly support language and translate that implementation to native Wasm that fluent bit we understand. So C as a language won't be longer a blocker for you as a developer or as a company that want to put more business logic into the pipeline. So that is one of the hot things that are coming up. We already have some POCs, but they're not ready to show. So maybe we can expect something for QCon US at the end of this year. Well, great stuff. By the way, from a C standpoint, us old timers like me used to program in C and not a lot of C cars is being taught. But if you do know C, it's very valuable. But again, to your point, the developers are focused on coding the apps, not so much the underlying. So I think that's key. I will like to ask you one final question before we wrap up. How do you deploy Fluidbit? What's the, is it, is it, you putting it inside the cluster? Is there, is it scripts? What's the, what's the architecture real quick? Give us a quick overview of the architecture. Fluidbit is not just for a classroom. You can run it on any machine, Windows, Linux, VM. Yeah. And that doesn't need to be a Kubernetes cluster, right? When we created Fluidbit, Kubernetes was quite new at the same time. So if you talk about Kubernetes, you deploy it as a demo set and demo centers pretty much a code that runs on every node, like an agent, right? Or you can run it as a service on any kind of machine. Oh, and one thing before we wrap up, I just missed to mention something from the spoiler part because it just gets in the way. We're having many news these days is that Fluidbit used to be mostly for login, right? And in Fluidbit, a specific project, we got many feedback from years ago saying, you know what? I'm using my agent for logging to Fluidbit but have my agents for metrics. And sometimes this is quite heavy to have multiple agents on your agent. So now Fluidbit is extending its capabilities to deal with native metrics, right? And the first version will be available about this week on KubeCon, right? We will be able to process host metrics or application metrics and send them to Prometheus with open metrics format in a native way. So we are extending the Fluentico system to be a better citizen with open metrics and in the future also with open telemetry, which is a hot thing that is coming up on this month. Everyone loves metrics, that's super important. Having the data is really, really important as day two operations and GitOps, all this stuff is happening. Eduardo, thank you for coming on and sharing the update and congratulations on the new venture. We'll keep following you and look forward the big launch, but Fluidbit looking good. Congratulations, thanks for coming on. Thank you so much, I hope you enjoyed the conference. Okay, this is the Kube's coverage of KubeCon 21, cloud native con 21 virtual soon we'll be back in real life at the events extracting the signal from the noise. Thanks for watching.