 And I got my first batch, you know, so I expect to be coming every year, and it's been an exciting event, so people are very nice, a lot of DevOps, CISAD means, and developers, great. So this topic is very important, I think, that for everybody. Not just DevOps, not just CISAD means, but also for developers. Because we as developers generate many problems to DevOps, right? You know that. So it's like sometimes I'm creating my own application, this application is logging some information, and the next day I change the format, and at some point the DevOps has to create a new script, and we have 10 applications, 20, all of them different with different formats. It's a mess. So on this talk, we're going to talk about our experience developing a solution to fix this problem, to make things easier for developers, for DevOps, and of course for the real business. Fluent D born long time ago, like four years ago, it's not a long time, but before to continue, I would like to take some time, if you want to send some comments, or this presentation sucks, it's good or not, please use both hashtags, okay, we'll have some teachers, people who's trying to give us some feedback during the session, so we'll be really appreciate. Well, let's get started. My name is Eduardo Silva, I work for Tresha Data, we are the company who's behind Fluent D, but Fluent D is fully open source as many other projects that we have. We do cloud analytics, so when you do cloud analytics, you say, okay, we are going to manage data for a bunch of people, but how do we collect data? So that's where Fluent D borns, right? So and Fluent D as other projects like Fluent Bit, in bulk are fully open source under Apache license, so the slides will be available later, so you can grab my Twitter blog or projects that I'm involved to. We're going to start talking about logging. Logging, it's pretty important, it has many advantages. For example, you can get what's application status, you can perform debugging. When you get some issues, the first thing that you get is please review the logs, if you have it. So and find some anomalies. We can try to shoot and of course, a login can be done locally or remotely. When things start scaling, at the end it's mostly remotely. But for a business point of view, logging is important because it helps you to take better decisions, right? Sometimes maybe your application is not working well for you as a developer, for you as a DevOps, but for a business perspective, that could be affecting customers, so things get serious. Logging is not something optional. Nowadays, logging is a must. And there's a few assumptions that, first of all, we need to start working on. When we start doing logging, we usually do it on the file system, and everybody says, ah, let's do some logging here. In this path, I have enough space, right? Your hard disk will never be full. You got it. It happened to you. And sometimes you say, no, I'm writing this to the disk, but the message that I'm writing is too short as it will never block. What means block that an application is running? At this point, I'm writing to the disk, and I supposedly be wait for that function to return. But sometimes, wait a second, two seconds, three seconds. That means blocking. At our point, you say, the log messages that I'm writing and human readable, what that means that everybody can understand it, false. It's very hard to understand log messages until you are in iteration, in a daily basis, you understand what is going on. And without conduct, next day, maybe you have an intern, maybe next week, a new developer, and everybody's writing new log messages. And you want to filter that information at some point. That's become a problem. And of course, you say that your logging system will scale. It doesn't matter. We as a company build this great app. If we have 100 users, we'll be fine. Nobody cares. Sometimes, your application scale gets viral. And you get 1,000, 2,000, 3,000 users. And guess what? What's happening with your login mechanism? It's trying to get stuck. It starts to get blocked. Well, it should work. But sometimes, it doesn't. So let's talk about the concerns that we have. So if the log information increase, it means that our data is increasing. If we have different message formats, it's more complex to parse the data. Even if you invoke some system code and a practice system label, say, please write this message to this file, and that returns, that does not mean that your data was fully stored in the hard disk, or the SSD, or whatever. Because the kernel need to flush that data for the kernel buffers. Now, if your application is running multiple threads at the same time, everyone of those threads is trying to write to the file system. Of course, you need to log, put some mutual exclusion between them so you can write the messages. But you have to log in. And log in is not good. It solves the problem, right? But it does not help you to scale. Now, if you have multiple applications, what happens when you have multiple logs? If you have multiple applications, it means multiple logs. If we put that application on multiple hosts, it's a mess. So there's a point where you cannot manage the information. It gets very complex. So logging matters. It's really beneficial, but it needs to be done right. There are many solutions. I'm not saying that we have the better solution of all. We're talking here about our experience by the community plus the company customers. So when you think about logging, you can think that you have many input sources. If you have some web applications, you can think that maybe your Apache web server is generating logs. But in the common environments nowadays, you don't just have just one application. You maybe have Apache. You have an NGINX as a front end doing caching. Behind that, you have a PHP using fast CGI. So we already have three things. And each one is generating logs, logs, logs. Plus, you have your custom application, which is made on any language or scripting, which is generating their own logs. So from an administration point of view, how do you manage to look at this information and get some results of this? Some statistics. What's happening if you get an error, for example, the customer or the user who's with this mobile app, is using the app and get some error, where that error happens? On the front end with NGINX, on the back end with Apache, on the PHP side, or my custom application. I think that you are getting the point. So not too far from here. This is what we do. We have a bunch of scripts running on Python, Ruby, Perl, and all of them are running through Chrome tab or something similar. Because we want to get that data and push the data back to any kind of service. Like, for example, we may want to make some archiving with Amazon S3. Maybe we want to do some big data stuff. We want to use Hadoop, so we want to push the data to the Hadoop file system. Or some relational database like MySQL or NoSQL like MongoDB, Redis, or whatever. So this is a current scenario that most of companies get a problem for companies. I was in the Fluentee booth, and we come up with many people, some of them say, oh, yeah, I hear about this topic, and others say, yeah, this is what we do. And this is a problem, because I waste hours. And that's what not happens on these times. Other thing, you can get the data, but you have to parse the data. That is quite expensive. One of the most expensive things when working with them, working services and data, is parse strings. Because everything is a string here. Just a few people is writing logs in binary formats. So how do we solve this with high performance? So you've got the problems. The first problem is, OK, we had different inputs or sources of data, each one on a different format. OK, I get them. Now how to parse the data? That's when Fluentee born. Fluentee born on 2011 to solve all of these kinds problems with performance, with flexibility. Not trying to compete with everybody, but just to integrate with everybody. Well, as you know, Fluentee is an open source data collector. That means that it collects data. It allows you to unify this data at some point and push that data back to any database cloud service. This is what we have before. So with Fluentee, we're going to have this, something more clean and something that you can use in a real way. This is a real way we have more than 1,000 users. What about the future of Fluentee? Well, it's high performance. I could be lying. Try yourself. Build inter-reliability. We don't want that you lose data. There are some solutions in the market which collect the data, but if they cannot push the data back to some database, if the node is down or you have some outage on your network, the data is lost. That could not happen. That's why Fluentee have some workarounds for that. We manage structured logs. That means, for example, if you have a web server log, for example, where you get the IP address, you get the date, you get the meter, the URL, and all of those from HTTP request, what it does is take that information and decompose a kind of message in JSON format for you. That means structured information at some way. It's like making a lot of key value store from each row of logs that you have. It has a pluggable architecture. The community has built more than 300 plugins, plugins to read from syslog, from syslog engine, for Apache, Nginx, Mexico, everything. And what's internal architecture in a global overview? Starting from here, from the left to right, of course, we have the inputs. The inputs plugins get the data in. Then we have the parsers, because we need to parse the information. At some point, may you want to filter that information? You don't want all of them. Maybe you just want to get the whole request, HTTP request that you're getting from the states. Maybe you want to discard everything that's coming from China, or India, for statistics reasons. Then that the data is filtered, it's buffered. That means that flu indeed, when it collects the information, it starts buffering for seconds. You specify the time. So once you've buffered the information, time out, it flows the data back to an output type. So the basic thing here is flu indeed works with inputs and outputs. And of course, it supports some formators. Formators means a way to refilter the information that's going out, the format. So the internal simplifier, how it works. When input plug-in gets data in, and it's filtered, then it goes to the buffer. But the data, it's pretty good. Because it just adds a timestamp here, time to meet when I receive this message. This is no longer just a log file. It's an event. An event has properties. It has time. It has attack. What means attack? Because I want to identify internally a flu indeed where this data is coming from. And then you have your records in a structured way. And then it's filtered out, of course. So as an input, maybe you can start flu indeed not just come parse log files. It also can behave as a web server. So I can say, I'm going to start a flu indeed with an input as HTTP server. And I'm making all my apps pushing logs to my flu indeed. And then flu indeed insert the records back to a database. Or just we can listen for syslog, or syslog ng. We have plugins for everything. And the output, of course, you can send the data back to a file to Amazon S3 if you want to make some archiving, it's quite expensive, but it works. Maybe your local instance of MongoDB, because at the end, what you want at the end is to query that information. And when we do buffering, we can do it on the file system or in memory. Nobody use memory, trust me. Everybody who's in production use the file system, because if something happens, nobody wants to lose data. So you could try it, but it's up to you. And when the data is going out to the output plugins, if you get, for example, a mech of data, one mech, what it does is split small chunks. Because at the end, everything is a record, with a time, with a tag, right? Small chunks that are slushed back in parallel, if you want. So it tries to optimize, increase the throughput through the network. Well, this is a more complex problem. Here we have the record, right? We have the time, the record information. It goes to the internal rotor. Do you remember that we have a tag here? A tag, I can't say, for example. If I have one FluentD running, I can say FluentD. FluentD, please listen for log files from Apache on this path and also listen for a nginx log on this other path, right? But what's the difference? I say, please, for everything that is coming from Apache, add the tag Apache. For everything that's coming from nginx, add the tag nginx. And then, when buffering and splitting them back, before it goes to the output, the rotor will say, OK, everything that is coming from Apache, maybe I want to push it to Elasticsearch. And everything that is coming from nginx, I wanted to send it to Amazon S3. So you can split and copy the data inside one FluentD instance as you want. And of course, we try to solve this. This is a common pattern. You have many inputs, many outputs. It always goes for buffer, filter, and routing. It also supports simple forwarding. Simple forwarding means receive the data and insert the data back. That's the basic stuff of FluentD, right? And now we're going to explain a similar configuration file here. Source means from where the data comes from. The type means the plugin. The tail plugin looks for files and starts reading for new data that is coming into that file. It's like tailing on bash, but through the script. I'm saying a path from where this data comes from. I'm specifying the format of the data. So you don't need to tell FluentD how to part the data. Just use the right format that already exists for you. And at attack, back in that Apache. We're not going to focus on this one. But then we have a match. A match means that for every data that match the attack, bacon, with everything, right? We are going to insert this on a MongoDB database, which is called Fluent, and the collection name is Test. The collection is a specific name from MongoDB database. And also here, well, what we have here means Forward. That means that when we can listen for events coming from another FluentD, because maybe your architecture is quite big. And maybe what you can do is make and talk each FluentD to each other. So here we have a more difficult example, where we have many sources, and many sources running their own FluentD. Here it's working forwarding, because we are forwarding the records to a central or aggregator, FluentD, who later can insert the data to any place like TrishaData, Fiboku, Amazon, Cloud Platform, Google, everything. And also we have the common case of the Lambda architecture. Are you familiar with Lambda architecture? OK, a few people. Sometimes when you manage data and you want to query that data, you have two ways. One is using real-time queries, because you have a bunch of records, and you perform real-time queries. And there are a set of data that is maybe three weeks from four weeks ago. But when you have data from a year, two years, the queries over that is a bit complex. So you need a different mechanism to parse that data. So this is Lambda architecture, where for some things you distribute one data to some real-time engine and other to some big data engine. Here is an example. We have Elasticsearch, which is pretty good for real-time. And we have Hadoop, which is really good for MapReduce and that kind of queries. And how that can be implemented in the configuration. If we look at the source, it's pretty much the same that the previous one. Now look at the matches that we have here. The type of output that we have is copy. What means copy? That we are taking, for each record, we are making a copy for two kind of stores. The first one is for Elasticsearch. We are using the lockstach format. And then for each copy, also send it to all the Hadoop file system, which is located on this hole, this port, and this path. So when I'm trying to explain here, is that does not matter what kind of data you get, you can store it as you want. And that is quite good. Now, who is Fluendee in production? Well, line, gray, a slideshare. This was not intentional. I don't know what happens here. Trust me. Really, you know what, because this is out of the record, because we have been talking with the Microsoft people and they are using Fluendee in a project. So I just added the image before it came here. And I don't know what happens. Really, I apologize for Microsoft people here. It was not intentional. I will fix it before talking about the slides. This is slides, you know, things happen. Well, so Google Kubernetes is using Fluendee too. So Fluendee is not just to collect data, push data. It aims to implement and unify it log in the year. So hold my whole architecture in my company, where I have different kinds of data, how I can unify everything. That's where Fluendee comes in. And of course, we at the company, we use it a lot. We do one thing for analytics that pay the bills and we make everything open source. And Fluendee is one of them. And with Fluendee, we collect around one million events per second. That is like 100 of tweets around the world per second. Of course, we have multiple Fluendee instances. I cannot share more details because I'm restricted on that. But it's quite good and it works. And what happened, we're talking about servers. We're talking about mobile applications. But there are different scenarios. What about the internet of things, right? Devices connected to each other or most well known as embedded. IoT is like the new market name. The IoT staff is growing a lot. It's growing to the order of the billions of devices. We care about different things, about connectivity. But at the end, all of these kind of devices are doing the same, sharing information, sharing data. And they're generating logs. The difference is that logs are not being stored in most of cases in the file system because it's very restricted, right? But they are being dispatched to somewhere. So they need logging. When we talk about internet of things nowadays, there are two consumptions, very big, where different companies are behind them. Because companies say, okay, I have my IoT devices, but I need to partner with other companies. So everybody's trying to merge to have a complete ecosystem for them. So on one side, we have the old scene alliances and the other open interconnect. Each one with their own implementation. So each company, it's creating their own framework. On one side, they're creating IoTivity and in the other, all joined. And that is how the devices communicate each other. But it needs logging. So how to collect the data properly from these devices? Of course, SLU&D is not suitable for that because SLU&D, you need at least 40 megs in memory to run it properly. And you cannot waste that kind of memory in a really small device. Doesn't matter if it's too cheap. You can't. That's where the new project born, which is called SLU&BIT. We just make a big release this week. I don't know if you read linux.com website. We have a full post about SLU&BIT. And SLU&BIT was a solution based on the experience of SLU&D reading from scratch, but in C language. SLU&D is made in C plus Ruby. This is made fully in C. And of course, it's fully open source. There's the Twitter if you want to follow us. Great. So SLU&BIT allows you to collect data in many ways. Collect, but also if you are a developer for embedded applications, it's allowed you to dispatch the data to different outputs, same as SLU&D. And it's made for any kind of services to collect data from sensors, signals, radios, we support XB, operating system information. When you have your system running, your embedded system, you want to measure sometimes how much CPU is being consumed. Because think about it, in embedded, power consumption is an issue. So you want to see how much I'm consuming, maybe I'm consuming the house of CPU time. And that is really bad. You want to get that decrease. And of course it can run on telematics or automotive stuff. So when we think about SLU&BIT at the beginning, we say, okay, how it should be developed. So we say NC, it must support plugins, right? And must have an integration with SLU&D. Because maybe you have a full architecture, maybe a lot of IoT devices generating data, but you want to merge that information at some central point. That's why it supports SLU&D. So this is a generic solution, right? Well, SLU&BIT works. On this case, SLU&BIT resides on the embedded device. This is just an example. Where the data source can be an IoT framework, as the one described previously, an XB device. An XB is a small, ready device that's in the market. Or some other Linux peers. Once it gets the data in, it flushes the data to SLU&D. And of course, after that, you can flush to everywhere. This was really good. People was very happy. But we get a lot of feedback from embedded people. Hey, I don't want to use SLU&D. People from embedded is very strict about some things. So we started supported direct output to different kind of services. What you see here, there are actual output plugins supported by SLU&BIT. The thing is, of course, we don't do buffering on the file system yet. Because of its reason, but some customers are asking for that at the moment. They say, it doesn't matter, I am using a really small board, but each one has a one gigabyte for the file system. So please use 100 megabytes for that. Recently, we had support for the elastic search. So when you collect metrics for embedded or any kind of system, you would like to have some visualization of that data. Elastic search is very good for real time queries. But Elastic provides a very good tool, which is called Kibana. Do you know Kibana, right? Cool. And Kibana allows you to make some graphics about your data to make it some sense. So we added support for that, so you can make some graphics of the CPU usage of your system. Now, we are going to jump to another context. Container, who's from here is using containers? Who's using Docker? Ah, yeah, well, Docker is the same. Docker works on top of the Linux containers. It's pretty much the same. Not the same, but it provides the right interface. So when you run containers, it means that you're deploying applications not just once, maybe multiple times. So we have here about use cases where people have a hundred of containers or more. Cloud services, they have maybe a thousand. So how do you collect the logs for those applications? So, we make a deal because Docker on its version 1.6 implemented the login layer. They knew it, that login was very, very important. So we just come up with them and said, okay, we can build the driver for Fluent D natively for Docker. So one of our colleagues just did the driver and after a lot of iteration and work, the solution was merged. What that means, that now it's starting from Docker, 1.8. The Fluent D support, it's there. So we wrote a Go driver for Docker. If you get the new version of Docker, it will be there. So if you deploy your application, you can use the login driver Fluent D, you specify the tag, and then it will flush the data to a Fluent D service automatically. And who's using this right now heavily, Open Shift? People from Red Hat, Open Shift, the new versions, they're deploying everything with Docker and Fluent D because Docker solves the problem of containers, how to manage them. And they're using Fluent D to manage the problem of logging, how to manage them. And the Docker output is pretty good because when the container is working, it's split out some information. Like the tag, you know that each container has its own ID. So you get that information, you get the time, the source from this data comes in, oh, container ID, container name, and the log message. And each container often generates many of them. So if you're writing an application that's gonna be deployed on Docker, just stream out some JSON messages if you want for facility to the standard output and your set. You don't need longer to take care about logging. And at the end, you get this many Docker instances flushing the data to Fluent D. And for people who loves Node.js, I added this because people often ask about this. They said, okay, I'm deploying my really cool Node.js application. Oh, I'm not logging. So we implemented to a package which is called Fluent Logger. It's already available on the NPM package server. So you can get it and start using it. You just create the instance, configure where you're fluent in the Fluent D tag, where the Fluent D is, the timeout. So every one time is gonna push this data out and then just one line to send the message. So if you had environment where people created Node.js applications, just try to use this because you're going to unify the whole logging of the whole applications. Well, that was my short presentation. I hope you enjoyed it. I would like to know if you have some question about it to modify. Okay, are you asking about when I get the data, how to modify it? If I can modify as, oh, okay. Well, there are two ways. When you get the data in, okay, you can specify one format. If you change the format, you can change the format inside Fluent D. You can say I modify these fields. And also for each record of event that you get on Fluent D, you can modify it. On the filter process, you can say, please append this key value to the message, which is a pretty common use case. So, and if you have a custom kind of message where the plugin does not exist, you can write your own regular expression for that. So you always can deal with data, but not binary data. But we can listen for TCP, UDP, or LOX, LOX files. Oh, so sorry, please, continue. Yeah, exactly. People will use, they create their regular expression and you can assign some key names to each value that match the pattern. And people use it to, for example, they have sometimes an NGINX as a front end and behind that you have many address for different applications. That they are back in microservices, whatever. So they want to filter what is coming to which site and not to aggregate everything in one place, instead on several ones. Yeah, right now in the version 0.14 that is coming out in a few weeks. We are testing that a lot because change it like the protocol at the internals affect thousands of users. So, yeah, the solution is there, we are testing it, it's not failing yet, but wait for the 0.14. What? Glossary. Yeah, you can make a copy, a match, and a copy to different FluentD outputs because they're going to talk with the forward protocol which is called. So that's the big difference. Our solution in the market allows you to get the data and insert the data somewhere. But if it fails, it cannot do anything. But FluentD allows you to balance the connection and have some failover mechanism. And you can send to many FluentD and this FluentD is going to get the record and put it back to any place. Yeah, sequence number on table. Sorry, I don't have your question there. Here. Yeah. Okay. They are not reassembled. So, okay, let me start. When you get each record, it becomes like a unit, an event of data. Right? No, no, no. What we split is a number of rows, like to say, a number of events. If I have a hundred, maybe I split between 10, 10, 10, 10, 10, 10, 10. And why is that? Because we need to start chunks. When something happens, we send a chunk. If a chunk fails, we need to retry. Or maybe we have some parallelization with different threads and pushing the chunks between them. Throw them. Thanks. Okay. You have a question, somebody for them. Anybody else? Okay, your question is about the performance on throughput. Yeah. The performance, this is the generic answer. The performance always depends of your data. It's not the same, you receive data in a fixed format from TCP that getting data over the file system. Because each step is adding a small overhead. We do our best to collect data as fast as possible and make a reliable system to deliver that data as fast as possible. But if something happens, it will take some delays. It's near real time, but it's not real time. So I cannot say that this will work really, really, really, really fast because it depends of your data and what's the configuration. Maybe you're inserting the records into your Hadoop file system. But if there's a problem in your network, you need to retry some TCP packets or something. It's complex to answer, but our experience says that we get good results. That's why it's been used in production. And it's been replaced, so it is replacing all the solutions in the market. And that's where, and the very good thing is that it's fully open source. If you get some performance problem, we can fix it. Each one get the address once it starts. So it can balance and the setup allows you to define which kind of balancing method are you going to use. So it's pretty flexible on that. So we're not trying to just send to this one and get married with that, but it will balance as you set up the configuration. You can set the run drawing and there are other methods. Yeah, the password happens on the input, okay? For example, I didn't mention this, but internally we use some data format which is called message impact. Have you heard about it? Well, everybody knows JSON, right? Okay, JSON is a string format. When you want to parse a string format, it's quite expensive. And JSON is expensive because you don't know where each key start and where each key ends. The same for the values and what kind of values. Internally, and for forwarding, we use message impact that was created by the fluent decorator, which is a binary version of JSON. So this binary version said, okay, this field start on this way, it's a map, it has 10 keys. That's it, and you can jump into it. So when the messages go between fluent D and fluent D, they use the forwarder protocol. And the forward protocol use message impact. So it's not going to use the same parsing that at the beginning. It's optimized for performance. And fluent bit does the same. Internal fluent bit is just message impact. Yeah, it's different. Bison from MongoDB is pretty much similar to message impact. The difference I have seen personally, this is my own claim, that Bison is just used by MongoDB and a few others. The difference is message impact, it was widely adopted. You can see we have bindings for Python, Ruby, Perl, C, C++, not being used for C, but it's made in C. So it's widely used. But it's pretty much the same idea that it's a binary JSON. Escribe. Escribe is a commercial solution, right? Escribe provides you everything to solve this. Most of people switch to fluent D because of cost and because of flexibility. Because there's some point where you're using fluent D as we're doing right now at Microsoft, that they said, okay, we need some extension. So improve that is a bit hard. In fluent D, as it opens an open source project with full documentation and a lot of contribution, it's easy to extend. If you know a bit of Ruby, you can write easily a plugin. So I think that it depends of the person. I know when to say that this is times better because it depends of the use case, but I'm talking about what we have seen with customers and with people around the community. On the, okay, he asked him what needs to be on the input to use fluent D. The input can be anything that generates some kind of log messages. For example, Apache web server, generate logs, ngnext, generate logs. MySQL generate logs. Maybe you can hook up your own system, syslog, and let syslog to put data to fluent D, or syslog ng. On that server, you need at least 40 megs. And all the dependencies are, comes from Ruby gems written in C plus Ruby. So if you go to the fluent D website, you're going to get the source code of fluent D and also you get some instructions. For example, if you are using Ubuntu, long-term support, or Debian, you don't want to build fluent D and get the whole dependencies. Inset, you get what is called TD agent. TD agent is like a package version of fluent D for enterprise servers. It's pretty much the same, but TD agent is a long-term support version. That is for free too. If you are serious, people use TD agent. Okay, thank you so much. And... Thank you. If you have some questions, there's a booth, booth number five tonight. I have some stickers here if you want to grab some business cards. And I have some, for the people who made some questions, please come here, I have some t-shirts.