 All right, I see familiar faces of this is what it must feel like performing in like a high school theater like having parents around So I've been giving a lot of talks this year and I didn't give any talk until this year and My most recent one was that I actually had to go to Rooko where I made two like huge mistakes Well as my live demo sort of worked and it's actually worse than like totally not working Because like you're almost there, right? The other one is my presentation crashed towards the end. So I actually have a fix for both of them So this time I there's no live demo. So like you guys don't have to worry or get nervous. I'm certainly less nervous The other is I had the wrong operating system presentation to a combination I was using a Mac, but I had a Microsoft PowerPoint, which apparently is not a good combination So this time I'm on keynote, but I'm on Yosemite. So that might be an issue All right, so a quick show of hands. How many of you have heard of Fluendy? Yeah, because I cheated because I just gave away stickers So some of you just got stickers and if you want stickers, they're almost stickers here How many of you use Fluendy in like devil production? Excellent, like there are two people who are brave enough to admit So I'm gonna give a quick overview of what Fluendy is today with use cases and hopefully talk about internal architecture a little bit As well as like war we're sort of headed in the next year or so So who to all my I am Kiyoto Tamura That's my that's how you can harass me publicly and I work for this company called treasure data Where I work as developer relations person and what that means is I get to come to a cool event like this and listen to other really smart people about open source and I also happen to be a Fluendy maintainer, but staying true to my Japanese heritage and like I'm gonna start with a bit of self-deprecation Except my self-deprecation is actually just true. So first of all, I'm a Ruby noob I didn't know that Ruby had lambda like this morning. Let's talked about static typing I Like that as an idea, but I think that would make Ruby a little too hard for me to programming. So we'll see The other thing is I'm a Fluendy noob, too that's the contributor graph and the top six people and I'm not the sixth and The natural question to ask at this point is why is it this sixth guy or six person? Giving a talk and not the other five people who clearly did more work, right? Well, I asked them and they have like all these excuses and actually the guy to my right or left is My boss is CTO because he's like a CTO. He's actually busy unlike me And I was like, you know, like San Diego sounds really cool. I Come from this like really expensive City with a lot of rain called San Francisco and it's it's really nice to be here It's my first time and also the first Ruby gone Actually, who's the first time I hear? All right, I feel less lonely now keep clapping so I Didn't mean literally but that works too because then I don't have to talk and you guys can just clap for the next 45 minutes, right? So what's fluent D? We tried to and this is like probably the most straightforward way of describing it and I'm gonna try to unpack because this sounds like, you know, just like another distributed software So it is first and foremost a data collection tool like if you have logs or any kind of event data think think about fluent D and when I say event collection or data collection usually like you have a system like this and You know, there are a lot of scrubs somewhere written by people who are no longer with you Somewhere like really brittle and this is not really anyone's fault because logging is not anyone's priority when you're shipping like the version one but eventually it becomes really painful and Flindy tries to Clearly I didn't practice this Flindy tries to solve this problem by unifying it and of course, this is like a highly highly optimistic picture It's takes actually months to get there, but it does help you get here the second point is it's extensible and there's this guy called Sada Fuhashi who wrote the first version of fluent D and The core philosophy of fluent D is that the core program should be reasonably small and should only do The most important things that nobody wants to like deal with but important So the idea is like I'm gonna touch a bit more about each one But error handling message routing making sure that you use up all the CPUs that you have stuff like that and Delegate the rest to the users who actually have very specific use cases right like reading data from particular data sources and parsing some like gnarly custom formats Buffering the data in the way that you really want to and you have control over that and write out data Do what you care because frankly like the logging software is pretty vacuous without having a destination that you want to To use and care and deal with and finally formatting the data So I'll deal with those two. So one way to think is the core concerns Just about the common cases and the plugins is where the use case specific stuff is delegated to and it also tries to be reliable and this in two senses one is Transfer the data reliably log data depending on what kind of log it is. It's very important that you don't lose them and The way we think about it is like traditionally like when you try to move data from point A to B Yeah, the first thing that you naturally do is some version of our sink and the product is like This is not the worst solution, right? Our sink is available everywhere But the real problem with this is that data transfer does fail and the one that fails Especially when you're doing in like daily batches you start to have this issue like oh like The load that failed last night and because of the stupid encoding issue now I'm gonna have to do two days worth and like the second day Oh, the data format changed downstream So like I have to do it for three days and four days and five days and before you know like you're just like Permanently behind and the way you fluently tries to deal with this is think of like log data more as street and rather than collection of files and try to get The data moving like in smaller bits and pieces so that if like certain transfer fails and it does It can handle it much better because it's like a smaller thing to worry about And I'll talk a little bit more about this one. I go through the plug-in buffer plug-in and The other thing is a reliable process So this is like a meta reliability because it's not about the code per se, but the process around your data engineering so like look at this I mean this is like a Another rendition of the diagram that I had earlier But which is more reliable this or this right like this like you have no idea what's going on where Whereas here all the data sources go to flu and E And it's fluently job to decide where the data should go and fluently has this notion of data tagging the data and the routing data based on the tag and I'll show a bit of config file later to explain this point and The diagram that I like to use or the pseudo computer science argument that I try to use is go from M times N to M plus it meaning that if you have M data sources that need to talk to N storage or back ends Don't make it a multiplicative. You shouldn't have M times and data flows. You should have M plus and data pipelines So that's a super quick overview of what fluently is and now the fun stuff, which is How the hell is the thing used out in the wild? So The very first one is simple forwarding you have some files and it's always in files And but you also have like a new mobile app that you want to track the user behavior on and you want to correlate it to you and I come up with something really smart and It's pretty simple to do in Flindy. This is the actual configuration file I haven't tested it that actually Listens to the log file that is that viral log it should be the log. That's the top left and it also listens to the TCP port and send that data to To a back end here. It's in MongoDB and I don't work for MongoDB But it's actually one of the most popular plugins out there along with S3 and I think elastic search So it's a really simple, right? This input is where you sort of define Where the data should go and output tried to match in those tags Here it's like back in something and if it matches the data is flushed out to that up All right, so that's like the first use case the second end is basically more of the same This is a very common use case especially with the companies that deploy thousands of servers the reason is when you have a lot of data and you want to sort of separate the concerns between the ones that are ingesting the raw and the data and parsing it and actually sending that data to the back end because both tend to be rather CPU intensive and One use user that I know has 2500 servers running for in D and it's it usually comes in with like a one aggregator for like a hundred Leap servers that are forwarding data and you can pretty much send it to anywhere like all these past and IS and Yeah, and that's because the users contributed the plugins to write you those systems The the other one that's my favorite is what I call a Lambda architecture because it's cool when you can use a Greek letter But how many of you have heard of the term lambda architecture? Cool. All right, so let me explain that The idea is basically you you can use multiple storage systems to do both like batch analytics and a more real-time computation. It was coined by this guy called Nathan Mars who wrote a store the real-time computation engine and Elastic search and Hadoop here are just like examples and it could be Something that you rolled out on your own But the idea is elastic search is gonna deal with Something that you want to keep track of real-time like let's say the data source is a patch logs You want to know what fraction of your request there are four four But also you want to store all the raw data in Hadoop so that some smart person can run like a very fancy Map reduce job and come up with some kind of user behavior insights so Fluendi is It sits in front of both systems and bifurcate a data stream and write it out to both places and It's pretty straightforward to do that. The this is The the left-hand side of this fluently config is actually the same from before and on the right-hand side. I Think it's your right way or is it left? I'm left-handed so I tend to get confused. Okay The left-hand side is the same the right-hand side Changed to use the copy plugin which copies data stream so that you can send data to elastic search and Hadoop and here's another one CP how many people know what CP stands for? It's okay. I didn't know until like three weeks ago complex event processing There's a whole like field of real-time computation But the idea is you want you want to To apply computation of series of computations to data stream So like again going back to the Apache example You want to see a correlation between the side response time and some other data stream And there would be there are systems that deal with this Actually, the guy sitting over there is going to give a talk about one such open source solution called Narikra But there there are a whole bunch of others including some proprietary ones Then you can also talk on some kind of visualization of example like Some something you hacked up with D3, but the point really is that It's fluently itself is flexible enough to talk to a back end of your choice and You can sort of hack away and put together some system in front of it as well The last one is fairly recent How many of you know Docker and get this is a stupid question How many people have used Docker? But excellent, so I actually use it for actually to debug a lot of stuff for Flandy, but One of the things about Docker containers is that it's actually there hasn't been a very standard solution to aggregate containers logs themselves and That's actually one new use case for Flandy that I started to see over the past few months and and Actually fluent D alongside elastic search and Kibana were adopted to Kubernetes Which is the orchestration tool to manage a cluster of Docker containers So that's yet another new use case. We're having logging middleware. That's highly programmable Is really helpful All right in the fun part architecture and also this is like right about where my knowledge starts to get very shaky So ask questions to harass me so Like like at the highest level and this is what Flandy looks like as the data goes through the system the input deals with accepting inputs and you can write your own parser to parse it and data is buffered and The output handles where to send the data depending on the tax that The input has generated for it and finally you can format it for certain outputs And the good mental model to have right now is that the first two Mostly deal with like inputting the data into Flandy and the other three basically deal with outputting data to external systems So input plugins There there are a lot like all that's the usual suspects like UDP TCP HTTP like they're all supported Another really common one is something called Intel which is basically tell command by a little smarter and more flexible and The idea is you receive logs and you try to assign the tag and the tag is how you know where the data should go So if the tax is a patch you something well, it should go to where you want the patch logs to be stored and The one important thing is that input plugins are non-blocking in that it has its own event loop That is independent of fluent is core event loop so if something starts to go haywire haywire it should not affect the rest of Flandy and Here's actually my my favorite and probably most common input plugin. It's called the tail input and Part of the reason I like this is that like until I started to work with Flandy I thought tailing on the log file is like the the simplest thing possible but but there there's some like There's some evidence to the contrary right so first of all like this class is called the new tail input if you can read that and The reason why it's called new in tail input is that because we did a total rewrite There's like old tail input and also it says towards the end it says a little more code and Do you know how much more little code and this is actually only? 20% of the rest and There's like some more comments down there and they're actually 700 lines of logic and dealing with like I Knowed and making sure that whenever like a file is look rotated. It actually looks at the new file But as well as the old one so that it doesn't actually miss certain files that are being appended to the old one And that this is like one plug-in that I know the interface But the implementation I would never touch myself. I delegate that to more informed maintainers Another less complex plug-in is this one. It's TCP input and I stripped out some lines But again like this looks really simple, but I'm cheating again because much of the work is done by the superclass base input so here's a base input and I'm just Excerpting the the main method on message and again like if you get rid of the bookkeeping all it does is it has a parser Class variable and sorry instance variable and the parser parses a message and there's like timer record Which gets emitted into flendy with a tag that you assigned So this is a good leeway to what the parses are so Flendy has a configurable parsers and most of the time it's actually handled or coupled with the input like some input just Have it as part of it but it parses data into JSON, which is the common data format for Flendy and throughout and The usual suspects again is supported out of the box and Here's like one example of what the parser plug-in can do for you. So that is the TCP input plug-in from the previous example and once when you specified a format parameter That is a regular expression. So that is actually invoking the regular expression parser Which is part of the core and there are like some other like named Rejects so like if you just write down a Patrick to there it parses the Patrick to combine log format If you write syslog it does the sensible syslog formatting How many of you know about grok? Parcer so there's like a grok like parser to and I had to mention that because I wrote it But it's a third party. So use it at your own risk and This is what it looks like in the regular expression Parcer that's the standard stuff. It does the name capture and does all kinds of like Busy work and Emits back the JSON The next one is buffer so Buffering is really important because It's it's amazing like what happens when you try to send bits and bytes over the Internet you want even when it's local or within the same data center and The key here is make it configurable so that if you want to buffer it buffer it to the desk That's supported if you want to if you care about performance over Some kind of data loss or the risk of data loss you can buffer it to memory and The internal is pretty simple buffers basically adjust a Cube chunks and the chunk is a adjustable unit of data and I think the first issue that you run into like the first hurdle when you start deploying fluently on lots of machines is Tuning the buffer parameters. So we try to make it reasonable, but again It depends on your network conditions and how your cluster is Architected and so forth, but the point is it's it's pretty configurable So it should be able to meet your needs as long as you you're patient enough to to tune the parameters And one so buffer is actually very much tied to the output because the way it works is that When an output plug-in tries to output the data, that's when you really like buffer the data For example, like if you're trying to send the data to like S3 on AWS You want to make sure that like if your network goes down like the buffer is there and you retry it at the diff later time and Some of the plugins are not buffered, especially the ones that are involved like writing to external systems and a lot of those Came about because originally Flindy didn't have like a dedicated notion of filter. So a lot of people use output to to actually filter data and That's part of the reason why this is the most common type of plug-in that the community has contributed They're about like 300 plugins for Flindy and I would say about 200 200 and more of them are up So like if you're thinking of using Flindy and it's like oh, I use XYZ sauce and I use this like storage service Chances are good. It's already been supported. And if not, it's pretty easy to get started there's a lot of documentation and sample code online and Here's like an example of An output plug-in. This is the pretty basic one that's bundling to the core It writes to a local file and again, I'm sort of stripping out a lot of the code. But the key here is that It writes chunk to a file handle and that's it So if you decide to write your own Like let's say you have your own like new no sequel database. I Guess that's that's not that's not a joke. I mean you can write it You all you need is basically get a hold of or write a Ruby client for that and Basically do that what I what you would do to insert data to that system inside of the right method Finally, sometimes you want to format your data differently than how Flindy does By default In the past or like in its early days usually it was the output plug-in office concern But we realized that a lot of people for example wanted to output as like new line delimited JSON But that was like done pretty much by the output plug-in. So we're actually trying to make that more pluggable so with the version 0.10 point 49 and above you can actually Use full matter plug-in for certain output plugins right now. It's s3 and file, but hey pull requests are always welcome and Again, I Felt like I should write some code or at least copy and paste it. So here's a one for matter Because the data is passed around the JSON and some people apparently just want to get a particular field So there's actually a form matter called single value full matter, which I've always found to be pretty funny but the idea is like the record is the data and by default it gets the the field Called message, but you can configure that so if you have JSON with a whole bunch of fields If you just want to output like one You can actually use this full matter, but there's CSV There's like TSV and you can come up with your own like some kind of key value pairs It's it's pretty simple coding in Ruby. So that's like a quick tour of Fluendy and this is like that the current architecture But we're gonna have a new dedicated filter plug-in in the next version and this is gonna make a lot of filtering or filter like Functionalities like a breeze like those are possible like in the past like two and a half years But they were a little hard to get your like head or like They were not easy to wrap your head around because you get the data and I'm gonna put it through this particular plug-in Then I'm gonna change the text that goes through a different route. It is it's quite complex But in the next version that major update that we're releasing Filtering is gonna be really easy. So if you have something like I want to get this data I want to get rid of those particular fields before I output it to this like third-party because of compliance reasons or Like I want to get these data But only if you filter out everything but data coming from those particular host names Those things are gonna be very very simple to do and Here's my like pretty ghetto looking road map So the version one zero point one twelve is the one that's gonna come with the filter and Also, we're gonna make a pretty big API change in the next major version of major minor minor major version 0.14 and our goal is to really Consolidate a lot of those changes and release the the true version one It's kind of funny to talk about version one because we have a lot of production users now Many of them like I was I never thought that Like Wendy would have as a user like three years in but the goal is to have the true version one next year probably in the first half and Right now like at least a treasure data in Flindy core They're like two people and to be built is like a lot of people short of doing everything We want to do so we always looking for someone to join us either as like the part of the community or if you're not Got paid to do this We're always hiring for that role so some goodies and This is like one thing I'll get to play which is a UI would I think the other main Flindy guy at treasure data is Just he just released a version of our package that has this in it. So the idea is Up to this point. Flindy is pretty much driven by command line Which is great. I love command line, but I wish I had the same tool when I was first learning how to use this tool and The idea is to to make it much easier to administer and test out Flindy locally and another one is treasure data packages Flindy for all the major unixi platforms and We also do the QA so a lot of people who use Flindy production tend to use this one and if I'm not mistaken Masa the main Flindy Package person just released the latest version the bundles the UI that I just showed and I know this is Ruby cough, but I had to talk about a gopher because like the gopher is like the animal of the year so One one issue that I ran into people who wanted to use Flindy is that we don't have a very good window support There's an experimental branch, but we've never had a lot of confidence in recommending that to handle high load or mission-critical payloads so We decided to write a very lightweight less feature rich agent and go that compiles natively to different architectures It's definitely less mature. So it's sort of paradoxically situated right now because we think it's gonna run better on windows But right now. I'm not hundred percent sure. So it's fully open source. It's under fluent Flindy dash forwarder and pull request contributions bug reports are all appreciated and That's pretty much it I guess everyone is ready for the break. Maybe I'll give you a head start then. Thanks a lot