 guess we're looking for our speaker so how you guys doing by the way you know you know I didn't come with any jokes you know if anybody wants to you know let's let's meet everybody I don't know this is the first for me I haven't had this happened here before and so we're trying to reach somebody so sit tight in a couple minutes if we still don't have anybody I guess we'll say move to another room I'll check I'll get back to you real quick so sorry about this is 10 minutes in and our speakers is not here so you might want to find another track to go to on your yeah an alternative track or just hang out here meet everybody either way now might be a good time to hit the Expo floor though and grab some t-shirts downstairs as well everybody get your scale shirts and all your swag just in front of the Expo hall door yeah you guys have a good and hopefully see you somewhere around here check you guys all right I assume you can hear me all right since I'm post lunch I'm gonna give another just couple minutes for stragglers to make it in we'll get started all right it looks like the rush to find seats is over so let's go ahead and get started so today we're gonna talk about exemplars and we're gonna do kind of a walk-through what they are why they matter and maybe some information about the future of them my name is Joel Gruen I'm a product manager at Chronosphere if you don't know Chronosphere we are an observability company that solves problems at internet scale so especially around if you're having problems at scale and managing the data at scale a lot of a lot of people come to us for that reason so and we're here today and well so if you want to come down and and visit us I'm happy to give you some more details then so today during the session we're gonna talk a little bit about exemplars and you would think that if the subject is exemplars I would just dive right in and describe what an exemplar is right out of the gate not gonna do that I promise we'll get to that but I'm gonna set a little context first so you understand what they are why they came to be and then how they're used in today's observability world so to jump right in let's just go to it the life before exemplars not that long ago exemplars are pretty new but you're on call it's after midnight and you get paged and your the application your team is responsible for has an authentication service that's generating spiking and errors and you've need to figure out what it is so I assume that's not an unfamiliar scenario to all of you so you get the you get the page you jump out of bed and you you're hoping that it's something that you can just downtime and go back to bed but it's something that looks like it might be a real issue so you need to dig in and that usually means heading to dashboards so you're looking around your dashboards I'm trying to find some indication of what might be going on inside of the system and you run around and you make some hypotheses and you start guessing a lot of it based on your previous knowledge and just kind of how you understand the nature of your system but you can't really get to any kind of conclusion so then you jump into traces and the problem here is before exemplars is you're not bringing any context with you you you pop open this is a this is Yeager but it could be Zipkin it could be any other tracing application and you just start looking and you start writing queries and you don't really know how to tie what you were seeing on the dashboard to what you're trying to find in your tracing data so that's frustrating it's not very productive hopefully in some reasonable amount of time you've been able to resolve whatever you need to resolve and you get back to bed or at least hold off until the morning when you can really dig in deeper so if that scenario I just described is reflected in this room my apologies it's not a great way to be on call I'm sure you have the sympathy of everybody else in this room but it's the reality for lots of systems that don't have that ability to take the context of one system and bring it back and forth to another system so then we moved towards integrations and there are a lot of systems out there now that are providing integrations based on having two or more of those signals joined together so you might have metrics and traces you might have logs and traces you might have metrics and logs whatever the combination is some of them have all three and the idea there is if the systems are together then making that jump that contextual jump from one to the other gets a lot easier here's a simple example of Prometheus and Loki and in a perfect world you find a relevant trace ID inside of your logs and you can jump straight in and and look at that trace and it helps solve your problem there's a lot of issues with that though and it assumes that you're not sampling for example and that the trace that you actually bring over from your log is actually there and then on the metric side it's even a little bit more complex so a lot of what's happening is you're trying to take two data sets one all of the traces that your system is producing and another one which is all of the metrics your system is producing and find some find some connective tissue and so often times you'll see that done using time so a time window and then hopefully your system has the data is in great shape and you can find some label or tag and find a way to cross reference them and so if you see this in one system and you can see a correlating label in the other and you have a time stamp you've got something to go on then and you can jump across and so that's what a lot of people are trying to do now with those data sets but also comes with challenges so if you're looking at your dashboard and you're looking at errors if we go back to our previous example and that's a roll up an aggregation of a bunch of errors across a bunch of Kubernetes clusters a lot of that metadata that you use to make that connective tissue gets wiped away in the in the process of aggregating and as we talked about if you don't have all of the data if your system cannot scale and store 100% of your traces there is a real chance that the trace you're actually looking for wasn't sampled and isn't in your system anymore so we talked about that the one in 10,000 problems so especially when you're looking for latency issues there are lots of companies that sample because of sheer scale only 1% of their traces in fact I know of a company that samples 1 tenth of 1% of their traces so if we use the 1% rule and you're only keeping 1% of your traces and you're looking for a p99 issue you've got a 1 in 10,000 chance that the trace that you're going to pull will actually be representative of the problem that you're trying to solve so even with these integrations we still haven't gotten to the point where we feel confident that as we're doing our investigations we're finding the trace that we need that's going to help us unlock what the root cause of this issue is so now we can jump into exemplars so that's that is an effort to try and make a more methodical programmatic connection between your traces and your metrics so typically what happens is within your metric data you can now append or connect some other piece of information typically that is a trace ID and what you can see here on the graph on the right is those little diamond shapes I don't know how they how they look on the screen I assume you can see in the little dots there those represent exemplars that are attached to the sample of metrics that are coming in the metrics being the underlying green graph that you can see and so now we've got a way that we can feel confident that the graph that we're looking at that is showing us that something is wrong we have a way to actually dig in and make that leap over to do further investigation using trace data so exemplars work great in lots of scenarios especially when your data is fairly homogenous you want to know that the problem that you're solving a lot of your traces are going to look the same the problem is not overly complex there's not a lot of variables and that way you have a much higher chance that the single exemplar that was identified as relevant to that point on the graph is actually going to be representative and help you solve the problem that you're looking to solve it's a great way to help you jump back and forth so you get to bring context with you so in example one where you just had to just open up Jaeger and start searching around now you have a way to connect those two sessions together if you will and find better ways for you to manage that investigation and then the other good thing about exemplars is that they are getting a lot of support in the OSS community right now open telemetry is spending a lot of energy on them and I think Prometheus the 2.27 release which is now almost a year and a half old had an experimental included some experiment experimental code for exemplars so you could actually go in as a developer and tag trace ID or whatever other piece of information you wanted inside of your Prometheus request and so it does have that support that's a great thing going forward for exemplars that the open source community is supporting them so now let's let's run through this scenario again it's still a little bit after midnight you still get paged but now when you go to your dashboard and you're looking at errors in your authentication service you're trying to figure out what's going on and now you've got exemplars to help you so instead of just opening up Jaeger now you're just leaping over to to whatever your tracing application is and off you go and in this particular case we'll take the spike in errors and we'll use the exemplar that you can see there circled and off we go over to Jaeger with what we hope because it's an example of what a it's an example of a trace that is associated with that point we hope it's got the information that we need and that's kind of where exemplars in today the problem is and where we need to go further with exemplars is I still have a bunch of questions I don't know because it was the authentication service for my Android app that was airing out at in the middle of the night I don't know if it's an Android only problem or an iOS problem I'm only looking at one trace and I know that that one trace is only going to show me one of those two operating systems I don't know if it's all of my Android versions or if it's just a single Android version so again a single trace doesn't give me the context to answer these types of questions is it across all of my application deployment or is it limited to a specific geo or is it a certain region inside of AWS where all of this problem is occurring I need more context and that the challenge the reason we need to go beyond exemplars is because that one single trace isn't enough data and isn't enough context to help me figure out if I have the air that I need to be searching for or if I need to keep digging and really get to root cause so as we just talked about they're great for homogenous types of situations where you have a fairly high confidence that all of the traces look alike and that your air is likely to be found in size and we also have that the secondary problems we talked about that through the through OSS a developer can now make that tie but now you're asking a developer who's writing who's instrumenting metrics to make decisions in advance of a problem about the trace that should come that should be associated with that problem once it happens and so there's a high probability that you're that your developer is not going to have the full context maybe they'll get it right maybe they know the shape of the data really really well and they'll nail it and that exemplar will be right on the notes a real chance that that exemplar that they've chosen to attach to that data point isn't going to be relevant enough to help you solve root cause and get to the bottom of what the issue is so that's why we get to kind of looking for that the needle in the haystack so you've got an example is it the right one does it have all the context can it help you answer the question that you really need to know the answer is probably not so what do we do let's talk about going beyond exemplars so the first thing that we want to do is we want to have a system that doesn't force me to have to sample down my data I want all of my traces whenever possible now that is not feasible for all organizations for a variety of reasons but you want to make sure that the system that you're using at least allows you to have that flexibility so that when you're pulling up a trace you don't run into this one in ten thousand problem that you have a realistic belief that the trace you're looking for is there because you didn't because you were able to sample it and then you also want to be able to look and have the flexibility to go across the entire population of traces so we want to talk about looking at a set of traces instead of one trace and using that to help drive the investigation so rather than being handed a trace that was predetermined by a developer weeks or months or years before I want to look at the traces that are relevant based on that particular moment in time on my dashboard and then I want to use that to get the context that I need to really get to the bottom of it so let's walk through what that might look like in terms of a demo I'm just going to refresh this real quick so this is a dashboard I got up in the middle of the night and I can see that there are some errors here and in the same way that exemplars work the power of this is I can click into this and it can take me to traces that I know are relevant so now I'm looking at a bunch of traces but instead of instead of a trace what I'm looking at is a statistical analysis of all of the traces that occurred given this span filter that I added by clicking on that point in the graph and the timestamp in and around that timestamp I'm now looking at a set of traces the more the better and I'm looking at a statistical analysis of all of these traces so that error was on this demo music front-end service and I can see that play track is the operation that was airing out so I've already added that to my span filter and now I'm looking at a whole set of traces that have that characteristic and I'm able to make some conclusions about that obviously I can see that that air percentage is a hundred for that particular thing it might be interesting to me to find out that this elastic FE track is also airing out at a hundred percent so I might want to investigate there I might I might care about some other information here I can also go look at the metadata that came along with that so this is analysis of all of the metadata of all of the traces that were in that sample set that I did that I created by a simple click on my Grafana dashboard and I can see some things that might be interesting here I can see that I know that I care about production because it's the middle of the night and if it's in a if it's in a staging system or some other system I just don't care so I'm gonna add that to my trace or excuse me to my filter and then rerun this and see what kind of information I get so now I'm starting at the top and I'm starting to narrow down my search based on the factors that I know and the interesting thing for me is let's just say I care about air percentage because that's what I'm searching for and it's interesting to me that this Geo Japan 73% of all of the traces that are in my sample block coming from the Japan region are airing so that's an area that I want to investigate further so with a simple click I add that to my span filter so now again the funnel is getting smaller I'm starting to bring down that that scope of what I need to understand to find the right trace I may find that this is the spot I feel confident that I now have traces I can now scroll down and look at the most relevant traces based on the filters that I just added I can see that they're all resulting in errors because that's what I'm trying to solve for and maybe I care about the ones that are taking the most amount of time so I'm going to start my investigation by clicking into this one and starting to dig in so now instead of just having a single a single trace based on an exemplar I've been able to start with the same idea of linking that data from my metrics into my traces and use the aggregation of the data in and around that time with as much of sampling as I can to narrow down to what I have a high confidence in I'm now looking at a trace that has a relevant amount of information to the problem that I'm trying to solve and I can see the critical path in here and I can see a bunch of things and we won't spend a bunch of time going into that but where we want to go with tracing is the ability to not just have hope that we magically pick the right one but to bring in a group of them do some statistical analysis of them and use that to filter down to the place that we want to go so with that I will let you know that we are in booth 215 for Chronosphere come on by we're giving away an Oculus Quest 2 headset so you can come by and have a chance to win that a lot of smart fun people down in the booth ready to engage and answer any questions you might have and I with that I will open the floor for questions that you might have for me yeah it's something that we believe is not supported by open telemetry today it is an area that we believe open telemetry can and should move towards to give users of traces that broader context to help solve problems there yeah and to be clear exemplars certainly have their place and in the right environment they can be quite powerful and they already have as we talked about earlier they already solve a problem that helps you move back and forth between the two but too often you find yourself jumping into an exemplar that's not the right one and then jumping back out and then looking for and then clicking another spot on the graph and jumping into an exemplar what we want to do is take that whole area around it and aggregate it and do some analysis of it and help you find the answer that way last call for questions okay thanks hello can you hear me thanks since we have three minutes I have some shirts here if you want please just came here they're all large since you are the first ones so yeah we have more in the booth if we run out of them but yeah they're large but not like really really big large yeah okay yeah there's no there you go oh we have some female sizes too who and don't tell the others thanks one some sicker oops sorry man okay you can be done okay your hologram speakers you have some sicker excuse me okay oh you got we got it okay we have this we have a plant it's anybody of you new to to login or to flew in bed or you are some you're already using the technology just asking to know how how deep we go in the in the session okay yeah we would have to wait for the AV person because there's not not getting the video place this one well we got some video problems yeah for some reason it's not taking them the video input thank you yeah we appreciate yeah there's was nobody outside so input one how do we switch the input oh thank you hello oh now it's fine thank you so much oh but I cannot see my screen now that's yeah let me check here this you know I recently switched to a Mac because to avoid this problem from Linux and now I'm facing this on the mic okay and the false scale so if I connected display and mine turns off how could I turn it on again oh I know what it is this is yeah let me try something else yeah that one is working yeah now there is an application that managed the screen so I'm going to first quit yeah it always turns off the screen when I connect something so if it doesn't kill it should be fine there we go thank you well thanks for your patience you know technical problems happens any anytime so well you are almost only that most of you were familiar with the technology and when we give to these sessions we don't know if it's new people around observability or not but pretty much everybody knows when beta thing so I will try to go a deep dive on observability so not deep dive but a quick overview and then jump into internals of the project and where the project is going and then have a good Q&A around architectures data patterns and everything that a might help with your deployments so my name is Eduardo Silva one of the creators of a fluent bit and now the founder of this company called Calitia which is the enterprise layer on top of this open source project and a CNC maintainer in the inside the flu in the a graduated project so one of the things with observability is always the question is what the state of my applications right and that's why this kind of tooling exists even if we think about individual servers we if you want to monitor an application is typically we go to journal DC's log or try to see how the logs are behaving and from the other angle if you want to get some kind of signals around metrics which are numerical right we try to go with top a PS or nest that of everything that gets processed through the props of system and this is ideal for single node environments we know that nowadays we don't want to SSH into a machine to see how it's behaving right we want to extract information in a different way and maybe centralize this information and that's pretty much because we are playing with distributed systems everybody now it's mostly deployed on containers or multiple VMs and if one application gets replicated across your stack right and it's really hard to into SSH into each one of them so that it is not longer necessary so how do we solve this problem and one of the option is like we have different tooling and services to ship logs and metrics from each single node and one of the approaches from architectural perspective is that we try to centralize all this information in one place so the user can perform the initial task right which is data analysis everybody what does login and metrics not because it's fun but you have to in order to understand what is the state of your own applications so and when we talk about observability we are start talking now about different signals that exist right one of them is metrics we have all the first one has been locked for years and people starting doing metrics as locks the user starts chipping certain metrics as locks in the past now that we have native payload for metrics and also we have all the kind of signals like more advanced which are traces but that requires certain instrumentation inside the application which are mostly created for a different purpose than locks different purpose than metrics but if you have an architecture where you have a microservice that depends of other microservices you want to understand what is the workflow what's become the application behaving from the first request and to the last one across different endpoints and that traces is really important for in observability from a project perspective and in the CNCF stack and outside of it there are many projects right you will find Prometheus for metrics EVPF where you can get a metrics directly from the kernel and get exposed to a user space open telemetry that's trying to solve all locks metric and traces from just when it is specification implementation perspective we have Jigga for tracing fluent initially for locks open metrics which is kind of a subproject of Prometheus for metric spec and also Cortex so when we get into the journey of observability we always get this problematic of okay which tool should I start with right and actually always the answer is what kind of problems you want to solve but it's not that easy because we know that most of these tools runs as agents of services and once you deploy them or maybe you get certain scale like I don't know 100,000 it's not that easy to start switching the agent switch configurations and connecting the dots between all the all your observability stack because at the end of the day this is like the first mile right how do you get the data or how do you try to send the data to a different place because every user even every customer has a different kind of back and some of them prefers blank some of them use open search or Prometheus standardized with refining Prometheus or any kind of other solution and this session is mostly about fluent bait and not just the break itself but the role that is playing inside the CNCF and the cloud native a environment the one of the visions for flu embed initially started like a high performance version of flu and D for embedded Linux I'm talking about five six years ago but quickly evolved as a solution for containers because it was more performant lightweight and could perform 10 times more than flu and D flu any has certain restrictions single threat is Ruby it does the job but a certain scale for nowadays demands of data management is the scale of data that we have sometimes not enough and the way that we see nowadays flu embed in the last two three years has been evolving because if you go to an environment it's not like you have one solution in place for everything you either has Prometheus endpoints right where you expose a metrics while in the other environment maybe you are experimenting with open telemetry and then you are starting adding I don't know certain features set for each project and you don't have something that is uniform in your stack what is uniform is that most companies try to get open source vendor agnostic yeah that's where we are now right but the next ladies is like are we trying to unify the just in a single protocol it's hard to say it didn't happen with locks right we're still getting with this look which was really I don't know 30 40 years ago and and the thing is that the way that we see fluent bit is like it's becoming like a good I don't want to call it like a proxy to connect the dots between different implementations fluent bit started as a solution for locks high performance but two years ago we started implementing extending the scope to metrics handling but not to replace Prometheus but integrate with Prometheus managing and having its own native schema for metrics and now we are doing the same for open telemetry right so nobody no matter why you have in your environment you can play sense with this lock with locks or just do Prometheus to open telemetry by having having the same agent that you have now so you don't need to deploy anything else right and one of the values of this open source agnostic approach is like yeah vendor lock no vendor lock in you don't get you don't get tied to a specific a framework not get tied to a specific implementation so Indian from it has been around well fluent be 10 years 11 years fluent bit for six or seven and now well who's using from it we know most of cloud providers if you deploy any kind of Kubernetes cluster on EKS or GKE you will find that fluent bit is running in there and we're seeing as maintainers that in the industry is trying to get rid of multiple agents in the environment because when you have multiple agents for different needs right historically they were created for that it's really hard to maintain you get this problem or where you are upgrading your operating systems because of security reasons and of life and then the next problems what about your application the dependence of the applications and observability also has a dependency on that are your agents be able to be portable across different distributions we have many users that run center six still yeah they're going to move to rocky linux of different stuff yeah but that transition takes one to two years two years so it's really important that the things that they were using is able to be used in different distributions even the ones that are not supported anymore because it's a lie that if we say yeah everybody will migrate to the latest version in production that don't happen quite easily and one of the highlights of one bit well it's a production gradient enterprise high performance I really said about all of that and the way that you can consume it we have the upstream version right so you can get it for packages you can get the container image AWS has its own distribution called AWS for fluent bit what is the difference it's like they have their own specific goal and custom plugins in their images so for they own Amazon needs but it's the same upstream version there's no changes actually we we were pretty much together we have maintainers in AWS that works with fluent bit at Calypcia our company we have our own fluent bit edition where we call it LTS that is a private edition but long term support mostly for banking institutions that need to run 18 to 24 months of security updates because they cannot follow upstream releases like we release a new fluent bit every two weeks and sometimes things could could get break and Google ops agent is like a new agent in the market for Google needs in the cloud right which bundles open telemetry for metrics and bundles fluent bit for logs a management and pretty much the concept is pretty easy we always try to standardize and get any kind of information from any type of source and be able to send the data to any type of destination that is new in the market or that the user needs and nowadays we face more than two million pools from Docker have every single day which is a is insane number of deployments and this is growing and one of the problems that when you get more adoption you get more bugs and more enhanced requests right and developers don't grow exponentially actually it's totally opposite so but it's a good challenge you know more adoption more problems but also more fun now getting to more specific terms about looks metrics and traces looks has been something that has been supported from the beginning we deal with a structure structure messages is scheme a list but also we provided the option to process data as soon as we get it into the pipeline we can run filtering to reach it with Kubernetes metadata AWS metadata or maybe if you want to bring your own business logic into the fluent bit pipeline you can run a Lua script and Lua allows you to modify the records or do any kind of enrichment or maybe drop the data that you don't care about before sending the data out to a destination in metrics this started a beginning of 2021 we define in the schema for metrics when we send metrics what is the difference that looks is a different type of payload right because metrics as pretty specific a locks is just a bunch of key value pairs and metrics is defined and we have a type like a counter gauge right and you can have a histograms but also they have dimensions like labels right so and all of that has been implemented in fluent bit in a way that the data our own data format can be compatible with open metrics which is a permit format and open telemetry so any metric that we get from any kind of type like permit use or our own internal metrics we can expose them in a different format that the user needs and from an integration perspective we have a Prometheus scraper so if you have some applications running with Prometheus endpoints you can use your same agent fluent bit to scrape the metrics and expose those metrics somewhere whatever you want we have a replication of the node exporter project by Prometheus inside fluent bit so you we can collect the same local metrics as this external print a project does and we can do Prometheus exporter or we can do Prometheus remote right Prometheus remote right was never intended to be like a protocol to transfer metrics in a public way but since most of vendors even for example New Relic or Google Cloud they allow to receive the data in Prometheus remote right that's why we implemented this in fluent bit so if you have the agent yeah you can connect to any kind of vendor that implement those protocols in their own endpoints open telemetry is a new big thing right the industry is trying to standardize how to do and how to I would say okay know how to collect how to receive the data in a unified format that most of vendors can take advantage of it meaning like if today I'm a user I'm using company a where I'm sending my data right I can switch to provider B without any disruption change just changing the protocol just changing the endpoint so everybody who supports open telemetry right I can switch between vendors and that gives a freedom to the user now a open telemetry as an specification is a framework is a previous a framework it tried to implement all about logs metrics and traces and the way that we approach this is like we implement the open telemetry protocol OTLP inside fluent bit nowadays this is experimental in the input side meaning where we receive the data we support metrics and traces right now and in the output we support log metrics and traces this is all pretty new implementation for a few months ago but we are really getting some contributions back for example link it in is really interested in to get tracing support in the input side of flow embed flow embed is deployed widely and link it in in Kubernetes clusters right so they they are instrumental applications with open telemetry but they don't want to switch agents so they are extending and contributing back to fluent bit to be able to to flow this new kind of a information and as part of the internals and this one knows the intention of initial proposal for this presentation but we know that the audience of scale is very technical and like to go to deep dive on how things works so I'm going to talk up a bit of how this works internally so a fluent bit as I said it was designed with performance in mind right and all the data that gets into fluent bit internally when we manage it we do some kind of binary a celerization right even if we get a JSON that gets a binary representation inside fluent bit and once you get the data that data needs to be somewhere and we use different kind of buffering mechanisms either in memory or file system which is the kind of hybrid mechanism because one of the problems that as an agent is that yeah you get the data but it's not as easy as to get the data and send the data out the problem is that when you want to send the data out network fails the end point is down so what do you do on that when that happens right that's what we have buffering mechanisms and what about if the agent crashes do you do the data no if you enable file system buffering because you have a backup of that right we don't have unlimited resources in a machine right but we try to provide the right a layers so the agent can survive under different failure scenarios a log workflow in you know this is pretty simple application trigger a message right and that message could be an instructor message instructor could be any route text message that we understand but necessary for the computer is just an array of bytes and if we think for example in Apache log file that becomes multiple lines in a file but when we do log in what we do is trying to centralize this in one central place process the data sorry getting to a central place of unit processing and then be able to send this data to the destination where the user is going to consume this information for analysis it could be anything we splunk amazon 3 open search or elastic right the agent has a lot of work to do right one of the things is like be able to collect all these kind of events provide mechanisms to parse the data because you don't get a binary representation with a structure you get route data right as I said sometimes if you're running for example in AWS you want to reach your logs with the metadata for that host what is the host name or maybe you want to append a label to those records because when you do an analysis you want to create the data not by patterns but by user labels or certain metadata manage a buffering and be able to send the information to different destinations and we call this like the output destinations or backings from a project perspective we try to support most of them that are available in the market and I think that in between input filters and output we have around a hundred plugins available right so and it works like this input processing output it's no more complex than that from a usability perspective and in the internal side the input plugins or the input interface what it cares about is IO this network and internal metrics while the engine internally cares about parsing filtering serialized data do buffering routing routing meaning decide where this data is going to end up and schedule how this data will go out and if it fails how I'm going to retry this data so this is how do we prepare for different failure scenarios but the user needs to define yeah this is my retry logic if things goes wrong this is how I'm planning to move forward and in the output plugins they have a different set of tasks that to do mostly is around network setup format the data you remember that I said that we serialize the data in our own format yeah output plugins are need to care about take that binary data and format that data for the specter format of the destination for example elastic search waits for adjacent right and plan adjacent but has a fixed schema so we do all this translation on the output plugin side right and that's the main job of a fluent bit now from a data salarization perspective we use a specification that is called message pack message pack was created by the fluent the creator right and message pack was created before fluent d and the goal is that you can have this all this kind of structure with key value pairs from different types in a very optimized side it's not it's not compression but it's just using binary representation that you save a bunch of bytes in general as you can see for example in the latest one imagine that if you have any kind of representation in json that has a new value a new that is for bytes right you don't need for bytes i don't know i was thinking today when i was doing this is like what would happen if json specified that instead of null you just use one end can you imagine multiply that for thousands of messages per second how many bytes you can save and that saves that's a cost in the company that's why sometimes people prefer some binary encoding than json when transfer the data that this is well message pack this is a comparison how it works and when we get the records inside fluent bit we serialize them in a message pack and we put them in a concept of chunks a chunk is a concept of multiple records binary their binary their serialized and they have a tag a tag is like a label that is something representative for that information because when you're going to do routing routing meaning i'm going to send this data to some place we use the tag to decide where the data will go okay and when the data gets in by default we have we have some kind of full of a buffering in memory so we start putting all these chunks in memory while they are being processed right and some point we are in this is like a queue we're putting them in memory and at some point the other part of the engine start pulling them out and passing these chunks to the to the output plugins who cares about shipping the data to the right place and this is a fast and mechanism because it's just memory usage the problem is that it's not persistent and one of the biggest problem is that you don't have unlimited memory right what would happen if your destination is down and you keep ingesting data right it's going to consume all your all your memory but before that the kernel will do something really good which is kill the process right because you cannot it's not all you can eat right and one of the solutions for this problem is that a hybrid buffering mechanism where we use memory and file system at the same time what we do is make the engine write these chunks to disk but the good thing is that it's not like a normal IO operation with reader write system calls what we do is kind of manage all of this with memory mappet files so we get a memory representation of the file where we write to memory and the kernel makes sure to synchronize the changes to disk this is kind of a strategy that database use mostly but in general if you enable file system buffering in fluent bit what you're getting is memory mappet files which is a really performant and has less overhead than using normal system calls when using disk for example fluent D use normal read write for the file system and that adds a lot of latency and of course it has some performance penalties this mechanism is really fast I would say well 99% of fluent bit production environments are using this kind of setup in fluent bit 2.0 it's coming at the end of September and there are many good things that are coming I don't know if I told you but at the beginning fluent bit was just a single thread process right and the way that used to to work in order to scale is that it has a main event loop and this event loop when it's time to ship data to the destination it just creates a coroutine which is like a lightweight thread and do a lot of a suspend and resume operations on the output plugins so everything thinks about window 3.1 where you have just one single CPU but you can do multitask stuff right it's like the kernel behind the scenes or even in Linux right is hiding all this context of switching context in the code but in fluent bit we took the same approach after some time the performer was great but it was not like the companies needed so we implemented threading in the output side now when it's time to take the data convert to JSON payloads do all the networking TLX hand check all that involves what we did was create all of that deferred all that job in an output thread and nowadays if you're running fluent bit by default most of the output plugins run in a separate thread now what is coming with fluent bit 2.0 is that we are implementing threaded input plugins because the problem that we're facing the output now we're facing it in the input side most of users now are using fluent bit as an aggregation tool we receive many connections right and in order to scale and be able to process all that data we need to take advantage of CPUs right pretty much a receiver will have more than I don't know 30-40 CPU courses so if we do it in separate threads of course we can offload the work to different destinations also one of the requirements that we got some time ago is like we support a goal in output plugins for some time but we got the same requirement for the input side it's like from a user and developer experience so how how can I extend this pipeline? how can I add my own protocol or my own handler? so right now we're native goal and input plugins so you can write your own goal and plugins those get compiled as a shared library and you can just run fluent bit and tell it hey the shared library file is here this is a goal and plugin just run it and it will be there to go as I said open telemetry also is something that is going to be shipped officially you can find the open telemetry work right now in the 1.9 series but yeah we call it kind of experimental we got some users but we are making sure that this gets to that O release and another interesting feature is TAP I don't know if you are familiar with the concept of TAP in observability imagine that you just set it up a configuration the data is flowing and at some point you said hey I would like to know what kind what is the data that is flowing from this input plugin how do you query that yeah you can go to the database but that goes through a different filter destination but how you can get a snapshot of the data in real time and that is TAP so by TAP it's a new HTTP endpoint that we have we will have in fluent bit so just issue a request and fluent bit will create a window where from that minute or whatever you set it up it's going to give you a copy of the data that is flowing through an input plugin so you can say oh this type of data is flowing right and when you find because most of the problems that you find in configurations are not in the config are when you store the data and then you try to do some troubleshooting and you find oh my data has the wrong format why something changed because you don't control who's sending the data right maybe the developer on the other side who's sending the data changes staff or enable some dummy flag on the on the on the tool and the structure of the data changes so TAP is a solution that allows you to inspect all these issues and do some kind of real time debugging of the data pipeline and well in observability is like a suggestion always try to be fluent like water so try open telemetry try Prometheus but always making sure that things could go smooth I think that fluent offers a good a good example so this is not about replacement tools I think that the opposite in observability is be able to integrate with the others and the new projects that are coming in right and bless you as a I don't know one of the futures that I took in the I don't know this is too small let me check in the presentation was the ability of fluent bit to run to scrape a metric from Prometheus or the node exporter so I would like to show you the a simple configuration that mimics what is not exported from Prometheus so as you can see here or it's too small can you say it or too small too small better now better okay so very fine pretty much in fluent bit you define a pipeline with inputs and outputs this is a classic configuration we also support the ML mode so in the input we have a plugin called node exporter metrics which mimics Prometheus node exporter metrics what it does is pretty much gather all the metrics from the host where it's running right and export that in Prometheus and you can use it on your normal a graphana dashboards and what we're doing in this example is going we are going to publish those metrics in a Prometheus exported endpoint inside fluent bit right so this runs like this we got fluent bit running and it's exposing the metrics on port 2021 so as you can see for example in we can scrape this Prometheus endpoint like that and we'll get all the Prometheus information so in the same format that node exporter does that using fluent bit let me put some color in it it's like prom color there you go so let me scroll up so you can get the same metrics but by using a fluent bit and talking the Prometheus a protocol of course you can do other stuff too maybe you don't want to do this kind of networking stuff you can ship the metrics by using I don't know the same metrics if you want to expect them by using std out are you going to start getting the metrics in the standard output format once they are collected so we're going beyond logs metrics is something that is being heavily used right now and well the next big thing is traces that we are shipping now so yeah I wanted to show you what is going on with the project and what is the next in the roadmap but also open the discussion for questions that you might have of I'm sure some of you are fluent bit users so please just raise your hand and I was instructed to pass this microphone okay the question is what are there some confusion in how to explain the Prometheus configuration to understand are we pulling and getting the data somewhere or are we exposing it okay the I'm going to explain this way and just running the helper for fluent bit so the input plugins for fluent bit I'm to collect or receive information from some source right this plugin that we have here which is the one that I use in the configuration is not exported metrics what it does is a copy paste of the Prometheus not exported project but inside fluent bit so fluent bit is the one that gather all the metrics from proc file system create the internal metrics payload and get it ready to chip we are not using the Prometheus tool this is just purely fluent bit okay scraping I would say I'm scraping locally from the file system from the proc file system we are not using not exported natively and the way that I'm exposing this information is by using the Prometheus exported output plugin what allows others to scrape my metrics I can this is like the pool model right others pull my information but also we can do the push by using remote write yeah we can expose any type of metric either we also can scrape metrics for example if you have what application your own application go online Node.js that has a Prometheus endpoint right we can use the Prometheus a scrape plugin where fluent bit will scrape your metrics and get in the pipeline and from there you can do whatever you want sorry didn't hear the question how often the interval by default I think that is two three seconds but you can configure the interval for a for scraping or not exported each one has its own options let me check now export the metrics use h here's scraping interval five seconds okay the question is if we have any other plans to integrate other type of input plugins external like in different languages yeah goal link is something that we're shipping now for September yeah filter is an interesting interesting question because the thing in any filter we support lua scripting and most of users are okay with that there are some users that are saying I'm going to implement my goal goal and input plugin but I don't need a filter because I'm going to implement my filtering logic inside my own plugin so the demand for goal and filter is pretty low yeah actually yeah running goal and code is really heavy meaning like the context switch between see and go is really expensive yeah that's why we're implementing threaded input plugins so your goal and input plugins will run in a separate set and will not affect the main pipeline performance I would say that the impact there's a performance impact for sure but it's I would like to refer you to a document that AWS wrote about that because initially they had their own output plugins reading in goal and after some conversation and some beers we convinced them to write it in C and they come they did the conversion to see and the performance went up right so but after that there's no much research on performance on that that area actually remember that goal and input plugins is pretty new so we're just going to start getting all this feedback so we're just making sure to run this in a separate set right so pretty much you called will be running a pin into a different CPU I will not affect the performance of the main pipeline the problem not the problem but the downside if you use a goal and input plugin or a goal and open plugin is that you're using your own API your own external packages right that's one of the problems that well most of language it has that's why in fluent bit we prefer to implement almost everything from scratch by using our own API because we know how to optimize memory how to optimize IO but when you're running goal and you are by your own yeah so it's like yeah but you can write plugins faster you can prototype faster and then yeah well AWS they have many customers running the goal and plugins while others run in the C plugins yeah sure please yeah we have time so it stopped retrying when sending the data out when flashing yeah by default we have a default policy or three retries two or three by default you can make it unlimited if you want but yeah by default is three so you can set retry, disable or off I think that is the option or just set a different number yeah it's it's not yeah it's an exponential but has some special jitter so they won't retry at the same time yeah it will be a random value but it will be incremental based on a factor so if you have 10 chunks that it needs to be retried they won't be retried at the same time otherwise you can saturate the you know the end points yeah it's a question if the input prometheus scraper has a dynamic configuration oh no right now the config is pretty static so you just tell the end point yeah and that's it yeah there's some requirements about that it's like what yeah because prometheus end points sometimes you are scaling up you have more how do you do survey discovery to all this stuff I think that for incovernitis that is solved you should use prometheus with survey discovery right but outside of Kubernetes yeah if there's no more questions we'll leave it here thanks so much for coming and somebody needs a shirt who didn't get a shirt all of you okay awesome thanks so how do I how do you mount this just like back here okay hello can you hear me awesome awesome wow this is pretty loud no I think it's fine I'm just naturally kind of loud and how do I get my stuff on this screen press a button if your slides are not just press the button I'll have you get yeah I know it yeah just want to share my screen press a button if your slides are not displaying and the buttons this button do I have to oh there you go and maybe it just didn't push it hard enough okay cool it's a little hard to read isn't it turn off the light maybe it's this so fuzzy oh man I know but I'm like trying to see okay so the views near the edit button okay view light show yeah welcome come on in Carlos pretty good good to see you in person finally my friend where's your necklace you snuck in oh okay here Carlos and I used to work together should I give people two more minutes or should I just start or okay well there's something recording so I guess I'll go to it so today I'm going to talk about an open source data pipeline for analytics and observability it's a combination of a real-time platform like Kafka with an analytics database called Apache Druid so basically the idea here is to create immediate intelligence essentially what that means is you're building a data pipeline so you can look at various use cases we can for the architecture of Druid and I'll explain how it's a very fast database so it's basically designed just for analytics then we'll talk about Kafka itself integration of a platform and how to also use it for observability use cases quick show of hands has anyone here heard of Apache Druid before oh okay okay half the room knows Druid I like this all right so for the other half the people have not used Druid before in this room Druid was basically built to design a real-time analytics platform it's designed for low latency querying and it breaks up data into chunks of time so every query you do is on time so big picture broadly speaking I view Druid as a combination of Cassandra and Prometheus so it's a time series plus a column or database we'll get into that in a minute so Apache Druid was started over 10 years ago by the founders Vadim Geon and FJ and Eric they started it as a company called MetaMarkets it's an open source project that they started as to do analytics on ad tech data so fast forward today there's over 10,000 plus community members 400 plus active contributors and over 450 code releases in the last decade and now over 1,000 companies are using Druid from digital native firms to Fortune 100 companies I'll just name a few off the top of my head we have Target that's one big company you've probably heard of Charter Communications Cox Telecommunications a lot of major banks and financial institutions use it for like fraud analytics and so on so it's used by a lot of companies and it's designed for just one thing Hyperscale so it's a fast-scalable database it's designed to look at real-time ingestion of streaming data this is queryable just a few seconds later so from a Kafka producer application to a dashboard 50 milliseconds 50 milliseconds so you're basically able to query data essentially instantly and the real use case is on combining real-time and historical data so if you have data in S3 or Hadoop or data lake somewhere and you also want to cross tabulate it with a Kafka stream like a click stream you have the facility in Druid to do so so basically the idea behind Druid was to just create an analytics database it's not acid compliant it doesn't do transactions it's just for querying that set so what are some of the use cases that this is all about well if you look at things like risk fraud so when it comes to exploring risk and fraud you need to look at a lot of fast-moving transactions you know credit card payments you need to do a lot of queries very quickly and you need to look at many different things like a customer profile something that falls outside of your normal purchasing profile maybe you know they live in middle of Midwest all of a sudden they bought a coffee one minute and then two seconds later they bought something in Sri Lanka like a boat and a house obviously that's a risky fraudulent transaction and you need an engine that allows you to query all this data very quickly so you can lock down these cards next you have data-driven applications so websites and apps if you've ever looked at like a Unity before or Twilio they have like an analytics page that they service their customers so basically what they do a lot of data-driven applications they need a layer that lets their customers slice and dice and explore information so basically UI not for procuring orders but just powering a dashboard like a Grafana a SuperSet you know a Tableau a Looker a Power BI all of these things can be made exponentially faster of Apache Druid as a core technology another use case is networking right so I think a good example of this would be like you know Cisco you've heard of the company Thousand Eyes it's a subsidiary of Cisco so they use Druid for network observability Splunk had Eric Toshetter he was one of the founders of Druid as well and he was actually creating the Splunk Observability Platform built on Druid you can also look at you know charter communications cocks major cable companies they use Druid for querying their whole network platform so they can determine you know within milliseconds 50 or so milliseconds whether or not there's an anomaly in the network like a spike in traffic a drop in traffic etc so that's kind of the use case around network observability you need to look at a lot of data within a less than a second trigger some kind of alarm because you know a service provider has SLA right they need to adhere to click streams yeah this is this is a lot of fun right so I did an evaluation with a major dating company and they basically would do what are called experiments so they would present a new experience to customers and we would get in all of customer data I'm talking five terabytes a day of data and they wanted to basically slice and dice this information to understand how a particular user would experience this experiment in each experiment was like 50 features so we're ingesting like 50 features four terabytes a day and they want to query this data in less than a second in like this dashboard so that's kind of what a click stream analysis is you basically take a user profile and you link it with a click stream so that could be like like what are they clicking on on a website what are they doing on a mobile app and you need to correlate it and make sense of both the real-time data you know the Kafka stream and the historical data the user profile information as well so when you need to join a lot of data slice and dice it in less than a second this is a use case that we really focus on and finally a digital advertising like I said Druid actually came out of meta markets one of our competitors is called real data and they're another entity out there that works on Druid and basically the founders of Druid left meta markets start employing an open source company and you know we're backed by some very good VC firms a lot of large companies like backing Druid itself and we'll get into that a little bit later so what is Druid so essentially Druid is a microservice architecture it's horizontally scalable in three dimensions and what I mean by three dimension is you can horizontally scale the querying layer right if you need to service more queries for the users more concurrency cache some queries you can add more brokers now if you need to store more data or ingest more data then you can add more data servers so these data servers basically they do ingestion of streaming data and batch data and then finally there's a deep storage layer as well so part of what we'll cover in later on the reliability of Druid is all of the data that you ingest into the local nodes are backed up into the cloud on an S3 bucket on a Hadoop data lake on a GCS on an Azure Blom Store pretty much anything you can think of Druid will back up that data to there and then finally there's some master servers which are actually a combination of coordinator nodes which handle like which segment to serve for which machine and it's also a patchy zookeeper so you have all these different dimensions of scalability available to you as Druid and that's why the ability of Druid to scale is essentially infinite you can have n number of users store n amount of data and return data results in less than a second because most of the data is stored in memory on cache and SSDs so what makes Druid so fast so Druid has four strategies that make a really big difference data in each period in Druid's timeline uses a column or format to compress the data second it constructs indexes on those timeshotted columns to enable fast filtering of data second it constructs indexes on these uh so basically what sets Druid apart from the query engines that run in data lakes and third is all of the operations happen on compressed data as a priority and finally it does the work of constructing the final result very late in the process so first compresses take secondary indexes operates only on compressed data and then materializes the results to the user so you're not reading every single row reading a fraction of a fraction of a fraction of your columns but the promise of the whole platform is extreme speed so this is a layout of a Druid segment a typical Druid segment is going to have a few million rows in it this segment over here has eight because that's how many we can fit on the slide now this segment has five columns so in Druid we store data oriented by column the first column here is time as you can see it's um a certain integer the second column is actually called artist so you have uh the first one over sorry over there artist the artist column is a string and there's three components to it yep and then there's a third column column called cities for ticket sales so we have time, artist, and city price and count so you can look across the screen horizontally so the artist column is a string there's three components to it the data component there's only one entry there per row and that's just an integer so these are the first three that are zeros the second two are ones and the next two are 22 so what this is saying is this is a dictionary dictionary encoded string of values so the first three rows meaning uh 00 means that the first three rows of this particular data set have artist ACDC and in every of the first three rows and then the next two we have both Kylie and ACDC then after Kylie there's silver chair which we won't really cover in this presentation so we do this this is a form of compression so it makes the data section smaller especially when the number of unique values is low relative to the number of rows in the segment which is column common for a lot of columns it's not universal but it's common in us that this is quite helpful but it doesn't just perform compression it also enables us to operate on compressed data so you'll see in a bit what we do is when you're doing a group by artist or a group by city what we're going to do is actually use the numeric keys to do the grouping rather than the strings because we can process numbers faster than strings the next section is dictionary the dictionary tells us what each dictionary code means and the third is the index and the index helps us filter so what the index is saying is that there's going to be a single entry in the index for every entry in the dictionary and they're corresponding one to one and it's a list of rows that contain the dictionary value so the first section of indexes which have rows have ac, dc, and so on these are also stored and compressed and they're stored compressed using algorithm called rowing bitmaps and they also operate on them in a compressed format so if you want to do a compression we're actually going to combine two compressed rowing bitmaps together into another rowing bitmap and finally the last two columns we can skip city so the last two columns on the far right of the screen are columns in price which are both numeric columns just like a timestamp column and they're stored the same way so let's do an example query so here's a simple query and we want to look at all the ticket sales for ac, dc and we want to break it down by city group by city and then we're going to get the total price in every city so a query like this is actually quite typical of the kind of query you're going to do with a system like druid you're going to do some kind of filtering some kind of grouping and then break it down into an aggregation and these are the kind of queries that are going to power these analytic experiences we're trying to power and we want to be very flexible in the kinds of queries we can do so we want to be able to support any kind of grouping and aggregations and because of that we're not going to want to pre-cache anything we just want to focus on computing whatever gives us the resources as quickly as possible so the first thing we're going to do is resolve the filter and we're going to go to the dictionary and we're going to see what the code is for ac, dc the dictionary is stored in sort of our order so we don't actually have to read the entire dictionary we just do a binary search through it to find the correct value and we see that ac, dc is the zero value the next thing we're going to do is retrieve the index for that value so that's going to be the first row in the index section and because it's the first row in the dictionary section now we're going to get a compressed bitmap that represents zero, one and two so the first three bits being set represent the first three rows containing that value so now we know which rows contain this value and this is actually super important this is one of the ways that we start to differ from what is a standard query engine on a data lake a lot of them don't do secondary indexes because the way you need to use a secondary index involves a lot of random access and that requires data to actually being stored on the server and we'll discuss this more in a bit but if through it it's actually pre-fetching all this data all this data before the querying actually happens so it has a data stored locally typically in memory or SSDs and because it's a mix of the SSDs in memory we're able to use these indexes so that's why we're only reading the indexes first the next thing we're going to do is an aggregation so we're basically going to group by city and select the price and the way we're going to do it is now we go to the data section of the city column and we can now see that we're going to the dictionary, the index and there's no need we know that we need to go to the first three dictionary values are it's two, one and two and the first two prices are 1800, 2912 and 1933 so we're going to build a little array and we're going to use that array to aggregate these things and the reason we're going to do an array is because there's not that many cities there's only three and so we're going to do this is build a three element array and then anything we see for one of the cities will go in the element of that array so there are a lot of cities more than we could allocate an array for and we would use a hash table so we choose a hash table array based on how many things we anticipate grouping anyway, if that being said we've chosen an array because there's not that many cities and we're going to group these three elements and there could be other cities as well so Melbourne has a null value stored in it and that's the end of that we don't have to read the rest of the data in the section of the city column because we know from the index of artists that there can only be three rows that contain anything that we care about we don't need to read the dictionary yet because we don't actually care what city values are there yet we're just aggregating and now we're actually going to read the dictionary we've done the aggregation now we've read the dictionary to see what the actual values are and we do this after aggregating because typically when we're doing a group by a lot of times the output of the group by the number of grouped rows is much smaller than the number of rows we're reading and so it pays to defer this dictionary look up until far later in the querying process so this is the idea of operating on compressed data as much as possible now look at what we've done as we've read we've done a binary search through dictionary section in one of the columns we haven't looked at the entire dictionary we've only looked at pieces of it and we've done a single random access to the index section of the column so we've done we just read the sections of the city and the price columns that we need based on the index of the artist column we haven't read past the first three rows because we didn't need anything too we've read only two values out of the dictionary using random access for the city column and what was done is very economical about how much data we read from disk from memory how much we've had to transfer to the CPU and this is where we get our performance from we get the performance from being extraordinarily economical about how much data to read and process and then once we process each segment in the economical way from every segment we have a partial result and then we just have to merge all the results together and this is how everything works in a jurid cluster we can query billions upon billions of rows in less than a second remember it's fractions of a fraction of a fraction of a second that's the key differentiator of druid so going back to where we were before you kind of know what the segment structure is and how to compute efficiently on query processing let's look at the distributed architecture right processes have many segments in parallel faster response times equals to more scale so indexers produce segments in real time when they're consuming from a screen and while building the segment you can also read from the indexers as the real time data is coming in and we'll process this in a second so historicals are indexed segments distributed across multiple machines this allows parallel processing of the data and you also have a second copy of the segment that's stored on a different node for resiliency so a data node could be deleted but you can fully recover from anything else and thirdly your segments are replicated the third time to the cloud for a backup which is your hdfs, your azure, your data lake so there's high availability for multiple production use cases and together indexers and historicals respond to queries on both real time and historical data all in parallel all incidentally scalable and incidentally resilient so now let's talk about my favorite topic in the world which is Apache Kafka real time data how many of you have used Kafka heard of Kafka, show of hands okay everyone here has used Kafka, great so Kafka scales horizontally just like Druid and that's mainly because Kafka has just one type of server it's called a broker which will broadcast topics to consumption each topic is distributed as partitions across various brokers to handle the horizontal scale of the consumption of those topics so partitioning is the way Kafka scales the servicing of its data across the cluster and just like with a Kafka consumer application you can have one indexer running for each Kafka partition now it's also interesting to note that you can replicate indexing jobs for streaming so if a streaming ingestion service fails it can fail over to another indexing service and generally speaking you only need one indexer to service 12 Kafka partitions but depending upon the size, volume, and the kind of maybe modifications you might be doing maybe you're doing some fancy ETL jobs in the ingestion maybe you need more than one indexer for 12 partitions you could have up to 12 it really depends on the size and variety and velocity of your data that you're dealing with the point is you can be as parallel as much as you want and that's the same thing with the querying side so because you're actually querying the Kafka consumers in real time you can distribute those querying jobs across the indexer services so it's going to hit the ingestion jobs and then you can query the data before it is written to the historical node but the real magic of Druid is the data is stored as a database so you can query data beyond the seven day retention period and a lot of the demos and use cases I cover we have Kafka data that's been stored for months or weeks maybe years and that's kind of the power of using Druid with something like Kafka is it gives you real-time insights into your data and you can kind of explore it this is kind of something interesting to note but every 15 minutes you know Kafka can run a compaction job I'm sorry Druid can run a compaction job this is configurable so you can compact the data that's coming in from Kafka automatically say every five minutes every one hour every one day and this can be a scheduled job to really compress that data as it comes in so what is a what is a analytics pipeline look like well I'll kind of tell you there's a couple different use cases right I think you're all familiar with you know case equal that's a streaming Kafka job you can run or so what you would do is you'd basically have two topics you would run a streaming join between them and the output of your query you would write to another topic which would be you know enriched or enhanced and then Druid would read from that topic and enhance that data that's generally speaking I kind of industry best practice when it comes to analyzing Kafka data but I've seen a lot of other use cases as well you know Apache Beam I think a lot of people have used that before Spark jobs are extremely popular in the industry I'm using you know Python Spark and Py Druid you could actually query Druid directly join the data in Spark and then have Spark jobs write directly into Druid Apache Flink a lot of people have done streaming joins of Apache Flink so you can enrich a topic and then Druid can pick it up from the topic and read it from there you know you can also write to S3 and then Druid can read from the S3 but that kind of isn't the point of this talk the point is to rate a data pipeline so the point of a streaming pipeline is to do streaming analytics on your data write it to your destination topic and then Druid can store it then analyze it as you need it so the capabilities you can do here is pretty flexible you can also run Pulsar by the way if there's any Pulsar people here you can run it against Apache Druid as well Overstock is a huge fan of Pulsar with Druid together I know those guys are pretty cool so the next thing I'm going to do is just talk about you know contributing to the open source community of Apache Druid there's about 150 companies here on the Apache Druid website I'm talking Splunk, Nielsen, Expedia, Swisscom Zscaler, Walkme, Dream11, GameAnalytics, Outbram, Twitter and much much more if you're currently using Druid send an email to community at imply.io we're very happy to showcase you in the open source community we have open source road shows going all the time kind of our goal behind our venture backed firm was to really build the open source project behind this technology so this is kind of one of our big initiatives next is you know we have a Druid forum so go to the Druid forum you know ask your questions tag away, create some content explore some information and that's kind of what the Druid project site is so if you go to the Apache Druid website there's a lot of fantastic tutorials and how to get started on Druid how to ingest Kafka data how to you know connect to a cabero how to read from example data sets how to deploy it in production we have K8 examples of Druid so if you want to run like on Kubernetes be my guest that is fully available to you and we have a lot of pull requests you know this is a very vibrant and active open source community and we'd love to see contributions from people like yourselves you know this is an open source conference let's all embrace the future which is distributed analytics and finally yeah we're looking for people to just you know talk about how they're using Druid how they're interested in using Druid how they're interested in taking it up and last but not least we offer free public training completely free like a three five day crazy course learning Druid you don't have to pay anything it's on learn.druid.io see that link at the bottom put it in in your phones start learning Druid so this is kind of my talk do you want to see a quick demo of Druid do you guys want a Q&A show of hands if you want to demo okay cool so yeah demo first thank you and A all right okay so basically whenever you connect to Druid you have what's called an ingestion task this is where you define it this is connecting to a Kafka topic so this is just streaming data as you can see we have clicks and sessions so you have a sessionized streaming join of data and this is a UI as well so what you do is once you get in there you can click it and you start defining your columns what they look at and you can start filtering them I've written some pretty crazy JQ filters on data before it's pretty fun we've introduced a new feature called nested columns we also have something called a multi-stage query engine so if you don't need sub-second queries you can join data pull data straight from the cloud join it in aggregate put it in a Druid and then query it so there's a lot of interesting things so this is a Kafka data it's all columnar you can transform it if you like you know define data filter data here's where you can figure the schema so you give it different data types oh sorry that was scrolling too much there you can partition the data so the interesting thing is you can segment in different granularity like you can store it in segments of you know one hour or one day one month one year you can optimize your data to just query it down to like the hour and that means all of your data will roll up to an hour so you can really like optimize a certain amount of rollups like I did one POC as a customer where we needed to roll up 99% of their data and then finally this is your ingestion spec this is kind of what it looks like so when you finish doing Druid okay so did I put a link Druid tutorial quick start here's kind of the quick start there's a lot of different tutorials you know loading files natively Kafka, Hadoop, querying data roll up, configuring their attention updating existing data compaction deleting data learn.druid.org if you want to see what we do at imply is we actually build UIs on top of Druid so basically if you just want to slice and dice your information and just explore it this is kind of what we do for various customers and you know that's kind of our contribution to Druid and why we built it was to power very fast analytics like this is this is gigabytes of data just coming in all the time yeah about halfway through this talk let's uh let's have a conversation questions yes you've been waiting patiently yeah um so it depends on what you mean by updating so when a lot of customers mean updating they usually mean like updating an actual row the way you would do updates in something like Druid or Apache Penel is you give it the latest record on time and then you query you basically aggregate per minute per hour per second and the latest value will then be updated so that's kind of how you update on an analytics engine you just insert a new value for that particular time stamp and it's going to roll up to that value it's not an intuitive update update but basically it's a roll up to time which gives you the most up-to-date value so it's an aggregation if that makes sense I think it depends on how big of an update you're talking about like if you're talking I can't really think of anything because if you're trying to roll up to a particular segment of time and if you segment your time for like an hour and you're always adding data to that one hour that's okay but if you're aggregating on like a second segment then you shouldn't be really having segments in second or minutes it should be like one hour segments so basically the the trade-off I would answer for you is if you're updating on something like a day or an hour that's fine if you're saying a minute that's bad and especially if it's like petabytes of data that's when you like start hitting into issues so or you can't actually I don't know if you could do segments on a second basis but that's definitely in the theoretical end but I could talk to some other companies doing it and see you think yeah that's fine that's totally that's actually the use case that I love about Droid because when I was doing clickstream analysis in Kafka the problem was late arriving data you know you have five minute window analytics and then if your data comes outside that window then you have to write the business decision to handle that and Kafka and Droid's perfectly designed for that use case because at the end of the day if that data comes a month later I don't really care because it's still that same time segment and it's going to go to that segment that one hour segment from a month ago that's a perfect use case for Droid Droid's designed exactly for that any other questions come on now let's have a conversation the whole point of this is to chat yeah go yeah yeah I mean I've seen use cases where people have multiple time columns I haven't made sense of it yet but you know yeah yeah so there are there are something called approximations and to be honest and transparent with you I haven't used approximations before in a POC as a customer yet but in theory you can you can give a certain amount of confidence on fast-moving data whether you want to get a precise perfect calculation to like you know the millionth percentile of accuracy or if you just want like a 90th percentile or a 95th percentile so depending upon how accurate you want the calculation to be without compressing your data that is kind of the trade-off I can't really speak to it because I haven't used it yet but I'm happy to have a conversation further you can speak to some of the engineers that work on percentiles and approximations basically the best way I don't know if I quite understand your question but I can I can show you and view it itself kind of what's going on so if I click on services I can look at my cluster you know I can see how much data is stored and I need to go in front of it so this is where all the data is stored right you have your your data nodes and then these are the middle managers they actually like run jobs whether it's ingestion or queries or updates and deletes now on the broker side which is where the queries are done we can actually cache the queries in memory that's kind of what I meant by infinitely scalable so if you're running the same query every day at 5 p.m. for like a million users your broker node will cache that query across like 100 broker nodes so that's kind of that use case in particular that you're talking about where the query will be in memory therefore it will be returned without any compute cost whatsoever did I answer your question or was that kind of a yeah yeah disk yeah disk will be a little slower truth be told there's data tier so you can configure and drew it so you could there are two things you can do one is query laning so if you want that query to be faster you can give it a priority in the laning of queries that's not going to particularly make it faster for use case but if you have a heavy use load that's one way of prioritizing queries the second is a tiering so like when I did a POC with synetics they had to query data for a week they needed a data for just a week query that did in real time predicting store sales powering their super set dashboards and then they want to store the data for a year they don't really query it often but they just want there to basically analyze slice and dice it and explore the information so that's one use case in your particular use case if you're querying data a year ago every day at a certain time I'm sure it'll be cached but it all comes down to how you configure it so you can have a hot tier a cold tier a warm tier it's all about how you configure the cluster for that particular use case if that makes sense go ahead transaction locking so so Druid is not a transactional database as far as I know it's not you know atomic and all that it's just analytics but you could use it to do analytics on transactions like I think yeah American Express does use Druid for fraud detection as far as I know what was your question sorry let me let me explain it as a data architecture because you're asking the right question but Druid won't block it it will be a query it is it absolutely is that's what AmEx does so basically what they do is they get all these financial transactions that come into Druid and they're querying it you know every second for a different user profile accounts and when the query hits and the logic gate hits like you know hey this query returned like a result that should trigger some kind of warning you know an email a Slack message a service now alert comes out and that triggers a whole chain of events that goes back to the transaction processing system and you know notifies the customer notifies the people team so basically to answer your question is yes but it's not Druid itself that's a system that's querying Druid at the end of the day Druid is a querying engine so you need a tool that basically just queries Druid if that makes sense think of it like Grahana you know you can have Grafana run Cron jobs or yeah yeah absolutely come on more questions no want some lively conversations here yeah we're actually working on a proof of technology using a certain benchmarking standard and we're going to publish that within the next couple of weeks to a month or two we can personally connect on that I can set you up for a director of product engineering that's working on that benchmark criteria are you using Druid in production or evaluating it right now with what are you comparing it to so there's a there's a slot I always use just give me a second that I ask all of our sorry no no no I mean this this is why I love this conference so I could just find my mouth so that would be great okay there it is okay yeah so whenever I talk to a customer I always ask these four questions just four questions do you need interactive queries you know do you need sub second OLAP queries do you need sub second analytics do you need a point and click UI yes or no you don't know it's okay next is do you need unlimited scale do you need high concurrency petabytes of data and need to ingest a lot of data that's another thing you have to think about do you need to combine real time and historical data do you need to look at a click stream of Kafka data and then maybe also you know read the same Kafka data from a year ago or maybe you need to correlate it with your S3 data your Hadoop data your spark data and finally do you need something that's cost efficient you know doing something like this on a cloud data warehouse lake is incredibly expensive cloud data warehouses are not designed for concurrency they're not designed for a thousand users they're designed like you know I don't want to name other systems but you know they're not designed for concurrency that's kind of like the broad implication but in your use case just ask yourself these four questions and if and if these four questions make sense for you then drew it as a good fit and if not then you know maybe something that's a little slower but a little cheaper might work that's things to think about any other questions yeah yeah yeah that's a good question so when I was doing things it's like Kafka you know whenever you're doing like key value and bucketing of your data you know a classic pigeonhole problem right you don't want to put all of your data in one bucket you want it to be kind of evenly distributed so the more evenly distributed the data the more generally heterogeneous your data consistency is but your it's kind of more flexible and that first it's partitioned by time so you know as long as your data is kind of roughly the truth is I don't know something I got to learn to be to be honest with you but that's a very good question yeah so there's a secondary index is really in Druid so first it's time primary index then depending until you design the columns the first column is indexed first second column is indexed third fourth column is indexed fourth so it's the order of the columns that that really affects the priority of the indexing so if that column is most important to you I would move it to the left right next to time that's kind of the simplest way to do best practice with Druid yeah yeah that's a good question you could hire an sorry that might know Druid there's multiple things you have to manage right you have your query nodes you have your data nodes you have your middle managers you need someone to we provide open source help charts so it's easy to get up and running and there is an open source community I am with Imply we do offer managed services for this technology we have professional services we have software which makes it easy serverless all that good stuff so I would say the bottom line is operating it shouldn't be a limitation for you it's just do you have a use case that requires sub second performance on massive amounts of data with a massive amount of users hitting a dashboard if you answer yes to those three questions then you need something like Druid yeah that's a good question so if you're looking at have you used something like Postgres or Redshift before these different systems yeah so with Postgres I'm sure you've all run into the issue or you've tried to query a million columns for another million columns right I ran into this issue six years ago where you know we downloaded the data set it was just 20 gigs not a huge data set but they wanted to query like I think a million rows against a hundred million rows and I ran this query it never finished I mean you know it never finished and my roommate at the time was like why don't we just try Redshift I'm like what's Redshift and I don't know so we partitioned the time on Redshift and we did get the query done in like five minutes instead of all week which is pretty cool so like my sequel to Redshift is in order of attitude better because it's parallelable at the time my sequel was single threaded so there's no way that one thread could go through a million rows on this side a hundred million on the other so something like you know Cassandra, Redshift, Druid they all kind of horizontally scale the compute requirements and it gives you more parallel processing of executing of your data generally speaking you need Druid when you need less than a second like very fast like 50 millisecond query time like if you're doing financial analytics for instance that's a very very good use case for Druid if you're doing network analytics you know you know spark jobs right spark is very powerful you can do like what's called a fact to fact join on spark like you can join a billion events of a billion rows and put it all together but that query is going to take you you know a few minutes few hours it's a batch job basically so Druid is designed for real time kind of queries so different technologies service different use cases and that's kind of something you need to explore so the three things you need to remember right massive concurrency of users massive amounts of data so usually gigabytes is a good place to start if you're loading in like a couple gigs a day that's a that's a very good place to start right gigabytes a day of ingesting whether it's real time or batch that's where Druid comes in and then yeah if you need to query that data in less than a second did I kind of address your question or open up to a broader conversation okay cool yes we are working on that we are working on that that is yeah so that that's kind of the thing Druid is not ANSI SQL yet so that's something we're working on so because we have a new SQL origin we're not a fully ANSI SQL database so some of those functions like a window function for instance isn't quite there yet but we do kind of have it Eric Shetter is working on the the workflow around it it's coming out in the open source version in a couple of months I think a month or two but don't quote me on this although this thing is filmed so I don't know where are re-aggregations okay I haven't tried that to be honest a few but we can look that up let me let me see do I have yeah so to be transparent and honest with you I come from a streaming background so to me I do like ksql join streaming join so all these different models of sequels will different for me but you know there's different ways of aggregating information we also have JSON type sequeling so the best way to do it is just to explore Druid you know try it out try our SAS you know try Polaris you know set up a three 30 day trial in the cloud put in your data see what you can do see what you can't do and that's kind of the best way to understand what it can and can't do and more questions we got 10 more minutes nope come on yeah yeah so that's interesting so I can give you the the pros and cons of each system snowflake is generally speaking it's cheap because they just store all their data in s3 and you're just querying the data off of s3 so you're not like you know storing that data locally on disk on like some SSD drive in a lot of memory generally speaking for that reason you could consider it to be cheaper now it's cheaper when you have one user it starts very cheap but as you have 10 users it goes here as you have 100 users you go here that million users forget it so snowflake doesn't really scale for that concurrency use case so that's something you should think about so Druid is designed for concurrency yes it's a little more expensive initially because you have to pay for much more underlying hardware but when it comes to concurrency and queries of your data it should be cheaper in that regard so sub-second performance is also kind of key you know Druid snowflake these are all cloud data warehouses so to get them that fast they have special pricing for that and concurrency is not something that they are really architected for and as a consequence they're more expensive for concurrency kind of use cases I couldn't hear you what are you asking yeah we have a Helm chart yeah we have we have a Helm chart so mini-cube kubernetes you know deployed on AWS deployed on Azure I'm very good at kubernetes so I can get you up and running in like five minutes if you want we can talk about that but yeah like Helm charts is easy Docker compose not exactly it's just do mini-cube mini-cube's better anyway the world's moving to kubernetes that's that's the new fact but yeah kind of like if you're looking at a cloud data warehouse just ask yourself one question do I want to have this powering a dashboard if you want to have it powering a dashboard that all of your employees are going to use then a cloud data warehouse is not a good database to power that kind of use case you would want to use through it but if you have like a data science team of like five people that need to run you know some queries here and there and on a lot of data like spark jobs that's where cloud data warehouse makes a lot of sense or adult alike so that's kind of the use case where those data's really make the most sense yeah questions no more questions should we conclude you look like you have a question what's your question okay okay okay yeah let's talk cool thank you Bessam no he's not thank you Bessam and I worked together for three years at Instacluster it's a very good vendor for open source you can go see them downstairs yeah do you have a question mark okay you're here bright and early all right um there's no more questions thank you what's your name Peter let's uh let's let's chat okay