 Test test Hello Yeah, so like I said, this is a link to a github it contains all the slides that we're gonna talk about as well as a little getting started with trigger mesh Project that I made for you guys if anybody's interested and I'll bring this back up later in the slides So about me, my name is Jeff Neff I'm a software engineer at trigger mesh Hobbies competitive SPF games first-person shooters decentralized technology. So like web three stuff and traveling and high-performance cars doesn't have to be cars, but Yeah So you guys are probably wondering what is trigger mesh? So we'll start off there trigger mesh is In integration platform, it's basically a set of tools that you can use to create integrations So we see here. We have an e-commerce website mobile applications point of sale and inventory applications feeding into Amazon we have a Google pub sub IBM MQ and Kafka and Any of these could be talking to any of these and these are emitting events into trigger mesh And then we're receiving those into consumers or targets syncs You're gonna hear me refer to those as targets and these as sources and It's very easy to rewire any of these to come out of any of these and sync into any of these or the ones that are available, right? So basically we're providing you a platform to easily integrate things and We run on Kubernetes so, you know It's we're ready for integrise enterprise great production and everything's open source obviously right we wouldn't be here So some advantages of trigger mesh Everything we do and everything we're gonna look at today is through a declarative API. It's written through YAML If you're familiar with Kubernetes, you'll be very familiar with us Architecture agnostic so we can run on-prem. We can run on cloud. We can run multi-prem. It doesn't matter Modern architecture, like I said, we're built on Kubernetes And we have a self-service mindset So you guys can go to the get up everything, you know is open source It's there for you to build and play with as you see fit. You don't need us to get started building your own stuff We're built on extensibility so plug-and-play architecture we you know if you don't like it You just plug it out plug it in add something and build your application that way and Open source our open source software Let's look at some Use cases are things that you could do a trigger mesh So here we have IV and MQ going into Amazon web services. This is a kind of an interesting one because a Lot of people are still tied into legacy architecture and they have the issue of hey We paid a lot of money for this and we like to extend our Life cycle on our investment so with trigger mesh It's easy to take IV and MQ and and throw it into not only Amazon, but it to any of our syncs or targets Here we have Multi-cloud going into Splunk so let's say you're running three four five cloud environments And you want all of this data to be logged in Splunk for easy observability. It's very easy to do that in trigger mesh Reduce integration costs Instead of going through pricey platforms not to name anybody We can do it for free here, right? create real-time alerts I like IOT things ATMs anything that needs fast real-time alerts Great platform for it streaming data running serverless on-prem Streaming data is basically real-time alerts, but running serverless on-prem so Function of its service environment basically similar to and is on lander anyway slender so instead of Sending off to lambda you can run it locally basically So I'd like to talk about how does trigger mesh work or what is it comprised of? trigger mesh The basic building blocks for trigger mesh is as we can see here is Just a couple an integration as we refer to it as a bridge. So a bridge is a Composed of a combination of these components and we're gonna look at Majority of these components now what they look like. How do you create them? and How to work with them So we're gonna start off with what is a source The source is a component that generates events I'm sorry. I'm having a hard time doing this mask So here we have an sqs source And we can see here the API version. Are we familiar with Kubernetes raise of hands? Most of us. Okay, so we're pretty familiar with the yaml manifest then But I'm still just gonna quickly go over it We're gonna see a common factor of API version kind metadata and spec and everything we look at it and this one We see here. We're pulling from the API sources dot trigger mesh a kind of sqs source and then we're gonna name this Sample and then here we in the spec we have a Just being sure you guys can see we have various configurable parameters. So we've got an arm Some receive options or a visibility time out some off parameters are also stored in a Kubernetes secret And then a sync where are we going to send these events to the sink is a reference We're referencing Broker named default from the API eventing dot Knative Any questions before I move on? You guys feel free to stop me at any point So what does the source create? We send events through a Specification called cloud events cloud events are nothing more really than HTTP requests with some special headers There is an exact specification of that In the working group CSNF see CNCF. Sorry What we're looking at here is kind of a pretty fide version of a cloud event We can see here some attributes the content type the ID source type and then data if you've seen Hcb requests this probably looks pretty familiar for you If not that one definitely should This is creating an a cloud event through a curl quest or a curl command. I'm sorry And this is the minimum requirements for a cloud event so we can see it's a it's a post request and It contains an ID a spec version a type and a source It's content type and then some data right So now that we have our events or we know how they're made and what they look like How do we consume them? We consume them through targets or we call them which are consumers Syncs there's a couple commonly referenced names. We're looking at again the sqs and We can see we're pulling from a different API now targets dot trigger mesh and Then and the spec it looks pretty familiar We're just missing the sync because this one's not going to emit events, right? We're just consuming them And we can think if we linked the sqs Source to the sqs target and it gave it the same queue. We're basically making a loop, right? cool so now On that same note, how do we tell the source that we want that event when we're working with a broker? Which will we'll go over what a broker is on the next slide? We Actually, I think these are out of order. We need to talk about this first so a broker Provides a discoverable endpoint for event ingress basically His job is to take events and then know Who to send these events to and then make sure those people get those events? How the broker knows What events to send who is through triggers which we'll look at the TLDR here is a broker's event delivery transport, okay? So we go back to triggers So the trigger is How we tell the broker who wants what okay, so? if we look here, we have a trigger and This is saying okay Send all of the events to the AWS sqs target example test named example target From the API version targets dot trigger mesh That matches the event type IO dot trigger mesh example from the broker default Okay So this provides a way for us to do simple filtering or we could just admit the filter and send all the events Well, you own the broker and the broker could be Kafka. It could be pubs of I'm sorry Right Well, if you're allowed to talk to the broker. Yes, sir. No, it's okay if you So a common issue when we're trying to make things talk that don't normally talk to each other is data transformation, right? or Even things that would natively talk to each other. Maybe we want to insert some business logic in there before they talk to each other So trigger mesh exposes components that make the majority of these transformation situations easy We do that both through exposing function-based components and declarative language-based components and we'll look at examples of both of these So a declarative language component basically enables us we're going to look at these in detail, but basically this enables us to store the properties that we care about So destructure the object into variables Remove the pre-existing event that came in Or we could keep it, but here we're removing it and then here we can restructure the event as we see fit here we're adding a Object provider with a property name. We're hard coding that core Oracle cloud infrastructure and then here provider dot account ID and We're assigning that variable account ID Which we've pulled in from data dot additional details dot tenant ID from the incoming event, right? Does that make sense? cool and Then Now we can also do function-based So in several different languages we exposed function-based components. This enables you to write insert Logic in your favorite language without having to compile a container. It can just be in the bridge manifest, right? So it's easy if you have to just remap something or you know a quick little function instead of having to Build that entire container and then you just have this little service manifest not descriptive It can be all in that one manifest, which we'll look at So this is a node Here's an example of Python It's a very similar Basically, we've just changed the runtime and then the code is Python, right? And then I have one more example here in Ruby TriggerMesh also makes it really easy to bring your own services So if the targets or the sources or transformations that we expose don't meet your needs You can bring any system or language that can accept or receive HTTP events and can be dockerized. So not a lot of requirements there that meets most right In Tech Preview, we also have a meta controller that transformed containers into API objects In English, that means that you can take your containers and with a very small simple manifest You can use Coby to create addressable services without having to write your own Kubernetes controller so questions Okay, so now bring it all together What we have here is Pretty much the simplest bridge you can make we don't have a broker or trigger anything we just have one source Sinking directly into a target and that's it This is the most I could fit on the slide to be honest So yeah, that's why it says but we are gonna look at a full manifest here in a second But yeah, so if we look at this though From the flow dot trigger mesh API, we have a kind bridge and In spec we have components here, and then we have an array of objects So, yeah, we know what these two do right? We're Twilio if you're not familiar. This is going to emit Twilio is a service to connect with text messages and I believe email to but in this case It's text messages. So when you receive text messages, there would be sync to SQS effectively with this bridge But let's say we wanted to insert a function. We just insert the object here Change the sync to the function and the sync of the function to the target and that's it If that makes sense Yes, sir, and it's on container. So each of these objects that we look at run in their own containers In the namespace that you specify the deployment Yeah, or a cloud cluster it could be our sass It doesn't matter. We're going to look at an example Building in Docker desktop a Kubernetes cluster and we're going to apply some stuff and if that doesn't answer your question and Bring it back up, please Anyone else, okay so just two quick case studies a Security group came to us CSNF they're comprised of several cloud providers and a group of security professionals and they have basically created a JSON schema and Gotten everybody to agree to standardizing security events and do one schema, okay They then wanted to deploy a proof of concept Using this schema and standardizing the events right because the idea is to take in from multiple cloud providers normalize the event Run it through event decoration enrichment and then split those events off into both sentinel and Splunk But they also don't want to be married to sentinel. So they wanted the platform to be You know, what's your sims or what are your cloud environments? We trigger match this was really easy. I think this this was my project actually I think it took me like two weeks a week We only because we had to add two things that we didn't have other than that It was all just wiring and using components pre-existing components Yeah Yes, sir Well, this is CSNF is not a complete project. This was a proof of concept So security audience hasn't been but it will definitely be done because it is a group comprised completely Almost completely of security professionals. So yeah And actually I wanted to show you the full manifest of that Bring it up Pull it over here. So This is the proof of concept This is gonna be a lot and we're gonna go over it quickly But I just want to show you what does a whole integration look like right and We're not using the bridge wrapper. Everything's just cut with the three dashes But anyways, so We have a Azure defender Vince Coming in Azure Q storage Bumblebee is a transformer And then those go into Sentinel and then Splunk they actually split it doesn't go out of Splunk or Sentinel And we have cloud guard events coming in a Kafka source normalizing the event Sentinel Splunk Aquasec events through pub sub Normalization and this one gets enriched by a Docker hub decorator and then we have sent no one Splunk So this one starts out. We have a broker here and this manifest is also shared in the github repo I saw some of you taking pictures so So we have the broker here and then We have a debugging trigger here, which basically we see no filter here This is just so I can see all the events that flow through the bridge, right? Now we have this is if you're not familiar with the Kubernetes secret This is what it looks like and this is how we define and store our secrets. We're not Married to that but this is a common way that it's done Now we have a Azure Q storage source Which should look familiar, but we just it's Google now instead of Amazon, right? So we have a different parameters there We have AWS credentials AWS sqs source Feeding into I don't know. This is one more as your service bus source and then now this is I guess If it is interesting at all, this is where it gets interesting Right here in the transformation. So Here's a real-world example of using the declarative language to remap and restructure these events so instead of having to write a function that Maybe the next guy that I give this function to he doesn't write the language that I wrote it in So then he has to rewrite it or whatever or maybe I don't hand it to a developer I want to hand this to a security professional With this it's very easy to come and look and change the properties that we we don't want to count ID to be tenant ID anymore That's fine. We just change it Right, and then we've remapped it. We didn't have to recompile anything. We didn't have to build a docker container Nobody had to read any code So yeah, we're gonna breeze through the rest of this because it's pretty much the same thing over and over again Trigger Bringing aqua sec events into its respective transformation Yep, it's just taking those events and remapping them to the standard that they had agreed on right And then we have another trigger to a transformation That's gonna happen three times and then down here at the very bottom. We're going to see Our debugging service, which is just an event display and then our Google cloud pubs of source our target. Sorry and Then No, that is a source. I don't think that's supposed to be there. I'm looking for Splunk here it is. Yeah Now we have Splunk and we have We should have sentinel right above this Nope, that's Splunk and here is sentinel so from those transformations they come out and then you see where we're Asking for a different event type because once those events go in the transformation They come out as a different event type that way we know who wants to grab what and building this way It's very easy to Extend this by just adding a service that grabs the event type, right? All right. Yeah, yeah But if you wanted something else to grab that information also You could just deploy him and subscribe him to the same event type. They would both receive that event Yes, I just wanted to show you guys a full manifest The order they're consumed. So if we go back up to the top. Oh How so let's say we have a source and it goes to the broker and two people have Subscribed to it. They would both receive the event and relatively the same time The event would be split at the broker and sent to them somewhat simultaneously Which way I'm there we go this one we're not going to look at a broker and it's going to be or a bridge I'm going to keep it really simple. This bank needed to build a modern banking architecture They need to integrate with multiple third-party SaaS providers and they had Lots of data streams event streams that they need to bring in so using trick-or-match quickly we were able to Incorporate over 180 event streams from 10 cloud native banking platforms and It looked yeah something like this mess, which I probably could have cleaned up, but I apologize for that okay, so now I Wanted to Play a little bit. It looks like we still have some time let me change my Display so that because I'm not going to be able to type and look at that thing Okay, cool So now you guys should be able to see I see you can okay So going back to Here One more time. Oh That's not what I'm showing There we go, there we go. So going back to this one more time for the github repo. We're going to go Into this is the repo that all I messed you up. There we go. Sorry guys Forget it. We're still struggling a little bit on the front Can't give you the link to if you it's just Jeff Neff scale talk notes So what's in here? Is a folder called demo and I made this little guy. It's a trigger mesh quick start. So what's in here is a bash script to spin up trigger mesh on a brand new Cluster in Docker desktop. So your only requirements here is you need Docker desktop. So Probably most people have that if you don't it looks like this. It's available on all platforms And then you need to have Kubernetes enabled so to do that We just go here to settings Kubernetes enable Kubernetes. I didn't do that with you We're going to do everything else pretty much together I didn't do this because it's like 15 20 minutes of us just looking at it spending rate But I can show you It is a brand new cluster Typing the one hand is interesting Yeah, it's nine hours. I just built it this this morning, I think so in this Repository, here's the same github repo we're looking at We have a nice read me here and It's not the one we want. Yes, it is I was just at the bottom So if you don't have Docker desktop you can install it there And then we need to enable Kubernetes which we just did and then we just run this little script right here I think I'm in the wrong directory Yeah Okay, and what this is going to do is Install all of the prerequisites for trigger mesh and spin up The namespaces all the services the controllers everything you'll have the full production trigger mesh Well, it's a demo environment, but the production installation Ready to go it takes like I Don't know five minutes. It really depends on your laptop and then I've also provided here if we look at next steps a List of it's basically kind of like a An intro to working with trigger mesh We start off debugging then we learn sources targets transformations and I didn't get to these last three forum sorry And this one we're actually going to do together because I thought that would be fun but Yeah, and then here in the trigger mesh components We have a Set of sample bridges so you can reference these if you're a building and then we have Most of if not all of our components. I'm pretty sure I got them all in here Separated into different folders. We have debugging routing a list of sources and a list of targets And what it gives you is the pre so a secret, right? We need this secret Name key data password the name actually is irrelevant and then how to do describe and deploy that target So this guy is almost done And what we're going to be looking at is this step one which is debugging and we're going to look at How do we create debugging events and then how do we consume events, right? Because we really can't do anything if we don't understand how to create or how to consume so What we're going to create here once our cluster is done is a ping source object Which is just a dummy eventing out or a debugging eventing object. Let me make this a little bigger for you guys That will emit some data for us on a cron schedule and then send it to sync and then here we have sockeye, which is just a Debugging service that's going to purify and show us our events, right? So if we open up this guy, we can see here That trigger mesh is done took us two minutes And now if we go into this folder, we apply this file You have to spell right we can see that those were created and Then this is if we fetch the pods I need to make this a little smaller We can see that sockeye is running here. We won't see a ping source running He's running we can but he's in a different namespace Because he's multi-tenant so multiple people can subscribe for him. He lives in one namespace But if we go here and we say get KSVC Which is the Knative service this one is set up to expose externally So we can grab this guy and then in about a minute. We should see a Event come through with the name of trigger mesh I'm not gonna wait that minute. We are but I'm gonna talk about This right here So the read me here is basically what we just did but then I also provided some working practices So you can start playing with the rerouting stuff and extending the bridges, right? And then there's the solution to that working practice there Come on bud, there we go So yeah This is a cloud event. We just made it Good job us So what do we do from here, right? Step two. We're gonna introduce a an actual source So let's go ahead and clean this up Okay, so now let's bring in a real source. So we've been looking at the sqs source. Let's use it, right? We're going to need a Kubernetes secret containing AWS access key ID and secret access key We can see that from here, right? So we're gonna need to go get some AWS creds and then we're gonna need an arm and obviously an sqs queue, right? So let's go do that real quick Here we have AWS Let's make a queue. We'll name it scale and we'll grab the arm here Which way was I this way? Okay, and then we need to go pretty sure I already did this but we need to go to I am And we need some access keys. I think these are actually Already valid right here These should be good. Let's try these You guys probably already know how to make access keys You guys can have these two. It's a great account Big billing cycle. No, it's a demo account Okay, okay, so now we have our AWS sqs source is wired everything's done And then for a sockeye we have no configurables So we should be good to go. So let's go into this directory and apply this Bridge And this time we will see both pods in the namespace watch So sockeye is gonna spin out for us because it's a prerequisite to sqs. We can see sqs is happy. So now Let's refresh this because it's a new deployment Okay, and so now if we go back to sqs. Is somebody already in my Amazon? That would be funny And We go back to our cues Did that not create what happened to it you guys were here with me when we made it, right? Okay, that's weird. Let's do it again. Yeah, that was weird Okay, so we're gonna need to redo that and plug in the new arm. I Don't know how I got an arm if it wasn't created, but oh good catch dude Well, that's okay. We'll we use this one because I already deleted it and copied it Okay, so now if we publish a message We should expect to see this message or Sockeye, but I forgot to refresh because I tore it down Sorry, sorry And Chrome does not like being in Okay, there we go, let's do this again and now we have an sqs message So here we took it into sockeye There is an attribute if we can see the body parameter here. Hello scale there is Attributes that our sources provide to it's we call it a mitt cloud event context or a mitt context So if we don't want to have all this nasty stuff and we just want to have a body But yeah, this is we can see here adiabas com Amazon sqs message is the event type And it came from the source and here's the arm that we created it at so that's cool now. We've We've created a source but Let's let's finish it off right we need to go into a real target So what I thought we would do for step three we're gonna do a Pups up source to sqs so effectively we're gonna sync Pups up and sqs right so we need a We already have That's a source we need target. So we come down here trigger mesh components. Let's start off with a source We need Google cloud pub sub source So let's look at it. It looks like we're gonna need a topic and We're going to need a service account key and then we provided a sync. So that's easy. We can do that, right? So pub sub We're gonna want this guy and Then we're going to need a service account if you use Google. I'm sure you know how to make a service account This is another account you guys can have two clouds for free with this cock. That's so that's great Okay, so we're gonna put this service account key in here and then we're gonna need to go make a pub sub source. So Let's go here and Let's go to pub sub and I already have one. So let's use that one but it's just Default nothing special, right? So we need this topic name right here. Okay, and now we're going to need We're gonna have to change this. So let's delete it so we don't forget it and Then we're going to need the target now. So let's go grab from our bin of components a AWS We're gonna need this we already have a secret So we don't need that. We'll just reuse our other one and then We need the sqs target, which we have right here So let's grab this guy And put him down here and Then we're going to go grab our other s AWS key and Add him in here. There is a difference. This is Upper case. Why did you do that to me? You called AWS not AWS crids Okay, and then now we can go also steal our arm from earlier and throw that guy in and Yeah, now we got to do this thing real quick. So We need to grab from the API version and need this kind and we need this name so it's nice doing references addressable things be having addressable objects because Now I can tear down and replace this sqs target as many times But this is always going to point to that Right addressable, right? If that makes sense if it's the same name in the same namespace from the the same API And kind then you'll always get it there so it looks like No, go away. Looks like we are done here. So let's save that Do yeah, we cleaned up. Okay Let's see if we got it right. So our QC. Oh We got pubs up to sqs. Let's see The pub sub controller was being this morning. He looks happy. So now We should expect when we post to Google Cloud pub sub that we find these messages Whatever we post in a sqs queue, right? So let's let's see if we we did this right. We'll go to messages Publish message Okay, and publish And now let's go here and pull for messages. That's that's cool. We did it, right? So If we look here, we may just expand this This time the the data is going to be base 64 decode encoded We'll decode it just so you don't have to trust me, but There we go Hello scale there. So that's cool That's just a that we didn't do that That's just how we get it and we the way we make sources is we try not to have an opinion We if it emits data, you know, Google emits data in a certain structure We give you that in that structure We're not going to make you learn some arbitrary structure that we made up, right? So this is how it's provided and this is how we move it along so Yeah, that Pretty much wraps up what I have. I wanted to do these other steps But I didn't think we would have time anyways and we've got 20 minutes, but Here's up questions Yes, sir in Yeah, so functions look like this We have a couple right here So if we wanted you're saying if we wanted to throw a function in here, how do we do it? This is Okay, and now effectively well we have one more thing to change We need to do this okay, and now This guy returns it so we have to use a broker actually for this guy But we could grab this one I think all of these actually are meant To be built with a broker. No this guy should have a sync property actually. Yeah, he does So let's do that It's better. We didn't build with a broker, but it's better to build with a broker because the broker provides Guaranteed delivery it also can provide dead later syncs So when you're what we've been doing is syncing directly from a source to a target when you go through a broker the broker Depending on what broker you use can provide a persistence layer, and then it also provides insurance that You got the message that the subscriber got the message And if it didn't you configure a dead letter queue so you can view the events that were not delivered for whatever reason That's why most of our components are designed to work that way. But anyways, so here we go Let's undo that and then get rid of We're just gonna do one here Oh, this is an example of all of them. That's why yeah, this is a horrible example Let's go grab a better starting point. I grabbed that from the docs and That one is not a good one So let's go grab a better one birthday cake Manifest this one I know works Sorry for that. I didn't test every component. I added 500 of them. So and we only want one property Okay, and then let me put this here for my OCD. Okay, and so now We have to change this back or sync because we changed it again my fault and then this guy Did we look at the yeah, we did so this data looks like this So let's just take Let's just take let's take data Data data data Okay, that's what we want and we don't want to be that and Then we'll clear it and then we will say We're going to make a key of Hello and we're going to map that to that data property or variable that we've created And then we're going to rename the event type to trigger mesh scale Okay, which we don't actually have to do because we're not using a broker but Okay, and then we need this to go here So let's change Our reference I'm pretty sure that I have this can this controller should be here Stop did we tear that one down? I can't remember. I don't think we did. No, we didn't let's tear it down Let me do it. Okay, so if we did this right, which I think I did I'm not a hundred percent on the Data dot data, but I'm pretty sure that's the right property So if we did this right, we should expect when we post to a Google Cloud Pub sub source It will go through the transformation as noted by the sync and then when it comes out of the transformation It should go to our sqs target So let's give it a joy publish message No, we put a hello in there Let's do a goodbye. Nobody ever says goodbye. Okay, so let's see. I Didn't even check if the pods spun up Everybody's there. So that's that's a good sign. I did get that wrong. I knew I was gonna miss something That's okay, we we did go through the transformation. I just got the property wrong. I think it should be I Swear, I looked at it right though data That data it should be unless it's just data Hello data. Okay, let's try that. Let's just try data Send one more of these bad guys off That's gonna be confusing So I hope I got it right because I Did not no, that's pop sub Yeah, I don't know why I can't grab that property to be honest with you both. I think I Would have to grab that raw to make sure That's my dated ID just for fun. And then I'm done It's gotta be that one, right? Yeah, I'm I'm failing here, dude. I got something wrong Yeah, yeah, I'm there's like just some zero where there's supposed to be a one But this is how you add it, right? We just plug the component in reroute how we're routing things right now. We're not using a broker So we just go right into the transformation and then out of the transformation we go into the target And that's it Yeah, that's gonna bug me. I guarantee you that's gonna work in five minutes when I'm done Yes, sir, oh like were you a compile and executable and it would just run it that's interesting No, and I've never considered that but maybe maybe soon Yeah, I would have to think about the implications of that one But good question we don't we don't expose that no sir anymore if now I'm gonna wrap up Well, thank you guys Appreciate your time All right, so I'm gonna go ahead and get started if anybody needs to grab any coffee or tea Security is it's in that room over there Otherwise we're gonna go ahead and yeah, definitely go grab it. They're still in there They also have a box of Randy's donuts, I noticed I I saw it with my eyes awesome dude eating it And I was like, hmm, that looks tasty. Maybe after I I'll I'll debate after this, you know All right, so we're gonna start off with unlocking value from time series data with open source tooling So my name is Zoe Stein camp. I'm a developer advocate at influx data. We are the light leading time series database platform So as I just said, I'm a developer advocate before it became a dev advocate I was a front-end software engineer for about seven years mainly working in Angular and React and Then eventually I realized that this was a little bit more fun and interactive with other people as a developer You kind of get siloed into your team and don't ever talk to anyone else. I wanted to do a little bit more So the session agenda today is gonna be a lot of overviews It's gonna be an influx data overview So that's gonna just kind of talk about like, why do you need a time series DB in general? What does it do? What's the difference an actual overview of our platform and how it works? I'm going to do a demo on our cloud platform Because in general just you know harassing everybody with slides can get a little boring and it's gonna be a little bit more Interactive and fun to show you when things don't work Influx DB and AWS how we work together. Yes, we're on the cloud marketplace But there is more to the relationship than just that and one of our newer things that we've released Which is influx DB at the edge. So for those of you who are not aware Edge for our definition is a device that occasionally has interconnectivity issues with the internet Finally questions resource and action items one action item is also at the end I'll re-announce this but anybody who asks a question gets some awesome socks got a whole pack over here And if I have extras you can just grab them. I don't want to take them home So influx data is where developers build IOT real-time analytics and cloud applications marketing Influx data at a glance. So we were founded in 2013 Our HQ was in San Francisco before COVID killed it We were already a pretty much fully remote company people would come in for free lunch on Thursday And otherwise not show up to the office, but that's where our CEO hangs out. Our founder is in New York though So just to clarify they liked a feud about it We build we're developers build real-time applications But that basically means is when you are dealing with real-time data. We're sitting there with you That's what we focus on We are one platform. So one API across multiple clouds and on-prem. I'll go into that in a little bit Just some quick fun numbers. We have over 1,300 customers and about 600,000 daily active open source deployments Those are a few of our customer names some of the bigger ones We are a PLG driven usage and subscription model people always like to ask when we're open source How do you make money and how will you not die as a company? That's how we are going to make money and not die So the rise of time series. So Pay pay users basically I'm trying to remember what this actually stands for now that you just mentioned it It's basically people who pay us for usage like that is how we make money is on the cloud platform You pay us as you go and you use It should just be called pay as you go. I actually don't know why it's Yeah, pay large go So the rise of time series. So here's a good little A Comparison between different databases Obviously one big thing to note about databases and time series databases as well is we are less a competitor of other things as much As we are a complement for that specific data type You are not going to store your like records that you would store in a sequel database into an into s It doesn't make sense. It's going to be a very bad experience for all of us But sometimes you are storing your time series data in sequel and you find out because it gets very large Unruly and angry And that's where we come in we are specifically focused on events metrics time stamp data I always like to talk about IOT devices is a really great example of this They're super noisy. They send you data almost every single millisecond that gets very very big very very fast You want to monitor them on a real time? You do not want to know that your IOT device crashed three days ago. That's not helpful to you You want to know it crashed right now and at that same note You also sometimes want to down down sample your historical data, which isn't common in other DB's normally you want to store things You know for the long term But in a time series database sometimes you kind of want to not do that you want to be like, okay I monitored this today. It didn't break. Whatever. I'm gonna go ahead and aggregate this down to just give me all These data points by the day or by the hour So I'm not storing this super finite data and sometimes you just want to drop it all together It's just like oh it's been past 24 hours. Goodbye data. I don't need you anymore That's obviously not as easy to do in some of these other databases because that's not what they're for which is great I don't want to lose my customer data. That'd be bad. That's why it goes in a sequel DB But sometimes I don't need to care about my plant buddy IOT system and how it was doing 20 days ago So some key drivers for time series applications access Time series data from assets and applications. So what we have here is our telegraph tiger, which I'll go more into but basically it's a open-source ingestion agent and When you actually get that data into mflux you need to start analyzing it So that's going to be things like performance availability and security and finally actually act on it So this is a great little photo of where some of this data might be coming from some of these are actually a We actually are used in these products like we are in the Tesla Powerwall I'll go into that and the other one that I don't mention as often is the virtual So it's going to be things like server monitoring a little bit more Not physical you can't hold it obviously many guys you can kind of hold a server But that's not the point But the big thing with these is just to keep in mind Obviously, there's lots of different server monitoring options out there as anybody who's ever been to you know the internet knows The big thing for us is we're normally used for very specific use cases Sometimes those can be very rigid and what they monitor and what they you know process But we are better for our customers who really need customized use cases So one platform one API so as I've already kind of mentioned a few times here We have our cloud, which is where I would say a lot of people end up going to It's free to start out its pay as you go from there our enterprise Which is mainly just for clients who require on-prem solutions and finally OSS, which is our open source Which you can just download off github run locally or you could run it if you want to be difficult You could run it in a cloud that we already offer for our own cloud, but hey why not make things more complicated So as I mentioned before here is one of our customers using us in the IOT space So when I'm showing off these use cases, it's mainly just to Make sure that everybody you know really drive home why we're what we're used for where we are you know purpose-built So is this Tesla's using us to monitor the health of their solar and power bank? Obviously, these are devices that send in a lot of data They can use this data and in general they can actually use their historically Aggregated data to tell when things are not quite right your solar panel if you've ever bought one before normally tells you I'm expected to give you this amount in this time of the year if it's not doing that that's normally a pretty good indication That something is broken so they can normally use that historical data to not only sell you solar panels But also tell you when things have gone wrong They need to store that data. They stored an influx Disney plus is one of those server Server monitoring that I was talking about they're obviously streaming a ton of stuff You know you want to watch your movies you want to watch the Mandalorian and you want to watch it fast and not have things You know crash out on you. They are monitoring all of their performance on some of their CDNs With influx DB and they specifically used us because they wanted very custom metrics for their servers This is one of our open-source use cases Which just to clarify these are kind of hard to get because you know if you're an open-source user You're not really per se talking to us. You just downloaded off github Luckily though Mike has been talking to our community a lot because he had a few issues getting going So we actually are super familiar with his use case But basically what he's doing is all of the birds of prayer a good majority in the UK have little tags on them They're basically IOT edge devices and what he's doing is he is monitoring those birds and specifically He's getting more data from the tags than they used to get and they're specifically trying to look for the birds Who have their tags? I guess you could say go dark. Unfortunately a lot of the birds have been Killed during COVID and they kind of want that to stop So they are actually using the tags and using our platform and our mapping graph to keep track of all these birds of pray So hopefully their population gets up And we stop hurting them So now i'm going to go ahead into the overview So So we are three things we have our api and tool set For real-time apps the high performance engine and our community and ecosystem So when it comes to tool set we have a restful api which is available across the platform We support multiple querying languages flux influx ql and sql We have our open source integrations which include telegraph our client libraries and our plugins And then our cloud delivery on the three major cloud providers The time series engine the big thing to know with this is it's specifically geared up for high cardinality So that means we're getting and also high throughput So we're getting a lot of data coming in and it processes it super fast That's the important part of the database and it doesn't drop any data that being the second Finally the community and ecosystem We are pretty uh, we actually have quite a large community on I'll go into this later on like slack and stuff like that But sees are some of the tools that we are built up with and I'll go over these a little bit more in depth when I do the demo and obviously the cloud marketplace Visual studio code by the way is on here because we actually offer a plugin that you can use to write up flux queries in visual studio It was asked for by the community So they didn't have to come on to the influx db platform or the open source to do their work They could just do it straight inside vsco So how it works This is a really great You know graph architecture or whatever you want to call it to actually show what I've been talking about But basically the platform is everything underneath the blue line. That's everything the platform comes built in with it Obviously, you could piecemeal some of these out That's why some of these are actually kind of slightly separated But the whole point here is that we're not just a database There's actually quite a few just time series databases and they do the job But we are a little bit more than that. We are yes the database, but we're also good for things like Visualization we have our own alerting system as well as things like down sampling which I mentioned earlier and I'll go more into And then of course the client library is the open source ingestion agent, etc Yes, I don't know where I'll share them, but I can I'm also just going to leave this slide up really quick So when it comes to getting your data into influx, these are the main options that you're going to have The big two that I'm going to be mentioning today are the client libraries and the asian-based push aka telegraph This kind of goes over like why you would use one or the other Or the use cases for using them rather Telegraph to specifically, you know really drive home what this is So telegraph is fully open source. It has about 300 input plugins And a lot of those input plugins are actually supported by either the company or the product that's doing it So for example, aws has a few plugins with telegraph a lot of their own engineers are the ones who work on it It's also used by plenty of our competitors, but you know, it's open source. That's how it works It's pretty much completely like I said driven by the community. We are the caretakers We make sure you know nothing malicious or crazy is going on, but luckily we don't have to write everything ourselves. Thank goodness Uh, it's also, you know, simple to configure and extremely flexible So this I haven't really talked too much about but fluxes are internal querying and analyzing language It's basically the flux to influx that sequel is to sequel databases But what it allows you to do is specifically to query on time series data And it was specifically built and written for that data type So it's a lot faster at handling it as you can see it reads in my opinion a lot like Python kind of or JavaScript some people have told me but regardless it's pretty readable It's pretty straightforward even in like this one right here what it's saying It's basically and for those who can't read in the back It's saying a time range go through this time range of data A filter like I want to see only the city iot a measurement of bicycle And then finally an aggregate window, which is a really powerful tool Which will basically aggregate down that data of that data So like I was talking about before with aggregation and down sampling. That's how you would do it These are just some of the examples of what you can do with flux Obviously you can query pretty straightforward It also has some things that some of you guys might recognize from sequel being joins and pivots It also allows you to analyze with things like anomaly detection Down sampling and correlation and then finally action now We do have a GUI for our alerts and notifications, but technically you could also write them in flux Task which are again basically just cron jobs. This is a GUI of what a task might look like But for some people what they'll do is they will once their data comes in they'll run a task that says Drop these specific columns clean this up and aggregate it down to the minute or the hour And then they'll put that new data into a bucket and that's the bucket They'll actually run all their other queries on or they might start separating things out But basically tasks are cron jobs that you can run as often as you need to or as often as you like That manipulate your data This is again the GUI version of the checks and alerts. It's just a little bit easier to see and I'll demo this as well But basically it's pretty straightforward a check You're just seeing if data goes above a blow or around a certain threshold that you dislike You pick an endpoint where you would like that notification to be sent For the GUI we have paid your duty rest and slack And finally a rule which basically just says how often you want to be notified and at what point So with this one it's saying when status is equal to critical. Please send me a slack message and let me know So i'm going to go ahead to the demo Oh no And of course it logged me right on out Here we go All right, so one of the nice things about being in here is I can actually show things and it's a little bit more Simple to understand. So here we have loading in data. So we have our i'm going to actually start over here We have our buckets. This is pretty straightforward. This is where you actually put your data You can name them as you like you can also start to tag them as you can see I have a bunch of relatively random ones. I suppose you could call it And then when it actually comes to getting your data in We have things like file uploads. So that's really great if you already have historical data We do actually have ports if you want to bring it over from like a sequel db or something But in general sometimes you just want to you know, do a csv upload The client libraries are a really powerful tool out. This one's actually the brand new here I'm going to go into Oh, no, I guess they all got updated. All right, very well. So basically what this does is it walks you through Um, how to get this all set up But the end result is that all you have to do pretty much is normally just install a library That's how you would do it in python You go ahead and export your token. You initialize your client with these lines of code. It's pretty straightforward It's about six lines And then finally you can just immediately start writing your your data up So this one's already preset to my bike bucket because it's been selected up here at the top But basically with this I'm going ahead and writing up to the bucket with a point And I'm giving it a tag in a field in a measurement And that's pretty much all the code you need to get this going. I would say it It only takes about five seconds to get this all set up. It's actually pretty quick I've done it in less than 10, but every single one of these languages has that option So it's pretty straightforward. You just download in the library and in six lines of code. You can just start going native subscriptions which includes mqtt And then all of our plugins which I were meant which I was mentioning before And I will be using one of the telegraph plugins for a demo though one of the less exciting ones Um, as you can see most of them I would say are of the professional variety But occasionally we get some fun ones like counter strike global. Sometimes people really like to track their gaming in here So I'm going to go ahead and let's see I'm going to go ahead and add data to this one I'm going to use a telegraph agent And specifically I'm going to use the system one So what this does is it's just going to monitor my computer I'm not going to configure it in any way. I just don't mind it can just be straightforward But normally you would get a few different input options where you could say something like how often you would want the telegraph agent To send data how big the file should be it's got a lot of options depending on the plugin that you use for this one It's so straightforward. It's not necessary Then I'm going to go ahead and run this in my terminal And go ahead and give the startup And this will immediately give me some data for this demo And it will probably just say that my back book is running pretty heavy I don't know. I'm not running zoom. So it should be relatively happy I've done this listen for data a few times that it never seems to work for me It works for everybody else but me But normally I do still end up getting data in here. So let's go ahead into this So yeah, you can see I'm in my system test. I can check some cpu's on here on my macbook Let's see this might need to be a little smaller because The data is only just coming in. Oh, there's my little data point Here I might just need to pick something else Being difficult will come back. That's all right Sometimes it just takes a little bit of time for this to all load in obviously One other thing to note is we offer API tokens these allow you to basically This is the token that you would use if you're using the client libraries that I mentioned earlier They can also be used in like an HTTP request as well One thing to note we allow an all access token which can get into all of your buckets and data and destroy everything Or we give you the option for a custom where you can pick things for when you don't trust people or even yourself So that's just one thing to note. Let me see if this now is working. There we go much better. Yay data. So basically This is one of our obviously visualizations. This is a pretty straightforward one. It's just called data explorer It's great if you just want to really quickly go in and see stuff We have a few different graph types. Most of this data is going to be terrible I would say for most of these outside of you know, the line graph. I mean, I guess the scatter looks okay It's not very exciting But one really cool thing that I want to show off is when you so you go through this you're gooeying You know, you're filtering down on this when you switch over to script editor. It actually gives you the flux code So this is how I write flux code and pretend I can write it It's really great and powerful and it also already gives you all of these Flux functions on the side as well for people who are more adventurous than me And actually want to you know, find some of this information. I'm partially joking I do know how to use this for the most part Um and another thing you can do is you can view your raw data tables So as you could see from here, this is a pretty good example of how this data looks in here It's got an rfc 339 timestamp which reads a little bit funny, but for the most part makes sense I'm getting values for my cpu's my loads and my overall system And yeah, this is kind of cool to see as the raw data So let's go into a little bit more exciting stuff like notebooks So we have two options when it comes to visualization We have dashboards, which you can imagine is like a static I want to answer the same question every day option Like server monitoring or you can look at notebooks as the more fun I have a different question every afternoon. I'm going to keep going back to the data This is a lot more fluid. It's a lot more scrollable It's also more friendly if you're working with multiple people like co-workers And one really nice thing I actually really like about this is it comes with the sample data Which I think is really cool because it allows you to kind of play around with the system Without having to upload your own data. Maybe you just don't have any or you're just a little bit curious I'm actually going to do this one because it shows off the mapping. Let's give it a sec here Okay, cool I'm going to go ahead and run it Scroll down This one is specifically built for our map data I'm going to go ahead and customize this Aha, there's been some earthquakes here recently in the past hour. This is even more fun if I do the past 24 hours Ah, there's the San Andreas fault line So, yeah, this is a really cool little thing that you can use here to kind of check out Like I said, this one is specifically for our map data Because obviously it already has the latitude and longitude in it But you also have the options of using data like air sensor or coin base bitcoin price That would be a kind of sad graph And what's also really cool down here is you can start to add things like I could add like a table I could add another graph. I could add a note something like Jay stopped touching my my notebook leave it alone Um, and then I can also obviously do things like a query or a flux script builder Query builder is the ui version of flux and then the flux script is just pretty straightforward But as you can see this one obviously just kind of well, don't get caught the map It just kind of expands and then you can just scroll down So it's a lot easier to read your data down like that You can also set up alerts and tasks right here as well Which is kind of nice because everything is just in one spot. It's in one place So you could name your notebook something like notify of la earthquakes And then you could do another notebook where you do something like notify of san francisco earthquakes And you could change that data You could set the geodata to only be in the relative region of la Just a you know quick example of how these might work out I'm not going to build out a whole dashboard because that would just take way too much time Let's see if we can get this plant buddy one if I do the data for like the past 30 days Because I've definitely run it recently. There we go. So like I said before this is great for overall static data So with this as you can see I have a little gauge here, which says the plant is drowning The average soil moisture Soil temperature and air temperature. This is all from a at home plant monitoring system But iot devices from amazon send weird sketchy information which needs to be cleaned up and I haven't done it Which is why the light says it's a 1000. I don't know what 1000 light means, but it means the plant was getting light This is some of the fun parts of a iot sensor data in general just in fyi. This happens even at a much larger scale These are the tasks which whenever you create a task off of the dashboards it creates this weird gibberish title It really makes a lot of sense. It makes it easy for me to understand But if you write your own test this one is one that we specifically wrote for the plant buddy And basically what it's doing i'm kind of waiting for all the colors to come up But basically what it's doing here is it's sending an end point to twilio We specifically coded this in flux, which is why we have twilio as an option Secret here being that if you code in flux you can send your data to anywhere not just paid your duty or slack But that does involve you know a little extra steps But basically what we're doing here is we're saying if the plant's moisture is below a certain Number i'm looking for where that number is. Oh, there it is if it's below 30 Please go ahead and send zoe specifically me a text that says I am thirsty and require water It's really funny when i'm at the booth and running this and then like this only sets to run every like eight hours or something And i'll get a text later on in the day like i'm thirsty give me water and i'm always kind of like who is texting me Where is this coming from? And then eventually I realize it's the plant from earlier because we never watered the plant at the booth let it be known That poor thing is about to die always So this is one example of a really cool little test or sorry task Like I said before the other big one is going to be down sampling Which actually let me see if I can just open up my visual studio code so I can show you guys It should automatically open to hopefully the project I was already in With any luck This is our very fun little plant buddy one. Let's see. Where is it? Looking for something in particular Don't know if my task is in here Oh man Well regardless I am not going to go hunt through all this but basically in here What we actually do is when we do our Edge to cloud data We actually aggregate our data down in here to be by the minute instead of by the second Because originally the plant sensors obviously send data and by the second But what we do is we aggregate and down sample down to be by the minute And honestly, this is a house plant sitting in my house It could be by like the day and it wouldn't be perfectly finite amount of data But it definitely didn't need to be by the second So we did it just for our own sake and also because I never empty out my data And I don't pay for anything because I'm an employee and I'm secretly, you know Storing just a ton of data on influx for free And in that way we also have our this is the GUI of our alert system So obviously the task was being used as an alert It was just, you know, a whole flux file of basically an alert combined with the task It's not uncommon But one thing to note is we have our dead man and our threshold check The dead man is literally to make sure that everything is operating and not dead So if your data source starts to not get data It will actually alert you and be like, hey, I haven't heard anything for like 10 minutes. You know, what's up? Obviously, this is not great with edge devices But with normal devices that you expect to constantly be hooked up Super useful to know that something is wrong A threshold check is probably the more common one And with this one, let's go ahead and grab my system data I don't know what's normal for CPUs It doesn't look like it's doing anything actually so we're going to pick load instead There we go. We've at least got some movement So with this I could go ahead and set something like the okay level is And I can actually move this if I or no, I guess I can't So I could just set this down here to be like three and I guess that's normal I could do info when it's like five and then I could do warning when it's like eight Or nine or something like that There we go As you can see my CPUs are nowhere close to these wonderful lines because it's all the way down there And currently I've got the critical. Let's set it to like if it's zero or something or Let's see one I can set this critical for if it's below There we go So as you can see, this is pretty like easy to set up and you do not have to do all four of these by the way I just like looking at all the pretty colors You could just do one or two But basically the gist here is you could pick out your data and you could be like, hey If it gets above certain points, please let me know whether that just be an info Whether that be a warning or a critical And just a note on this you do not have to notify on this you could just have checks running And basically what they do is they produce. Uh, let's see if we can get to come up No, that's not what I wanted They produce A history there it is This one won't have any because I haven't run it recently But basically they will produce a history that you could read through And what some people actually do is they have checks on their data and then occasionally a person will come in once a day Once a week and they'll just look through the check history It's kind of like a logger for them, but they're not actually doing anything Real-time with that data. I know it seems kind of weird because we're all about the real-time monitoring But people use us for whatever they want to use us for in the end. So that is an option Like I said before the notification endpoints are pretty straightforward You set up one that you like you give it, you know the incoming web hook url That basically just says send the data onward to this And then the actual notification rules Which again you can set to be something like when it changes from or when it's equal to Can you please go ahead and send me a message and send me a message every obviously one minute so I can get angry at you Or maybe every you know two days so that way then nobody knows anything's broken. It's all a secret Variables pretty straightforward. These are just all the tags that you see on some of your data This is great if you're dealing with actually a lot of different types of data I don't really use this because I just create gibberish data. So I have gibberish tags Templates are actually really cool They are basically kind of like the server monitoring and that you can already build out with them So like they're already pre-built out for you. So for example, let's see We have a docker hub one that seems kind of useful cool So it already comes pre-built out with a docker hub status and a few of the Values that you might want to see in your dashboard are here Basically what you can do is you install this and then it automatically starts out with the telegraph configuration docker hub got Comfig it gives you the dashboard with all the pre-built out Values it also gives you a label docker hub and a bucket docker underscore hub But basically this is to set everything up for you. It makes it a lot faster to get going These are all open source by the way. So some of these are built by our employees Some of these are built by people who you know just really want to build their own and share it But in general this is a really great way to just get started and get going Some of them are even more fun and not serious like the fortnight player performance. Definitely not serious But most of these are going to be you know boring job stuff unfortunate But moving on from that labels very similar to variables Secrets doesn't matter. They're a secret. Um, so basically that is the overall uh demo for the Uh cloud platform. So i'm going to go back into this But now now we have a much better idea on how this all works How some of the options work the um the capabilities that you have and just to clarify because I didn't think I mentioned it That cloud ui that I was just showing it looks exactly the same as the uh open source ui as well So like if you go ahead on to get hub you download open source and you run it locally on a local host That is pretty much exactly what it's going to look like I just don't want to run things locally right now. My laptop is a little old and struggles So influx db and aws So here are a few of our aws input plugins So with this we offer specifically aws cloud watch alarm alarms and statistics We also have one for amazon ecs input plugins As well as aws I don't think I've ever pronounced this kinesis It's a consumer input plugin that reads from the kinesis data stream Sorry, I've said most of the other aws ones before but not that one So we also have a few aws output plugins that includes the cloud watch output plugin Which will send logs and metrics to aws cloud watch That blue one is specifically the time stream output plugin Which writes metrics to the amazon time stream service And finally the amazon kinesis output plugin that is the experimental plugin still in its early stages of development But basically it'll do a batch up of all the points into one-put request to kinesis kinesis, thank you Sorry guys kinesis And then finally telegraph aws processor plugins. The only one that we have currently is the aws ec2 metadata process plugin Which basically appends metadata gathered from aws imds to metrics associated with ec2 instances Another thing that we're currently working on together with aws is greengrass So for those who are not aware aws greengrass otherwise known as aws iot greengrass core software It runs on windows and linux based distributions such as an ubuntu or raspberry pi os And basically what it does is it gathers that data from those iot use cases And specifically in here. This is a relative model of how this would look You in you deploy influx db as a open source component you connect to telemetry components You write the data to influx db and finally you can display that data in grafana Obviously we have our own visualization which i showed you guys But many people like to use this with grafana and we do have an output plugin for it in particular And finally with this one in particular. They're actually replicating the data to influx db cloud we're actually going to be having a A github we're currently working on it basically but a github code example for this We're working with greengrass on a plugin and integration So right now that isn't fully complete. So this is just a very small example of how they used it about a year ago And this is going to be updated. It just wasn't updated in time for this conference So at the edge So this is something that we are really really excited to be talking about because we've been in development for it for like Forever but definitely for at least six months at this point But basically every industry has an edge normally when we think about edge devices We think about a tag on a hawk or something in the middle of nowhere Maybe out on a farm or something But the reality is almost every industry has an edge device These are just a few examples that you can see Obviously, these are things that would come in and out of wi-fi Especially things like customer mobile apps, which are definitely known to do that And these are all examples of where you might need to use edge to cloud replication So one of the new powerful features that influx db is offering Is that basically when you're running an influx db open source instance You are going to have a bucket on that instance And in that bucket, we are going to have a disk back disk backed queue Which will basically store that data. So even when you have lost wi-fi, you've lost, you know, interconnectivity It will still have that data stored there And with that data, you could obviously run flux queries locally like I do for my plant buddy And but once you are reconnected, it will then batch send up to the cloud While keeping all of your data stored safely on your local device What this basically does is it solves the obvious problem of You're trying to store stuff locally, but it's not really configured properly And now you've lost your data or possibly it double writes You know it reconnects a few times and it just can't, you know, keep up or keep track And it just ends up, you know, double writing a bunch of stuff, which is not helpful But this was specifically built out for us to specifically go from an open source instance into a cloud instance With edge data replication in mind So what we basically have here are two apis And two cli commands that basically define the remote connection and then define where you want that data to be replicated So basically you say these are my remote buckets. These are the buckets that I need that dispatch storage And this is where I want that data replicated when everything interconnects I actually use this a lot with our plant buddy demo at the booth because sometimes my laptop goes to sleep Or we lose wi-fi because it's at a conference center and unlike here There's not routers all over the floor half the time. So internet sketchy at best And this is actually really helpful because it's actually really funny. I'll go to lunch And our graph will have, you know, a big gap on it because the plant's data is no longer being sent up because I went out to lunch And the computer turned off I reconnect it and all the data just gets pushed right up now I'm dealing with obviously a very small amount of data We do have limits to how much you can just push obviously you can't just like send terabytes of data up a file But in the end it will all end up on the cloud as long as it's, you know, connected long enough And the data is not ridiculously massive You can find all that information somewhere on our website I am sure I just don't have those exact details but basically This is obviously like I already kind of talked about this is just adding value to the influx db ecosystem We work a lot with iot devices and in general edge devices Like I said, it can't even be something like a plant demo at the booth You never know when you're going to lose interconnectivity to be honest outside of maybe at like a server farm or something that is Really interconnected a lot of things could lose Wi-fi and then you could lose data So for a lot of our users, they're actually really excited about this because they're going to use it For things that we wouldn't necessarily really consider edge devices, but more of just like a backup I'm kind of nervous that I might lose my data one day and this is a great backup solution These are just some of the nice things about this Obviously durable with that native cube Automatic it's not super complicated code to get this going. It's literally like two cli lines. It's super simple And it's very flexible This is literally the iot edge example of my plant buddy This is the plant right there and as you can see I'm running this is supposed to be an example of the fact that I am running a Flux query here to down sample it into the down sample bucket So what's funny here is I'm actually getting my data into my first bucket called plant buddy Which I am not actually backing up per se because then I'm immediately running a flux query To down sample that data and that's the bucket that's actually being replicated up into cloud So that's the other powerful tools there is you can actually just go ahead And Sorry to analyze you can go ahead and manipulate or down sample your data right there on the edge device And then you can only upload the one that you actually want, you know the one that you've already made smaller or make cleaner So these are fun. Um, so this is for try it yourself So this is a hash code to our qr code. Sorry. This is a qr code to our cloud website The other one is to our open source and our community. That's where you can find all of the code including not Including for some of our examples like right here. You can see my plant buddy project hanging out But this is also where you'll find stuff like telegraph Our you know influx cb platform, etc One quick shout. I want to give is to influx db university. So with influx db university This is they pretty much helped me sponsor out this course But basically it's a great spot to learn more about telegraph or flux Or influx db deep dive just in general It's a great resource to go over some of the more advanced topics and really become familiar with the platform And it's completely free. You don't have to pay anything for it These are just a few further resources I normally just put these all in one slide just so that way then they're a little bit easier to access But basically the getting started the forms in the slack So one thing to note forms are great if you have a question and you think somebody else has already asked it It's very easy to search slack is kind of a disaster zone But slack is great if you just want to interact with the community or you have a new question That is a perfect spot to go or you just want to talk to us. That's fine, too. We're there And we also have our github which again as I just mentioned all the code lives there Also, if you want to put up issues Even if you want to first talk to us about it in slack, you could just go straight to github We accept them there, too Our book which kind of talks a little bit about the original reasoning behind influx And it's not as you know docky as our docks. It's a little bit more fun and readable Our blogs which talk about not only our use cases for like our big customers But also some of the more fun ones like plant buddy or somebody one of our co-workers has been monitoring their grill for smoking meat I really want to try it at this point because i'm tired of reading about it And then obviously influx db university So yeah, if there's any questions, you're welcome to ask We have you are correct. We have c sharp. I'm afraid we do not have c plus plus And that being said though, you can there might already be a request for it or you could request We are open source. So I have I swear some of these have already come from some of our users for sure so like Sorry, so can you clarify are you talking about like you wanted to schedule something on your own machine to run or You could probably use the client libraries You could write something that would specifically pull that data into Maybe like a python file or a cron job file and then write it up to influx That would probably be the fastest like way to go especially Now some people will actually create telegraph plugins just to you know get around that But if it's especially closed, you know, it's not safe or whatever. It's for security reasons I would say the client libraries are probably going to be your fastest bet Yes, that's right. You can't just so one thing to note is you can't piece me less This all comes don't worry. We're super super light github package, but this all does come pre-built in I'm going to actually go back to just the main page Are there any other further questions? Yes, I can There we go here. I'm gonna full screen this as well Can you have a question? So I I'm not personally super familiar with elastic, but from what I understand It's some people do put some of their time series data in elastic and it works out Okay, I think it just depends on the data that you're putting in Because obviously elastic elastic does deal with a heavy amount of data since they are like a search You know, they're mainly used a lot in like searching in logs Um And that is one thing to know is we're more events based than we are log based We're not quite as good for that yet. We're rolling out some new features Which will make us a little bit better on that front So that is one thing to keep in mind. I'm sorry. That doesn't fully answer the question I think that Yeah, that happens a lot in general is that people will use other databases to you know, solve their time series Data problems and I will say sometimes it works just fine And that is one thing is if you came to me and you were like, I'm not dealing with any pain points I would be like, that's great. Then stay where you are. I mean clearly it's freaking working Because I've definitely had people come up to me and they're like, well, why should I switch? And I'm like, well, are you in pain and they're like, well, no and you're like, well then don't Like give yourself extra work who needs that But if it is giving you pain points, you can definitely look into influx You can always ask us or on the community and just be like, hey, I have this issue We'll just solve this issue and if we're like, we'll be honest with you We'll be like, actually no, this is not going to help you at all or we'll be like, actually this will solve some things It's one of those things where there's nothing there's nothing perfect out there in the world and databases in general There's always going to be cons and pros and some jobs are just better dealt with and other solutions than us and we like to be honest and upfront about that because It's a lot it's a lot to I can tell you right now. It is a lot to move data off of databases It is extremely painful and it is not fun And it's probably the biggest reason why sometimes users are really angry at us Is because they're moving their data from somewhere else and we have to help them and it sucks So it's best to get it right the first time for sure any other questions And there is no other questions everybody who asked and everybody else in this room you're welcome Look at these wonderful socks You are welcome to come get some awesome socks Look it has a database on it They're very unintrusive and they they're one size fit all feet. We are non-discriminatory on feet But yeah, everyone have a fantastic time this week The week slash weekend, I guess we're going to bleed in for sure Oh That makes total sense because our main um, I don't know what you call it slogan Is a time to awesome which basically means we want to make developers life as easy as possible So the time to awesome is faster And that is a big thing is because we're open source in general We do listen to the devs we listen to people when they come to us with complaints and that's how we build out I mean that's where edge data replication came from is people doing janky solutions and complaining and us going We should fix this guys like we should just make this a lot easier That's where every that's where every tool came from it all came from Hey, I'm a c++ developer. Where is my plugin and we're like we can help you with that Or it was I really need to do alerting and I'm doing it on some other weird thing And we're like, let's just put this all on the platform. So you stop hacking around basically But awesome. Yeah, like I said, don't be afraid come on up and grab some socks There's only super skeptical about Anyway, jump some slides, I'm just gonna talk if we're not recording because I think everybody can hear We are Well, if we are recording I'm not live Before it was hot. How's that? Is that too much beard noise? Okay, great So I'm gonna tell you a little bit about what I've observed. I work for a company called data stacks anybody ever heard of data stacks We've got one we've got two Rags, I'm so glad you raised it just now I'm so glad you raised your hand on that one rags because that would be very confusing if you hadn't heard of it so data stacks is the company that Primarily cares for the Apache Cassandra project at Apache Cassandra Up until maybe five years ago This is this idea in computing that you have one of two types of problems You're either trying to make impossible things possible Or you're trying to make hard things easier and they're two very different types of work And for a long time Cassandra was about making impossible things possible. I need data First order data east coast west coast and europe and I needed to be mostly in sync because I can't Make people in india pay a tax to access my system because it's slower It's a hard problem It's gotten any easier more people do it There are a lot more no sequel databases in the world. They're all a little bit different but I've been working with Cassandra for a little while and this is kind of a Me gelling my ideas From a data perspective about how I've seen people build Applications that geo scale. I don't know. I might be coining that term But we talk a lot about vertical scaling or horizontal scaling but that's all still like A piece of butter in the middle of the toast Right like a big stack of butter in the middle of the toast And the toast is very big and it's a long way to that vertical or horizontal scale And geo scaling is a different problem I think we're all going to live here. I think the line between A cache and a database Is blurring as applications aren't just about serving people in one factory Or serving people well in like one region So that's me. That's a much better picture of me when you think of me later think of that picture Data stacks We want you to use Cassandra Is the shorter version of that I've talked in this framework A few times About the three characteristics I want to see When I see A big distributed system big is such a weird word. It's so subjective, right? Like what does that mean big? This is You know the the Account correlation behind capital ones all over their assets This is I haven't personally gotten to work on this but this is netflix's Measurement of what you're watching This is Data that has this ubiquity to the let me go to grant here this ubiquity to the human experience across the globe, right? And it doesn't make sense for it to sit In arlington virginia Or alexandria or wherever you at first design It avoids these complicated bottlenecks That create that penalty I'll try to break that down a little bit and then you guys can call me out if i'm just being a crazy person We need repeatability and I bet we're going to talk about kubernetes on that one But the more surprise we experience The harder it is for us to maintain and resolve issues So the more common things are the more repeatable they are the better And we need observability I'm not going to talk about observability not in this presentation. There's a whole track about observability It's a big deal get good at it Cassandra was super successful with the people that would read the system logs that came out of cassandra and hard for everybody else And that was an observability problem We've tried to do a lot of work on that but make sure you can see what's going on before you get started autonomy so ideally In this big magical globally distributed system, we'd have a bunch of workers doing things But the more those workers have to interact The more bottlenecks are in our system And that can be okay in a single data center where our mean latency between two machines is what? Less than a millisecond It's nothing but if we have to build consensus Between us east and us west that's a hundred milliseconds, right? That's time That bottlenecks and locks and adds complications and forces you to run into problems when you start to hit scale Because anything that slows you down stacks you up Do you guys anybody? I'm going weird here. I actually did just throw up a slide with the Oxford English Dictionary Anybody ever look at the Oxford English Dictionary anybody know the history of the Oxford English Dictionary? Do you That's close to the copy that my dad had when I was a kid That's the the combat Oxford English dictionary. It came with its own magnified glass Somebody decided we talk about like big data and these distributed computing problems And we think that they started in like 1973 A group decided to make a dictionary Of the first occurrence of every word in the English language and it's 40 000 words 400 000 words It's big. This is like post middle English And so some editors sat down and tried to go through everything anybody ever wrote in English And find the first occurrence of words And they thought it would take 10 years um It it didn't it took a lot longer than that, but the thing that actually allowed them to fix it Was one of the first big data distributed workloads that I remember hearing about Because what they did Was they made a list of words that they cared about and they mailed it to every reader They put it up in bookshops And all of those readers would come in and say oh, I found an interesting version of the word abacus In this book and they'd all mail it back in and they'd correlate it And then the editors having worked with high the whole field working with high autonomy Could then do that last little bit to try to find the best reference and the book This would never have been completed if not for that and this is To me a major Major thing that humans were able to pull off to bring that back to This whole cloud data topic um Our developers spend a lot of time Talking about autonomy in their coach and they do a lot of work on it They're people that are big proponents of the actor pattern They're people that like to talk about how all the people just don't see the value of functional programming I'll do that later if you want me to Um, we have this idea of share nothing architectures or microservices and these are all about isolating things out So they all these people can talk to this beautiful lambda. There's no connection between those lambdas, right? That's perfect autonomy, right? Until it hits the database Databases are beautiful an incredible amount of work went into Relational databases and what it means to relate data how these things Correlate and how we can keep them sort of perfectly consistent But it also destroys our isolation Anybody here familiar with the cap theorem? Rags is If you ask database people They'll tell you they know what the cap theorem is and they'll either tell you they like it or they hate it or why it's stupid or why it's amazing But the thing about the cap theorem and and I there's a reason I bring this to like a discussion from the aws perspective, right? And to this big like distributed cloud discussion The cap theorem is worth reading. It's a couple of pages. It's nothing and it's all about network clocks But the cap theorem wasn't about databases It was about distributed systems It said look if you're going to build a big distributed system There are three things you might like you might want perfect consistency if I get a birthday invite from tokyo and somebody from Berlin asks me if I've been invited there should be no delay Between the time I got the invite And my answer that I was invited that's consistency Remember I mentioned that 100 millisecond delay That's entropy against consistency. You don't see it in a single region But as soon as you start to spread that butter out you have to deal with it The other thing that would be ideal is perfect availability, right? like We want the system to answer our question every time because if the database doesn't answer Or see I just did it again if our distributed system doesn't answer then it's broken. It's down, right? And the last thing that cap said we have to contend with is this idea of partition tolerance. It turns out That computers break and the more computers you use in a system The more you're likely to have to contend with that So if you have one copy of your data somewhere in a big distributed system When that partitions offer breaks You have no means of dealing with that and cap was one of these, you know pick two theorems, right? You can be consistent and available. You can be consistent and partition tolerant. You can be available and partition tolerant It's not really true. It's actually one of the criticisms of cap. It's really about consistent verses available I say that database people talk about this a whole lot because a lot of us from a data layer say My data layer is consistent. The truth exists here Everything is linearizable. I'm acid compliant Right here down And so this is perfect and let no one impugn the quality of my data. That's not true. Truth is up here It's coming in from the outside And those lambdas that we talked about have different execution times. They have retry logic They have lots of things that change the way they execute. So If you enforce truth here You ignore what the truth was up here And this is one of the reasons why I would argue Oh wait, I forgot my my fud slide about geo scale. Let me scare you a little bit more We talked about that latency between regions, right? So that consistency promise if we just do it right here where I live Well What about the hundred milliseconds it took for somebody from over here to send that message and it got to the database later and they lose Consistency says they matter less Right than the person that was here because they have to go further over the wire and truth only exists in our database server So it doesn't matter who was actually first So the first thing Before we get into a bunch of really boring slides Is I just want you to take a minute if you're thinking about building distributed systems And step back from your fear Of perfect consistency of the database layer And start to think about what it would mean To embrace an available partition tolerant database a database that's gonna get the reads in I'm gonna talk to you or get the rights and I'm gonna talk to you a little bit About how we would build for that. I've got way too many slides on I'm probably gonna blow through them And get to the other good stuff at the end But a couple of things if I've convinced you at all To think about when you're building this way Is first the idea that you should prefer prefer events to transactions If someone increases something a little bit Don't try to update the whole row capture that the thing was increased Um, we'll show this a little bit. You should prefer denormalization To joins because every time you do a join You have to build a consensus and spread across a bunch of nodes And if you call that from that other side of the world A lot of times it's got to come I mean you might be lucky and have read replicas and be able to do it But you start to do that on a right And you're locking a bunch of resources into a dance. They have to resolve um Treat the data layer as a journal reconcile state at read time if anybody's ever This is a dated reference if anybody's ever kept a checkbook Unreconciled it or built financial software people in finance already know this They don't give you money because their database says you have x dollars They give you money if your database said you had x dollars x amount of time ago Because they understand the latencies in their system They know that just because the database says it doesn't mean that something couldn't be clearing from two days ago So that leads us to this next point of understanding the potential system latencies And how they impact certainty in your logic but I think this is a pretty big deal Because again, we just check the box and say my database is consistent. So everything's fine But again, the truth is out there the check is in the mail And we need to reserve it. We need to deal with it whenever it was written When it entered our system instead of when we received it in a lot of cases particularly when we're doing geo scaling And then the last one. Oh, I didn't know I wanted that now I do The last one so You guys have probably heard in the news a bunch of social media networks have gotten launched for different people for different reasons I can monitor their social media networks and I could tell you whether or not they're going to crash And I do it with one simple heuristic Do they have an auto incrementing key? If they have an auto incrementing key, they are going to fail Every time because auto incrementing keys are the worst version of consistent Every row I have to write I have to get consensus across my entire data layer To build an auto incrementing key It's a smell and if you want to scale That's a good place to start thinking about the point that I'm making why that's dangerous and how you could deal with that So Cassandra, how am I doing? I feel kind of wordy, but we're not going we're doing okay. Anybody asleep yet? Anybody have questions about that somebody must have wondered over from the postgres session and it's deeply offended by what I just said Nobody Cassandra deals with this availability first design By saying look, we've got a ring of nodes This ring is this idea like we're in a data center or whatever we have 15 computers We can use for this problem each of them as a node So we're going to coordinate them into a Cassandra data center This is going to get horrible too because data center is such an overloaded term like cluster or whatever, but We've got this ring And we want to write data into it So we're going to and we'll deal with this a little bit. We're going to look at something I don't want to join the Hilton honors network Maybe I do we'll see what happens with my slides When data gets written this is really all Cassandra is to you know with a little bit extra When data gets written to Cassandra I wonder if I can just I can sign in for guest wi-fi This will go down here And we'll all be aware of it the whole time When data is written to a Cassandra data center that data can actually be written to any of these nodes It doesn't matter And that node will work as a coordinator send it to one of the typically three replicated replicas that own it And then return back to the writer It worked based on how durable a promise they want it doesn't actually change the way it writes But you can say I only care if one replica gets it confirm it then Which is roughly the equivalent of I'm gonna hand out birthday invitations Hey, did you hand out all the birthday invitation? Well, I handed I can guarantee two of them got out It's gonna happen in the background. That's actually a horrible analogy But it's gonna happen in the background. It just depends on how much I want to audit that process as a writer That write goes into replicas that write also immediately goes to other data centers That could be us east us west That could be On-prem and in the cloud There's time here In that replication. We don't wait for that time to come back to you. It's a hundred milliseconds Or again understand your system latencies when you're modeling don't write here and then immediately read here You know give it a hundred milliseconds if you can wait for that to happen, but Those rights are replicated The reads come in and can read from one or often they'll read from quorum half plus one So they know they're going to catch anything that was written And that means anytime we read or write we involve three nodes three nodes means Nothing ever gets more complicated as we add notes I recorded a podcast for aws containers in the cloud anybody know about it. It actually seems pretty good I don't think enough people watch it We just ran a 1200 note Cassandra cluster on eks With near linear scalability You add a node you get the resources You add a node you get the resources you add a node you get the resources It turns out about 1200 nodes the gossip protocol blows up and you don't get any more resources bad things happen But we continue to be able to add nodes and we weren't getting 10 back We were getting 90 back with each node we added and this is why um You also get single millisecond single digit millisecond latency Uh data is shared in real time on that back plane And any of the nodes can deal with writes or reads. This is actually kind of profound because if you're uh used to working with Consistency oriented databases, they'll tend to have a right leader And that right leader will throttle any right activity Again, we talked a little bit about some things you need to do if you do things this way But Cassandra because of this can take these massive write workloads Because it only needs to involve a little bit of the cluster and it doesn't have to single point it At that leader for each write that comes in I'm gonna lose my all right So I alluded to this already. It means availability across the globe, right? We have that replication We can be hybrid. We can be don't tell the AWS guys I can't believe you put google in your example. We can be multi cloud Uh, we can be multi region, which is the most common version of this that you see Um, and that means that we can keep the data Near the input or the consumer. That's why I say databases and caches are gonna get a little fuzzy particularly as we get further out to the edge, right? um But that means High availability and low latency from consumers I know a company and it happens to be my company and this is super embarrassing because we're supposed to be good at this who Siloed the control plane for one of the applications they released in the east coast and then sold that application in india And it took a hot minute to realize This was miserable for them Horrible experience. We should have probably known better given our background, but you know people make mistakes so I'm gonna go through this kind of quickly Are there sequel people here? Are there people that build databases here? Does anybody actually design third normal form and The short version of this is when I build a relational database, I describe the data And I model the data. Was that like a half hand? Okay When I when I build Relational i'm building this sort of perfect notional form of the data This like platonic ideal of how the pieces of the data come together And when I'm doing Cassandra query modeling I'm doing some of that But the other thing that I'm doing is I'm actually looking at how people access the data And trying to reduce the amount of replicas that people have to talk to When they talk to the data Um I really feel like I've got A bunch of good slides here that I'm happy to make available that talk about what this looks like in a data model But I really feel like I'm gonna hear some snoring If I go heavily into it I'll stop on this one though. Um, when you're modeling data Cassandra has this idea Of a primary key that speaks to the uniqueness of a row It has this idea of a partition key that tells us what node That row lives on right because the big thing we need to do is traffic route if I get a read Who do I ask if I get a write? Where do I put? it has an idea of The ability to control how large a partition is this is probably worth calling out Someday I need to write a book Describing all the different uh no sequel databases is like office politics You can actually get an idea how all these play together because this is all pretty esoteric If you hadn't gotten into it, but there are Key value stores anybody know a key value store? No, no, no, no, no, no, this is this is interactive now What's a key value store? Can you throw one out? Redis awesome key value store. Uh, there are key document stores Anybody know a key document store? What's that? Mongo. Yeah the king of the key document store and then What's that? And then their key column or stores And Cassandra is one of the key column or just I think based on the big table data table or paper that came out of AWS came out of Amazon key columner Says I want to point her to a block of data because I might want to be able to aggregate on that block Key document says I want a pointer to sort of nondescript Munch of data that's not just one thing and key value says if I get a key I want the I want the cash key or whatever back like they all have different uses But in Cassandra understanding that partition keys helped decide that size of the partition block Is a big deal High level when we're designing for this sort of availability first We're looking at the data requirements. We're identifying entities and relationships But keely We're identifying access patterns in the data. How will users query it because we want autonomy in these queries We want to talk to as few things as we possibly can Because that's how we get speed back and that's how we scale geographically This is Caught a fight into some modeling principles know your data know your queries nest your data And most importantly duplicate your data gosh if you can I'm so lucky nobody here's a big database head because they're going to hate the duplicate your data stuff It's what we do in caches all the time that right caching is duplicating data And some days if you do this on Cassandra, you don't actually have to put in Mongo and then use redis as a cache um so we Bring that together into a logical data model that results in a physical data model. This is way too deep for what we're doing Here's a logical model of things coming together the idea of a venue And a digital artifact This is actually kind of a fun process one of our guys put together where we start with the notion of what artifacts by venue is and then we sort of work in The partition key based on the queries we're getting and then we want to order how it goes to disk with clustering columns And then we decide that that order should be the other direction And then lastly We use another clustering column to get down to the idea of uniqueness In a row so we don't actually overwrite Every venue name in year because that would be pretty useless This is that same process again of a conceptual model around iot I want to I hope this is actually making some sense because i'm going through it way too fast, but I do want to stop here because again, we're in this world where We came up with a conceptual model But then we had to look and think about how we actually query and interact with the data Which which is the big lesson I want to give you And I think I made that point a couple of minutes ago, but when we're going for autonomy We need to build the tables in such a way That we don't have to do things like joins To get data back because again at scale We start to jackknife as that happens We've got great examples of data modeling again, you know Last session of the day, so I realized that this is a little bit of a side-eyed pill that people staying awake So i'm blowing through but Datastax academy two Is a site that we have where we do a ton of work on modeling and dealing with this If you are going to go availability first in your application design And I think you should think about it if you have to go of geo scale Take of the time to go through this understand the difference there was a process that I watched over and over again that was a little bit heartbreaking Where people would realize the power of apache chisandra realize that they needed it Deploy it Using what they learned from oracle or postgres or whatever and then rebuild it eight months later Because they hurt themselves very badly on it because they didn't take the time To understand really fundamentally why they were doing it. I'm going to skip through this I want to talk about the fun stuff Um, I will pause at the idea of key takeaways know your data know your queries nest data duplicate data A query should be satisfied with a single partition If at all possible Primary keys give us data uniqueness. They control how our data is distributed and they control how data is queryable There's a dark side here that you mentioned earlier, which is It can be very hard to experiment with Cassandra data because you're modeling everything off specific queries And as soon as you need to do analytics or get a different view of the data It can be really tricky. That's a thing that we can talk about in another conversation There are things like change data capture in modern Cassandra databases that make it really easy to push into stuff like uh elastic search Or drive your data. I'm a big uh Poly data person where I want a search index and I want a time series index Just like nflux was talking about before depending on the query It's not harder to build this way, but you really do have to have You know bane break your back and you go up to the hemoleus for a week and learn how to be batman all over again So what really wasn't a couple of weeks is that reference to data now? Okay I'm going to jump to the good stuff. So let's talk about repeatability because here I can give you another anecdote Um, does anybody know About ili whitney And the cotton gen does anybody know about a guy named brunel This is weird. Um In the history of interchangeable parts Every time I touch kubernetes. I think about brunel Maybe because it's a nautical analogy But in like 1830 There was a need in one of the british shipyards to produce a bunch of blocks like 30 000 of these blocks And up until that point it was Bespoke and artisanal there were people that would sit down and whittle these things out and build them Which was great because when they broke you could just throw them away And it took forever to do And brunel was a machinist who actually designed some machines To build these blocks and this is important in the history of interchangeable parts Because not only was brunel's team There's a dark side to the the labor reduction that happens in this quote But not only was brunel's team able To complete this they were able to make it better because the uniformity Of the product the ease and the celerity with which they were able to do it So blew out the old method, but the other thing that happened too Is the parts of the blocks that brunel made when you're on a ship In the ocean things break They were interchangeable you could rebuild them you understood there were no Surprises in the way these blocks were put together and that gave a lot more ability even to the sailors on the ships To adapt to problems in a way they hadn't been able before they didn't have to whittle a new block I alluded to this when we were talking earlier because I was going to go deeper into this that's kubernetes if you are not paying attention pay attention kubernetes is doing that now if you're building big distributed systems We talked about autonomy. We're talking about repeatability now. This is where it happens This is the the ad slide instead of the sign slide We'll do this for you if you want to play around with it We've got this awesome implementation of casandra on kubernetes called aster db And when you find out how expensive it is To run casandra when you want to play with it because we really tell you to run like six beefy servers because you should for real workloads Go play with astra until you're ready to actually run it on six machines because you have a workload that can't go down kubernetes leads me into something that we have been doing pretty recently and this is i'll go through this and then we'll We'll pretty much be wrapped up on this But this is the repeatability part of this presentation We have doubled down on the idea of running casandra on kubernetes A lot of people are afraid of stateful workloads on kubernetes Don't be lots of awesome things are going on around about this but a lot of the headache around Putting something like this on kubernetes is resolved when somebody gets together and writes a really nice kubernetes operator for you And that kubernetes operator knows how to take a crd that you upload to it And turn that into something useful expressed in pods On that kubernetes cluster That's kate sandra It's the kubernetes operator and the cas operator That drives casandra kate sandra is production ready It's cloud native It's apache casandra It runs on kubernetes and it's also multi cluster, which is actually just for the kubernetes nerds It's actually super cool because kubernetes is not Multi region by definition. It's not that's not what it's designed to do and it shouldn't do that Other products have been put into that space kate sandra operator if you give it the keys It'll actually build you multi kubernetes clusters casandra clusters to span the globe us east us west Europe wherever one kate sandra operator will rule them all The other thing that happens a lot when people want to go do a kubernetes deployment Is they just get the primary thing running on kubernetes and they forget that you have to come back to work on monday And it's got to continue to do things kate sandra. It doesn't just come with casandra It comes with Prometheus and grafana for monitoring observability If you guys part of it, it's very important. There's a track Pay attention to it. It comes with medusa that manages backups because you know backup and restore That's something people like to do occasionally with data It comes with reaper that Is an anti entropy mechanism for distributed data in casandra. I feel like that's either a long discussion or just trust me And we'll stay with trust me for right now. It also comes with this awesome project called stargate We have this whole slide deck of slides that I blew through that we're all in like this sequel dialect Stargate actually allows you to talk to casandra and graphql rest grpc And this is all in open source like I mean, you know come set up an account on astra or look at our data stacks enterprise, but before you do that Run it in open source and uh, don't bother to learn cql unless you are where c++ guy go I was thinking about him when I was thinking about this unless you're the c++ guy and you just got to get that speed kate sandra relies on the idea of a control plane and data planes really all i'm telling you here Is remember how I told you kate sandra can run Uh, do I need to go back so you can see the command? Okay, it is a helm chart Typically what I'd recommend is distributing the kate sandra operator with a helm chart and then When we talk about creating casandra clusters out of that kate sandra operator I'd recommend it's just a crd that you're applying. It's just a Cube cut will apply. I really want to see that like a get ops or something, but So we can we create a control plane that control plane is responsible for creating clusters It's helm install kate sandra operator. There's not much to it. Assuming you have a kubernetes cluster like eks If you've heard of it I can vouch for running lots and lots of casandra nodes on eks In fact our astro database is a service also runs on eks Once we have that controller in we can materialize a couple of data planes If you're playing with this, you don't need to do all this stuff If you're just going to do one This is the big complicated example where you're running in multiple regions. You can basically skip this step until you want to go big you can even Skip this step when you first play but we do actually have to set up some keys And some consensus between the two when you're going big And then What I really want to show is this We saw this earlier with your presentation It's just a yaml description Of a kubernetes resource. It's just a message to the operator that says Hey, man, I want A casandra data center and oh by the way, can you uh Can you actually split it across availability zones and tell casandra about that so it replicates the data in a smart way and I want eight nodes and I want to use Standard storage Or I want to use something else because I don't want to do it through abs. We've done it through abs. It's fine Um, I want to configure the heap size. Don't you don't need to do any of that I want to pass specific stuff down into the casandra config You don't need to do that because you don't have to think about the casandra config all that much anymore But when we have that in place we just deploy when I did that thousand node test I actually structured it as I'm going to do 15 nodes and then I'm going to grow that to 50 nodes And then I'm going to grow that to 150 nodes And the process of growing 15 to 50 nodes Anybody guess what it takes to requisition 35 additional casandra nodes from katsandra I updated the yaml and applied it And then casandra likes to wait three minutes between each node. So then I watched a lot of youtube videos And then I came back later And I had it and I kept doing that until I had 800 nodes and then when I had 800 nodes, I realized that I actually needed to pay attention To the subnet size and I hadn't and I'd run out of ip addresses Because the way eks uses ip addresses and then this is really fun This is either the worst thing you've ever heard or the best it was friday night It was about seven o'clock. I didn't want to bother the rest of the team for this example I had an 800 node casandra cluster I was getting near petabytes of data with the data density that I was dealing with And I didn't want to ruin anybody else's weekend So I did a helm Uninstall katsandra And I had autoscalers set up in eks Five minutes later My 800 nodes of m5 4xl like This cost more than a dollar an hour My 500 nodes five minutes later were gone And the idea of what that would have looked like 10 years ago When I would have to start calling contractors to come pull stuff out of the data center Like the flexibility of being able to do that is profound And I knew I could recreate it too and when I recreated it there wouldn't be any config drift I wouldn't have to deal with one node being misconfigured because I had three teams working on it And it would take me a thousand times three minutes To do it because of the way kassandra liked to bootstrap If you're curious about deploying a multi region cluster I feel like rags might be talking about that tomorrow. Are you talking about multi region clusters tomorrow? Multi cloud even better And and maybe a little call out too, even if you're never going to use kassandra You should take a look at this project. This is actually a really cool Implementation on kubernetes and understanding what the engineering team did here And this is all open source We like to pretend like All you have to do is helmet stalker sandra Or katsandra. This is a little bit more complicated. There's three commands really Oh, and it's an apply, but it's so shocking Compared to the days that I spent with if you guys ever gotten like a multi terminal where you could open a terminal And then echo the command of the 15 different servers Have you done that with 500 servers? Every day that I went to a client to look to see what was wrong with their 100 node kassandra cluster The first thing I had to do was write it was use a script. I didn't write it anymore That went and compared all the configs on all the servers because that was inevitably going to be the problem It's declarative with kubernetes So if we go back to those three big tenets autonomy look at availability first Don't be afraid To let go of consistency because consistency like the cake is a lie Right And repeatability look at kubernetes. It's time And observability go see some sessions tomorrow That's it That's all I got I don't have any socks, but I'll answer questions So when you're multi region you have to do the work to allow the pods to communicate with each other What's that? It's I've seen it done a couple of different ways and rags can actually probably talk to that a little bit better than I can um You but you do have to have each ip of each pod Be addressable to the other kubernetes cluster Yeah, whether you're whether you're paying for a fiber connection through somebody Or don't do this putting it on the public internet and letting them talk Or setting up some sort of vpn and ip tables Every node needs to be able to talk. One of the nice things about Eks is that it assigns Like first class. I don't know what the word for this is ip addresses through that c and i for every pod And it makes it a lot easier to do that connectivity Rags if you're curious, I bet he'll talk about it tomorrow too. He's done a ton with it. Yeah The cool side of that is we've seen a We've seen multi-cloud, but we've also seen cloud to cloud migrations With zero downtime for the application just none Because of the stuff you saw with replication and kassandra Anything else? I didn't hear any snoring, which is a good sign nothing cool I explained everything you guys are all High availability experts This is just some of the stuff i've seen as i've worked with these applications. I hope it's useful to you And it definitely is through the lens Of kassandra because that's kind of my chosen home right now But I do think it's worth considering And I do think it's gonna matter more and more as we move from silos to fabrics for our apps and our data All right, cool. Don't clap again when I say thanks Thanks already got that And I think that's the end of our day so Um, I wonder when we go outside if it'll be 72 degrees in sunny I sent my wife The dark sky image of the weekly forecast Well, it was 72 degrees every day. She was not amused She is in the swamp of central virginia That's But the reference to me, really, is civilization. Because in the civilization video game, replaceable parts. I think it was. Manufacturing of parts by stuff. I still have it. Yep. Henry Ford. Yep. And we're talking about Ford. And we're talking about Eli Whitney. But like, when I look at the history of the Bernalpen, it doesn't make a thing. Yeah. Are you around? I'm not doing anything. I'm going to the hotel pool and just being an idiot. OK. When's your session? I went to the pool. It was high. Yeah. Tomorrow at 11. I don't understand. OK. Cool. I will at least be in the front row of the audience. I don't know what to do. I might have been on here. I was single. Except new hardware. Things like that. We're not really growing. The question is, how do I put this into my cupboard? Instead of going back to the same single. It depends on what you're using in Datasaxe. Also, our Kubernetes stuff is in one frame. Are you using search or analytics? Or just pure presenter? I don't think so. I believe we should be. Hit me up. There should be contact information, but it's matt.everstreeteddatasaxe.com. There's a little bit of a bridge between open source and DSE. But I think the Kubernetes, I think the case under that we can run the DSE version. And there's just a little bit of a headache if you're going out of it. Also, let me know, because we're trying to build an argument that we basically, in a future update, need to go just put blocks underneath all the Cassandra installs in the world. Kubernetes underneath them as a control point even if they don't advertise for the customers. You're going to give me these hardware boxes. I'll put Kubernetes on and I'll put it in Mandarin. Literally, you upgrade. I don't know if it's going to happen on DSD7, but literally you upgrade to DSD7 and part of the upgrade is, oh, I can't think of a name of it, KubeLit or something along those lines that just sort of lives underneath them. They were executable. And we can manage it that way. Because then we just, some of that code that Kubernetes is going to be for us. Like why reinvent the wheel on that? But hit me up when we talk about it a little bit. And I can give you some warning on it too. All right, thank you. Can I add you on where you're going to be? Yeah. It's just matte over straight. Yep, I'll take this out of your way in here. Yeah, that's great. Unstoppable capitalist. That is certainly not me. Oh, that's mine. That's my bathroom. Okay. There's actually more. I was just going to look at your data. What's more about Kubernetes? There should be more of a model of how it works to make features out of it. So that some of the functions will be managed in data files, whether or not. And the other thing that we really want to do is sort of coordinate your things with the RFA-DSR, the RFA communication, and then they even use clusters that you can call the protocol. So we'll move it as much as we can.