 Perfect, so I'm gonna go ahead and get started. So we're gonna be going over the time series tech stack for the IoT Edge, so really quick. My name is Zoe Steinkamp, I'm a developer advocate for Influx data. Before I became a developer advocate, I was a front end software engineer, mainly working in React. Some interest outside of coding and technology, it would be traveling, cycling, cooking, and gardening. You can ask me about any of these and I'd probably talk your ear off. So first things first, we're going to do a bit of a dive into the Influx DB platform, mainly because we need to explain how this works with Edge. Oh, there's someone washing the window. Sorry. First I thought a pigeon hit it. So what is time series data? So time series data is any data that is time stamped in some way, whether that be every hour, every minute, or every second. More specifically, time series data comes from many sources. This could be in a physical space like devices and sensors on possibly a factory floor or in a field or on a vehicle or even in space. From software applications, including VMs, containers, microservices, applications, storage and compute resources, et cetera. And from actual applications, be that crypto, financial, stock markets, agriculture, energy, it's very broad. Increasingly, it is the combination of multiple data sources that brings the intelligence to these applications. So when we normally describe Influx DB, we normally do describe it as a platform. Technically, most people know us for our database, but one thing to know is we have our API and tool set for real-time applications. We have our time series engine for real-time data workloads and finally our community and ecosystem. So when it comes to our platform, we normally break it out into these, as you can see, three architectures right here. The first one being our data sources. So that's going to be the actual devices that transmit time series data, as I just mentioned a few before. Then we have the actual core Influx DB platform. That includes Telegraph, which is our open source ingest agent. I'm going to talk a little bit more about that in a second. Our client libraries, which allow you to write in your preferred coding language to get your data up into Influx DB and then also native ecosystems as well. Finally, from there, you have the ability to start collecting, transforming, down-sampling, alerting off this data and that can go onward to be application workflows, infrastructure insights, and IoT actual actions. So as I just mentioned, we have our Telegraph plugin. It has over 300 plugins available and it's completely open source. We normally like to say that we're the caretakers of Telegraph. In fact, it's used by quite a few of our quote-unquote competitors. We'll use this, but pretty much every single plugin that we have is taken care of by its original owner, whether that be a company, a platform, et cetera. There's a few in here that we take care of ourselves, a few more fun ones like Minecraft or CSGO. Those were built by our engineers in-house who just really wanted to monitor their gaming. But for the most part, these are taken care of so they stay updated, et cetera. Flux, so when it comes to actually querying around your data in our platform, we have Flux, which you can imagine. Flux is to Influx DB, what SQL is to MySQL. It's the same thing, it's just a language for actually doing the work that you need to do on your data. Flux allows us to work more efficiently, specifically on time series data. It runs at the DB level as well, so it doesn't require any third party or client application tools to actually query this data. These are just a few examples of some of the things that you can do. The ones I normally highlight would be things like custom aggregation, which can allow you to change the window of time you're viewing for the data. You can also define custom functions yourself like running through tables. Some of you might actually recognize if you've done SQL development, those would be the joins, the map, and the pivot. And finally, more advanced features like covariance and smoothing, which is a little bit more useful with data science use cases. This is what I mentioned before, this is just the client libraries. These are the languages that we happen to support to help you upload your code and the language that you feel comfortable coding in. Task, so tasks are actually pretty important. I'm gonna really quick kinda go into what these are cause this is actually a big core part of edge to replication or at least it used to be. So with task, what this allows you to do is mainly clean up your data, as we like to call it. That means that maybe you wanna drop per, say, columns, you wanna drop information, you want to aggregate that information, specifically when it comes to aggregating, we normally call it down sampling when you archive historical data. When it comes to time series and IoT devices especially, you'll find that they send a lot of data, it's down to the milliseconds that exponentially gets quite large over the course of an hour, heaven forbid a day or multiple days. So what a lot of people will do is they will down sample their historical data for storage. They'll say, I don't want to know every microsecond detail, I just need to know roughly in an hour what this value was. And so for that, they'll go ahead and store that. Maybe a week later, they'll down sample it even further for better storage for a smaller data set. And sometimes they also just drop it all. They just say, drop everything that's 24 hours old, get rid of it. And that's also something that down sampling and tasks can do. So checks and alerts. So checks and alerts is exactly as it says here. This is the UI for it. It's a GUI that makes it easier. But basically you can check on things. Specifically you can do a threshold or a dead man. Dead man is as morbid as it is. It means that your data source is dead. It just literally is monitoring to make sure things are coming in. Thresholds are more what you guys would be familiar with which is things like, if your IoT sensor goes above or below a value threshold, please let me know. Set it to warning, critical, or maybe even just info. Maybe you just want to know about what's happening. From there you pick an endpoint. We offer pager duty, Slack, as well as HTTP, which obviously HTTP pretty much opens the door for any type of application. And from there you just set a rule, which basically just says, please alert me if it goes from info to warning or warning to critical. It's pretty straightforward overall. This is that exact same use case but with Flux. I just wanted to show that you could also do this in the non-UI basically. Now one thing to note about this is it does give you a lot more functionality obviously because you can, as I've mentioned before, use Flux to kind of clean up the data and do a few other analyzing and manipulating, et cetera. We also have a lot more endpoints when you do it this way. Dashboard creation, so this is gonna be visualization. For a lot of you guys it will probably kind of remind you of Grafana. That's a lot of people actually do use InfluxDB with Grafana. This just happens to be our in-house visualization. We also have dashboard creation. Again, this is probably gonna remind some of you guys of dashboards like when you're monitoring a server. It's a very similar thing except for the differences. We don't normally, we have a few out-of-the-box dashboards but for the most part it's custom built. A lot of people like that, especially if they're monitoring things that require custom dashboards. We also do have what we call the notebooks which are basically more interactive. A dashboard is something you would visit every day and you wanna see the same answer to the same question. You can think of notebooks as you are playing with the data and you want to answer different questions and you wanna kind of workflow through it. It's also a lot better if you have, you know, multiple coworkers and such. These are just some of the basic visualization types. Not gonna go super in-depth into this. They're pretty straightforward. And here's another visualization type that's a little bit more advanced that we offer. Things like maps and histograms. So let's go a little bit deeper into the actual tech stack for edge data into the cloud now that we have a brief overview of the platform. So really quick, some advantages from doing edge to cloud replication. The biggest edge advantages are going to be reducing bandwidth cost of sending high fidelity data to the cloud. And number two is going to be network resilience for intermittent failures and connectivity. Imagine, you know, if you're in a remote region or things only get uploaded every couple of, you know, hours, that's pretty much what we're talking about here. This hybrid solution provides the flexibility to also move some of those more mundane tasks such as down sampling to the edge. As I just mentioned before, this can be great if you don't want all of your data being stored and you actually already know that you wanna start cleaning it up, you could do it at the edge instead of doing that at the cloud and it could save you time, money, just makes it a little bit easier. So here are some stacks that you would find in the cloud. Some of these are gonna be things like Foglamp and EdgeX Foundry, which are IoT edge frameworks being built to help control running the devices as well as standardizing API and protocols. They also both have APIs for retrieving data and sending it northbound to a system like Inflex DB. Ditto creates digital twins in the cloud. You can use Inflex DB for data replay with this. It also includes running data simulations on a virtual machine. For outbound, we have a few options which I've kind of already mentioned, but obviously you can use Grafana, Python and Pandas. We actually have a library for that one in particular. And then a very straightforward one being Node.js React and Giraffe, the whole. That would be something like you might see in like a phone app or a website. When it comes to stacks on the Edge though, so some of the stacks will include the TIG stack, which is what we call Telegraph, Inflex, DB and Grafana. They're used so often together that they actually have their own little name. Teguido, which is not Telegraph, it's a Docker Compost script that sets up Telegraph and an MQTT broker to handle that common setup. It is on top of the TIG stack, which is why it has a similar, in my opinion, kind of funny name. The client libraries, which is what I mentioned earlier, those are gonna be the ones where you can code it up in the language of your choice. Probably most commonly, Python is normally what we work in. EdgeX plus Inflex DB plus Grafana. These next three are going to be, so EdgeX, AWS Greengrass and Azure IoT. All of these are frameworks to get data from Edge sensors up to the cloud. And they all use Inflex DB on their Edge gateways to provide a local dashboard. I'll be going more in depth actually on the final option though, which is specifically Inflex DB open source to the cloud. So just, I'd realized I actually hadn't mentioned this. Just a reference, we are completely open source. You can do everything on the open source. We do also have a cloud option. It's completely free. It's basically pay as you go, but it's free obviously when you start and as long as you keep yourself below a certain level. So that's the one we're gonna be mainly going through. So this is a very straightforward example of OSS to cloud. I do wanna clarify that as of like a week ago, this is considered the old way of doing Edge to cloud replication on our platform, but this will be the example that you see a lot of people using. So I'm gonna go over it really quick. The big thing with this is you're basically just going to need obviously a cloud account. After you sign up, you basically just create a bucket, create a token, make sure that the token that you're using has right permissions. And from there, you just scope your destination bucket with a task to basically say, hey, send this data from here and send it on up. So yeah, as you can see here, we also are happening to filter this. So see, we've got a range of 10 minutes. We want these two data. And then we just wanna push that on up to the AWS cloud. For this, we've just put, you know, random gibberish for the or ID token, et cetera. But again, this is the old way that people used to do it. There are some limitations to this, obviously. It does offer an easy way to write data from the edge of the cloud. But the big thing is it's sent over HTTP. There's no built-in functionality for failure handling, which is not great, I can assure you. And there is also no built-in functionality for batching, retries or paralyzation. The two function really should only be used to consolidate data from OSS to cloud. If you're meeting these conditions, you intend to down sample your data first before writing it to limit the size of the request. You have a very small amount of devices, maybe only one or two sensors. You can't really use this for large workflows. It's not gonna work. It will probably actually cause a network failure, which is something that we saw quite often when people did this. And you aren't trying to overcome writing a large amount of data with micro-batching and generating a really high request count. Kind of similar storyline there. But as I said before, as of this month slash this week, we are now excited to announce the availability of Edge Data Replication that we've been working on for, I swear, since during COVID. So with that, this solution, it unifies time series data processing between Edge and cloud environments so developers can provide consolidated intelligence across widely distributed environments. So let me go ahead and show some of this. So specifically what this does is it is durable diskback hues and buffering to withstand planned and unplanned disruptions in network connectivity. In addition, this feature configures replication at the bucket level, so developers and operators can precisely define which sets of data to replicate and where to send them. This replication also happens on right when time series data arrives at the Edge and matches a replication rule. OSS immediately mirrors the data to the remote bucket defined in the configuration. When influx DB OSS cannot send the data to the remote instance, all events start to buffer in the local durable queue until the connectivity is restored. The size and retentable of the durable queue are configured are configurable. Sorry guys, I've got a lot of words that kind of jumble. So these are just some industry use cases that we were talking about when we originally started developing this months ago and I just wanted to put this up here. So basically, as you can see here, if you're dealing with something like retail, this would be inside of stores, last prevention devices, you know, this can have intermittent internet connectivity. I can tell you right now, my husband used to work for like Chipotle. They always seemed to lose internet just randomly and their cash registers didn't work, not great. Manufacturing, that's obviously pretty straightforward. A lot of the time manufacturing can be done in actually kind of remote regions and in general, thought necessarily super well connected and then banks as well. That's gonna be things like ATMs, customer mobile apps, which obviously will come in and out, et cetera. Tangibly though, to get more into the in-depth features of this and how it's gonna work, we have two new API endpoints, remotes and replications, and then we have two new CLI commands, which is also happily named remote and replication. Each replicated bucket also, as I've already said before, gets that dispatch queue for buffering data safely in case of the disruptions. So this is just a very beautiful little design here that I built up, so as you can see, Telegraph puts the data into the influxDB OSS. From there, we have our disk with our buckets and our queue and we also have flux. Flux is kind of an optional thing, that's if you're down-sampling in particular, which I'll go into. And then obviously from there, it starts to upload this into influxDB cloud. And I'll actually show the commands for the CLI. I'm not gonna go as much in-depth for the API, but they're pretty much the same, they're relatively, you know, they're doing the same stuff, it's pretty straightforward. So as I just mentioned, let's go ahead and get started into the setup. So this is the influx remote create. So with this, I'm just going to clarify that, I'm assuming that you have an influxDB OSS instance running in a cloud account, that's the use case for this, that's what we're going into. In your local influxDB OSS instance, you would use this influx remote create command to create a remote connection to replicate your data. You have to provide the following, you need a remote name, the influxDB URL, an API token. It has to have write access, otherwise it cannot actually write to anything if you just give it read access that doesn't know good. And obviously just your organization. This is the replication create. So with this one, it actually creates the replication stream. And with this, you're just gonna have to give it the name, your remote connection ID, your local bucket ID. So that's the one on your OSS instance that it writes from. And then your remote bucket ID that the replication will write into. Once the replication stream is created, influxDB OSS will replicate all writes to your specified bucket, to the one on your cloud, the one we call the remote influxDB bucket. And then you can also use the influx replication list command to view information such as your current queue size, your max queue size, as well as your latest status code. So some important things to note here. Only write operations are replicated, other data operations like deletes or restores are not replicated. In influxDB OSS, large write request bodies are written entirely. When replicated, write requests are sent to the remote bucket in batches. The maximum batch size being currently 500 kilobytes. This may result in scenarios where some batches succeed and others fail. Which we do actually have a task option for basically being like, hey, if this last thing failed, go ahead and go back to the last failure. I'm not gonna go in depth into that, but I just wanna let you guys know that we do actually have some failure handling already built in. In some cases, as I've mentioned before, you might not want to write that high, that raw high precision data to a remote influxDB instance or cloud instance. So to replicate any down sampled or processed data, you basically just, it's the same thing where you just create your bucket in the instance to store your down sampled or processed data in. Then you create the influxDB task that down samples or processes that data and then stores it into that new down sampled on your OSS instance. Getting the data, it's getting it raw. On the OSS instance, you still have another bucket called down sampled or you can name it whatever you want. That's where you're actually running a query like this where you're asking it to start to break it down, do the task instead of every second. This one is aggregating it down to every 10 minutes. And then from there, you wanna replicate that down sampled bucket. Now in theory, you could still obviously create a replication stream for the original data too. But the whole point in this is obviously to kind of make it a little bit more stream form to make it a little bit easier. So with this, you can see how these lines of code make a lot of the difficulties as it used to be a lot easier for people. It's a lot less writing. It's a lot more failure proof. It's not going to be as difficult for you to deal with intermittent connectivity issues. Basically, this really helps solve a lot of the big problems that we deal with when it comes to IoT edge to cloud. So I'm going to really quick go over an IoT edge example. One sec. So this is the architecture diagram for what is currently our plant buddy demo sitting back at our booth. Basically, we have a project where we want to monitor the health of, for this example, a household plant. And specifically, we are monitoring this data at the edge because when we go to conferences, we tend to lose wifi connectivity. And then we lose our data for our plant. So we needed something for edge to cloud replication. So what we also happen to do though is we also happen to down sample our data too before we send it into our cloud account. So basically to break this down, we have our plant and with that it has a microcontroller that basically has about four sensors on it. From there, we're using a telegraph to get it into our OSS instance. And we're actually running a flux down sampling on the moisture check. We bought these really cheap sensors off Amazon and they give us a bunch of gibberish data. So we like to clean them up a bit. Also, we just don't, we don't really want to store it down to the second. I'm sorry, but the household plant just doesn't require that. So we tend to aggregate it down to more like a minute or every 10 minutes. I luckily as an employee, basically have an unlimited free account, but I think even I'm reaching my limits at this point. But with that from there, we basically put into a down sample bucket on that OSS edge. And it stays in the replication queue. So when we do lose data, as we tend to here at the conference or just in general, like, I've obviously disconnected my laptop from the device and it's all this stuff. From there, we send it on up to the Influx DB cloud and we actually have it going to a web application right now, where we have, so we've actually, we have some alerts and notifications on it too. Like we're alerting through Twilio for moisture notification to let us know if the plant gets too dry or too, well, it's never too wet because we never water the poor thing. So it's always too dry. And basically from there we have our UI for it. I'm going to really quick do a plug that you can come and see this at our little booth. There's the plant and there's the ridiculous amount of cords and sensors attached to it. It's a lot of fun to see this in real life. And this is actually inspired by one of our clients who is doing this at a much bigger scale. They are specifically a, I'm going to call it a farm. It's kind of a farmish factory in New York that produces lettuce and tilapia together. That's obviously a very intense system. The fish feed the lettuce, the lettuce are not fed to the fish, they feed humans. But basically with that, they have a ton of IoT sensors and their farm tends to occasionally lose interconnectivity. It's just not well connected. I'm sorry, but farms are just kind of dead zones I guess, even ones in New York. So with that, they're using edge to cloud and they're also specifically down sampling because like us, they're dealing with IoT sensors that send them a lot of gibberish that they do not desire and they also don't need it down to the millisecond either. Another quick one that I wanted to go ahead and mention is a company called Bebox. So they are specifically out in, I do believe Africa if I remember correctly, but basically they have solar panels geographically dispersed. They have about 85,000 solar rooftops that are providing power to these very small villages and remote parts. From what I understand, they occasionally get internet connectivity from satellites, like kind of like the Tesla one or just occasionally reconnects to the main grid. But basically they're dealing with a lot of, a lot of loss of data originally. They were already doing edge to replication before, obviously like we started to do this more seriously till we built out everything for it, but they've been doing this for over 10 years. They've already got a few solutions, but I still like to mention the fact that this is an ideal use case. Obviously for some people, in general though, I do like to think about when it comes to edge to cloud, it's always good to somewhat store the edge because you might just have internet connectivity or just data loss just in general, just through life. Even if that's not a common problem that you deal with currently, it could be an issue that you have and it would suck to lose your data. So this is an overall great use case for that. So I do wanna mention some further resources as well. So as I've already talked about many times in this, this is actually where you can go ahead and find the cloud account is that the getting started. That would be where you actually create your inflex DB cloud. We also have our community forms as well as our Slack. So the community forms are more something that you could like search through if you're having any issues. We also have our Slack community is a lot more active. A lot of our employees are on there. We do, we're fully open source. So obviously on our GitHub, we get a lot of issue requests or things filed, but plenty of people just come to us on Slack to ask questions or make comments or even complain. We're open to all. We don't discriminate. And we also have a book. It basically goes more in depth as to how the time series database was built out like some of our core, I'm gonna call it theater reticle components, just some of the ideas on how this was built out, how we're building going forward, our actual docs, which go more in depth. And also that is where I grabbed the code commands for the create and replication stream that is inside of our docs now. We have blogs where we write about these kind of things, including this, and just in general, some of our client use cases, you know, all that good stuff. And then finally, Influx DB University, which is our brand new free university course where you can learn more about different components of the platform. And yeah, I managed to end this a little bit early, which is fine. I'm sure no one minds a little bit of extra time back. If you guys have any questions, you're welcome to ask. And like I said, I'm actually not gonna end on this slide, I don't like it. Go ahead and come back to our booth, if you have any more. We also do have some really awesome socks and stickers. And like I said, you can see a little bit more of the plant buddy. We actually have all the code on GitHub, specifically just to show off. I didn't wanna go super in depth into per se that code here, but I can do it at the booth. But yeah, thank you all for coming.