 This past May, the cube in collaboration with influx data shared with you the latest innovations in time series databases. We talked at length about why a purpose built time series database for many use cases was a superior alternative to general purpose databases trying to do the same thing. Now you may remember that time series data is any data that's stamped in time. And if it's stamped, it can be analyzed historically. And when we introduced the concept to the community, we talked about how, in theory, those time slices could be taken every hour, every minute, every second, down to the millisecond. And how the world was moving toward real time or near real time data analysis to support physical infrastructure like sensors and other devices and IoT equipment. Time series databases have had to evolve to efficiently support real time data and emerging use cases in IoT and other use cases. And to do that, new architectural innovations have to be brought to bear as is often the case. Open source software is the linchpin to those innovations. Hello and welcome to evolving influx DB into the smart data platform, made possible by influx data and produced by the cube. My name is Dave Vellante and I'll be your host today. Now in this program, we're going to dig pretty deep into what's happening with time series data generally and specifically how influx DB is evolving to support new workloads and demands and data and specifically around data analytics use cases in real time. Now first, we're gonna hear from Brian Gilmore, who is the director of IoT and emerging technologies at influx data. And we're gonna talk about the continued evolution of influx DB and the new capabilities enabled by open source generally in specific tools. And in this program, you're gonna hear a lot about things like rust implementation of Apache Arrow, the use of parquet and tooling such as data fusion, which are powering a new engine for influx DB. Now these innovations, they evolve the idea of time series analysis by dramatically increasing the granularity of time series data by compressing the historical time slices, if you will, from, for example, minutes down to milliseconds and at the same time enabling real time analytics within architecture that can process data much faster and much more efficiently. After Brian, we're gonna hear from Anna East Dodis Giorgio, who is a developer advocate at influx data. And we're gonna get into the whys of these open source capabilities and how they contribute to the evolution of the influx DB platform. And then we're gonna close the program with Tim Yochum. He's the director of engineering at influx data and he's gonna explain how the influx DB community actually evolved the data engine in mid-flight and which decisions went into the innovations that are coming to the market. Thank you for being here. We hope you enjoy the program. Let's get started. Okay, we're kicking things off with Brian Gilmore. He's the director of IoT and emerging technology at influx data. Brian, welcome to the program. Thanks for coming on. Thanks, Dave. Great to be here. I appreciate the time. Hey, explain why influx DB needs a new engine. Was there something wrong with the current engine? What's going on there? No, no, not at all. I mean, I think it's, for us, it's been about staying ahead of the market. I think, you know, if we think about what our customers are coming to us sort of with now, you know, related to the requests like SQL, you know, query support, things like that, we have to figure out a way to execute those for them in a way that will scale long-term. And then we also, we want to make sure we're innovating, we're sort of staying ahead of the market as well and sort of anticipating those future needs. So, you know, this is really a transparent change for our customers. I mean, I think we'll be adding new capabilities over time that sort of leverage this new engine. But, you know, initially the customers who are using this are going to see just great improvements in performance, you know, especially those that are working at the top end of the workload scale, you know, with a massive data volumes and things like that. Yeah, and we're going to get into that today and the architecture and the like, but what was the catalyst for the enhancements? I mean, when and how did this all come about? Well, I mean, like three years ago, we were primarily on premises, right? I mean, I think we had our open source, we had an enterprise product, you know, and sort of shifting that technology, especially the open source code base to a service basis, we're hosting it through, you know, multiple cloud providers. That was a long journey. I guess, you know, phase one was, you know, we wanted to host enterprise for our customers. So we sort of created a service that we just managed and ran our enterprise product for them. You know, phase two of this cloud effort was to optimize for like multi-tenant, multi-cloud, be able to host it in a truly like SaaS manner where we could use, you know, some type of customer activity or consumption as the pricing vector, you know, and that was sort of the birth of the real first influx DB cloud, you know, which has been really successful. We've seen, I think like 60,000 people sign up and we've got tons and tons of above enterprises as well as like new companies, developers and of course a lot of home hobbyists and enthusiasts who are using out on a daily basis, you know, and having that sort of big pool of very diverse and varied customers to chat with as they're using the product, as they're giving us feedback, et cetera, has, you know, pointed us in a really good direction in terms of making sure we're continuously improving that and then also making these big leaps as we're doing with this new engine. All right, so you've called it a transparent change for customers. So I'm presuming it's non-disruptive, but I really want to understand how much of a pivot this is. What does it take to make that shift from time series specialist to real time analytics and being able to support both? Yeah, I mean, it's much more of an evolution, I think, than like a shift or a pivot, you know, time series data is always going to be fundamental and sort of the basis of the solutions that we offer our customers and then also the ones that they're building on the sort of raw APIs of our platform themselves. You know, the time series market is one that we've worked diligently to lead I mean, I think when it comes to like metrics, especially like sensor data and app and infrastructure metrics, if we're being honest though, I think our user base is well aware that the way we were architected was much more towards those sort of like backwards looking historical type analytics, which are key for troubleshooting and making sure you don't, you know, run into the same problem twice. But, you know, we had to ask ourselves like, what can we do to like better handle those queries from a performance and a time to response on the queries? And can we get that to the point where the results sets are coming back so quickly from the time of query, that we can like limit that window down to minutes and then seconds. And now with this new engine, we're really starting to talk about a query window that could be like returning results in milliseconds of time since it hit the ingest queue. And that's really getting to the point where as your data is available, you can use it and you can query it, you can visualize it, you can do all those sort of magical things with it. You know, and I think getting all of that to a place where we're saying like, yes to the customer on, you know, all of the real time queries, the multiple language query support. But, you know, it was hard, but we're now at a spot where we can start introducing that to, you know, a limited number of customers, strategic customers and strategic availability zones to start, but, you know, everybody over time. So you're basically going from what happened to, and you can still do that obviously, but to what's happening now in the moment? Yeah, yeah. I mean, if you think about time, it's always sort of past, right? I mean, like in the moment right now, whether you're talking about like a millisecond ago or a minute ago, you know, that's pretty much right now, I think for most people, especially in these use cases where you have other sort of components of latency induced by the underlying data collection, the architecture, the infrastructure, the, you know, the devices and, you know, the sort of highly distributed nature of all of this. So, yeah, I mean, getting a customer or a user to be able to use the data as soon as it is available is what we're after here. I always thought of real time as before you lose the customer, but now in this context, maybe it's before the machine blows up. Yeah, I mean, it is operationally or operational real time is different. You know, and that's one of the things that really triggered us to know that we were heading the right direction is just how many sort of operational customers we have, you know, everything from like aerospace and defense, we've got companies monitoring satellites, we've got tons of industrial users using us as a process historian on the plant floor, you know, and if we can satisfy their sort of demands for like real time historical perspective, that's awesome. I think what we're gonna do here is we're gonna start to like edge into the real time that they're used to in terms of, you know, the millisecond response times that they expected their control systems, certainly not their historians and databases. Is this available, these innovations to influx DB cloud customers only? Who can access this capability? Yeah, I mean, commercially and today, yes. You know, I think we wanna emphasize that's for now our goal is to get our latest and greatest and our best to everybody over time, of course. You know, one of the things we had to do here was like we doubled down on sort of our commitment to open source and availability. So like anybody today can take a look at the libraries and on our GitHub and, you know, can inspect it and even try to, you know, implement or execute some of it themselves in their own infrastructure. You know, we're committed to bringing our sort of latest and greatest to our cloud customers first for a couple of reasons. Number one, you know, there are big workloads and they have high expectations of us. I think number two, it also gives us the opportunity to monitor a little bit more closely how it's working, how they're using it, like how the system itself is performing. And so just, you know, being careful, maybe a little cautious in terms of how big we go with this right away, just sort of both limits, you know, the risk of, you know, any issues that can come with new software rollouts. We haven't seen anything so far, but also it does give us the opportunity to have like meaningful conversations with a small group of users who are using the products. But once we get through that and they give us two thumbs up on it, it'll be like open the gates and let everybody in. It's going to be exciting time for the whole ecosystem. Yeah, that makes a lot of sense. And you can do some experimentation and, you know, using the cloud resources. Let's dig into some of the architectural and technical innovations that are going to help deliver on this vision. What should we know there? Well, I mean, I think foundationally, we built the new core on Rust, you know, this is a new, very sort of popular systems language. You know, it's extremely efficient, but it's also built for speed and memory safety, which goes back to that, us being able to like deliver it in a way that is, you know, something we can inspect very closely, but then also rely on the fact that it's going to behave well and if it does find error conditions. I mean, we've loved working with Go and, you know, a lot of our libraries will continue to be sort of implemented in Go. But, you know, when it came to this particular new engine, you know, that power performance and stability or Rust was critical. On top of that, like we've also integrated Apache Arrow and Apache Parquet for persistence. I think for anybody who's really familiar with the nuts and bolts of our backend and our TSI and our time series merged trees, this is a big break from that, you know, Arrow on the sort of in-mem side and then Parquet in the on-disk side. It allows us to present, you know, a unified set of APIs for those really fast, real-time queries that we talked about, as well as for very large, you know, historical sort of bulk data archives in that Parquet format, which is also cool because there's an entire ecosystem sort of popping up around Parquet in terms of the machine learning community, you know, and getting that all to work, we had to glue it together with Arrow Flight. That's sort of what we're using as our RPC component, you know, it handles the orchestration and the transportation of the columnar data. Now we're moving to like a true columnar database model for this version of the engine, you know, and it removes a lot of overhead for us in terms of having to manage all that serialization, the deserialization and, you know, to that again, like blurring that line between real-time and historical data. It's, you know, it's highly optimized for both streaming, micro-batch, and then batches, but true streaming as well. Yeah, I mean, it's funny, I mean, you mentioned, Russ, it's been around for a long time, but its popularity is, you know, really starting to hit that steep part of the S-curve. And we're going to dig into more of that, but give us, is there anything else that we should know about, Brian? Give us the last word. Well, I mean, I think first, I'd like everybody sort of watching just to like take a look at what we're offering in terms of early access in beta programs. I mean, if you want to participate or if you want to work sort of in terms of early access with the new engine, please reach out to the team. I'm sure, you know, there's a lot of communications going out and, you know, it'll be highly featured on our website, you know, but reach out to the team, believe it or not, like we have a lot more going on than just the new engine. And so there are also other programs, things we're offering to customers in terms of the user interface, data collection and things like that. And, you know, if you're a customer of ours and you have a sales team, a commercial team that you work with, you can reach out to them and see what you can get access to because we can flip a lot of stuff on, especially in cloud through feature flags. But if there's something new that you want to try out, we'd just love to hear from you. And then, you know, our goal would be that as we give you access to all of these new cool features that, you know, you would give us continuous feedback on these products and services, not only like what you need today, but then what you'll need tomorrow to sort of build the next versions of your business. Because, you know, the whole database, the ecosystem as it expands out into, you know, this vertically oriented stack of cloud services and enterprise databases and edge databases, you know, it's gonna be what we all make it together, not just, you know, those of us who were employed by InfluxDB. And then finally, I would just say, please like watch NIS and Tim's sessions. Like these are two of our best and brightest. They're totally brilliant, completely pragmatic. And they are most of all customer obsessed, which is amazing. And there's no better takes, like honestly on the sort of technical details of this than there's, especially when it comes to like the value that these investments will bring to our customers and our communities. So I encourage you to, you know, pay more attention to them than you did to me for sure. Brian Gilmore, great stuff. Really appreciate your time. Thank you. Yeah, thanks Dave. It was awesome. Look forward to it. Yeah, me too. I'm looking forward to see how the community actually applies these new innovations and goes beyond just the historical, into the real time, really hot area. As Brian said in a moment, I'll be right back with Anais Dota's Giorgio to dig into the critical aspects of key open source components of the inflex DB engine, including rust, arrow, parquet, data fusion. Keep it right there. You don't want to miss this.