 Hi, this is Yoastapil Bhartiya and welcome to another episode of T3M, our topic of this month and the topic of this month is data. And today we have with us once again, Eric Summer, CEO of Decodable. Eric is here to help you on the show. Thanks for having me back. And when we look at data, today's focus, because data is a very big topic, is mostly about building real-time data pipelines. But before we jump into the topic, of course, you folks, I mean, I've hosted you so many times, but it's always a good idea to remind our viewers what is Decodable all about. So we're a stream processing platform. You know, that means that we connect to source systems, we acquire data, we do whatever kind of processing makes sense for a particular use case, and then write that result into everything from event streaming systems to data warehouses, data lakes, and real-time OLAP database systems. I would like to understand a bit about the evolution and the origin of stream, you know, data real-time because there are a lot of terms that we use there. From the early days versus today, what was the driver, why we need, you know, streaming data, real-time data pipelines. And of course, then we'll talk about the other aspect. But I do want to understand a bit about the history origin. Stream processing and real-time data is not necessarily new. It's been around for quite some time. I think the thing that has changed over the last couple of years to kind of bring it to the fore again for people is this idea that the use cases that people are seeing are just becoming more interesting. So in retail, it's about package tracking and delivery tracking. It's about what's currently in stock and inventory management. It's about engagement of customers and logistics, again, delivery tracking and just like knowing where things are, supply chain management and those kinds of things. So I think it's really that the use cases have finally caught up to the technology and made the technology interesting for people again. And I think that that is what has brought things like Kafka and Apache Flink and all these other kinds of open source projects to the fore. And since you brought Apache Flink and open source, I want to understand a bit about of course, if you look at a lot of data-specific projects, a lot of game from Berkeley, you know, actually, I think I called them the whole mafia, most of them they came from there. So talk a bit about the importance of open source for decodable and which open source product that you folks consume and contribute to. I think these days, most enterprise data infrastructure has some either open source, open core component or a set of de facto standards that are critically important to that particular product or service. I mean, you know, Snowflake, for instance, has recently gotten support for things like Apache Iceberg as external table support. At decodable, we are heavily based on Apache Flink, which is this 10 year old project that came out of a university in Berlin that is all about, you know, the processing and real-time transformation of data. So Apache Flink and DeVizia, which is a changed and captured product for pulling data out of relational database, operational databases for things like replication. And eventually in our case, stream processing and what happens downstream in terms of the criticality of these projects and the health of the community, I think that, I mean, quite frankly, the more robust the open source project is, the more there is a community and ecosystem around this. And, you know, from the customer's perspective, this is a good thing because they're not locked into decodable or any other vendor, you know, whoever they may be with respect to their data, which gives them leverage, quite frankly, over people like us. And for us, it creates this enormous ecosystem of connectors of, you know, adjacent technologies and these kinds of things that allow us to better serve the use cases that customers are looking for. And so we are pretty active in those two communities. And then there's a bunch of what I would call, I don't want to say ancillary communities, but sort of other open source projects that are critical to us around file formats and all these other kinds of things that I think have been adopted as, like I said, de facto standards in data warehousing and data platforms in general. So when we looked at, you know, real time, you know, or, you know, stream processing is like fling the ideal choice or, you know, it is like most suitable for most use cases that decodable is targeting. Yeah. I mean, Apache Flank has been the foundation of some of the most sophisticated real time businesses. So some of the noteworthy open source users of Apache Flank are people like Uber and Lyft and Netflix, Stripe, you know, various folks use entire businesses predicated on the idea of being able to perform analysis or take action in real time, which gives us quite a bit of confidence that this really is the right platform. So Apache Flank is absolutely the de facto industry standard for real time stream processing, which is one of the reasons why we chose it. And of course, like I said, the ecosystem around it and the, you know, the support for all these other technologies is absolutely critical, I think, for both us and customers. When you look at specifically real time data by plan, how challenging it is for engineers or teams to kind of, you know, of course, build a whole strategy there or is something, you know, easier, as easy as the setting of a lamp is that server. So let's talk about how complicated or challenging it is in reality. And then we'll talk about, you know, what folks like decodable do to help them. I think a lot of people get sort of twisted up around data strategies and these kinds of things that are very abstract and hard to pin down. So, you know, having looked at hundreds at this point, maybe thousands of these kinds of data platform projects or sort of the AI ML stuff that kind of comes out the other side of it, that people are probably even more interested in, I would say, like, start at the end and work backwards. What do you want to get out of this? What is the either cost savings or revenue growth opportunity? You're like, whatever it is that you're measuring, that's like, start there and work backwards rather than boiling the ocean and thinking about data and data strategies. Cause I think like the data is a means to an end, right? Like the actual thing that's meaningful is what happens, you know, out the other side. So, you know, I think that that to me is the is the makes a lot of things clear and sort of helps people to cut through some of the busy work or sort of the the ideological debate about sort of like, how do you structure these these platforms and strategies and those kinds of things. And then I think like it is absolutely overwhelming when you look at the stack of technologies that some people are advocating for in order to just get off the ground, even with the simplest use cases. I mean, there's, you know, 20, 30, 40 boxes on a white board, you know, with arrows between them. I think that more so than anything, rallying around a small set of technologies, pick one thing for each function in a platform. And again, I would never take your eye off of what are you looking to get out of this versus, you know, the the the logo bingo card that I think a lot of data platforms wind up turning into I mean, from our perspective, you know, having a way to capture process and ingest data into analytical systems, having jobs that run either on that stream or in those analytical systems that power applications, I think is a pretty there's an easy way to distill this down without getting too wrapped up in sort of the hundreds of different logos and projects that that potentially get into this. So I think I mean, that's probably not that's probably not the most actionable advice, but but I think that, you know, from our perspective, reducing the number of platforms that people have to deal with for decodable that is stream processing, ETL and ELT and a single platform, you know, that sits between all the different systems, sort of like the network in between different applications and hosts winds up reducing the complexity of the overall platform. We think that's important for people to be able to be successful. When we talk about data, of course, one of the there are a couple of critical things that it comes to is, of course, data integrity security is important backup HR at high availability disaster recovery, because application can go and come back with data if that's gone, that's critical. Talk a bit about the scope of decodable when it comes to once again, security or other aspects associated with data. Decodable's role in this, like I said, is to really say in between a bunch of different systems. And so when you think about business continuity and disaster recovery and availability, typically, there's a couple of things. One is what's the durability of this data? Once it's captured, is it going to be around if the lights go out, you know, and then secondarily, if an entire cloud availability zone or region disappears? And I think that, you know, for us, that means capturing the data that means persisting in a way that is resilient to temporary failures, you know, think that we will eventually recover from and then facilitating the transfer of that data and the fanning out, you know, from one cloud region to potentially multiple cloud regions, such that if you lose a region or availability zone that you actually have that data. It's a relatively sophisticated set of constraints to solve for. And I will say that it's always like most things a cost versus risk profile tradeoff, right? So it really depends on the criticality of the data, the use cases. I think if you're talking about purchase and transaction data, this data is critical for like a retailer, then there might be other kinds of data where you're not necessarily willing to make that investment. So I do think that you need sort of a framework for thinking about these kinds of things. For us, the way we help facilitate that is we are the system that is reliably capturing this data and then not just processing it, but moving it around between, like I said, various availability zones and regions of the cloud case. So we can be a system, I won't claim to be like a backup or disaster recovery solution, but we can be part of a larger strategy around sort of how do I deal with the loss of, you know, you know, U.S. East One in Amazon or something like that, you know, those kinds of situations. And sometimes it's not just disaster recovery, just like global businesses that have many, many regions where plenty of presence throughout the globe and need to centralize parts of that data for analysis in the system like Snowflake or Databricks for analysis and model training and those kinds of things. So I think that all of those kinds of things are working in concert to come up with a larger architecture, deployment architecture around, you know, how do you structure and get a platform that is serving not just the use case, but solving for some of these other concerns around security and disaster recovery and those kinds of things. What kind of cultural shift you're saying when it comes to data? It's a really interesting question. I think, you know, look, you know, at least half of this challenge, you know, around data and data infrastructure is a people problem, not a technology problem. And I think the big shift that we are seeing that I think is consistent with the larger trend in enterprises generally is this notion of self-service, you know, and that means, you know, enabling the people who need to do the analysis that needs to train, build and train the model to be able to acquire the data that they need, assuming they have the rights to do so, acquire the data they need, when they need it, without having to hop between team boundaries and file tickets and fill out forms and those kinds of things in order to get their job done. And, you know, for us, that means making things like relatively sophisticated stream processing available in the languages, in this case probably SQL, or a programming language that the team that knows what they're trying to do with this data, you know, that they know that they can work in without having to, like I said, hop between team boundaries. So I think this notion of self-service is the difference between projects that take, you know, days and weeks and months and years, right, and I think that to me, and obviously, like there's the ML AI trend and, you know, all of the sophisticated analysis that people are doing around this data, I won't claim to be an expert in sort of the analysis. I mean, I'm more on the infrastructure and data platform side, but I think that in order, I mean, we know that those people get the best results out of those efforts when they are as close to the data as humanly possible, and that means being able to give them the right tools to get their job done. And I believe that that's largely centered around this notion of self-service. What advice do you have, how these efforts of all approach data, and not only approach, hey, do we really need real-time data pipelines or not, but actually, too, once again, as you said, they're buzzword-like, data strategy and things like that, but they should approach it appropriately so that, irrespective of, because eventually everybody will be using some form of generative AI or data there. So not only they are prepared, but they're also, once again, more efficient with whatever they are doing. I think that this is gonna be largely centered around the organizational component of it and then the technology component of it. And I think from a technology component, obviously I have a horse in this race and so I'm a little bit biased, but I mean, I think that having a small set of powerful primitives to have a data platform that is comprised of real-time ingest and processing, obviously some kind of analytical database or system for sort of the historical storage and processing of data and then whatever tools around that, whether it's programming languages and notebooks and model training and like all the other sort of data quality and governance tooling that you need around that, I would say make sure that you understand how all those pieces fit together. Not just like logos on a whiteboard, like I said, but really and truly understand how the team is going to be able to work with those tools in order to accomplish what they're trying to accomplish on the other side. And then I think on that organizational side, it's really about enabling self-service, making sure that people are really clear about what the goals are, not just what we're asking them to do on a daily basis, but like ultimately what they're gold around so that they have a really solid understanding of why they are doing what they're doing so that when they go to actually do that work, they can make reasonable trade-offs around all of the different attributes and especially focus on the most important parts, whether it's data quality, whether it's timeliness, all the other drivers that are gonna be super critical in order to make any particular project successful. So if you're a chief data officer or CIO, I mean, those are the kinds of things that I would be thinking about. I think that the technology and the vendors that you bring in to facilitate that are probably the easy part. Of building that, obviously, like I said, I mean, I think we're a big part of that and there's plenty of other technologies that sort of go along with that. But that's sort of how I look at it and where I've seen people be the most successful. Eric, thank you so much for taking time out today and of course talk about data, Flink, of course, decodable. And as a result, I would love to have you back on the show. Thank you. Really appreciate it. Thanks so much.