 Hello and welcome to the Cube and the Analyst Angle. I'm so excited to be here today talking to you. I'm Rob Strecce. I'm one of the lead analysts here with the Cube and Silicon Angle media. And we're gonna be talking about data products, governance and the modern data platforms and how this is really shifting around. And I'm really excited to be joined by Lior Gavish who's the CTO and co-founder of Monte Carlo. And we're gonna have some fun discussions about this and really get into how the modern data stack and governance and all of these pieces are moving around and why it's important to understand where that all is and have that observability inside there. So thank you for joining me. Thanks, Rob. It's great to be here. Yeah, so why don't we jump into it? Both of us have been in this space for quite a bit, you for even longer and talking to a lot of customers. What we see is that there are different pieces of the modern data stack and it keeps evolving. You have Databricks, buying people like Mosaic ML, there's more AI, you have everybody talking about LLMs. And I think one of the things that seems to be really confusing is what is the modern data stack and how is it evolving? How are you seeing it from the customers that you talk to? So from my perspective, the modern data stack is a set of tools, mostly cloud based that really started maybe 10 years ago and allowed companies to handle data and get value out of it in a way that's never been possible before. Basically putting a lot of data on the cloud, transforming it, aggregating it, manipulating it in various ways and then producing various kinds of data products, if you will, out of it. And the event of the cloud and the event of all these tools that work together whether it's the data warehouse like Snowflake and BigQuery and increasingly Databricks and Redshift or whether it's the tools around the BI layer or the orchestration tools, the ETLs that have built around it and DVT to drive transformation. So all these tools are best in breed. They do what they do really well but they also actually work really well together and give people a way to build a data stack that actually performs well out of the box, requires much less maintenance and allows people to really focus on deriving value and inside out of data. And so the modern data stack has been a boon, essentially, for data teams, which is very exciting. I think where it's evolving right now or where I feel we're maturing is we're starting after we've crammed all of our data in and built a lot of things and try to figure out how to make sense of it all, it's now time or we see companies kind of going back to the foundations and the basics and it's trying to think about how do we make all of this thing, how do we productize all this thing? How do we really create data products that are trusted and discoverable and usable and that really drive impact in the business? Yeah. Which adds a new layer of complexity and questions and it's a very exciting time and I think most recently, one of the biggest ways people believe they can add values by working with AI and LLMs and that's probably the new exciting frontier of the modern data stack. So all of these things are super exciting for me as kind of someone that's part of the industry. Yeah, I think you hit on a really good thing and I think I was having a conversation with Dave Vellante, one of the other analysts and one of the co-founders for theCUBE and we were talking and I think Snowflake and Databricks get a lot of the press but there's still a ton of people using BigQuery. I mean, especially the ones that were using Google Analytics to push their data into there to get that and trying to do customer 360. Are you seeing kind of a good spread amongst Snowflake and Databricks and BigQuery? Are you starting to see some of the other ones from Microsoft and I mean, there's also Redshift over on the AWS side that we hear about every now and again as well but what are you seeing out there from what people are building? Are they having one or multiple of them? How is that working? Yeah, absolutely. I'm not a modern data stack snob. I'm not gonna say that a certain tool is part or not part of the modern data stack and I think that's what we're also seeing out there. Really, people end up choosing the best tool for their job and sometimes, many times it's Snowflake, many times it's Databricks, many times it's both but there's also good reasons to use BigQuery. It's a great data warehouse and for people that have a lot of their data coming out of GCP, makes a lot of sense to go with Big Query and so we're really seeing a good balance between all of these and for many companies there's actually multiple tools like you alluded to because even within the same company like different teams will have different needs and different strategies and different personas that make the choices and that drives a lot of the tech choices and so definitely seeing all of them. Yeah, I think that's what we're hearing as well and I think in fact when we looked at some of the ETR data which is our partner that does research around this, they actually showed that there was an overlap between Databricks and Snowflake that it's when you look at Databricks customers 44% of them or something in that neighborhood also have Snowflake and I bet if we had put Big Query into that as well you would see yet again another level of that I tend to agree with you, I think it's the right tool for the right purpose at the right cost as well and I think that's where people are because they have different agreements, maybe your agreement's up with Google after two years and you wanna go to Redshift because they give you a good deal at AWS or what have you I'm not a snob on that either so I think right tool for the right thing but I think you hit on a really good point there around personas and I think that there's been the rise and I actually did a podcast back in January on data products and data product management and it's the rise of the data product manager myself having been head of product of a couple of different companies, I kinda look at it and you have to build these data products out and productize them what are you seeing in the data product realm? What is becoming more important from what you're seeing from your customers? Yeah, absolutely. So first of all, maybe let's define data products, right? So the idea of a product is that you build something and then you put it in the hands of other people and then they find value out of it, right? They get something that they need out of that product and I think data products always existed but if you look back maybe 10, 20 years ago the form was this binder that someone prepared for a month or two or three months before the end of the quarter and physically submitted to the executive team or whatever and that world has changed quite a bit, right? Like today we put data in the hands of a lot of people and a lot of organizations, right? Like every single person is making decisions out of data oftentimes from dashboards that you might have in Tableau or Looker or something else but it's sometimes or increasingly maybe a large language model that's interacting with people or it could be a data set that a data scientist uses to do a certain analysis or it could be sometimes an application that automates things for the business whether it's marketing spend or pricing and risk or things like that and so it's kind of taken a lot of different shapes and needs to be done differently because once you have so many products in the hands of so many people you really have to think about scale and like how to actually expose it in a way that's repeatable and trusted and findable, right? In the same way that we think about any product in the real world, like you need to make sure people are able to know that it exists if they have to need, you need to make sure that they understand how to use it and how to consume it and you need to make sure it's at the level of trustworthiness and quality that's required for the job, right? And that I think is a new set of tools that or a new set of disciplines that the data teams are increasingly adopting in order to support those data products and make them a success and really drive value in the enterprise. Yeah, I've been on this kick around data products and governance and being able to understand because like you said, the trustworthiness and I think we're seeing that, I mean all the people who are tweeting about how there's entropy within chat GPT and what took two seconds is now taking or coming back wrong even from an analysis perspective and I think that that to me is where you start to lose people is if you can't do it repeatedly over and over and over again and I think with data products, they have to be that you have to know the lineage, you have to understand where things came from how because having, I started my career as a DBA really long time ago and when I looked at that and started out down that and I would go and run these reports, like you said, it was manually, we had a dot matrix printer where we, I was at a university and we would ship out, I was the DBA for the person who shipped out all those huge mailings that people get in the mail and we had to print the labels. Well, something would change, somebody would go and change the format of the address or add something in and it would change the way it printed out. I had to then go track back to why something changed and how the data changed coming into what was our data product which was this mailing list at the time and I think you hit on a really good point which is for me it was near impossible where I lived within that stack and we're also talking about the 90s so I'm dating myself a little bit where I was using very rudimentary SQL tools to go and understand that. But I think, like you said, things have changed. How are they evolving? So, because it's got to be more complex at this point. Yeah, absolutely. I mean, the level of complexity increased, the number of data products, the number of different data sources that are feeding and the number of different stages that the data goes through before it gets into a product. And I think, or at least in Monte Carlo, we believe there are some good analogies to learn from how reliability and trust is handled in software systems which are as complex or sometimes even more. And there's some common methodologies around that. It's usually called DevOps, right? It's a set of practices of how you actually deliver software in a way that's reliable and trusted and a set of tools that come with it, right? And first and foremost, and a topic that's near and near to my heart is observability, right? This idea that at all times you can actually collect a lot of information from the data stack, specifically metadata, logs, metrics, statistics, and really understand its health, really understand whether it's working as intended, whether it's doing what it's supposed to do, whether it's delivering information that's accurate and fit for the purpose, or whether it's not. And if you can tell at every single point in time whether the system is healthy, you can also know when it breaks, right? And you can react to that and you can solve that problem proactively rather than impacting your customers, the people that use your products because that is the way you lose trust in a product, right? If it breaks over and over again in a way that it's builders don't understand or don't know. Yeah. And so, you know. And that would seem like it fits in with governance and really is almost the foundation is having that observability and being able to understand where the data is, is it healthy, is the product healthy as that? Yes, we believe it's the cornerstone of any governance initiative and we're increasingly seeing the market adopting it. By now we've served over 250 enterprises that use observability as a core piece of their stack and as a core part of how they deliver trusted data products and not just data products. Yeah. No, I think in spending my time when I was over on the snow plow front and seeing how people like you're saying building data products on top of the first party data that we would deliver into a data warehouse or a data lake or whatever you wanna call it, a data mesh or whatever fancy name you wanna use with it and how they would really look to drive, you know, hey, I wanna reduce churn by making sure that I have reach out after all of these incidents happen or something like that from a customer service or actually to your point about data products, looking at and analyzing their data products and how those data products actually perform. So building a data product on top of for a data product to evaluate the health of it. And it would seem that, you know, where Monte Carlo is coming from is really digging in and showing how the data product is built out. And that's really what you're announcing this week as well. Right. And Monte Carlo from the get go really touched on that problem of like data lands in the data warehouse and gets transformed through multiple stages and oftentimes multiple teams in the company before it lands in an application that might be designed to reduce churn, right? And like, how do you understand how all these things fit together and how they all work in tandem and whether they're all performing in the way that they're supposed to. So that's something that Monte Carlo tackled from the get go. What we're announcing today is the data product dashboard which is the first time where people can actually have a succinct and clear view of all of the issues that are related to a particular data product. So they can basically go into the tool, define what constitutes the data product. Like what are the objects that are actually the data product, whether it's tables or BI reports or models and really understand not just what's going on with those specific objects, but what's going on with everything that's upstream of them all the way to those tables that land in the warehouse from a source system. And that allows people to do things. One is to understand and prioritize what are the issues that truly matter. Right. At one extreme end of the spectrum you can try to solve only problems that happen at the most downstream part of the data product level but then it's going to take a long time and be really, really hard to find issues. At the other extreme you could try to monitor and track every single table you ever had and that's not scalable either. But if you can really narrow down to what are the issues that are impacting the most critical assets that I have, the most critical products that I have, it really gives you focus and clarity around how to spend your time and your resources. The other thing that it lets you do is actually communicate with stakeholders, right? There's, from a trust perspective, there's a huge difference between you finding out that the address was wrong versus the person that built the system for you telling you, hey, I know the labels are wrong. Hold on, I'm working on it. I'm gonna let you know as soon as it's fixed, right? That communication makes a huge difference in terms of trust. You can have the exact same number of problems but if you're able to communicate about it, you're just gonna get a lot more trust and a lot more adoption. And the data product dashboard actually allows you to do that because it allows you to look at the data product, understand is it healthy or not and if it's not, you're able to go ahead and communicate it to the right stakeholders and work to solve it and update them on status and so it's really designed to help teams both prioritize and build trust around the data products that they're creating. Yeah, and when I got to take a look at it, I think what was neat about it and really, and I think is the fact that you're able to calculate the importance and based on giving it a metric on how often it's used and how many different places it's used and things like that, which I think gives people that visibility that you just can't get otherwise because I think where we're seeing people building data products, they're actually building data products on top of data products. So you're getting these hierarchical types of interwoven data products and so those certain tables get really hammered from a certain set of products and so if some change was to happen in that exact table, it can be catastrophic to a number of different products and I think that's what you guys are really, what I can tell focused on is helping people understand that. Yeah, absolutely. I mean, if you're a software engineer building a production system, the data that you're creating is going to get used downstream by a bunch of different data products and most times you're oblivious to it. You don't even know, you don't know whether it's being used at all and even if it is, like who's using it for what purpose and that ability to actually map to like a change in this data set that was created in a software application, maps to these critical assets and here's how, actually allows you to have a much better understanding of what you need to do about it and how to prioritize issues, right? Like the problem is a lot of things are going to change but which ones truly matter is difficult and having that visibility is something that our customers absolutely love. Yeah, and I think it also to your earlier point about being able to communicate and be transparent. I think it's the data product manager, the data engineers, the data developers, they all can see this and understand how it all works together. And I think to me that was one of the best ways is visually being able to, because the picture is worth at least 1000 words if not more to say the least. I think one of the things that I loved was the fact that it really spoke to me on that level about how it brought those different personas together because I think there's still gonna be a massive skills gap out there and I think these types of tooling and observability much like I've dealt with observability in the infrastructure space in Kubernetes which is very complex systems but similarly interconnected and interwoven with service meshes and stuff like that except it's data meshes and data warehouses and I think that to me there's a good analogy between that because without that visibility you don't wanna be the last person to know that the labels were wrong. You wanna be the first person before you go and run 10,000 of them and then you have to throw that all out. And I think that must be where your customers are really focusing on is how do we make this not only transparent within the teams but also helping people understand, okay, hey, hold this or revert back and go and fix that before the end customer of that data product is seeing it. Yeah, absolutely, it's a game changer, right? It changes the whole conversation in the company from everybody sending angry Slack messages to the data team to the data team actually understanding what's going on and proactively communicating and building trust and we've seen that some of our customers are actually very diligent and run periodic surveys that measure a bunch of different aspects of how people in the company perceive data and all of the companies that do that have seen a massive increase in trust and data within a small number of months after implementing an observability tool because it creates that communication. By the way, that's regardless of whether they have more or fewer issues. Just the fact that you're able to own it and know about it and communicate about it makes a whole lot of a difference in terms of how data is perceived and how valuable it becomes to the business. Yeah, that totally makes sense and I think, again, this is one of those topics that's really key and I think people are super interested in because there's so many more applications that are data applications or data based applications that are built on multiple data products and products could be features of those data apps and how they get to your point, get consumed is one of those things that I think that new data product manager type role is really focused on and tying that back to the business and understanding that, hey, this data product has this revenue associated with it and being able to, that data product managers butts on the line when he's going out there. Are you seeing that as well changing in your customers? Yeah, absolutely, I think data teams are aspiring to transition from fulfilling tickets and asks and doing ad hoc analysis to a place where they can provide products that are usable and consistent and can be consumed. Everybody, they love building and not reacting to asks and they want to truly understand what they're building and why they're building and how it's impacting the business and whether there's a person whose title is Data Product Manager or whether it's someone informally doing that, it's a role that's so critical for many of the teams that we work with, tying between the business needs and the data and really creating something that's sustainable and repeatable and trusted. Yeah, no, it totally makes sense. So if people want to learn more about this, what should they do? We're at MonteCarloData.com so please visit our website if you'd like. I'm on LinkedIn if you want to reach out and yeah, would love to hear from you. Awesome, well, thank you, Leo for coming on and I really appreciate you really helping bring this together. We're going to be talking about data products here on theCUBE quite a bit because I think it's one of these places we really want to dig into and understand because I think there's a lot of different opinions of what a data product is and how it's really dealt with but thank you and keep it right here with theCUBE and the analyst angle and we'll bring you more information. Thank you and have a good day. Thanks, Rob.