 Hello and welcome to theCUBE and the analyst angle. We're here to have a CUBE conversation. We're going to talk around data products, what they are, what they're going to be, what people and customers are really talking about. And I'm so excited to be joined today by Sinji Kim, who's from SelectStar, who's the founder and CEO of SelectStar. And I want to say thank you for coming on. You're a CUBE alumnus. You've been on at least twice before, so we're excited to have you back today. And especially on this topic, because I think that you and I have seen multiple different parts of what a data product is, how the modern data stack is really evolving, and really what people should be looking for as they go out and start to have these conversations within their organizations about data products. And I think that's kind of where I want to kick off is because we may not exactly always see eye to eye on things like this, especially what a data product is. So why don't you give your perspective on what a data product is? And then we'll see how close you and I agreed to that. Sure. Well, first of all, thanks for having me here. I'm excited to be on the CUBE again. So yeah, I think when we met last time, when we talked about data products, what we really talked about is like at what scope of definition are we talking about? I have noticed that some people say it's data product even when you're calling it a data set, which I disagreed with because I believe, and as someone who has a background in software engineering and product management, I believe product, if you are calling something as a product, it has to have some type of jobs to be done, like a purpose that it's built for. And most of the time, data sets primarily reflect a state or product or could be an entity or fact. At the same time, I believe product needs to have either certain features or attributes, different aspect that is for specific purpose. So when we talk about data products, my definition is one level higher than just the data sets, usually something that has to do with the data sets. So set of data is either part of it or is baked into a model, could be a model. So most of the time, I feel like a lot of the data products are things like your recommendation model, fraud detection model, could be personalization model, designed for specific purpose. And because it is like a product, can be used for different features and purposes, can be augmented, can be iterated and applied in different places. So that's my definition. I think that's a great way to look at it. I, with myself being a product guy, a product person, I look at it as it has to be, you have to be able to do something with it. There has to be some value you get out of that product. There's something I can achieve with that product. And then to your point about going up the next level up, it's kind of, is that a data product feature or is that a data product itself? Does it stand alone? And I start to look at it as, some people say, well, once you get to visualization, maybe then it's a data product. I think visualization is a way to consume the product potentially. I could argue both sides of that fence. But where do you come down on, what is usable? Because if it's a feature, like you said, fraud detection model, let's sit there. Fraud detection, I think, is pretty good as a product, because it tells you that there's fraud or there's not fraud, and there's usable, and it's consumable. I think there's things like recommendations engines that could be theoretically, hey, it gives me a recommendation, but is that a product or is that part of a product as part of a site that I sell something on? Sure, I mean, that's why I say this is a data product. It's something that can be baked into a feature, like at the, you know, Furorovia website or it could be part of the app. So I feel like a lot of data products have a lot of applications of where it can get attached to. And because of that, it may not be, that itself may not be a product, yeah, it itself may not be a product feature, but then it powers the product feature and that's why we call it, I guess, yeah, data product. Yeah, and I think that's why I keep going back and forth with data product versus data product feature, and I think the word feature seems to be missing in our hierarchy of- That's true, and also once you, when we say feature, that is also a very like ambiguous word because we use feature for like machine learning models, like which features are we using, right? So let's not get confused about that. But regarding product development perspective, the other aspect of this data product is just more of like, like the other aspect is data applications because like I think that is another type of work can be considered as data product whenever we, you know, like if it's a very simple application, we call it crowd application, right? You, user is entering all the data and the application is giving you back all the data that hasn't been entered by all different types of users, whereas data applications is a, you know, I guess by default is a crowd application, but it can have more than that. Like it will analyze or it will, you know, run some models on top of the data that's been already entered and then give you a different results. So I think when the data products are being utilized, it usually powers data applications. Yeah, no, I see, I love that. See, I love that we're talking about as we build data products, you're really building a data app and the data app then gets exposed in some way, shape or form to whatever the end you, whoever or whatever the end user may be. And I think to me that was, I think that's, I think it's a still evolving hierarchy of this because again, what if, because here's the one that always throws kind of a wrench at me, is when I say, okay, I built this data product and then somebody goes, well, I took your data product, I took a subset of your data product, I took another complete data product and I joined them together to make that third data product that's up in the hierarchy built on top of two different data products. Now, does that mean I built a super data product? Or is it like, I took these features of these data products and started them? I would say rather than a hierarchy, it's more of a decoupled way of looking at it because you can always take different components out of the data product and then make other products, just like in software, you can take different modules and create a new application. But the aspect of this data application, I feel like is also not necessarily like subset or the super set of others, but more or less of a decoupled way. Like data applications will have data sets always. Does it utilize or work with a data product? I think that's kind of like the question here. And I think it does, it's more of like, because even when I think about data apps, this is kind of the whole retool and a lot of other internal applications that are also built on top of any databases. And in a way, how a lot of the data on Snowflake is also moving to with the Snowflake acquisition of StripNet. Yeah, I think that it's very interesting in how, and I was talking to some people, I was at a DBT community meetup and we were talking about data products and some of the end users of DBT were there and they were talking about how they're building their data products within their companies. And they started to get into, hey, to your point about actually transforming that data once you have it and where did it become a data product? To your point earlier, it's not just a data set, it's I transformed the data into a data product. In their case, they were using DBT Core but they transformed it and modeled it in a certain way. Actually, we're now getting to the notion of data as a product, so treating data and treating data team to be more equipped with product management mindset and the software engineering principles so that you are maintaining the inheritance or the structure of the data. You are defining requirements of data models, things like that. I mean, you were talking to end users of DBT. Yeah, yeah, like one of the companies, and I think they're out there as a reference, was Rapid7 who's there and developing their applications that they then sell to their end user customers and they happen to use DBT Core underneath the hood their data team does. And I think what was interesting about it, and I think it goes back to treating it as a product and having data product managers as part of the team and really starting to say, here's the working backwards from the customer, here's the requirements that I have, what they need, we have to figure out the best way to present that data to them so that they can then, and that customer is an internal customer who could be the UI team, it could be the analysis team. I mean, in a way, that aspect of what is the business needs and hence how should the data model be designed or how should the data be represented for the decision we want to drive for or analysis that we want to do. I think has always been the job of data analysts and data teams overall, but I think this introduction of, and I do see a lot more of data product managers, data platform product managers, this I think is adding a more kind of like center of excellence type of layer where the data product managers can see not just like one aspect of the request but multiple requests and then hence being able to take in a lot of those into consideration of data modeling but also how they can maintain so that they can reuse the data model that they already have and there is a good data contract around how they can be used and added or changed. I think that's all where the data as a product really comes in. Yeah, no, I think that to me makes so much sense. I think that it's a treating it as a product. In fact, I forgot, I was down in New York City two weeks ago meeting with a whole number of different data teams there and some venture capital folks that were investing in different, I guess you could say startups and it was sponsored by Alex Hutchins who runs a recruiting firm called DataWorks and they specialize in recruiting in data teams into corporations and so a lot of the teams we were talking with were really trying to wrap their hands around how they get a product focused type of mentality into their companies because I think they see it as a challenge that I'm a product manager at heart and I get it and I understand how you formulate the requirements but that doesn't come naturally to data teams. How do you see that people are addressing it? I'm seeing people recruit in, people who are more like me that have been product managers but maybe not even in the domain that that company is working in. Got it. Yeah, I guess if I were to try to think about the history or the experience of other data product managers that I've worked with in the past and obviously these are mostly just like start because I would say it's fairly new role in a lot of companies. I think a lot of them do come from more data analyst background and this is just something that I find from a lot of data leaders. They initially start in the position of supporting the business by presenting and analyzing helping making business decisions with data. But as they start doing a lot more of those work they try to optimize and make their own internal processes better. And I think taking like a significant step towards that is looking at and say that we should treat data as a product. Hence, we should have a way to manage data better. We should cut out all of them. We should put contracts in place and whenever there is a request of data these are the processes that they should go through. Like if there will be like a transformation layer or like if the data lands on this database like it should have this SLA like all of those stuff. I think comes from a lot of people that are already ingrained in data as a data engineer or data analyst but are starting to really think about all like or being exposed to all different ways that the data needs to be used. So yeah, I still think it's something that a lot of data analysts and data engineers can move into. It doesn't require kind of branching out on talking to more end users and learning more about the domain level knowledge outside of just optimizing SQL queries. No, I think it is one of those how do you train people up into being that product management mindset versus how do you bring people in and can you teach them the data aspect? And I think I even when I was at MeanyLife Financial and John Hancock Insurance in the US way back in the day when I was in the internal IT so this is before we had this I worked with the actuarial group and we had a grid of computers where we would do the numbers and come up with all the science behind it and all that fun stuff before it was called AI back in the day. And what I needed to be able to help them build out that grid and build out the abilities and capabilities to run their algorithms to actually come up with those numbers. But me knowing the business was more important than me knowing the data science behind it at the time. So I think you're gonna find that it's gonna be an interesting I guess you could say thing to see how this grows out because I think that data teams really need to have that business focus. And I think some do and some don't like every software group you can always see it and very who wants to be into the business who doesn't wanna be into the business. Okay. Yeah. I think a lot of data people do especially people that are you know facing the stakeholders and providing those insights and analysis. I also think that it is important for more business stakeholders and domain stakeholders to start being more familiar with the data models and how their analysis gets put together. And I think that's kind of like the only way to really reduce the gap in order for both data team and business team to collaborate well. I agree. I agree. So what's new with SelectStar? What's going on? Everything good? Rowan. Yeah. So it's good. It's been busy. So you know we just came I guess I can't believe it's already like a week ago but they are from Snowflake Summit and Databricks AI Summit which have been really amazing shows. We launched two major new features. First one is AI documentation. So we used to have automated documentation. So AI is just kind of additive on top of that. Our original automated documentation was about filling in all the descriptions of columns and tables whenever we see either duplicated tables or whenever we recognize any data that's been transferred as is because that's something, one of the details that our column level lineage will track. So you can basically write documentation once and then we can apply it all in everywhere else. So that's what we had but it still does require some effort for data teams to write some documentation in order for us to propagate that all. Now with the amazing developments in LLMs and all we wanted to try out what would it be like if we were to have AI to write documentation from scratch if there are words like no human documentation whatsoever. What are the aspect of context and the prompts we can give in order to get somewhat of a 80% plus guesstimates of column documentation we can do because in a way a lot of table column level names which is required then the comments like the description side is already descriptive enough. And we also, given that we process so much metadata one part that we saw as one of the key is the SQL query that creates those tables because now with DBT and all the transformation layer you can actually see how each table or view or reporting layer is getting generated. So it was really cool to see how close we can get even when there is like no prior documentation whatsoever. And now we can help our customers to really kickstart their data documentation and data dictionary without having to like it's really more of like going to basically review the documentation that AI have for you and then as you start augmenting them you can make the rest of your documentation much richer and more accurate. Yeah, we see that with the AI all the time especially with the anywhere LLMs are being applied from a specific or segmented language model type of a deal that it's always enhancing what the people are doing. So how close can I get? And then augment the knowledge which makes total sense. Yeah, so this is like just a baby step towards like adding more features and functionalities using AI but I'm really excited to continue embedding AI into other places and select start as well. The other feature that we also released is a snowflake cost analysis. A lot of our customers especially on the enterprise side they are huge customers of snowflake and because we manage all of their metadata one of the things that we have always gotten asked for is what the usage of the data means to them in terms of the spending on their cloud data warehouses. So we felt like this is also very interesting angle. With lineage and popularity we can tell you based on the dependencies like this is a table that's being quoted a lot but it's quoted by this Tableau dashboard and we can let you know if that Tableau dashboard is actually being viewed by your business stakeholders. Because if it's not then what's the point of keep updating the table. And similarly in terms of the cost perspective we wanted to show somewhat of that relationship of this table is being utilized a lot and hence it is definitely worthwhile of this much of a cost. But here is a table that is being quoted by different aspect and it is very expensive but is it actually being utilized like in terms of popularity are there a lot of people using it or is it just some people that's writing very bad queries like that type of anomalies being able to see those anomalies is one part. And then for in general companies that are looking to optimize and manage their cloud data warehouse costs better the first step of managing your cost is by just slicing and dicing the data by putting it into how you can segment them by user, by team, by warehouse, by your dashboards or tables or queries. So like it's basic level segmentation at the same time when you start doing that you can start seeing which are the top most queries that are driving most of the costs. What time of the day or day of the week is causing this on which warehouse. So these type of insights are, I would say also first steps but is very interesting to a lot of our customers so very excited to roll that out and we are extending that to dashboards level. So that's also interesting because most of the I think cost related part like snowflake themselves also have released a new feature. They call it SKI or something like that or SPI or the snowflake like something indexed right performance indexed. For us it's really to show not just on the table level but also based on the application. How much is this application costing you on the snowflake? How much is this specific dashboard costing you on the snowflake things like that? No, that's I think fantastic and I think you also hit on one another topic. We'll have to circle back around on another time but around data contracts and cause that's a whole nother, we will probably spend another episode on that but I really appreciate you coming on board and sharing your perspective. I think it's gonna be a continued conversation around data products and I really thank you very much Shinji for joining me and I thank you all for watching and joining us here on theCUBE where we're breaking out the signal from the noise and really bringing you what is going on especially here on the Anilostangle. Thank you, take care.