 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Officer of Data Diversity. We would like to thank you for joining this Data Diversity webinar, the ABCs of treating data as products sponsored today by data.world. Just a couple of points to get us started due to the large number of people that attend these sessions you will be muted during the webinar for questions we'll be collecting them by the Q&A or if you like to tweet. We encourage you to share some questions by Twitter using hashtag Data Diversity. And if you'd like to chat with us or with each other, we certainly encourage you to do so. And just a note, zoom defaults the chat to send you just the panelists, but you may absolutely change it to network with everyone. And to find the Q&A or the chat panels, you may click those icons found in the bottom middle of your screen for those features. And as always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and any additional information requested throughout the webinar. Now let me introduce to our speaker for today, Tim Gasper. Tim is the VP of product at Data.world and co-host of the web show and podcast catalog and cocktails. He previously served as director of product at Jan Rain, head of product at marketing at Bitfusion and VP of product and global offerings manager at InfoChimpz. Tim has over 13 years of product management and product marketing experience and is a writer and speaker on entrepreneurship, lean startup methodology, analytics, and AI. And with that, I will go to the floor to Tim to get today's webinar started. Hello and welcome. Thank you so much, Shannon and hey everybody from all over I see folks chiming into the chat with their locations. This is so cool. I always love working with you Shannon and DataVersity. And getting to look at all this awesome content and meet all these great people. And today we have a really exciting topic, which I'm very happy to present to you all and it is inspired a lot by a lot of the trends and excitement around data mesh, but specifically zooming into one very important aspect of it. So my favorite aspect of it, which is data products and folks like yourselves and really all across the data space are trying to understand what's up with data products, like how do I, how do I benefit from product data product thinking, literal data product creation. You know what is the process by which we go about it when should I do it and when should I not. So we're going to hit all sorts of interesting topics today related to the data product concept and specifically a framework that that myself and actually my other host at cataloging cocktails came up with, which is the ABCs the data product ABCs which is a framework to think about how to manage and build those data products so looking forward to going through that all with you today. And without further ado I'll go ahead and and jump in. So first of all, who am I to talk appreciate the introduction Shannon so yes I've been at a lot of different companies always in the data AI and an analytic spaces. I've been working and writing on data, and you can find me on Twitter Tim gasper linked in at in slash Tim gasper and also Tim gasper calm, and I am the co host of a really awesome show called catalog and cocktails we do it live every Wednesday and we also push out to all of the popular social platforms and podcast platforms, really excited to be in the top 2.5% of global podcast listenership which is a very oddly specific statistic, but we're really proud of it. And you know we would love to have any of you join us and please like and subscribe. We're not talking about anything related to data management and analytics and the whole focus is honest, no BS non salesy conversation so we're not pushing, you know specific technologies we're just asking hard questions about you know governance about data ops around knowledge around analytics around AI and much much more so check it out we're at over 100 episodes now I think we just did episode 105 last week so pretty awesome. One of awesome shows also you should check out all the great content coming out of data versity. There's actually an awesome podcast that Shannon has been putting together around careers and data, one and I were super lucky to to be honored to be part of the sets of guests that came onto that show so definitely check out some of those links on the left hand side if you're curious to check out about that and not just us but all the other great guests that are that are on that podcast so thank you Shannon for putting together some really great content there and give people a view into not just different careers that you can do and data but also, you know just learning interesting things about data through all those great chats. So, who is data.world, and you know what kind of, you know why we kind of talking to you all today. And data.world is the enterprise data catalog for the modern data stack and we're powered by a knowledge graph. What that means is that we were born in the cloud, and everything that you put into data.world's catalog and governance platform lives in a connected network and so whether you care a lot about that and you're like oh my gosh like I want to take advantage of awesome semantics the same technology that the worldwide web is built on awesome. For those of you are just like you know what that's the car in the Ferrari and I just like to drive the Ferrari and I just want to press the gas pedal and hear the engine go broom that's cool too. And ultimately what matters is that driving experience which is really what we think of as adoption so we're really focused on making sure that people can find understand trust and access data at scale. We have over 1.8 million users using our catalog, really focused on fast iterations we do over 1000 releases a year and we're not just a metadata catalog. We're also able to provide federated data access to her platform as well so thank you to all of our customers you can see on the right hand side there and many, many more these are just the ones that let us use our logos and really appreciate all the kudos that we get from them. Also, a quick shout out to snowflake, we're a very deep technology partner with snowflake we are both we are snowflake ready technology partner premier as well as powered by snowflake so very close partners with them as well as a variety of other players in the space we think it's important for a catalog to work with many different technologies but especially with your, your data warehouse. So, let's jump into the content here. So, really this whole idea of data products has, it's been around for a while now, but has really taken up steam. And the data mesh movement has come abuzz and data mesh really became popular, let's say about a year and a half ago is when it really started to emerge under the scene and now it's just like blaring excitement slash hype. And for those of you that aren't as familiar with data mesh, it's really all about. How do we really scale our data investments and not just from a technology perspective but especially from a social perspective it's a socio technical paradigm. And the problem is is that there's monolithic data infrastructures and the problem isn't necessarily that we can't scale the technology it's that actually these systems don't scale socially across a larger enterprise. And in general, we're treating data too much like an afterthought right when people say things like oh data exhaust or you know, even things like data is the new oil which sounds like it's something that is you know pulled from the ground and consumed that that is sort of the previous paradigm of thinking and really going forward, we want to try to tap into that untapped value of data, turn it into a reusable and sustainable asset, and actually overcome the barriers that overly centralized platforms and processes can cause for us to be able to be effective around data. Right and the quintessential example of sort of data management gone wrong is you've got this centralized governance team you've got this centralized data engineering team. All things go into that group all things come out of that group and ultimately what happens is you're waiting for a very long time right those data bread lines, as well as you are trying to go around them right and so you get sort of shadow data it sprouting out all over what that division bought snowflake oh but we already consolidated around data bricks right you get all sorts of problems like that and and it's out of this sort of morass around trying to scale data both socially and technically that data mesh really became exciting, and it's a little bit borderline religion now, and that gives me kind of a chuckle, but I'm just glad we're having the conversation. Because I think when you look at the big data movement when you look at the AI movement, a lot of these previous movements of sort of hype and excitement have been very technology centric. And I'm very excited that we're finally getting very hyped and excited about something that is not just about technology but also about the social challenges around data that we need to solve in our organizations and so you know kudos to all of us for getting excited about something that's the people in process not just the technology. So, just double clicking a little bit more on data mesh. There are really four main components or tenants of what data mesh is to try to solve that problem of monolithic both technology and social structure in our enterprise and it comes down to these four things so it's first of all domain ownership. So empowering the folks across the business wherever they may be to be able to create and steward the data, because they are the experts they are the ones who are closest to it and therefore the best to manage it so it's really about sort of democratizing and federating the management of data to the different domains in your organization and depending on what industry are in and depending on how you think about your business domains might be your business units and might be your functional areas and might be your products, you know there's a lot of different ways to think about domains and that's a webinar all its own. But, you know domains essentially is you know how you how you think about your business organize your business based on areas of expertise and management. So the first tenant of data mesh is around domain ownership. The second tenant is around data as a product. So that is really about managing data with end users and the end stakeholders and mind really thinking about the user experience of data, and the surface area of data. The third piece around data mesh is self served data infrastructure as a platform. That's the piece of this whole cycle that's probably the most technology centric. And if you're going to spread responsibility to all the different parts of the organization around the data, aka the domain ownership. And if you're going to ask them to treat and manage and do the life cycle around data more like a product, then you better provide them the tools and the self service infrastructure to be able to do that. And then becoming the sort of modern platforms that you know if you if you're in an organization that has a data platform team or an IT organization focused around data. Then usually this is their mandate right to sort of choose these different tools and be able to provide them to the broader organization so that when they're building out pipelines are trying to solve certain analytical or AI use cases that they've got the right tools for the job. So the things that often sit in this category are things like a catalog or a governance solution, you know your data integration solutions or ETL or collection integration solutions data storage and compute your snowflakes your data bricks your Azure synapse your Google BigQuery all that type of thing right So transformation and modeling tools if you're leveraging things like dbt in terms of a more modern sequel oriented modeling environment for your data warehouse then you know things like dbt might fit in that layer or maybe you're using more of an ETL platform for that type of work. AI and analytics AI tools and then you know things like data observability data quality, these are all things that kind of fit into this self serve data infrastructure as a platform bucket, where as you're building out those data products some combination of these things are going to help you. And finally the fourth piece of data mesh, perhaps the least well understood and so therefore one that you know companies like us at data.world have been leaning into to try to create more clarity around is around federated computational governance. So if you're doing the domain ownership the data is a product in the self serve platform, then you need to figure out how to scale the governance aspect of this right which is the safety the interoperability and then the enablement around all of this data. You have to find the right balance between centralization and decentralization to make that work so certain things policies principles interoperability standards those should probably be centrally managed and so if you have a steward council or a governance council or you know a chief compliance and an officer in partnership with a head of governance you know these are the types of folks that might provide some of that centralized organization around that, but then for everything else you really try to empower those different domains to manage governance locally in those different domains in a way that makes sense to them. And so that's where the federated aspect comes from right you try not to have all governance be centrally managed you try to create the core language the core rules of the road, things like interoperability standards and then really the rest like, for example, you know creating data testing rules or something like that right those should really be democratized to the domains instead of trying to have one central office create all the you know the data tests for the entire organization that's just kind of one example there. Now, one last thing on federated computational governance. Why does the word computational show up there. Well that's essentially a nod towards this should really be technology enabled if possible. So, you know, if you're using an identity management platform, if you're using, you know, an access management platform. You know if you're using things like AWS, I am right these are all examples of ways to programmatically and through policy be able to manage sort of the rules around access and around visibility and around capability. Because they're policy driven and because they're code driven, they can be computed against. So basically the whole point of that word is that as much as possible governance should be automated not manual. So these four things come together to really enable data mesh. And today, the piece that we're really especially going to zoom into is data as a product, which is really that idea of keeping those end users and stakeholders in mind and treating your data with some of the best practices of product management so you can get the maximum benefit and value out of your data. So, we're actually going to start with the takeaway. First, right and hopefully what you kind of take away from this whole talk is that consuming data to solve the critical business problems should be as easy as buying a product on your favorite e-commerce platform. Let's talk about Amazon, for example, and, you know, let's say you're searching for a water bottle. You know there's a lot of options out there. And you need to be able to figure out which one's the best one. And in order to determine a good product you look at a lot of factors right you look at like oh the reviews the price, the images the documentation around it when you click those different things you're going to get to see all the information about that product. The importance of search and discovery around being able to find these products. And then on top of all of that, there's a lot of duplication here right there's a lot of different, a lot of different water bottles available on a place like Amazon over 10,000 results you see at the top left there. And that starts to go also into life cycle, which is that this is a marketplace and may the best product win. And the ones that don't win should probably go away they should fade away and we shouldn't be maintaining them anymore right those vendors should not exist. So that's a bit of an analogy right to think about both how data product management maybe can be a different framework to think about how you work with data in your organization, but also more specifically the end user experience the end user experience data product should be like a data shopping experience or an analytics shopping experience. So let's kind of keep that in mind as we go through all of this. So what are the data product ABCs. So we've been working with a lot of customers over at data.world and specifically also Juan and I have talked to over 100 guests now on catalog and cocktails, some vendors but also a lot at various companies like WPP, like rapid six like many others that are building out varying scales of data platforms data capabilities, and then being able to actually bring those capabilities to market. And we have found some trends some trends around where data products succeed and where they hit stumbling blocks and start to fail. And as we started to write these things down words like accountability and expectations and knowledge and downstream all started to show up and we started to see a little bit of a pattern here so we just leaned in we fully embraced it. And it was the data product ABCs and it became ABC DE. And if you're curious about that QR code there that'll take you to a little bit of collateral on a little bit more detail around the data product ABCs we have a, I think it's like either a white paper or some documentation around that so feel free to check that out. But basically the data product ABCs are accountability boundaries, contracts and expectations downstream consumers and explicit knowledge and we really see this as a really valuable framework for how to think about and build out effective data products. And the caveat that Juan and I always say when we go through data product ABCs is this is an agile framework. And so please, you know, poke holes in it ask questions, contribute ideas to it because our goal is not to let this framework be fixed in time to evolve. So let us know what your thoughts are and questions and and we want to incorporate them in and actually several folks have provided some ideas that we've incorporated already you won't know that because you'll just see the latest framework but a lot of these actually were because of the collaboration with the broader data and governance community. So let's start off with the first one, the first one is accountability and this is really important because accountability is really about who. It's about the people and behind all data all analytics and all the projects in your organization all the work structure at the at the end of it all or the beginning of it all. There's people people who made those decisions, people who have those ideas, people that get called when things break and go wrong. Right and so those people are at the center of everything that's going on from a data perspective and it needs to be captured and understood, and a data product can't exist if it doesn't have some sense of accountability, right. So you think back to that analogy of the on Amazon right every one of those products has a vendor behind it and those vendors have people working at them. So accountability that has to be built in here right when you order one of those products a person is going to fulfill that or machine is going to fulfill that in the warehouse and then it's going to get loaded onto a truck and a person's going to drive that over to your house. Right, so there are people involved in the entire process the same as true with data. So some of the key questions that you should be asking when you think about accountability are who is the owner or right if you don't like the word owner. There are great other terms out there like trustee like ambassador is actually one that Laura Madsen pushes. She's the one person who wrote a disrupting data governance great book by the way if you're if you're interested. Who is the owner trustee or ambassador that is responsible for the data, like who does the buck stop with. That's usually not enough usually there's more than one who and if you just stop with owner you may not be going far enough. You should ask some other questions that have a who attached to them who defines the requirements. So for this data product, let's pick like order data for example people are ordering things. Who decides that we should add new columns to that table. Who decides that new use cases should be handled for that. Who's that person is it the same as the owner is it actually somebody else is the owner really more the documentarian and actually there's some other person who's actually evolving that that table or evolving multiple tables or evolving that API right whatever that data product could be that data product could be you know an API it could be a streaming queue it could be a bus it could be a lot of different things right or a data set in a data warehouse. Who fixes the product when it breaks right who's the person who it goes to who's on call is the person who's on call different than the person who fixes it when it breaks right do you have an internal help desk where you know folks are kind of handling the data for the data engineer gets assigned. And then who's in trouble that the data is mishandled, what is the liability scheme or the compliance scheme, who's that person. And there's probably a number of other questions that you can be asking inside your organization around accountability, and one recommendation that I would have is consider, what are the least number of people that you could list, while still being complete. And so it's probably around four or five people that are in play here there's usually a more of a technical person probably the person who's fixing it when it breaks. There's a more business oriented person who's maybe the SME or defining the requirements. There's usually a person from a liability standpoint and then there's usually even a fourth person involved in this right maybe a top user or person that you ask on how to use the data. And so it's important to capture that that's the accountability structure and that's important to the data product. When you think about other types of roles in your organization that may be related to this, here's just some examples of different roles different accountability folks that might be in play here. From a strategic perspective, there might be certain executive owners right and executive steward or an executive champion, a program manager a technical architect or a project manager who's involved in having some accountability around this data. There may be certain consumers that have accountability over the use cases or the analysis of the data. There might be some folks that are more on the producer side, whether that be more of a data steward or a data engineer or a data product manager. Right, so this just gives you some ideas of some of the roles and again going back to that razor, right, what are the minimum number of folks that you could list that are accountable to a product. You could identify what's the smallest number there but still complete and compelling and useful. The second piece to this framework is boundaries and just to really emphasize the point here I put all my bullets in a box and I also put a cat in a box so hope you enjoy. And some questions here are, what is the data. Where is it what is it. What are the columns what is the you know like where is it. Where will it live. Is it in snowflake is it in s3 is it in data bricks is in where is it. What isn't it. Right. And this is where we get a little bit into abstractions to some degree but it's important because a product has a surface area a product has features and capabilities, and it has some things that doesn't do. And it's important to be clear. So for example if a data product is an API, but it's a soap API not arrest API then you got to be clear on that you have to be detailed on what it is and what it isn't. If that data includes these seven columns but excludes these other three columns right that that's that's what it is and what it isn't. And not just what's in the box, but also what are the inputs and outputs. So what are the interfaces. Is it accessed through JDBC, is it accessed through an API. Do you consume it from a streaming perspective, right these are the inputs and outputs are you know are you able to download it. These are all important aspects of the boundaries of that product. And finally how do you balance the roadmap against some of the other organizational priorities and considerations. These are all sort of aspects around how to think about the boundaries and the present the present day the now of a given data product so this is think of it as building that box. And what are the holes or the interfaces that that box has, or if you want another analogy think of it as like the castle right the castle has a wall around it so that it's clear what it is where it is but then it has it has a drawbridge that goes across that mode maybe it has a few draw bridges right what what are the walls, what are inside that castle and what are the draw bridges that go across that mode. This this is all about the boundaries of that data product it's what gives it its definition. The third piece is contracts and expectations. And I'll go through the questions and at the end I'll kind of comment about the difference between a contract and expectations so examples of things that fall into the category of contracts and expectations are what are the data constraints definitions and tests around the data. Is a phone number supposed to look a certain way for it to be considered a phone number, otherwise you know it's outside the boundaries of that. Is there a certain test that's supposed to work there's a test that is checking if, if there's no values and if that, and it should this column should never be no, it should have never have any nulls in it it should always be 100% complete. Here are three examples of contracts and expectations. What are the service level availability and the service level objective. So, is it supposed to be up 99% of the time 99.9% of the time. If it goes down how fast can we expect it to be come back online. So what are the sharing agreements or consented uses and policies around it. So am I only able to use this in a certain context because we have a contract with a certain other entity that we actually got this data from. And I need to make sure that I'm following, following that agreement. So I'm not necessarily allowed to be used for operational analytical purposes and not for marketing targeting purposes. And that starts to get into consented purpose which is a sort of connected piece of consented uses. And so, you know, there's a lot of privacy legislation out there now and although right now I think partner says that even though about 25% of the world's population is covered by the US today, by the end of 2024 it's actually going to be at 75% of the world's population so you know these modern privacy laws are expanding very rapidly in the US alone right we've got a lot of conversations about okay well now there's CCPA, we can actually just pass some legislation, applying to its area of jurisdiction. You know, you've got Canada with its own laws like it's becoming pretty clear now that we need to follow these even if you're geographically constrained in terms of what you're doing. So we have the right purposes. And, you know, the data product can only be used for those use cases which comply with those consented purposes, you know which can get quite granular it could go down to the row level in terms of how those consented purposes apply. So I think the performance and scale of the product in terms of the contracts and expectations there. If it's an API, should I expect that 99.9% of the time that that API will respond within 500 milliseconds. Or no. And at what scale am I allowed to do things right am I allowed to hit that API 10,000 times a minute, or only 100 times a minute. So what else about the maintainability details so what else about the data quality and how it's being maintained, what efforts are being done around it can help me to understand whether to trust this data. And what are the security constraints around it right does it have to be encrypted are certain columns hashed. These are all things that might be true about that data. And what's different between a contract and expectation. Well, an expectation is not a firm commitment right and expectation is in most cases you should expect it to work this way. And so an example might be so we said, hey, 99% SLA maybe that's an expectation, you know we're aiming for it to be available 99% of the time. That would be an expectation a contract would be if we are if we break that we're in breach. And so a contract is a stronger expectation a stronger commitment than an expectation is. And for those of you that are keeping track of what sort of buzzalicious in the data community right now there's a lot of talk around data contracts. And they go pretty, most of those conversations go pretty far in terms of the strictness of that data contract like they're almost like programmatically enforcing data contracts that's kind of an extreme. I think that it's better to think about contracts and expectations in a broader way, because then that allows you to really think about, you know, the right balance you don't want to have too many contracts on your people who are trying to create and manage your products, because think of them in like handcuffs right if you put too many handcuffs on somebody you're going to really slow them down or think of it like molasses right you're going to slow them down. There's an agile balance here where as much as possible you want to establish contracts, where you really need something to be true like if you're building applications that are customer facing and things you really need some strong contracts, but as much as you need rather than just be expectations. All right, next, downstream consumers, who are the current consumers of this data product who's the user, who's the customer. You know this is D because you know we're going in alphabetical order here but really this is the most important part of a data product, who is the customer, who uses the product, and not just who uses it currently but who could use it. What's attractive to that is what are the use cases that have been considered around this particular data product because that incorporates both the current consumers as well as potential consumers. What is the value of this product. You know, and that can go even so far as monetization right does it have a monetary value to it. If not a monetary value then just what are the benefits of it and how could it benefit and the help those consumers in those different use cases. How is it going to evolve over time. Are you going to be adding new columns in the future. Is there a new use case going to be supported, are you going to add consents to it in the future. Are you going to add hashing to one of the column value so that way it can be more widely available to the organization. Instead right now it's got PII in it and so you need to ask for permission first. How is it going to evolve to provide more value to consumers over time that's related to roadmap, and what's the user experience of the data, and so this connects to some previous points as well. So downstream downstream consumers a data product that has no consumer isn't a very good data product and probably shouldn't exist. Explicit knowledge. This is really where you take everything that we just talked about and wrap it all together. And this is especially oriented around documentation. The more you can create automatic documentation great but manual documentation is important too. And this is where you focus on what is the meaning of this data product and the different ingredients or components that make up that product. Is data oriented what is the schema if it's graph oriented what is the ontology or if it's conceptual oriented. How is it related to other data products, right so conceptually related other duplicates is essentially a duplicate of another data set and it's been just moved from one place to another. So conceptually rated synonyms antonyms, even more specific technology relationships like can this table be joined with another. That's an important relationship that should be explicit, not implicit. And then you know where's the documentation, tell me about how to use this data. So there are not all products are created equal. And if you in your organization have five products 10 products maybe you can manage them all with a high level of fidelity and investment. But if you have 100 data products 200 300 data products, that might be a pretty large surface area. And so consider what are the most important data products to really invest in more detail across these four different these five different frameworks versus ones that are a little less critical, and think about life cycle, what data products can we actually retire, because going back to that marketplace example right thinking about that Amazon marketplace, what products are successful are going to continue to be successful, those that are not successful should go away they should disappear they should retire right so there's a little bit of a capitalistic aspect here to to data products. So let's bring this all together. Here's an example it's a little wordy but I want to show it to you all in one go so you can kind of see what this all looks like so let's say you create a data product around user data. And the accountability may be that the product domain is responsible. The technical steward is if something goes wrong with the data pipeline is Alice, the business steward is there if there's any questions about the meaning of the data and that's Bob. There's a manager who's a member of that product domain, who's gathering the requirements and managing the roadmap is Charlene. So there's our accountability structure boundaries this data product is going to contain data about users starting from January 1, 2022 and onwards, users are defined as people have activated their account this data product is going to live in our cloud data warehouse. So these are some examples of some boundary aspects. The contracts and expectations this data product will have a list of all the users it will contain the unique internal ID day created etc etc etc. All the data should be complete in other words there's no reason for missing data. This is the definitive number of users this data product is near real time with up to one hour lag. You can see how some of these are articulated here like this is not a contract that's more of an expectation. This can only be used for internal company purposes if any of this data including aggregated needs to be shared outside the company the data product manager needs to be consulted so we're giving people some expectations around usage. And then data is available in the cloud data warehouse where it can be accessed through SQL or through the BI tool. Who's going to use this data right who are the consumers these are the consumers of the data product the customer success team is the main consumer of the data. Marketing and sales teams could use this data perhaps on cross all so the implication there is that maybe that's a potential user maybe they're not using it today. Marketing team wants to use this for personalized offers but that's not a target use case yet so that's another future opportunity. Depending on the requirements more attributes can be added. The customer success team wants to use the data through BI tool so this is showing you how the downstream consumers interact with this. Finally explicit knowledge a user is any person who signed up for a system starting on January 1 that has activated their account that the schema is and then imagine a schema definition era data dictionary. The user can be associated with exactly one email so we're talking about some more explicit knowledge about this data and then finally a user data product can be joined with user activity data product that's an example of some relationships and connections that are instead of being implicit now are explicit. So again this is just a selection this is just an example but this gives you an idea think about your own organization. Think about your own data assets that you're building think about your analytic assets that you're building and and consider how a product oriented approach using these ABCs could actually help you to have higher quality products. Have more valuable products have more discoverable and understandable products. And also maybe help you manage the surface area in the life cycle help you see which products should live and evolve and be invested in which new product should be created. And then maybe which product should go away should we should retire sort of product lifecycle. So this next section really is how do we build data products so hopefully in that previous section there you're really saying OK this framework to be really helpful and thinking about how to manage those data products. How to create a quality framework around them and how to document them but how do we actually go about building them. And we actually think this is where governance actually plays a really important role. And at data.world we really push this concept of agile data governance where you're really iterating around governance around your data around your data products your analytics, combining both safety as well as value and knowledge. And it's really focused around adapting the best practices of agile and open software to data and analytics. And so what that means is you've got, first of all, the team right the people that are going to help to build those data products. If you're a more forward thinking organization, or maybe actually data products are part of your revenue stream, then perhaps you have data product managers and actually at data.world and also at catalog and cocktails we're seeing a pretty big rise in data product managers, not just in companies that actually sell data or have you know externally facing data products but actually internal data product managers as well that can work with data architects data engineers and analysts actually create a more product centric approach to data pipelines and those data products that are getting created internally as well. But it could be folks that are more traditional like data architects, essays, DVAs, data engineers data ops folks and many others. So these are your data producers. And then you've got your data consumers might be your analysts your data scientists machine learning engineers bi teams business professionals. And both of these groups producers and consumers need to work together in order to develop these data products. In the same way think of it as like a software product manager, a software product manager if they're doing a good job doesn't just come up with a data product in the ether, and then just say like I think it would be awesome if this product existed and then, you know, imagine that data that product manager doesn't talk to any consumers they don't do any testing they just go get some engineers and I go build a product and then they build it and hope people will come and, you know, sometimes you hear of amazing successes and that happens but most of them you do not, because that's not grounded in what the consumers want they're not influencing it there's no feedback loop it's not lean it's not iterative. And that's why we think producers and consumers working together so important. And these roles cannot be fixed by individual sometimes a producer may actually be a consumer in another context, and vice versa. What's really important is really focusing on the use case and that requires you to understand what is the consumer trying to do. Let's develop a backlog of the business questions that we're trying to answer around our data. If we have existing data that's being used to solve this then maybe it's about identifying hey which of the data is already being used to solve this use case and kind of have that be a part of this here. And ultimately the end user business values the driver. So cataloging your questions and backlog and actually cataloging the people in your data catalog think a little more expand expansively about the role that a data catalog could potentially help you in developing this background business questions. Next you got to curate data assets sometimes that means creating new data assets. Oftentimes it means identifying the assets already within your organization that are being leveraged to answer this question and just formalizing the boundary around it and identifying you know these five data product ABCs around that. And here it's important to collaborate release early, do peer reviews, really define and document what that product is contextualized linked to policy clarifying or fine this is really applying the data product ABCs and capture the knowledge as the work is done. So if possible, if especially when you're creating new data products new data assets, ask the data engineers and the folks producing the data to document it as they go so it doesn't have to be a post hoc sort of end activity that a data steward or a business analyst is attempting to do all the way at the end, the folks producing the data and working with the consumers are going to be in the best position to really capture a lot of that explicit knowledge. Next, enable. This is a missing piece often, and it's actually an exciting topic to think about is, we talked so much about, you know data ops, we talked now about data as a product so it's great that that's becoming more popular, but a concept that isn't really talked about is one around data marketing and actually Emily pick over at data.world. She is a senior product marketer on our team. She's been evangelizing or a lot around this topic around data marketing. And the whole idea around data marketing is when you think about a product and let's go back to that Amazon analogy and you stick that water bottle into the marketplace. If you stick it there doesn't all of a sudden mean that you're going to get a lot of people going to see it, even if you document it well, even if it has some good reviews and things like that right that'll increase the chances that it'll create value, but it still may not get value. What happens on the other end is people are doing SEO marketing they're doing SCM they're doing, you know, content marketing there's all this marketing activity that's happening and enablement activity that's happening to actually make it so that people buy that water bottle. So think within your own organization when you have a data product and you want people to use it and you're confident that it's good. A product manager does more than just make the data product they actually help to evangelize it so training evangelize evangelism community accessibility being hands on this is all important aspects of enabling consumers to actually be able to use this and know about it and apply it to the right use cases. Then you really are entering the phase of okay well consumers are going to find and understand and trust it they're going to use it to do their analysis they're going to answer business questions this is what we call the development of working insights. And on the producer side this is really about learning and iterating so you want to measure how are people using the data product kind of audit it. You want to provide advice and assistance improve the documentation improve the product fast safe enablement event users is the goal, and based on you know that measurement and that advice, it all feeds back. And so either we're improving our data products we're creating new data products or we're retiring data products that shouldn't exist anymore. In some cases, or even like merging or splitting data products into different pieces. Really this is all part of a portfolio and an iterative process to solve the most value for your organization giving the given the resources and the priorities that you have. So we think that this is a really powerful process to think about developing these data products and, and even if you're a little skeptical about the idea of data products and you're just thinking about data governance and a more sort of general and generic way. So this can be helpful as well to think about how do I achieve governance outcomes in days or weeks, not months and years. And we really think that the time impact of being fast incremental and iterative is super important, because the traditional approach to governance or the traditional approach to developing data products is really more of a waterfall approach, where I could take many months or years to really get to that first step of value. And we think that the data product approach on the agile data governance approach really allows you to focus on, hey, let's do data product number one data product number two data product number three and number four, and iterate our way towards value for the organization and we're going to be much more agile going to be much more effective. And we create way more value for the organization. And when we get to outcomes like, for example, one of our customers that data.world is a company one web. If you're familiar with SpaceX and their Starlink system which is a constellation of satellites to provide satellite internet. One web also is creating a satellite internet service. And one web actually by building data products through data.world through snowflake and the rest of their stack actually was able to monetize the data on their environment, and they actually paid for their entire data stack. At the end of the 12 month period they paid for their entire data stack by monetizing their data two tables that they had and they monetized it, and it paid for their entire data stack so that's obviously you know more of a data optimization example but that just shows you the power of this data product approach to to not just have conceptual value but actually to turn into real dollar value. What are the metrics. Consider some of these questions like how many of your employees are searching for the data on a regular basis. How many of your employees are doing self service analysis with the data. How many data apps are being built to change the way that the business runs. In various tools right are people you know finding it in the catalog are they using it with bi tools are they using you know your data science tools and notebooks on it. What are the most common types of data that employees are using to deliver business impact. And do you have a data community internally or externally and how many people are active within it. These are all you know different examples of metrics that can help you understand the impact of your data product how much people are using it, how valuable it is. And what is the actual addressable market within your company. One thing that product managers think a lot about is total addressable market TAM right. And so what is the TAM within your organization of this particular asset is it very valuable to a couple of people. Is it kind of valuable but to a lot of people. Is it hugely valuable to a lot of people right there's almost a quadrant here of sort of, you know the size of the audience and the amount of value. And that will really help you get more of an ROI model almost an accounting around your data products, you can almost think of it like hey I have to have a financial model around my data here. And the point of that is not to make our lives harder as data professionals or data leaders right like, should we all start hiring accountants now there's actually some people that think you should I'm not, I'm not going that far. But data accounting is important because we have limited resources we have limited time. And when somebody comes to you with a frivolous request for a new governance process or new policy or a new data product where you're like I don't think we should work on that. And it gives us a framework to say like, look, the data doesn't support that this is going to be useful. It's data about the people doing data right we're getting pretty meta here which is exciting but this is this is how metrics can really help us get more aligned. And I think wearing this product manager hat is going to really empower all of you whether it's a formal hat or just an informal hat. It's a really good framework to think about things. All together so takeaways, consuming data to solve the crucial business problems should be as easy as buying a product on your favorite ecommerce platform hopefully today you've seen through these examples and through this framework, how you can achieve more of this sort of ecommerce type of approach through a pretty simple framework of recognizing and identifying things, as well as documenting them right focus on accountability, boundaries, contracts and expectations downstream consumers and explicit knowledge and not necessarily in that order right as you heard me mentioned downstream consumers if you don't have a consumer of a product not really a product. So all of these are really important aspects the ABCs of data products. And then finally, leverage the best practices of agile software development to create and manage your data products. You can learn from how software product management delivers these software data products in a fast way and an agile way and a high quality way in a responsive way, and apply those same practices to the world of data, both literally in terms of developing your data products, as well as the overarching governance and process around that right. How do I implement just in time governance that moves the bar as much as possible into an empowered model that federates that governance as much as possible to the domains while doing the minimum valuable things in a centralized way to make sure that we're doing things safely and effectively. So to learn more about data mesh governance and around data products and data mesh. We actually have a really great white paper called an actionable framework for governing the data mesh. If you don't want to remember this kind of long URL at the bottom here just go to data.world and go to resources and you can find it there in our reports and tools area. We'll go into some really great practical examples and some more details around how to apply the product ABCs, as well as sort of the best practices around agile data governance and data mesh should definitely check that out. And please come and check out our show catalog and cocktails we talk about topics like the data products, like governance like master data management. And the other day we had a VP of AI from Samsung where we talked about the future of AI. So come and check it out. We'd love to have you it's an honest no BS non sales conversation and we drink cocktails during it so we encourage you bring your cocktail with you tell us what you're drinking in the comments. Thank you so much for your time today. Hopefully this was very interesting to you all and I'll pass it back over to you Shannon to see if there's any comments or questions that we should help address. If you have any great questions coming in and just answer the most commonly asked questions this reminder, I will send a follow up email by end of a Thursday with links to the slides and the recording along with anything else coming in and I recommend the, the podcast here that you're, you're promoting at catalog and cocktails I still can't believe you guys do it live it's so brave. It's awesome. Thank you for the support it's a ton of fun. Thank you for coming in here Tim do data products include the raw data where the data itself is the product. Great question so is the data product inclusive of sort of the raw like data that's part of it right is that the question. Perfect yeah. So the this is where an interesting analogy comes to play so the answer is I would say overall. Yes, right so for example if you have, let's say, five tables and those five tables to go together when joined together provide you customers right that customers data product is is those five tables in combination with the ABC DE that you saw here today in this presentation so that that becomes that data product. The one sort of caveat to that is there's this overarching question that I think is a little bit academic but also very practical at the same time which is like what is a data product and what is like not really a data product. And the analogy that I found to be useful there is think of like an assembly line that's building laptops. And that laptop assembly line might have like circuit boards that are part of it and maybe there's wires that are part of it and there is a bunch of other you know pieces of the frame of the laptop all of that comes together to create the laptop. So is the circuit board a lap is the circuit board a product. Yeah, but the consumer is the factory that builds or the company that builds the laptops right so they procure the circuit boards from somebody. So yeah that circuit board is a is a data. It is a product right but but it has a specific consumer that isn't necessarily the end consumer is the laptop a product yeah the laptops definitely a product. Is the frame of the laptop a product maybe right is like half a circuit board a product. I don't know not probably not because it's not really functional if it's just half the circuit board. So I know that's a little bit of an academic question but it helps you to kind of think like what is and what isn't a product. So within our data environment, I think an important thing is, is every table, a data product. No, I think we should be pretty clear about that to ourselves and across this sort of the space is that like not every table or every dashboard everything's a data product. It's a data product when it has a clear consumer and has some application of these five ABC DE things to it. Tim, are there any metrics on staffing required to achieve the ideals discussed here are there any rules of thumb for staffing planning. That's a very question, Shannon and the rules of thumb here, I think are. First of all, you can implement more of a data as a product approach with the team that you have today, and think about who's kind of wearing this hat, and kind of acknowledge it and give it some kudos and respect and maybe a little bit more right, whatever makes sense given your culture and given your resource constraints and so, like for example, I have found that there are often certain data architects certain data engineers, just as example right that take extra ownership over the data products that you have and those are like those data engineering leaders and managers that are like often like reaching out to the parts of the business, showing lots of curiosity and empathy to the broader business and really taking ownership over the use in the roadmap of data. Those are people that are already kind of wearing this hat, you may find that those are also the individuals that tend to contribute a lot to documentation or are mobilizing the documentation around that data. So we're kind of wearing that hat and let's kind of give them some respect and a nod for the work that they're doing here which really could be a role all its own and they're going above and beyond to do that. Right. However, I would say that from a resourcing position, we're seeing more companies across our portfolio of companies that we work with, as well as in the industry, starting to invest in data product managers, even for internal use cases. They have only a couple right and they're only focused on the highest value most important data products that can be a huge boon because now you've got dedicated resourcing focused on how are people going to use this what is going to be most valuable. What's the roadmap for this and prioritizing that backlog for the broader data team to work on so consider how you could be leveraging folks like data product managers more effectively in your own organization if you're not already. And I'm going to see if I can slip in at least one more question if not to in the last few minutes that we have here. So how do you scope your data products at what quote unquote level are these products typically defined at a subject area for a particular strategy. Good question. I would say simply scope it at the level of the, you know, use that framework of MVP, which is very important in the software around minimum valuable product. What is the minimum valuable product that solves a particular use case or problem right and you saw on that one slide I showed a business question so it can be as simple as how many customers did we have last year. And what is the data set that lets them solve that question that can be your first data product so I would say start small, don't try to be too ambitious. It's good to know your broader domain landscape and so you know, you know I know a lot of companies invest in creating their domain topology and things like that that's valuable Burke you should do that right that helps you create your stewardship system and structure and things like that. But when it comes to data products, don't try to like envision all 200 data products that you're going to have start with the first one solving the first business question. Okay, I think I can slip in one more here for a shared data product with multiple downstream consumers, what are the effective ways to handle conflicting requirements or different prioritizations being requested on enhancements to the roadmap for the data product. That's a great question and if there was a one size fits all answer to that then as big VP of product that they did that world I could just automate my job and just disappear. And I wouldn't even need to exist. This is a hard problem to optimize for right and unfortunately there's no one size fits all answer. I think that's why I'm so excited about more formally recognizing the role of a data product manager, because that's a messy problem multiple stakeholders multiple use cases. You know maybe there's a perception of one use case being you know a multimillion dollar opportunity but it's also higher risk so how do you assess that will ultimately you need to get the right stakeholders in the room. To talk about it, you need to build out that roadmap and you have to have some sort of trust trust that that data product manager, whether formal, or just informally recognized is going to make the right call, and drive that roadmap forward with continuous improvements to that data product So, I know that's a little bit of a non answer but hopefully it gives you a little bit of a framework of yeah data you know product managers have to deal with this all the time in the software realm and the same difficulties yet opportunities exist in the data realm. And thank you so much. There's so many great questions coming in I'll get the rest of the questions over to you but I'm afraid that is all the time that we have slated for this webinar. Thanks to all of our attendees for being so engaged in everything we do really appreciate it as always. Tim, thank you so much another great great presentation love working with you and thanks to data.world for making these webinars happen. Awesome. Thank you so much Sharon and thanks everyone for joining this has been great. Thanks everyone. Have a great day.