 Hello and welcome, my name is Shannon Kemp and I'm the Chief Digital Manager of DataVersity. We would like to thank you for joining this DataVersity webinar, five things to consider about data mesh and data governance, sponsored today by Data.World. Just a couple of points to get us started, due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them by the Q&A section, or if you'd like to tweet, we encourage you to share highlights of questions via Twitter using hashtag DataVersity. And if you'd like to chat with us or with each other, we certainly encourage you to do so. And just to note, Zoom defaults the chat to send to just the panelists, but you may absolutely change it to network with everyone. And to find the Q&A or the chat panels, you may click those icons found in the bottom middle of your screen. And as always, we will send a follow-up email within two business days containing links to the slides, the recording of this session, and additional information requested throughout the webinar. Now let me introduce to our speakers for today, Tim Gasper, Won Cicada, and Paul Gams. Tim is the VP of Product at Data.World and co-host of the web show and podcast catalog and cocktails. He previously served as Director of Product at Dan Rain, Head of Product and Marketing at Bitfusion and VP of Product and Global Offerings Manager at InfoChimp. Tim has over 13 years of product management and product marketing experience and is a writer and speaker of entrepreneurship, lean startup methodology, analytics, and AI. Won is the Principal Scientist at Data.World. He joined through at the acquisition of CapCentre, a company he founded as a spin-off from his PhD research in computer science from the University of Texas at Austin. His goal is to reliably create knowledge from inscrutable data. His research and industry work has been on designing and building knowledge graph for enterprise data integration. Paul has worked in development and technical sales with a database system for over 35 years, and he has been at Snowflake since 2017. He is currently the technical lead for Snowflake's data governance and data security partners. And with that, I will go to the floor to our speakers to get today's webinar started. Hello and welcome. Awesome. Thanks so much, Shannon. We really appreciate you and the DataVersity team giving us that warm introduction. And I know that we're very, very excited to speak to you all today about data mesh and data governance, coming from our unique perspectives, both from Data.World, working in the catalog space, as well as from Snowflake, really providing that data cloud. So thank you for having us. And Paul and Juan, anything you want to chime in as we kind of kick things off here? Yeah, thanks again for having us. We're really excited to talk about data mesh and the importance of data governance and excited to talk to all of you today about it. Likewise, I'm very excited for this conversation and also to be able to provide our point of view. I think data mesh is something that we're all hearing about and our goal here is to be able to provide our honest, our bold point of view of how we should be thinking about data mesh and data governance. So let's kick this off, Tim. Yeah, that's a great point. There's a lot of hype and interest around data mesh, and we need to bring clarity to it. And hopefully this webinar really does that for you today by focusing on five key aspects that you may not have considered. So just before we jump into the content really quickly kind of introducing who we are and why we're kind of talking to you today. Paul, do you want to start us off with with Snowflake? And then I'll talk a tiny bit about data.world. Or just for those of you that haven't heard of or don't really know what Snowflake is, it's we call ourselves the data cloud. And that's really what we are. It's a single platform, lots of workloads that all run on that single platform and the goal being no data silos. All your data accessible by all of the users for all of the analytics that you need. Awesome. And data.world is the modern data catalog. We are 100% cloud based, built on a knowledge graph, really tapping into that technology that companies like Facebook and Netflix use in order to understand and make their data very valuable. And we're trying to make data discovery, governance analysis easier. And I think what's really exciting and why we love working with the folks over at Snowflake is that when you've got a really great data cloud environment and you've got a great data catalog, all sorts of things are possible to improve access to and the power of your data, including really helping address needs around data mesh. So we're very excited to talk to you today about some best practices and bringing some of our customer experiences into the mix as well. So why data mesh? Why are we talking about this today? And why is this such a big deal? Well, it's clear that over many, many years here as we've tried to wrap our arms around our data and as big trends have happened such as in recent years, things like big data and machine learning that simply addressing data as something that we kind of have to just put in one place and assume that the right thing is going to happen just isn't happening. And it's not necessarily a technology problem, although there are factors that affect our challenges around data that are technology related. Actually, a lot of it is a people in a process problem. And so in just the last couple of years, this idea of data mesh has really come to the fore and really put a spotlight on how data in terms of technology, people, and process all has to work together. And the big problems that data mesh is trying to solve is that, first of all, when you take a monolithic or an isolated approach to data, that it doesn't scale, particularly socially. And often data is being treated as an afterthought. You may have heard of the sort of adage like data exhaust or even data is the new oil, which takes a very sort of narrow and singular perspective around data instead of really thinking about how to take data and make it a first class citizen. And why do we care? Well, the number of systems is exploding. Think of all the different SAS tools you have to use every day. The complexity of data is expanding. And yet the importance of getting value out of our data is more central than ever. And so we want to really prevent processes and teams from becoming a bottleneck for the business. And we want to tap into the value of data more. And so data mesh has really captured the excitement of a lot of folks, both technical and non-technical, on, hey, maybe we can scale data socially. Maybe we can put data in the center. Maybe we can find a way to empower the broader business around the data and really get more value out of it. And we're excited to talk more about that with you all today. And really core to data mesh are the four data mesh principles. Jamak DeGany, who is really the primary leader, the thought leader around data mesh, and has really popularized it, highlights these four principles throughout her works around data mesh. And you can particularly see a couple of the seminal articles, the blog posts, that really kind of started this whole trend there on the martinfoller.com site. And these four data mesh principles are domain-centric ownership and architecture. It's really about empowering the teams that know most about the data to take accountability around that data, own the cleaning and the refinement of that data, and also be involved in the metadata and governance around that, things like compliance, like lineage, like discoverability. Data as a product is the second data mesh principle. And that's really about thinking of your data not as exhaust or just something that you have to deal with incidentally, but rather putting it in the center. And considering the end users, who is using the data? What are they using it for? And is it easy for them to use? Is it discoverable? These are really important aspects to really provide good experiences around data. The third principle is self-service data platform. And this is the most technology-oriented of these four. And that's around having great tools and platforms that empower people to do what they need to do around data. So if you're empowering these different domains to create and develop these data products, do they have domain-agnostic, easy-to-use, low-maintenance tools that let them do that kind of data and metadata work? And can they really repeat patterns? Can they take the best practices of the past and leverage them into the future? And then finally, the fourth key principle of data mesh is federated computational governance or federated governance. And this is really about some aspects do have to be centralized and identifying what those key aspects are that should be centralized like interoperability and standards, constraints, that it's important to make sure that those things are consistent across your organization and really to think of ways to automate these policies. Can you use some of the great tools and technologies and capabilities that your self-service data cloud or catalog provides to really make developing, managing, and maintaining those domains and those data products much easier? And I know Juan and Paul, you all have been working in data mesh quite a bit as well. Which of these principles stand out the most to you? And is there anything that you'd like to add on this particular slide? Yeah, I'll be talking a little bit about federated governance and how important that is in order to really put out the right products, the right data products and make sure that it's governed in a similar way across so that it's all secure and it can all be trusted. It's really important for that to be taken very seriously so that you really can get these data products out correctly. And I want to add is if I had to choose one, the one that I'm really excited about, which is the big game changer here is that second one is data as a product. It's the idea of bringing product thinking into data. We're doing this for so many things. We do it for software, right? We should be doing this for data. And I think we're really excited to share our very specific point of view. We talked about what we will be presenting as the data product ABCs of our concrete way of thinking, what does data as a product actually look like? So very excited about this one. I think this is a game changer right now. Awesome. Yeah, I can't wait for us to get into that. As we talk about these, obviously that fourth one is really important and it affects really all the four principles. Paul, what are the big challenges around governance that folks really have to focus on as they're stepping into this world of data mesh? Yeah, and even the challenges that we, it's not like this is what we see all the time. The fact that the data is everywhere. It's in a lot of different formats. It's in a lot of different systems. It's siloed. It's secured in different ways. If you wanna try to bring all of that data together, there's gotta be a way to govern all of that data. And it's really complicated to understand all of the different challenges you have around all of the data, different lines of business within an organization will think of the same data in different ways and you need some way to manage all that complexity. And then the last one, just doing security and governance and getting it correct. Things keep, regulations keep changing. It's very rigid. And so you really need to make sure that you can understand your data and the policies around it in order to secure it correctly. Yeah, it's one thing to actually wrap your hands around your security environment and then the regulations, they change, they evolve and now you have to do it all over again, right? How can you be dynamic to that? Yeah, and it's really important. So to first know your data, right? So you have to, in order to be able to build the right data products and to govern them correctly, you really have to understand them. You have to know what data you have, how it applies to the business. You have to be able to classify the data and understand what's sensitive, what can be shared, what needs to be shared in a secured way or masked in some way. And then you need to be able to see who's, understand who's using that data, track its usage, track its lineage, know where that data came from. Once you've got that understanding, then you can really protect your data. You can secure the sensitive data either with policies or with encryption or however you protect your data in a standardized way. And then that leads you to be able to unlock your data, allow that data to be shared as a data product without knowing that you're not giving away anything that you really shouldn't be giving away. Yeah, 100%. And I know a lot of people talk about find, understand trust. And actually, I kind of like no protect, unlock even better because you get kind of an additional aspect around protect and actually getting access to your data securely but in a democratized way. Right. So, governance has, as you've heard Paul mention, it presents a lot of challenges. And on top of all of that, a lot of times we try to approach this in a very top down way, in a cumbersome way. And really with defense in mind. And obviously that's critical. We don't want to get in trouble with these different industry or governmental regulations. But if you only focus on that, you create additional bottlenecks which is one of the fears of data mesh that we're trying, the data mesh tries to address to really democratize and push more responsibility out to the different parts of the business and scale things. And so when you think about the future of data governance and what it needs to be, it really needs to be more about the rules of cooperation and collaboration, the process of doing data and analytics work together and capturing knowledge in real time so that you can actually move through that knowledge, protect and unlock flow more quickly, more collaboratively and actually take a use case driven approach. And if you think about how catalogs fit in as well, right? Governance is about discoverability, not just about data protection and catalogs help you understand and trust. You're trying to eliminate silos and catalog environments can help share and make data accessible to everyone. You really want to accelerate time to value, not create more bottlenecks and burdens. And then finally, you don't want to have to install software in order to address your needs around data governance and around data mesh. And I think that's one of the most exciting things about things like Snowflake and data.world is the fact that these are cloud native approaches that are highly available and can scale to all sorts of ways that your organization needs to scale. And so with that, we want to really transition into the five things to consider about data mesh and data governance. Really important points that some of which you may know and some of which may be a little surprising to you. So we'll dive straight into what is the scope of your data mesh? A lot of folks think about data mesh and it can be quite daunting, right? They look at their existing environment and it looks something like this. And if you're like most of us, your environment's actually probably about 30 or 50 times more complicated than this. And that could be pretty challenging when you're thinking about, what should I actually focus on when I'm trying to develop my data mesh, right? And one of the big aspects that first principle of data mesh that's really important here is the concept of domains. The idea that there are topical or functional areas within your business where expertise is consolidated and where you're going to be creating your data products and where experts are gonna take accountability around the data. And this is actually an image that is on that martinfowler.com website that Jamak Degani put together that is actually an example of Spotify. And in the example of Spotify, which for those that aren't as familiar as a online streaming music service, they have various domains in their data and in their organization around artists, around podcasts, which are a very unique medium and a unique area of expertise. They have expertise around this concept of the users, the customers and the users themselves. And then there's the music, the files and the information and the genres of these different pieces of music and of videos that are on their platform. And each of these domains have unique technologies, unique areas of expertise, unique parts of the business that work on them. But each of them has a relationship with each other, so for example, a user profile might need to pull information on the music that somebody is watching or listening to in order to really be complete. And so when you think in your own organization, there are these domains that exist, whether they're functional oriented, maybe they're oriented more around the products that you bring to market, perhaps there are some things that are in the middle or in between. And these domains come together to form sort of the overall area of your business and products actually tie into that. So you see things like O and D in here, these are different products that might be connected to these different domains that they have ownership around. So if we map that to this diagram, you can kind of see an example of this here. And I know that Paul, this is an example that the Snowflake team kind of uses to walk through kind of mapping domains to your data environment. Do you want to walk through some of the main points on this one? Yeah, sure. There's a couple of really important points here. So the first is that the domain is really the, I think the way it's described is the output port. You have a set of data schema that's the data product. And the stuff to the left of that, that's really how I put it all together all the ETL processes or whatever processes I use to build that data product. Each one of the domains can do that differently. And that's not really what needs to be exposed out to the consumers really. The consumer just cares about the final product. The other important part is the point is that domains can be made up of other domains. So in this example, I have a customer domain, I have a help desk and support domain and orders domain and the customer 360 domain is really just made up of all of those. And the expertise there is how to bring those other domains together and make that available to those consumers. Last point is that domain is really just the API. You know, the data product is really just the API. So in a lot of cases that's SQL, right? That's the ability to look in a SQL database, but it could also be even something like a Kafka stream where the marketing promotions domain, which reads from other domains is just building a stream that's getting sent out to certain consumers. And across all of this, which is the part where you might wanna have some centralization is around some standards, some federated governance and the catalog that allows all of your end users to find the data that's in these domains. And something I wanna add to this is there's one, one thing we're gonna be very careful about is what I always say, don't boil the ocean, right? I think we're very tempted to go figure out what are all these domains right now and start mapping them out? I think this is when we really need to start thinking about who's asking to solve their problems at a very kind of crucial state right now. So let's try to find those, identify what are those crucial pain points right now and map those out to the existing domains. And leads me to the second point that I wanna make is we're not gonna go create these domains. These domains already exist implicitly. So it is part of the work here is to identify what are those existing domains? And depending on your organization, maybe they'll be easier to identify, they'll be self identifying, maybe there's gonna be some stuff that's gonna be a mix. And I think this is what helps us realize that data mesh is a social technical paradigm shift. This is the social aspect we go do. So two things to remember. One, don't boil the ocean. And second, these domains, you're not gonna create them, you're gonna identify them. They already exist within your organization. I think that's a great point. Like Paul, I loved how you talked about, you know, these products can, doesn't, and not everything is a product and products can be many different things, right? It can be an API, it can be a data set, it can be a data mart, it can be many different things. And Juan, your point around don't boil the ocean and identifying domains is really important. And you'll probably get it wrong, right? A lot of companies, they start off thinking their domains are one thing or that one particular domain or use case is the most important. And then they find out halfway through or partway through that, you know, that it doesn't make sense. And it's important to be dynamic to realizing that. Now, as you think about not boiling the ocean, you know, it's important to think about how you approach your scope and how your domains can affect your data catalog, right? And so if you're really focused on use cases that are more on the right side of that chart or more focused around sort of data consumers discovering assets, then perhaps what you're doing is developing more of an analytics catalog. And that has implications on sort of the scope of what you include there. Maybe you're really looking more at sort of the observability of the pipelines and things like that. And you're trying to understand more your upstream or your source domains than perhaps you're doing more of a data platform catalog. Or really maybe you're trying to kind of be comprehensive and cover everything and really get even more into something like compliance and sensitivity and things like that. Perhaps you're building out more of a granular resource graph. Obviously, you know, project the terms onto this chart that makes sense to you, but this gives you a little bit of an idea of, hey, am I starting from the top of this here? Am I starting from the bottom? How do I make sure I approach this in a way that really doesn't try to boil the whole ocean? So that brings us to our second key point here. Who are the key stakeholders when it comes to data mesh and to data governance? And for this section, we're actually gonna keep it pretty straight and simple here. We have this chart here of some of our different key stakeholders that we see that are really relevant to implementing and maintaining a data mesh and a data governance type of solution. You see at the top here data leadership, right? It's so, so important to have data leaders involved in the initiative in the program because if you don't have that buy-in, we see this over and over and over again. They, there isn't enough tying it to the strategy and there isn't enough prioritization to really make sure that it's front and center and there isn't enough of a push to really affect the culture because there's a ton of change management and culture shifting that has to happen to really move to more of a mesh mindset to really think about data as products and to really think about discovery and knowledge first as you're building out your, your data mesh and catalog type environments. It's important to think about the interplay of consumers versus producers, right? There are the people producing the data. There are the people consuming the data for various use cases and it isn't always clear cut. In some use cases, a consumer becomes a producer. Somebody does some prep on their laptop and then republishes a new modified data set that consumers now are a producer. So this interplay is important and people wear different hats at different times. You then have folks that are a little bit more in that sort of governance function more formally, right? Whether it's data governance people, privacy folks, security folks, some of these folks may make sense to be centralized and be really more a part of that centralized approach to standardization. But if the more you can push responsibilities around governance, privacy and security into the different domains, the more you're gonna tie the implementation of these things to the unique aspects and use cases of those different groups and of those different domains. A lot of times that top-down approach is a little too divorced from the reality of how the data is being used, what's the actual meaning behind the data. And you can actually achieve not just better access to data but better security and governance if you bring that responsibility closer to those different groups. And then finally, you've got platform teams, folks that are working on the underlying technology and working on things like automation, right? Data Ops and DevOps type approaches to really make governance more automated and leverage some of the great capabilities that things like your Snowflake or your data.world platform might provide. Juan and Paul, anything that you'd wanna add on sort of the stakeholder side of things? One thing just on privacy and security is just making sure that you get the right people involved early in the process. I know for a lot of customers that we work with who are migrating maybe they're on-prem systems to the cloud or starting to bring in additional data sources that have maybe some more sensitive data. The earlier that you start understanding what the requirements are for storing that data and making that available, the shorter your timelines will be before you get those data products out and the easier it'll be for everyone. Important point to realize is that there's many stakeholders here, right? So it's important to identify who are those stakeholders? And so actually a question to the audience that I appreciate if people wanna go chat and have the conversation is what other stakeholders would you consider that are not on this list? So I'm very curious to see what the audience has to say. Great question. Yeah, feel free to chat into the chat there. This brings us to our third point here, which is a pretty meaty one, which is where should we standardize and productize data, right? Okay, so I'm convinced of this domain approach. I'm convinced around data as products, but what does that really mean? What really is data as a product? What really does it mean to standardize and where do I standardize? So with that, I'll pass it over to Juan to talk a little bit about data products and how to standardize them. Yeah, so if we go back, we look at kind of one of the first writings from DJ Patel who's a former US Chief Data Scientist he has a book and he describes the data product as it facilitates an end goal, right? And then Shamak DiGoni calls it as it is something that provides, it's valuable and usable. And the way I like to think about it is the same way we do shopping on our favorite e-commerce website, right? You go into that e-commerce website because you have an intention of something that you want, right? Something that you need and you're gonna go buy and you go search for that. And then at the end, you find a bunch of results. The platform gives you ways to be able to go navigate that, filter things out. It will give you the ratings, reviews, it gives you sponsored content. You can click on things, you can see how that product is being used, who bought that product with another product and so forth, you can see the feedback and so forth. What is important here to think about is that same kind of experience that you may go through when you go buy something on an e-commerce website, we should be thinking as data as that same type of experience. So what we have been working on together with Tim and folks at Data World is what we're calling the data product ABCs, right? And this is a framework that we're putting out there to see what does it mean to have a data as a product? So A is accountability, who owns this? Who is responsible for this data product? Who fixes it when something breaks, right? Who's taking an account, following up with all the feedback? Who's defining the requirements for what this data product is and so forth? B is boundaries. What is this data product? What isn't it? What is it supposed to be? What is it planned to be? What is it not? Where is it going to live? Is it going to be gonna have a SQL interface, right, API interface and so forth? What are the inputs and what are the outputs? That's boundaries. C is for contracts and expectations. What are we expecting from this data product, right? What are the constraints? What are the tests that are being done about it? What are the quality guarantees that are being done? What are the SLAs, SLOs around this stuff? How should this data product should be used? How could it be used? How can it not be, how should it not be used? What is a performance, right? How is this maintained? Is this being updated? How often is this real time? Can it be used for different types of scenarios? These are the contracts and expectations. D is about downstream consumers. Who is actually going to be the consumer of this, right? This is a very, very important one because we're not creating a product just for the sake of having some data out there. We need to understand who are the consumers and users of those products? What are the use cases? What are their expectations on what do I need today versus what they're gonna probably need tomorrow? Who are the potential consumers of these data products? And E is about having explicit knowledge. I think this is one of the parts that we are missing in this current world that we live in of what I call the data first world where it says we need to go start shifting into this what I'm calling this knowledge first world where we have knowledge as first class citizens. Let's understand what is the schema around this data? What does this actually mean? How is this documented? Give me examples. How is this data product related to other data products? So this is in essence how we're seeing a framework to start establishing what should a data product be? So when you're thinking about data as a delivery as a product and bringing the product thinking think about the accountability, the boundaries, the contracts and expectations, the downstream consumers and the explicit knowledge. Back to you, Tim. No, I'm so excited about this framework because a lot of folks are very confused about data products, right? And they're like, what does it mean to have a data product? This gives you a roadmap to really say, hey, like, do I know these things about this data product? Can I define them? If I can't, I should, right? And it also puts a little bit of rigor around data products, right? So because there's this anti-pattern where it's easy to just say, oh, well, that table in this database, that's a data product and this table over here, that's the data product, right? But if you don't really have ownership around it, if you don't really have expectations defined around it, you don't really know what the right use cases are that are supported versus not supported. You haven't documented it, it's really hard to say that that's a product, right? So if you're gonna have less products and better products, that's a good thing for your organization. And these data products can be different sort of shapes and sizes, right? You heard about things like an API being a data product, a data set can be a data product. Even things like aggregations where you're combining different data products together to create either new or just combined data products, these are all potentially data products and they all can benefit from having this kind of ABC approach applied to them. And I think what's really exciting, and Paul's gonna go into this, is thinking about how this fits into your own data infrastructure and some of the considerations around how to implement this practically from an architecture perspective. So Paul, over to you. Yeah, here's an example. This shows the Snowflake data cloud, but it really could be data products coming from all kinds of different places. What you can see, I love the little owl, so that we know that data.world is, they're cataloging all the assets across the organization, including the external sources that are used for the data products, for the different domains, but it's all centrally managed by the catalog or by an exchange or in combination of those things so that everyone gets access to all of those products. Go to the next slide. And the nice thing, or when you're thinking about these, then you start thinking about how do I group these products? Are they grouped by, if they're in a database, do I group them by database? Have a database per domain? Do I have schema per domain? You also, in the Snowflake data cloud, you also have the ability to have each domain in its own Snowflake account, and they can each even be in a different cloud, in a different region. It doesn't really matter. The consumers can be in different clouds, different regions. The goal is that I know where all my data is. I've got a centralized way to find it, to discover it, make it available to all my consumers, and that they're all using the same standards around governance and security. One of the things I wanna chime in here is to note what I call the two lenses of a catalog. Actually, you can see here the two owls of data.world, one outside and one in the inside, and that's on purpose. So for the outsider one, I call this, you wanna be able to think about the data catalog for the audience of a data producer. So they're gonna be cataloging that the raw technical data from all those sources and the data teams, right? The teams that are for each domain that are taking all those data sources to be able to go deliver that data product, they need to go use a catalog to understand where all the tables and columns are my sources, right? Where is the sensitive data? Let me understand the lineage about all the existing application sources that you have. Once those domains start generating those data products, those data products need to be cataloged too. And that's the second lens of a catalog. It's the catalog for the data consumers. And that's what you see in the center right there, right? You wanna have that inventory of all those data products. So when the consumers come in, they go into the catalog, they're gonna go look for those data products that they're actually gonna be able to go use and consume. Very similar to that same experience that you're having on that e-commerce website and you're finding that product that you're actually gonna go buy. So there's two experiences of the catalog. So one is for the producers, which is gonna be much more technical to be able to do things like technical inventory, having lineage, having sensitive data and so forth. And then that second lens of the catalog, which is here are the data products that you have those ABCDEs, right? You know exactly who's accountable, you know what the roadmap is and how it can be used and how it's connected with other data products. So that's something very important for us to realize here. Yeah, and that's a great point because for on the left side, you've got your data producers, your data engineers that really need to take raw materials and turn them into finished goods that can then be used by the consumers. So I love that analogy, Paul. I think that's the way to go think about it, right? So you have the engineers, like the producers who are taking those raw goods and they're turning it into a product and that's cataloged into the lens for the consumers. Yeah, I think that's a great point. Regardless of whether that data product lives a little bit further upstream or it's something that's really, really sort of at the tail end very end consumer facing. I think what's also exciting about this is really trying to separate technology as an enabler, right? So one aspect of that is sort of the global nature of how you can implement something like Snowflake, not just global, but also multi-cloud. And I know on this next slide here, there's also a bunch of specific pieces of functionality that work really well in conjunction with a catalog offering that Snowflake has that helps to implement this in a well-governed way. Yeah, and the catalog is really key to a lot of this. So on the left, we talked earlier about knowing your data. And in Snowflake, we have the ability to tag items as sensitive data or really, you can tag it in any way, automatically classify sensitive data. You can use within Snowflake, you can see what, how objects were built, you can see the lineage, you can see what are dependent. So if I have, for example, a table, I wanna know what views, what stored procedures, all those things, what's dependent on that object. And then I can see who is actually accessing that object. And that's really important. If I'm a producer, I really do wanna know who's using the data that I produce, what's useful to them, what's not useful to them. And we just trying to make it easy and then that's all surfaced in the data catalog. For protecting your data, there's policies at the row level, who can, there may be some data that, for example, people in the US are allowed to see some data that people in Amia have different data that they see. You can mask data, you can tokenize it so that you basically encrypt particular sensitive items or mask based on a different column. And all of that allows you to securely share that data. And when we talk about secure data sharing, and in the picture before, it really means not moving the data around, but keeping it all within that data cloud and sharing it live so that as it changes or when that, as the new feed comes in for the data product, the consumer see it in real time. Yeah, I think these are really powerful capabilities here. And in our previous conversation, Paul and I and a couple others were chatting about, when you think about whether it's data mesh or really other types of architectures as well, Snowflake can really serve as, or in general, like your data lake, data warehouse, data cloud type environment can serve as that sort of computational platform environment. And then something like your catalog, like data.world can be that sort of experience layer that helps you find, understand and surface a lot of this information as well. So that's a great sort of combined approach to being able to address sort of how do we standardize things, how do we build these products and then how do we make these things be discoverable, accessible and protected all in a very smart way. So we've got two more key points to hit here and we'll move a little quickly because we've got about 10 more minutes of formal content here before we move into our Q&A. So the fourth point is who is responsible, right? We mentioned it in those ABCs of data product but this is a really key question. So we're gonna really kind of zoom into it and bring it front and center. And whether you're gonna call them data stewards or data owners or data producers, data custodians, data trustees, side note, there are so many terms for what you call people that are helping with your data and taking ownership and accountability around data. Personally, I'm a big fan of the data product manager term even though it's a little newer and still kind of coming into the limelight. Really it's all about accountability, right? And accountability ideally is the fewest number of people possible, right? Because the more you spread accountability and responsibility, it's kind of like who's on first, right? So it gets more complicated but some of us work at much larger or more complex organizations. And these questions like, who defines the requirements versus who fixes it versus the roadmap versus the expertise can be sometimes separated. And so you may see that even though maybe the owner is the data steward or the data product manager that maybe the data custodian is really the person who fixes it when it breaks, right? And that is sort of like a data engineer or an IT person or DBA or whoever that might be, right? Who has the expertise, maybe the steward and the data custodian have some of that expertise but you've got some key subject matter experts, some SMEs that are the experts around that data. So think about in your own organization who are the people that fit the answer to these questions and what are the fewest number of critical hats needed in order to really accomplish appropriate comprehensive, smart accountability around your data? And one thing that I would really encourage you all to think about is consider how to minimize the amount of sort of middle people in this overall flow, right? One of the things that data mesh is trying to move away from is the idea that there has to be this bottleneck of sort of the data engineers that work in a centralized team that are trying to grapple with all these inbound domains that are sort of on the producer side and all these like outbound consumption patterns from the data consumers. And basically they're a ticket taking organization that is not given enough context, right? How can you get more direct data producer and data consumer collaboration where engineers are either working on that centralized data platform, they're working on that set of self-service capabilities that empower the producers and consumers or they are embedded with the domains, right? They are working side by side with the producers. Maybe they are the producers and they're the ones working directly with consumers. So think about how you can remove some of this playing the telephone game when you think about accountability and responsibility. And if we look back at that architecture example that Paul showed earlier with the domains and mapping them there, you should also think about how not just can you associate responsibility with domains but can you shift some of that responsibility more to the left of this diagram, right? So for example, rather than having a bunch of customer expertise down on the right hand side, if you know that you've got a customer system, that domain customer on the top there where a lot of that customer sort of expertise and the actual people who are using your CRM or other systems where that's where they're operating, can you shift more of that responsibility to them and empower the people that are making the data? Not only are they often gonna have that expertise, but also when things break, they're the ones who have to fix it, right? And the more that we can associate the SLOs, the SLAs and the sort of the value, the operational value of the data with the folks that sort of manage those systems, the better that we're gonna have data reliability and minimize data downtime across the entire organization. One thing I do want to add is that we have to, as you said, Tim, and I agree, we want to be able to kind of shift ownership towards the left, but that may not always be possible, right? So it's always the answer of it depends where this goes. If you are going to go, if you were a domain or creating products that is combining existing data products, then you're probably already actually gonna be living more towards the right. So I think it's, again, we have to understand kind of the social dynamics within our organization. Where do we have technical expertise? In an ideal world, I think we will start seeing domains like what we're seeing here that will have their own data teams. So you'll have a data team associated to each domain. But your starting point is I may have be very centralized and I'm starting to go decentralized. So it will be little by little this shifting. The moment that you start seeing more data teams live in each domain, that is when we will start seeing more of this shifting towards the left. But again, it is kind of depending on the culture of centralization and decentralization that you are within your organization and what you're trying to aspire to be more centralized or more decentralized and finding that balance. I think that is also one of the key takeaways that we want you to have is you need to be able to look for that balance between centralization and decentralization. And I think one other really, really important point is also going back to what's all the way on the left, which is the data sources. Traditionally, we've all seen where we try to take all those data sources and then figure out what all that data is. And really what data mesh is doing for you is by having data products, they're letting the owners or the domain experts really figure out those sources, pull them together, make them usable by consumers. And that's really why you want as much, move to the left as possible, where you can. Yeah, great, great points here. And hopefully this provides you a bit of a recipe book to approach this within your own organizations in terms of where are your domains, how do they map together and where can responsibility be associated so those people that are working with that data on the left-hand side can help to make the lives better of everyone in the organization, including and especially the folks that are to the right of them. Last point here, how to be agile. And one of the things that we want to recommend as a big sort of final takeaway here is that you can apply agile software practices to your governance and your cataloging approach as well. And one of the things that we want to advocate is this idea of agile data governance, the idea of creating and improving data assets and improving your knowledge and management of data in a collaborative and iterative way. This is about empowering the usage of data safely. And it adapts the deeply proven best practices of agile and open software to data and analytics. And we won't go into the details of this, feel free to go to data.world and check out, we have like an ADG white paper and some webinars and things, but this is kind of the chart that we really direct people towards that is sort of a non-vendor specific approach to how to iterate around your data. And it's about building a use case backlog, really identifying the key questions that's gonna help you understand all sorts of things, like who are the consumers to focus on, what are the domains to focus on first, curate those data assets or those data products and then focus on really creating an environment where producers and consumers can work together and then take those learnings and feed it back in, right? So try to get those data products out there quickly and then make them better and better and better. Don't boil the ocean, don't try to take three years and a massive waterfall approach to trying to address your governance needs because ultimately, if you take that waterfall driven approach, this top sort of set of bars here can take years and sometimes you never get there, right? They say the average tenure of a CDO is about two and a half years. And if you can't get your catalog or your governance program in place in two and a half years then the reset button gets hit, right? So you have to take this iterative use case driven approach. So with our key takeaways, I'm gonna pass it over to Juan. This is actually a fun sort of connection to what we do in our catalog and cocktails podcast where we end with our takeaways. So Juan, take us away with your takeaways. All right, well, so we first discussed on what is the scope, right? We are talking about what are they, let's identify those domains. And guess what? You're already doing that work, right? There already exists. So let's go identify those domains and we really want to depend on, identify the priorities you wanna solve. Are you doing more analytics type of work? Do you really need to be able to go all the way to map out all the resources within your enterprises, how they're being connected? So that's number one, the scope. Number two is who are the stakeholders of your data catalog? I think always, always, we need to have data leadership involved. We need to have executives involved in this. And second, there's a plethora of different types of stakeholders, consumers, producers, governance, privacy, security, platform, and there could be more. Please be aware of all those stakeholders to make sure that you are bringing in all the particular rounds that is needed. Third, this whole idea of standardization are the data products. We are presenting here today, our data products ABCs. Next, the ABCs, accountability, boundaries, contracts and expectations, downstream consumers, and explicit knowledge. And this can happen in so many different places, right? At the consumption layer, data management layer, the data producing system. So identify where is this ownership going to occur? Fourth, when you think about who is the responsibility, again, this is when we really need to go focus on the accountability. Who's the owner? Where are the requirements gonna come from? Who's gonna fix something if it breaks? What are the roadmaps? Where are the expertise around these data products? And finally, we need to be agile. We need to empower the usage of data safely. And we can do this by, let's identify the backlog of questions that people are trying to go answer and map them to particular domains. And we go through the whole kind of agile process. Let's do have sprints, peer review, collaborate and iterate. The five main takeaways we want everybody to go have. And then to wrap up and ready for some questions, if you're interested in learning more about data mesh and data governance, we just released this white paper right now. You can go to data world and resources and you can go find all our take on what the data mesh is, right? In addition to the four pillars, we really like to tell people it's data as a product and finding that balance between centralization and decentralization. And you can find more details about the data products ABCs framework that we have in here. Awesome, thanks so much for the great takeaways there and the action step here. And just as we transition to questions here, thank you Paul so much for joining us. You bring such great expertise from the snowflake side. Really excited to be partners with you all. And with that, we'll pass it back to you, Shannon. Thank you all for this great presentation. It's been very informative. And just to answer the most commonly asked questions, just to note, I will send a follow-up email to all registrants by end of day Thursday with links to the slides and links to the recording along with anything else requested throughout here. So diving in, and if you have questions for any of our speakers here, feel free to put it in the Q&A section. Is data mesh suitable for organizations with low data management charity who are just starting out on their data governance journey? Is it too big a leap to start with the, if the organization is very siloed currently and data management is immature? I think that's a super excellent question. It is one that I think not just the three of us are thinking about, but really the whole industry is thinking about related to data meshes. When is it right to do data mesh? And when shouldn't you do data mesh, right? And obviously there are gonna be some situations where maybe you're a really small company or you're just looking within a particular department and you're like, hey, you know what? I don't really have the political sort of position or situation right now to be trying to push data mesh across the organization, or maybe it just doesn't make sense yet because of the scale of what you're doing. And that's okay. Not everyone has to do data mesh. Not everyone has to break it into 10 or 20 or 100 domains. Like, you know, do what's right for your organization I think is really key. I think that being said, what's really exciting about data mesh in my mind and Juan and Paul, I'm curious if this kind of connects for your own philosophies on this as well is that data mesh ends up being a pretty scalable model. So even if you only really have one or two domains, even if, you know, you're accountable or responsible individuals are pretty consolidated. There's only one name associated with each of these products. Even if you only have two or three products to start, that's okay. You can start small. The model actually kind of facilitates starting small and growing and sort of this organic method over time. Paul, Juan, would you kind of say the same thing there or add to that? Yeah, the key is really about when to get started. You can, I think you put it perfectly, Tim. Starting small with, you know, even a single domain or just to get the methodology right, just to really understand what's important around governance, around defining the products. Who are the, you know, who are the consumers? Who are, you know, what are the use cases? All of that is, you know, we're calling it data mesh. It's, you should be doing it anyway. And if you do it in such a way that you can push the producers down to the people that really have knowledge of the data, then that really is important and helpful. Yeah, I wanna add to this. First of all, let's actually kind of, from my honest perspective, let's take the word data mesh out for now and replace the question where data mesh here is, should I treat data as a product for my organization? I think you should. I think everybody should start treating data as a product. And I think everybody should start bringing in product thinking into data. Now the question is, how do we start doing this? I think this is where, if your executive or kind of leadership says, we need to start treating data as a product, that's the type of kind of leadership mandate that you wanna go have. That's one thing. Other techniques, tactics that we've seen is, you know what, there are people who are really desperate to get their answers to their questions and they probably have the famous shadow IT. A lot of this, I think it is an opportunity to take those folks who are already doing that work who already have that type of knowledge that's going on and saying, hey, let's actually, let's take this ad hoc process and let's go figure out what that process could look like to start generating products that we can start standardizing a little bit what this product is. And it becomes that first kind of what I call an iron thread and it becomes an example for other places. This is a paradigm shift. It is not just about technology. It's a social technical paradigm shift and change is hard. And I'm talking to one of my colleagues. I remember it's like, when we wanted to go to the moon 50, 60 years ago, people were crazy about that. Who are the crazy people who wanna be the astronauts in your organization who are like, I'm up for the challenge for this change. And those folks go find those folks who are ready for that and test this out first with them. Bring in treating data as a product and have it to be as an example so people can start getting, starts kind of the snowball effect. I love it. And there's been a lot of requests for the link to the white paper and I put it in the chat there so everybody has access to that as well. And there's also the link for the e-book as a request. So, and I will be sure and get these links to everybody in the follow-up email as well. If you have additional questions, feel free to submit them in the Q and A section. We've got just a couple of minutes left here. So I'm not sure Tim or one or if you can answer this question is, is Microsoft's common data model also using data mesh architecture or is it two different concepts? I can talk about that. So it's, so a data model is a part of it, but a data product is really around the model and the data. So you might use the data model as a building block for your domain, if there's a common data model for example, for finance or banking or something like that. I might use that in building my data product but my data product actually includes my data and all the sources and how I turn it into that finished product. Yeah, I think that's a great point, Paul. The one thing that I would add is that, the common data model was kind of developed with originally sort of the way that Dynamics 365 is put together. So it's a great sort of starting point if obviously if you're using Dynamics as your sort of CRM, it's also a great inspiration for like how you'd might design your domains or how you might approach a semantic layer to some of your data. Like if you're thinking about what my glossary might start to look like how might I organize it, that kind of thing. So I think it's great inspiration for that kind of stuff, but obviously one size rarely fits all. So figure out what makes sense for your own organization. Yeah, and I want to add to that. This has to be very carefully. Let's not boil the ocean, right? I think it's a great inspiration these existing common data models. And I think once you start finding the centralization versus decentralization kind of balance for you, there are things, there are models that you want to be able to go centralize. For example, our definition of what is a contact information and telephone numbers. There may be regulatory reasons we need to have that centralized, right? For GDPR and so forth. And then different domains can reuse those existing models and reuse them as is, they can extend them themselves. And then also domains can probably will create their own models. And then what we also see is that there's gonna be friction between different people creating different stuff. And we say, that's fine, embrace that friction. The world is complex. We cannot expect to go simplify it. And that friction is probably significant because that there is some energy going on and that's where we should go to get some focus on. And that's how we know that there's priority. So I think it's again, finding that balance between centralization and decentralization and don't boil the ocean. I love it. Well, thank you all again for this great presentation. Lots of kudos going on in the chat here and lots of requests for the links. And again, I'll get both, I didn't get the chat, stick at the ebook into the chat there for y'all, but I'll get that in the follow-up email. Again, which I'll send out to all registrants by end of day Thursday with links to the white paper or the ebook, the slides and the recording from this session. Thank you all and thanks to data.world for sponsoring today's webinar. And thanks to all of our attendees for being so engaged in everything we do. We just loved it. Thank you guys so much. And we all have a great day. Yeah, thank you so much for having us. Thanks a lot. All right, cheers, everyone.