 Here we go. Hello and welcome. My name is Shannon Kemp, and I'm the Chief Digital Officer of DataVercity. We'd like to thank you for joining the latest installment of the monthly DataVercity webinar series, Advanced Analytics with William McKnight, sponsored today by Informatica. Today, William will be discussing competitive analytic architectures comparing the data mesh, data fabric, data lake house, and data cloud. Just a couple of points to get us started. Due to the large number of people that attend these sessions, he will be muted during the webinar. For questions, we will be collecting them via the Q&A section. And if you'd like to chat with us or with each other, we certainly encourage you to do so. And just to note, the Zoom chat defaults is sent to just the panelists, but you may absolutely change that to network with everyone. And as always, we will send a follow-up email within two days, containing links to the slides, the recording of the session, and any additional information requested throughout the webinar. Now, let me turn it over to Vivin from Informatica for a brief word from our sponsor. Vivin, hello and welcome. Thank you, Shannon, and good day to everyone. So let me actually start by sharing the screen. And Shannon, if you can just confirm, you can see my screen, then I can get started. Yeah, looks great. All right. Good morning, good afternoon, good evening, everyone. My name is Vivin Nath. I lead Cloud Data Engineering, product line at Informatica. I want to take about five to eight minutes to talk about some of the data challenges many organizations face, and then double click on the Lakehouse architecture before handing it over to William. So let's get started. For organizations looking to drive digital transformation, data maturity plays a big role. And most organizations start by asking, first, like the descriptive and diagnostic questions based on the data. And as the data maturity improves, many organizations strive to be more predictive and prescriptive by leveraging their data. But many, many don't even go beyond the descriptive and diagnostic stages. And they don't end up reaching the autonomous decision-making stages that they strive for. And in fact, only 20% of the companies are successful in their digital transformation journey. And why is that? Because many organizations have silos and applications. They have silos and data and people that creates this friction. For instance, an organization's ERP app does not share data with its CRM store. And the organization's structured data could be in a relational database, whereas the unstructured data could be in an object store. And both of them have different views of the data set. Likewise, organizations have many different teams and many different lines of businesses that all use different technical stacks. And all this creates friction and silos. That's why to break down the data silos, the application silos, and the people silos, at Informatica we launched Intelligent Cloud Data Management Platform, which provides a single unified platform for all your data management needs. Informatica Clouds breaks down data silos by connecting all data sources to data targets where there'd be data lake, data warehouse, database systems that could be on-premises or in the cloud. Likewise, Informatica Cloud helps break down people silos by democratizing access to data for all data consumers with end-to-end data governance. And we also break down application silos by connecting all applications across on-premises, SaaS apps with over 250 plus native connectors. And to enable the data democratization across the organization, it is also imperative to cater to different users with varying skill sets and data literacy. And that's why we offer no-core tooling for data analysts and low-core tooling for data engineers and data scientists. We also make it easy to get started with our no-core data integration offering with CDI-free and pay-as-you-go offering. So if you want to start small, you can start with the free offerings. And if you have more advanced data engineering, big data use cases, we offer advanced cloud data engineering offering as well. And if you're a data scientist or you're a data engineer and you are more comfortable with low-core tooling, we provide InfaCore, which provides a lot of the Informatica Cloud capabilities in a low-core manner where you can consume it as part of your favorite notebook experience like Jupyter Notebook, for instance. And then finally, we have the model surf capability where you can operationalize any machine learning model on the go. Now, IDMC is, we understand there are many different data architecture patterns. And IDMC provides you one platform that can be used for any of these data architecture patterns, whether it be lake house, data mesh, or data fabric. In the interest of time, I will not go through all the data architecture patterns, which William will be covering, but I'll just double-click on how you can use IDMC with lake house pattern. Now, a lake house architecture pattern, as you know, is a relatively newer architecture, right? That combines the use of data lakes and data warehouses. And the lake house concept involves bringing together both the data lake and the data warehouse. It tries to abstract away the storage layer for all your data management needs. So you can use one data layer for that. It kind of decouples the storage from the compute and thus allowing you to scale, to have as many concurrent users as run as many concurrent queries while allowing organizations to keep ingesting more and more data and store any amount of data, right? So let's look at how Informatica's cloud offering allows you to implement a lake house architecture. And that can be used for your BI analytics or data science at ML applications. So the orange boxes is where you can see how you can use the Informatica cloud products. So first is, before loading the data, the data practitioners need to understand the origin of the data. They need to understand the data attributes, the relationships, the lineage for better data governance. And it will also help you provide a complete picture for the business. And this is where our data catalog and governance products helps you to discover and identify the right data. And once you've identified the right data, you typically have a lot of different on-premises, SaaS, streaming sources. So you want to ingest and replicate the data from these applications and sources to your data lake. And it is important to note that it is not enough to just perform bulk load as these source systems are usually live and you have changing records on the source system. So you need to also capture the change data capture. So remember when you're doing replication, think about both ingestion as well as CDC, which you can use with our data ingestion and replication product. And once your data is ingested into the data lake, you might want to transform it, you want to enrich it, you want to cleanse it. And this usually involves loading data from the landing zone to the data enrichment zone to the enterprise zone. These also have or go by different terms like some call it bronze tier, silver tier, gold tier, right? And you can use our ELT on Spark or ELT on SQL capability, which is available as part of our advanced data integration and data quality offering. And in many scenarios, you may load the cleanse data set for downstream business analytics and reporting into another cloud data warehouse. And if so, you can again leverage our ELT capability on SQL offering with advanced push down capabilities of Informatica cloud. Many of you may also have data science and ML use cases and for which you are consuming the data from the data lake and you could be using say a Jupyter notebook. So you could use something like a info code, which gives you all the capabilities that I mentioned, like many of the out of the box capabilities of data integration in a low code manner. So you can consume that as part of info code from your notebook interface itself and run it along with your Python code, for instance. And lastly, if you want to operationalize machine learning models with the data fed from your data lake, you can use a machine learning capability, just the model serve capability, which will help you to operationalize any machine learning model at scale as part of your data integration pipeline itself. Now, I want to leave you with few assets and guides before handing it over to William to talk about the different data architecture patterns and feel free to reach out to me or connect with me if you have any questions. With that, I want to hand it back over to Shannon. And then thank you so much for this great presentation and thanks to Informatica for sponsoring this webinar and helping to make these webinars happen. If you have questions for Vivin about Informatica, please feel free to submit them in the Q&A portion of your screen as he will be joining us for the Q&A at the end of the webinar today. Now let me introduce to you our speaker for the series, William McKnight. William has advised many of the world's best known organizations his strategies form the information management plan for leading companies in numerous industries. He is a prolific author and a popular keynote speaker and trainer. He has performed dozens of benchmarks on leading database, data lake streaming and data integration products. And with that, I will get Florida William to get his presentation started. Hello and welcome. Thank you, Shannon. And thank you, Vivin, for that lead in. I will have a wonderful sponsor, a board. And I am excited to be bringing you this hot topic or topics as the case may be. There's seldom a database vendor interaction that I have these days and I have a lot that doesn't go into one of these areas. And I'm pretty sure it's the same for a lot of you. Now, there's been a lot of talk about all these things. Certainly the lake house has been around a while, but it's not like there is a, it's not like this is the hall of fame and ideas get vetted by a committee and eventually graduated into the, okay, this is the data lake house now. This is what we've got to go by. Not like that at all. As a matter of fact, these are, everything I hear is just a consultant or vendor interpretation of these things. And of course, you're hearing mine today. However, I think there are good ideas in these broader ideas. And that's why I'm talking about it. Not to grade you out on whether you're doing a proper data mesh out there or not, but really to bring you some ideas from that world that hopefully work within your environment to meet your business goals, which is what it's all about. And this presentation today is brought to you by Informatica, but also by experience. These are some of the clients that I've had a chance to work with at varying degrees, a lot of insurance and healthcare and financial recently, but a lot of different industries here. And yes, they're all talking about this. So let's us talk about it. These are distributed data architecture patterns. And I've been in the industry a long time. I've been through some things with the data warehouse. I think some of you have been there as well. And you recall the big idea early on was the big data warehouse in the sky that everything went into. And if you weren't doing that, you weren't doing it right, that kind of thing. I think we're kind of going through the same thing now with the data lake. While it's great and we need them, and that's another webinar where I can talk about that, but they're great and we need them. And they have shortcomings just with a straightforward implementation. And so what happened was different shops started to do different things with their data lake that was more accommodating to reality. Much as the data warehouse went through this, only for whatever reason, it took the data warehouse world a decade or to come to some realizations while we're a lot quicker now with the data lake, maybe learning from the experience with the data warehouse. So what are some of those shortcomings that we had to get around by when you implement a vanilla data lake? Okay, they're monolithic centralized, coupled pipeline decomposition and hyper-specialized ownership. So the goal is we wanna get in alignment with the operational systems, which in different ways, by the way, not just with the data lake, but they're on domain-specific boundaries largely. And the idea of central data teams, not just having the data distributed but having data teams distributed, and they are siloed as well. And so we have to really learn these domains before effectively integrating or building out the architecture for that domain. So there's some specific knowledge that is required. And the idea of a global data lake for a large company anyway, is probably not in the car, it's just straight out. So what we wanna do is we wanna accommodate some realities here. Now, there are pros and cons of following architectural patterns. Theoretically, it is science. There are some things that have bubbled to the surface that makes sense. As a matter of fact, I think that in this presentation, I'm presenting all of that or most of that for all of these different architectural patterns. And the rest is specific interpretation by this vendor, that consultant, et cetera, et cetera. But there are some commonalities. I don't think I'm wading into any controversial ground today by talking about them at this level. Some more pros of architectural patterns. Decisions are addressed that you were unaware of. I wasn't thinking I would need that. And here it is in the science, if you will, of the data mesh and so on. And they are understandable. I think they're pretty understandable. I think you'll think so too after today. The cons, you can lose focus on the business priorities. And I've seen this happen time and time again where maybe I as a consultant will get called to build a data mesh. And okay, great. But why are we doing that? And what is the business goal? And sometimes I lose people in that conversation. So keep your focus on the business priorities even as you build out these things. Yes, there's some reason why we call them data mesh, data lake house, et cetera because it forms some commonality within the tech group. But the business group may not be on board with that. They got their business priorities. We need to support them. The cons may not be right for you. You may make a wrong decision. You may think we've got to have a data mesh and I'm going to like to that. And maybe that wasn't the right one for you. So be careful with that again. Pull ideas and it can take longer for adherence if you're really trying to do it. So that's some of the cons. So here I am, a consultant and an analyst saying you don't have to follow all the rigor in the science. Let's see what John McEnroe has to say about that. Serious man, you cannot be serious. Okay, I am serious. I am serious. I am trying to say make sure that you meet your business goals with whatever it is that you're doing. And we are unbuilding here apparently. All right, here we go. These are not mutually exclusive by the way what we're about to go into the fabric, the mesh, the lake house and the cloud. Of course, there are combinations. No one size fits all. As a matter of fact, I find it as a consultant really impossible to sit abstractly from you and your shop. If I don't know anything about you in your shop and say, well, this is right for you. To me, it really depends upon where you're coming from. What technologies do you have implemented? What architectures do you have implemented? And how well are they implemented? What about your skills? Where's your skilling at? And that all comes into play when we're talking about where we're going with our architecture. So again, no one size fits all and probably what's best for many of you is a combination of these things. So let's keep that in mind as we go through the ideas here. I'm going to spend some extra time on the lake house. I feel like that's the entry point, if you will, into these distributed architectural patterns and it's certainly the one that's getting the most discussion out there. So I want to help people get into their lake houses. So what does that look like? Okay, Databricks coined the term data lake house and consequently many other vendors coined their own terms or sort of discouraged the use of the term but I think it stuck. And I think that we're kind of off of that now and it's a thing. So I'm going to go ahead and say data lake house. Thank you, Databricks. And it's really a combination of the data lake and the data warehouse. Yeah, that's it conceptually which is a great idea on the face of it, right? Combining great things but that does create points of integration and which are points of failure as well. So the idea here, the big value here to me is the drill through paths that have been created between the relational database and the cloud storage data lake. And these drill through paths, many of these vendors have been working on for years to make sure that they're right and that they're presenting you with the lake house. As a matter of fact, in a few slides, speaking of the extra time I'm going to spend on the lake house, I'm going to go through what four major vendors out there are doing to give you a data lake house and what you can utilize from them to have your own data lake house. So lakes and sometimes that's just a term used for cloud storage. You've heard me say this time and time again, it's still true. We're still not solid around that. So you can have a data lake as for one purpose, one application, if you will. And that's okay because it's on cloud storage. I hope that made some sense. But data lakes emerge to handle raw data in a variety of formats on cheap storage or data science and machine learning, but it lacked critical features, some of which I already discussed. And these are critical lackings from the world of data warehousing. In data warehousing, you can have transactions if you want, you can clearly enforce great data quality. So the lake house, their lack of consistency does make it almost impossible, excuse me, to mix appends and reads and batch and streaming. By the way, it's a real skill and we could probably spend a whole webinar on this topic but what goes where when you have both? You can't put everything everywhere. What do you put in the warehouse versus what do you put in the lake? And definitely, definitely, definitely some things go in both physically. And so there's a hundred ways you could do that. It's a skill. There are a few key technology advancements that have enabled the lake house, those metadata layers between the warehouse and the lake that are set up to handle the drill through pass. That's number one to me. New query engines were designed on the lake house providing high performance SQL like execution on the data lake. Access has been permitted there and exploited for data science and machine learning tools. And the lake as sort of a endpoint for data, it's not really a system of distribution although some use it for that. It's usually a data lake is the terminal point for data and you don't have to worry about offloading into other systems. So you can do different things with it. All of the major data platform vendors, we're talking the Snowflakes, the Amazon, the Google, the Databricks clearly, all of the major vendors have converged their messaging around the concept of the lake house architecture that takes the best attributes of the data warehouse and enables them to run on platforms with data like storage at architecture, specifically cloud storage. Most queries will in this architecture will originate at the data warehouse and then it'll do what I call a reach through as necessary into the data lake. That has to be smart. That has to get the right data that matches up to the query, which wasn't even originally run on the lake. So that's where a lot of the IP went to make lake houses possible. The principles of the lake house, managing data, all data everywhere, formats, any format and making all those formats accessible easily, adaptable type of storage, actually a multi-tier storage, facilitating the continuous flow of data. So under the covers behind the scenes, if you will, the data is flowing appropriately into the warehouse and the lake on a repeated basis and what's happening at the surface is something else and handling various tasks. So all the open storage formats gotta be possible in this architecture, flexible storage, the ability to separate compute from storage. And that's pretty much widespread now. That makes it easy to scale storage, which is necessary as part of the lake house. Support for streaming. Support for streaming, not just data integration, like from an ETL-ELT perspective, but streaming as well. That's especially useful for the data lake. Diverse workloads are definitely supported in here as part of the lake house. Now, by building a lake house, organizations can streamline their overall data management process with the unified modern platform. The lake house can take the place of individual solutions by breaking down the silo walls and really just making it a lot easier for the in-user to get to the data that they need. Now, what I'm finding is that while it's true, the initial lakes were built for the data scientists and they still largely are, they're being opened up. The lake's being opened up. Some people, some shops out there are creating lakes to sort of replace the warehouse. Now, I'm not in favor of that necessarily, but that is happening. So we're getting more and more users per capita on the lake than we are on the warehouse. Now, they're both growing. They're both growing by leaps and bounds. Don't get me wrong, but there's a little extra emphasis on the lake these days in my opinion, looking at different shops. There's less administration now for a lake house, better data governance, simplified standards, and better cost effectiveness with all of the things that we're able to do with the lake house architecture. So now let's drill in just a little bit on some of the major vendors and what they're doing. Let's start with Redshift. So what Redshift has is they have external tables. So they push many of their compute-intensive tasks like predicate filtering and aggregation. That's all down to what they call the Redshift spectrum layer. Spectrum queries use a lot less of your cluster processing capability than other types of queries. Now, there are some considerations to make this all work. Your Redshift cluster in the S3 bucket must be in the same AWS region and spectrum doesn't support some things like VPC routing with provision clusters. This means that the routing of traffic between the cluster and other VPCs must be handled manually unless you are using an AWS Glue data catalog that is enabled for AWS lake formation, you can't control user permissions on an external table. So there's still some ways to go with this, but a lot of shops are implementing the external tables and this is their lake house. So the limitations are on the screen here. It can't perform update or delete operations, but you can insert. So it's good for those read-only type of environments. Snowflake also uses external tables for this. As a matter of fact, I think most of these database vendors do. They have this stage concept. You see they're in my workflow, create stage, create external table, create the cloud object storage event notification and then get an automatic refresh. What's that stage? The stage is like a registry for the metadata. It requires a little bit of maintenance on your part. You can either manually refresh it or you can create workflow that triggers the updates to it. And that keeps everything in sync. You can manually partition your Parquet files. You can more efficiently query and analyze the data as well as reduce storage space with this type of architecture. It helps to ensure the lake houses continues to be scalable efficient and secure partitioning also helps when it comes to the Snowflake lake house environments. Now, what about BigQuery? Yes, here it is again, external tables. Run BigQuery analytics on data stored in S3, blob storage or Google Cloud storage. So it's going to connect them all. Now this kind of maybe is starting to beg the question, well, should I be mixing different types of storage? Like S3 with BigQuery? You can. Certainly, I put it right here. You can. Most shops stick to the, like for in this example, Google Cloud storage. But everybody has a little bit of S3. It's sort of standard now. So it's good to know that BigQuery Omni will work with it. And to query external data, you would need to create a big lake table that references Amazon S3 or blob storage data. So there's some workflow to this, which I don't show you all of here, but it involves creating an AWS IAM policy, an IAM role for BigQuery, adding a trust policy, creating a schema with location specified as the other cloud region, creating that big lake table. And then finally, you can query as normal. There are some limitations to this, which we find somewhat limiting. 20 gig per query result, 1 terabyte per day. And there's only a couple of regions in AWS, well, one in AWS and one in Azure that this is working in today. This is bound to change. So that's BigQuery. And finally, I'm going to cover Synapse. I'm a little bit remiss that I don't have the founding fathers of this idea, Databricks in here, but I don't. Let's look at Synapse. So they have two types of external tables. They have Hadoop external tables, which is kind of the old way where you read and export data in all your formats. And it's available in dedicated SQL pools, but not in the serverless SQL pools today. Or you can have native external tables that read and export data in all the formats, CSV, Parquet, et cetera, et cetera, available in the serverless SQL pools that they have. So all of these vendor, well, I shouldn't say all, but I think maybe all, a lot of them have their serverless options. Now, great for waiting into the technology, getting a feel for it, maybe not the most price-effective way to handle large-scale production workloads, but they definitely serve their place in the spectrum of things. Public preview is in dedicated SQL pools, Parquet only, and writing and exporting data using CTAS and the native external tables is available only in the serverless SQL pool, but not in the dedicated SQL pools. So they all have some limitations of their own. And here you can see I kind of broke it out in terms of those external data sources. So you can get a better spec on which one to use. Now, let's move on from the lake house. I hope that was exciting to you. And you see some possibilities there. A lot of shops have implemented lakes and warehouses and without putting them together, one plus one doesn't equal three, right? So there's a lot of great possibilities for you really in the lake house. I could almost kind of blanket, recommend a lake house or approaching a lake house because no matter where you are, it's adding a few things to make it even more valuable. So data mesh. This is an overall data architecture environment that you see here. Now a data mesh is conceptually pretty simple. Some of the components of the architecture, let's say we were building a single architecture for the business, which is ridiculous, but let's say you're doing that. And as you do that, you have this one lake, one warehouse and so on. What the mesh is saying is that that's not realistic. You're gonna have multiple, let's have multiple lakes, let's have multiple warehouses, maybe multiple Kafka environments or data integration environments that work on more or less a domain boundary and domain is kind of squishy when you're talking about all this stuff, by the way. But anyway, they're all working on, they can all be working on domain boundaries. And there are some things that we come together on to create some efficiency in this architecture. So the data mesh architecture decentralizes and decouples components by the domain that I mentioned before, recognizing that context matters, such as the concept of a customer being different across the company. And we all know that to some degree, the concept of a customer does differ, at least in the attributes that different domains are interested in. So this creates some flexibility in design and it creates a lot more self-service. That's the big thing with the data mesh, self-service. Adaptive governance as well, governance that is more localized, whereby it's not so hard to stay in compliance with the laws and regulations if you're keeping up with them. And you can keep up with the ones that makes sense for the data that you're assigned to. And that is part of the data mesh as well. This concept, the data mesh, it's not attached to a vendor per se. It's more or less consultant led, so really ground up. The mesh aims to decentralize every component based on domain, like marketing sales, HR, et cetera, although those don't play out exactly like that in most shops. There can be multiple marketing domains, if you will, sales and HR, et cetera. As a matter of fact, it's probably a whole other webinar for us to talk about building out your domain. So look for, if you're concerned about that, look for conceptual data modeling because that's where that can happen. So this is a very flexible design for self-service, some of the principles, domain ownership. Domains offer a bounded context. A team owns the domain. Data is a product where data sets are exposed via APIs in the catalog. Federated governance under global standards of quality. So lots of self-service in here. This is frequently compared to a microservices approach to data architecture. The benefits, data democratization. We're all talking about that. This is a manifestation of that. Broadening the access of data beyond where it sits today. Reducing data silos and operational bottlenecks. Cost efficiencies moving away from batch, utilizing cloud data platforms and streaming pipelines for real-time collection. And it reduces the technical debt. Where distributed architecture reduces the strain on the storage system, providing more accessibility to the data via APIs. Now, distributed architectures can reduce the strain on that storage system by spreading out below, allowing multiple systems to access the same data simultaneously. So you can decouple the access layer from the storage layer in this type of an architecture. You get interoperability by standardizing domain, exhausting data fields. So those fields that are not specifically attached to a domain, they are very universal across the company in nature. That can remain that way, remain universal. A lot of times when I'm talking about the data mesh, I can't help but think about master data management. Which by the way, I hope is part of your architecture here. But we do realize that much data is quote-unquote mastered in the warehouse and the lake and so forth. And that distributed nature to that is necessary in many cases. So also this is great for security and compliance, some use cases, BI dashboards, anything that's pushing the data out into the domains, again, is great for that. So if you have, if you're a very decentralized company, I think that's the clue that the data mesh might be something great for you. And the data mesh, it's something that no matter where you are, you can get there with some effort. I don't think, I don't think any architecture that I've seen can't go there, any centralized architecture that is. And many companies are now in a centralized architecture for their data lake stuff. And there are, hopefully you can see in the presentation, in the diagrams and so on, the steps to take to enable more possibilities with these ideas. So for the data mesh, multiple warehouses, multiple lakes, multiple intake layers, et cetera. So use cases for data mesh when you have specific customer experiences where the customer data is really what's allowing the businesses to better understand their users and provide more personalized experiences. This has been observed in a variety of industries from marketing to healthcare. So I'm gonna call out data mesh as being number two here to the lake house in terms of concepts that are at least trying to be implemented out there. And then machine learning projects, number three. Standardizing domain agnostic data enables the data scientists to quickly integrate data from different sources, reducing the time spent on data processing and allowing more models to be implemented in their domain, in production, faster. And this accelerates automation, which is a focus for many companies today. Now, I apologize. I didn't know how to represent the data fabric. It's just sort of there, no matter what your architecture is. And your architecture might be based on data mesh, data lake house or a centralized concept such as in the picture here today, but there's a mesh there. There's a fabric, I shouldn't say mesh, there's a fabric under the covers behind the scenes there. Fabric provides common shared services connectivity and application portability. It's all about using metadata to automate the management and governance of all data. More automation is possible by providing a unified centralized platform for data management integration and governance, all of this stuff that we're looking at here. As a matter of fact, I can't help but think about data virtualization as a concept that is similar here to the data fabric. Patterns and that's a very metadata driven across the board here. Patterns in this metadata can start automating data quality rules and data analysis. So as we think about AI, which I frequently do, and I think about a lot of the analysis that we do could be done by AI, this facilitates that. This idea of the data fabric will facilitate that by providing the AI or the analysts as the case may be access to all data everywhere. Automated data analysis uses algorithms to process large amounts of data and identify patterns and trends in it. This can help organizations make informed decisions faster and with greater accuracy. What this also allows is a lot of AI intelligence in the data fabric. AI can define data governance on its own. It can see what's necessary and create the data governance patterns, implement data governance rules across the board. This is great for thinking about security and wanting to feel better about masking and encryption happening everywhere that it needs to happen. It utilizes a continuous flow of data over all metadata assets to provide insights and recommendations. Very often will be combined with the data mesh. Now, you cannot do a mesh or fabric to maximum effectiveness without Enterprise MDM. I already kind of alluded to it. Have to come back to that, stick that in here for you because I'm a big proponent of it. I've seen it do great things. So when you try to ask a question, how many customers and you have multiple customer databases and so on, all the more important. Now with a fabric or a mesh or one of these decentralized architectures to be able to answer that question and master data management is one that can do that. Now, you might be thinking, well, what can be automated? What kind of analysis is he talking about that can be automated? Well, there's a lot of automated analysis as possible today. Text analysis, image analysis, video analysis, audio analysis, those things I just mentioned. It's almost ridiculous to think that you're gonna have a person going through that sort of thing. Speech recognition, automated data mining and machine learning, automated predictive analytics, automated natural language processing and automated anomaly detection. These things should just be happening in the fabric of your organization and facilitated by metadata. Data fabric principles, be intelligent, be automated, unify disparate systems, embed governance, strengthen security and privacy and provide more data accessibility to everybody. A fabric can allow decision makers to view all more cohesively to better understand whatever lifecycle that they're interested in, product lifecycle, the customer lifecycle, et cetera. It counteracts the problem of data gravity and that is the idea that the data becomes more difficult to move as it grows in size. Now, this is not an excuse to, none of these are really excuses to have a half-hearted data architecture underneath these patterns. You still wanna be solid there. You still wanna have a solid data model. You still wanna do data quality to a standard, okay? You wanna, even your column naming and so on, the oldies bug goodies, if you will. They're still there. We still wanna do them well. And this is additive to a great foundational architecture. All of these things are, okay? Keep that in mind. Fabric architectures operate around the idea of loose coupling data in platforms with applications that needed. One example of a data fabric architecture in a multi-cloud environment is where one cloud like AWS manages the data ingestion and another platform such as Azure oversees data transformation and consumption. Then you might have a third vendor like IBM or Oracle providing analytical services. So the fabric architecture stitches these environments together to give you that unified view of data. Benefits are many integrated intelligence where they use semantic knowledge graphs, metadata management and machine learning to unify data across various data types and endpoints democratization of data. There it is again. The data fabric architecture facilitates self-service of applications, another way to get there and broadens the access of data beyond wherever it is today and better data protection because of the consistency consistent, the ability to consistently apply the data security rules, the ability of AI to generate those rules and other data governance type of rules. So there are many use cases for data fabric and click. Okay, here are some of them. Fraud detection, okay. Fraud can happen anywhere. Fraud is often these days a product of some sort of, how shall I say, commonality that happens across the globe even where it's consistent, there is, let me try this again. There's a team of, a team approach to fraud today. How about that, okay. And in this team approach, it can be really hard to detect if you're looking at data in isolation. And so anytime that's the case, a data fabric is gonna help data for preventative maintenance and by integrating across various sources, data scientists create a holistic view of everything including preventative maintenance, data discovery, customer profiling and risk modeling. And that is what I have to say about the data fabric. Okay, we're three down. Now the last one is sort of recently emerged and this one is tied somewhat, quite somewhat to a vendor snowflake, all right? And so I think the rest of the market is a little slow to, I shouldn't even say slow, it's just a part of the evolution of data cloud in that it's something that we're not hearing as much about as the others yet, but I think it fits right in there. I think it's the fourth stool on the table, if you will. And that it's interoperable with the others just like the others are interoperable with it and all the others similar to the others, there is some science that has emerged to fit the reality of the emerged patterns for cloud usage over the last decade now we have, right? And again, this is another way of trying to make the most of a decentralized architecture. So what I want you to know about the data cloud, there's a couple of things. One is that it works across cloud. So you can have multiple clouds that are somewhat working together. That's number one, but number two to me is very interesting. And something that we're not all doing enough of which is data exchange. This is the ability to share an exchange data with your subsidiaries, partners, or third parties or the general audiences, general users that are out there on the internet. The data marketplace provides live access to ready to query data with a few clicks. And I have been recently rather living in that world of providing third party data of all stripes, syndicated data, as well as data that is going to be used for AI algorithm generation because we need a lot of data for that. Anyway, I could talk extensively about that. That's a market that is quite emerging today. And the data marketplace market is more about connecting with your partners and not necessarily public data, but there is a lot of public data in there. Now, I predict that you might be thinking, well, what's the use of that? There's a lot of use for that. Now, this gives every enterprise the opportunity not to not only utilize the data, but to monetize their own data. I predict that you as a data professional, mostly what I get on these podcasts, 25% of you are probably gonna be working in your company's data products in the next few years. This is an integral part of the new economy and the ideas are going to emerge from us, the data professionals, the ones that know, the ones that know where we sit with data and the possibilities for data. So I'm always injecting this idea into my webinars and talks and so on. And that is to be assertive about the possibilities as a data professional, the business possibilities. Now, if you research this idea of the data cloud, it can be rather frustrating because the term is used even by snowflake, I should say largely by snowflake to mean different things. It means a data cloud within a single cloud instance, like AWS and all the architecture that you have there, including of course the snowflake implementation, that's a data cloud, but really, I like to think of it as all of the clouds working together and that's the other kind of common definition of the data cloud, which I think will stick and which I think will be the, where a lot of the value comes from because having a great architecture with a great database in one cloud is one thing, but now we're back to the beginning of the webinar where you wanna do something with that. You wanna make it into a lake house, a mesh of fabric, et cetera. So a data cloud is combining all of the clouds that you may have. So in summary, the distributed data architecture patterns are not mutually exclusive. You should not think of them that way. You can do multiple again. I don't have a bullet on it, but don't get hung up on making sure that you're ticking and tacking all of the things that somebody out there said about what you need to do to be a data mesh. Your management, executive management is not gonna grade you out on how well you did a data mesh. It's about how well you supported the business and drove this and that's what we're here to do. So if you're doing that great, hopefully there's some ideas to make you do it better in here. The lake house is all about drill through pathing. Yeah, think of it that way and you're 80% of the way there in terms of understanding that. The data mesh architecture decentralizes and decouples components by business domain as we saw. We saw in the picture, the warehouse was that. The lake was that. The data integration was like that. So that's the data mesh idea. The data fabric provides common shared services, connectivity and application portability, making more automation possible through patterns in metadata. Those are some of the keywords when it comes to the fabric, automation and metadata. And finally the data cloud, which I'm not using, I mean, I'm saying it's a concept that applies broader than Snowflake even though they've coined it. It allows organizations to unify and connect to a single copy of all of their data and external data. So that's some of the promise from these decentralized architectural patterns. You might have seen some overlap. Yeah, there's some overlap out there for sure. Could the world have simplified this to probably a couple of concepts for us? Yeah, probably so, but they didn't and here we are. And that's why we're still taking the best of all of these for our implementations. We are implementing more than one of these ideas and thriving by doing so. And that has been my part of today. I'm gonna turn it back over now to Shannon for some Q&A. I'll look forward to it. William, thank you so much. As always for another great presentation. If you have questions for William or for Vivin, feel free to submit them in the Q&A portion of your screen. And just to answer the most commonly asked questions, just a reminder, I will send a follow-up email at the end of Monday for this webinar with links to the slides and links to the recording. So Vivin, coming in first was a question about some Informatica products. Was it a difference between BDM with Spark? Is it better than PowerCenter? What are the benefits of using BDM with Spark jobs? Yeah, I can take that. So BDM with Spark, that's the old name. It was rebranded as data engineering product, data a while back. And that was meant for big data use cases in the on-prem world, especially if you are in Hadoop ecosystem. And that was specifically the four day of BDM. But since then, the world has evolved, a lot of the cloud data sets, big data workload have moved over to the cloud data lakes, data warehouse. And that's where I would recommend using Informatica's advanced cloud data integration. So it's one cloud data integration product for all your workloads, whether big data engineering or small data or medium data workloads, or just connecting across applications because it gives you the power of ETL, ELT on Spark, ELT on data warehouses, ELT on SQL. So it's one product for all your data integration and engineering needs. Perfect, thank you. And so when distinct domains of business data is coming together with merger of business units, consolidating that data into one database inside data lake house, will it lend itself to data mesh or does that sort of consolidation go against the objectives of data mesh? William, you wanna kick us off there? It used to be that when there was a merger, there was the big war room project of getting all the data together in one data warehouse or whatever the case may be. We don't see that so much anymore. We see, okay, let's leave it alone and keep it working and give us some integrated value out of what is going to continue and maybe do some master data management over the top to build out some enterprise data layers, the things that matter. So I think when it comes to the merger and acquisition, the questioner is right about data mesh becomes the thing to look at. And the thing to acknowledge is we're gonna have multiple of each of these and we just gotta make them work together better. And so I'm looking to the mesh architecture for something like that. And it doesn't mean that we're not meshing lake houses and clouds and standard data warehouses otherwise and so on and so forth there. But yes, when you're leaving things in place and there are multiple of them in the architecture that is the mesh or yeah. Vin anything you wanna add? Yeah, I concur with what William just said. We have seen more and more in acquisition patterns that enterprises will end up supporting both ecosystems, both data cloud. So data mesh architecture is blends well for that particular patterns. And thereby, you know, make sure that you have the tooling necessary to support those multi-cloud patterns. I love it. So, and we've got about five minutes so I can slip in as many questions as I can here. Does data mesh architecture pattern require domain specific resources? For example, data lakes for it to be effective. What would be the considerations to using a data mesh architecture pattern with enterprise centralized resources would not the domain delineation be at the warehouse or data mart level? I think you're losing something there if you don't have domain experience because that's a lot of what the mesh is about. The mesh is about creating domain specific architectures that work together to create the enterprise architecture and to create those domain specific architectures you wanna have people with great domain knowledge that are building specific to the needs of that domain. If you just have skilled, say tech practitioners that don't know the domain, that's one thing and you may need some of that, but you wanna be sure that you have the domain influence on what is happening there because let's face it. Again, here I am with another, it used to be this way kind of thing but it used to be that we would gather a lot of requirements and we would kind of ferret them out of everybody in the organization and that would go pretty well. But anymore it's, well, every department and domain has their team that just knows because they work there and that's better. They know what can't be just communicated in a JAD session if you will requirements gathering session and that's very valuable and that is what the mesh is all about. So I would definitely have some domain specific resources on that team. Yeah, and just to add to that, I'll take an example, right? Like let's say you have your e-commerce firm and you have a team of data scientists who's building a sophisticated ML model, right? So it probably makes sense for that domain expert in this case who's the data scientist to develop, train the model for let's say the cart checkout experience and they will use something like a customer 360 degree view or an MDM offering, right? And another team will maybe a business side, right? They want to use this model, they want to present the recommendations in a promotional email. That will be a marketing promotional, the person who's doing that will be a part of the marketing department. So it's a different domain, but all of them would be connecting to the same MDM or customer 360, right? So you can leverage a data mesh architecture for those kind of use cases. Perfect. So I think we've got just a couple of minutes here. So if data mesh is great for a decentralized company, does that mean it's not good for a single view of client, client 360 view? You can have both. If you are not that decentralized of a company, you maybe you're not even that large of a company, you may not want to bother stepping into mesh principles. I don't think they're going to help you that much, but it's, but most organizations large or small have some decentralization to them today where the mesh can really help out. We'd have to look at your specifics to understand this, of course, and or you would have to look at them to understand that. But I think, I think, yes, you would, what you could put over the top, what I recommend is master data management. And that's where you have your enterprise data. When you have the mesh, I think master data management is an essential part of that. So that you do have that common, you have that commonality, but you also have that customization, if you will, out into the domains. So that's what I like to do to make sure I have a strike, the right balance. Kevin, anything you want to add there? Yeah, I was tempted to go back to the previous example where which was exactly the case where you have different teams, they're working on the same data set. One team is building recommendations and the other team is sending promotional emails and the way, and it's a data mesh architecture, but they're all having a single view of the client using something like an MDM. So it can work. Perfect. Well, that's perfect timing that brings us right to the top of the hour. I'm afraid that is all the time we have for the webinar. So William, thank you so much as always. And Vivin, thank you so much for joining us this month. Thanks to Informatica for sponsoring and to help make these webinars happen. And of course, thanks to all of our attendees for attending and engaging in everything that we do. We really appreciate it. Again, just a reminder, I was in a follow-up email by end of day Monday with links to the slides and links to the recording from this webinar. Thanks everybody. I hope y'all have a great day. Vivin and William, thank you. Thank you. My pleasure. Thank you.