 In 2011, early Facebook employee and Cloudera co-founder, Jeff Hammabacher, famously said, the best minds of my generation are thinking about how to get people to click on ads. And that sucks. Let's face it, more than a decade later, organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile, data-driven enterprise. What does that even mean, you ask? Well, it means that everyone in the organization has the data they need when they need it in a context that's relevant to advance the mission of an organization. Now, that could mean cutting costs, it could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving supply chain problems, predicting weather disasters, simplifying processes in thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness, we've made progress. But the hard truth is the original promises of master data management, enterprise data warehouses, data marts, data hubs, and yes, even data lakes were broken and left us wanting from more. Welcome to the data doesn't lie or does it, a series of conversations produced by theCUBE and made possible by Starburst Data. I'm your host, Dave Vellante and joining me today are three industry experts Justin Borgman is the co-founder and CEO of Starburst. Richard Jarvis is the CTO at EMIS Health and Teresa Tong is cloud first technologist at Accenture. Today we're going to have a candid discussion that will expose the unfulfilled and yes broken promises of a data past. We'll expose data lies, big lies, little lies, white lies and hidden truths and we'll challenge age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth inevitable? Will the data warehouse ever have feature parity with the data lake or vice versa? Is the so-called modern data stack simply centralization in the cloud AKA the old guards model in new cloud clothes. How can organizations rethink their data architectures and regimes to realize the true promises of data can and will an open ecosystem deliver on these promises in our lifetimes. We're spanning much of the Western world today. Richard is in the UK, Teresa is on the West Coast and Justin is in Massachusetts with me. I'm in the Cube studios about 30 miles outside of Boston. Folks, welcome to the program. Thanks for coming on. Okay, let's get right into it. You're very welcome. Now, here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think, Justin? Yeah, definitely a lie. My first startup was a company called Adapt, which was an early SQL engine for Hadoop that was acquired by Teradata. And when I got to Teradata, of course, Teradata is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on-prem, data in the cloud. Those companies were acquiring other companies and inheriting their data architecture. So despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. So Richard, from a practitioner's point of view, what are your thoughts? I mean, there's a lot of pressure to cut costs, keep things centralized, serve the business as best as possible from that standpoint. What is your experience, Joe? Yeah, I mean, I think I would echo Justin's experience really that we as a business have grown up through acquisition, through storing data in different places, sometimes to do information governance in different ways, to store data in a platform that's close to data experts, people who really understand healthcare data from pharmacies or from doctors. And so although if you were starting from a Greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place, the reality is that businesses just don't grow up like that. And it's just really impossible to get that academic perfection of storing everything in one place. You know, Teresa, I feel like Sarbanes actually kind of saved the data warehouse, right? You actually did have to have a single version of the truth for certain financial data. But really for some of those other use cases I mentioned, I do feel like the industry has kind of let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralize? I think you got to have centralized governance, right? So from the central team for things like Sarbanes-Oxley, for things like security, for certain very core data sets, having a centralized set of roles, responsibilities to really QA, right? To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise, you're not going to be able to scale, right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately you're going to collaborate with your partners. So partners that are not within the company, right? External partners, we're going to see a lot more data sharing and model creation. And so you're definitely going to be decentralized. So, you know, Justin, you guys last, I think it was about a year ago, had a session on data mesh. It was a great program. You invited Jamak Taghanian. Of course, she's the creator of the data mesh. And one of our fundamental premises is that you've got this hyper-specialized team that you've got to go through if you want anything. But at the same time, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess question for you, Richard, how do you deal with that? Do you organize so that there are a few sort of rock stars that build cubes and the like? And or have you had any success in sort of decentralizing with your constituencies that data model? Yeah, so we absolutely have got rock star data scientists and data guardians, if you like, people who understand what it means to use this data, particularly as the data that we use at EMIS is very private healthcare information and some of the rules and regulations around using the data are very complex and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business, because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a consulting type experience from a set of rock stars to help a more decentralized business who needs to understand the data and to generate some valuable output. Justin, what do you say to a customer or prospect that says, look, Justin, I got a centralized team and that's the most cost effective way to serve the business. Otherwise, I got duplication. What do you say to that? Well, I would argue it's probably not the most cost effective and the reason being really twofold. I think first of all, when you are deploying a enterprise data warehouse model, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you for many, many years to come. I think that's the story of Oracle or Teradata or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams is as much as they are experts in the technology, they don't necessarily understand the data itself. And this is one of the core tenants of data mesh that Jamak writes about is this idea of the domain owners actually know the data the best. And so by not only acknowledging that data is generally decentralized and to your earlier point about Sarbanes-Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for those laws to be compliant. But I think the reality is the data mesh model basically says data is decentralized and we're going to turn that into an asset rather than a liability. And we're going to turn that into an asset by empowering the people that know the data the best to participate in the process of curating and creating data products for consumption. So I think when you think about it that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two models comparing and contrasting. Do you think the demise of the data warehouse is inevitable? I mean, you know, Teresa, you work with a lot of clients they're not just going to rip and replace their existing infrastructure, maybe they're going to build on top of it, but what does that mean? Does that mean the EDW just becomes, you know, less and less valuable over time or it's maybe just isolated to specific use cases? What's your take on that? Listen, I still would love all my data within a data warehouse. We love it mastered, we'd love it owned by a central team, right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date. I would say it's a losing battle. If we've been trying to do it for a long time, nobody has the budgets and then data changes, right? There's going to be a new technology that's going to emerge that we're going to want to tap into. There's going to be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high-performance tool for what it's there for. But you could have this new mesh layer that still takes advantage of the things I mentioned, the data products and the systems that are meaningful to date and the data products that actually might span a number of systems, maybe either those that either source systems from the domains that know it best or the consumer-based systems or products that need to be packaged in a way that'd be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to resolve them. So Richard, let me ask you, you take Jamak's principles back to those. You got the domain ownership and data as product. Okay, great, sounds good, but it creates what I would argue are two challenges. Self-serve infrastructure, let's park that for a second. And then in your industry, one of the most regulated, most sensitive computational governance, how do you automate and ensure federated governance in that mesh model that Theresa was just talking about? Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to be, to centralize the security and the governance of the data. And I think, although a data warehouse makes that it's very simple because it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at EMIS is we have a single security layer that sits on top of our data mesh, which means that no matter which user is accessing which data source, we go through a well audited, well understood security layer that means that we know exactly who's got access to which data field, which data tables and then everything that they do is audited in a very kind of standard way, regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible, understanding where your source of truth is and securing that in a common way is still a valuable approach. And you can do it without having to bring all that data into a single bucket so that it's all in one place. And so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform and ensuring that only data that's available under GDPR and other regulations is being used by the data users. Yeah, so Justin, we always talk about data democratization and up until recently, they really haven't been line of sight as to how to get there. But do you have anything to add to this because you're essentially taking, doing the analytic queries and with data that's all dispersed all over the, how are you seeing your customers handle this challenge? Yeah, I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners. People know the data the best to create data as a product ultimately to be consumed. And we try to represent that in our product as effectively almost e-commerce-like experience where you go and discover and look for the data products that have been created in your organization and then you can start to consume them as you'd like. And so really trying to build on that notion of data democratization and self-service and making it very easy to discover and start to use with whatever BI tool you may like or even just running SQL queries yourself. Okay, guys, grab a sip of water after this short break we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence. Keep it right there.