 Welcome back everyone, CUBE's live coverage here at Open Source Summit 2023. I'm John Furrier, Rob Streche, breaking down all the content. We're kind of winding down day three. This is kind of what we bring in the community and start getting expert opinions and riffing on big problems, big opportunities. We've got a great guest here, author, Lauren Mateo, author designing data covers from the ground up. Lauren, great to see you ran into each other yesterday. Yes. From the Massachusetts area like us, Boston area. I am here, let's be honest. No, you're not here because of that. That was the extra bonus points. No, but mainly you're doing some cutting edge work. You wrote a book around data governance and designing it. Now more than ever, there's a lot more design and systems thinking going into architectural decisions around data. Not just North America, but globally. Rob, you're not foreign to data, you're dealing with data. Let's get into it. What's the book about real quick and then we'll get into some of the questions. Sure, so as you mentioned, I wrote a book and published it with the pragmatic programmers called Designing Data Governance from the Ground Up. The subtitle is Six Steps to Build a Data Driven Culture and I wanted to write this book because I have been in tech in some form or fashion for a decade and I've been working directly and indirectly with organizations on data strategy and AI trends for the last seven years, first as an analyst at Gartner and then now in my current full time role as a service designer at Steampunk, which is a human centered design firm building design and technical solutions for the federal government in the US. And the common theme that I've noticed over the last seven years is that we as a society have moved very rapidly towards consuming and ingesting more data than ever. It is powering more AI technologies than ever but data maturity seven years later is still really at an all time low. It's actually decreasing as the volume of data that exists in the world increases very rapidly. And so when I was at Gartner, basically all of their thought leadership walked clients through the steps they would need to take in order to become a data mature organization. But in the four years that I was there, those numbers in terms of who is actually mature when it comes to managing their data didn't move. Then I became a practitioner where I was working with data architects and engineers to design systems, processes, services that would help ingest and share that data via various interfaces. And I was working with clients whose job it was to manage data in many cases. And I was again stunned to see that many of their processes were manual. They took days to complete. There was no automation. There was no process mapping any of that. And so the workflows that were powering this data was still so inefficient. And I realized that being data driven is really not a technical challenge. It's often framed that way, but it really is a cultural shift. And so when we talk about digital transformation, I think being data driven is a huge part of it. I think there's a lot that we can learn from the security sector and how they have become cyber first. But we're not there yet with data. And that's why I wanted to write the book. Awesome, yeah. Now I think it's really interesting because I think one of the places that, and again, my last startup I was with was Snowplow, an open source platform for doing data collection, first party data collection. And it ran into, and it was pretty much the alternative for one of the alternatives, what they call a customer data platform, built on things like the folks down here with the data lake technology or Snowflake or other databases that are out there. And I think you hit on something that's really important is that a lot of times people have been collecting data, either been using Google Analytics, which has more recently been outlawed in, I think it's up to six countries in the EU now. And Google has really, has come out and basically said they have no plans of being able to address the concerns of the EU to get past that. So I think when you start to look at this and more and more data is being collected, where is it being collected? Where is it being stored? Sovereignty. Sovereignty is a big, big, huge problem. Because for instance, there's certain things within France where you actually have to keep your data for certain, even if you're an intergalactically large commercial operation, you have to keep your data still in France about that, the right to be forgotten. And I think the GDPR stuff that's coming to the States is really interesting. And I guess my question, long way around, is from, it seems like data governance has started to pop up more and more. Are you finding that people outside the government are coming to you and really saying, hey, we're trying to figure out how we do this and what's going on in Europe? What I see more than anything is people jumping on the AI bandwagon without, and they are so hyper focused on the type of AI they want to deploy, which frameworks they're going to use, which containers they're going to use on the backend, and they are, that concept of who is consuming this data, which pipelines is it going through? None of, that is all an afterthought. I can't count the number of times I've heard people say, we will do data governance later, we will do human center design after deployment, and I die a little inside every time, because that's my job to do these things. Why do they say that? It's because they're lazy, they don't have the money, they don't have the expertise. What's the reason? I think it ultimately goes back to what you touched on, which is data ownership. That is the biggest problem I see, especially when I'm promoting this book, people say what is the biggest challenge with data governance? And it's that no one wants to, no one either wants to own anything, meaning they don't want to own decisions about it, and they don't want to ascribe ownership to others. And so when we were talking right before we went live about how in the U.S. we have, as citizens, no, basically no federally protected consumer rights when it comes to our data. And that's very different when you look at citizens of Europe, where they have many more rights to their data. We are seeing in the U.S. patchwork legislation coming up at the state level, like in California, Virginia is starting to do more. But in terms of the why, why don't people care? I think fundamentally, first of all, it is long work, it's a long game that you have to play. But the other thing is, I think there is, I think we're not looking at this as a human-centered problem. I have my own bias in looking at it that way as a human-centered designer, but the reality is that until we make that shift, I don't see things moving. Yeah, and I think it's really interesting is the fact that, I think some of these people, it's almost better. They feel like if I'm the ostrich with my head in the sand, it's a little bit better because then I can ignore the fact that, oh, I am collecting data from certain countries that where GDPR now does apply to me, even though I'm in the U.S. And I think that's going to be a wake-up call for a lot of these organizations when they start to realize that, hey, we're using this AI technology. I'm working with another startup right now where it's about how to find all the data and ascribe business metrics and business value and metadata, not just file metadata to it so that you can do GDPR. In fact, we have a customer that's running this and doing that for that exact reason because they use it for AI. And I think where they're a little bit ahead of the curve but they've done a lot more with the European Union. So I think what's interesting about, I think a book comes at a really good time. The question that brings up is that on Amazon, for instance, I remember the days when cloud hit the scene. Everyone's like, they thought they were ready for the cloud. And then when the pandemic hit, the people who needed the cloud the most who weren't in the cloud got hamstrung because they didn't have the core competency or the muscle or ownership of understanding how cloud works. People in the cloud pivoted quickly, hey, we're agile. So I think this data ownership is like a problem. But, and the people who try to get into the AI business without having that is like trying to do cloud without having expertise. So I think this AI is going to force everyone to the table and I want to get your thoughts on how you see that happening because what if someone says, hey, I've been pushing this data ownership, it's been that garbage of the barge of garbage that wouldn't land in New Jersey and was floating around. Remember that, no one wanted it. I don't want ownership, I'm going to get fired. I mean, it's a big task. It's not a lot of love in that position. But then look at what happened in the past 10 years. Snowflake, they don't like being called the data warehouse. They want to be called the data cloud, but they're basically a data warehouse on the cloud. Data bricks, same thing. Teradata, old school data warehouse trying to get into the cloud. So this has to change. So what's your thinking in how do you see the mechanisms of managing data, the software, the data infrastructure? What's teed up nicely for this next AI wave? And what kind of isn't? I think there is a lot of potential around the concept of data mesh architecture. So that is relatively new on the scene. It was founded as a concept by Zemeck Degani at ThoughtWorks four years ago. It promotes a data as products mindset where you have individual data domains that have clear owners and they are managed as products within a singular data lake that all hook up to one singular mesh catalog that is updated in real time. And then you hook up your consumer apps to that mesh catalog so that it can consume apps and then consume that data that is necessary and then put it out for users on the business side. I gave a BOF session on data mesh architecture yesterday at the OS summit and we talked about the challenges of implementing it. The reality is that this is very new. Nobody is an expert in it. It is costly to implement. That's another big barrier for organizations today. It's less about the tools that are available and it's more about making them less costly and scalable. As another example, I've helped a client implement AWS SageMaker for ML Ops and they sell it as this super easy tool. You can just find your data set. You don't need to code. You don't need a PhD in stats. You can just go in. So I thought I want to experiment with this. I'll set up a free trial and see what I can do. There is no free trial. It costs the second you start using it and so that is of course a big roadblock for anyone who does not have a large budget. So I think the technology we've seen a lot of innovation where we're not moving the needle is the governance part and what I would say is that your culture really determines the tech that comes into your organization. So if you don't have that foundation of being data driven, you're not going to implement the right tools. I think the maturity curve is interesting. I'm discouraged to hear that the numbers are low. Like I'm so bad on the maturity side because we were riffing at KubeCon about the following aspiration, now aspirational fantasy hallucination as we call it here in the AI cube. We were riffing on the fact that everyone's shifting left foot security. Where are they shifting with data? What if the developers decided where the data is stored? Why should someone else decide the governance? What if the developers could have guardrails like DevOps for data? Which, what's the question? That's a good question. That flips the script. What if developers could decide where to store it? What would that look like? Yeah, my biggest takeaway from the conference this week is that it's amazing how much progress the open source community has made on security in the last five years. I was here at the same conference in Vancouver five years ago and I don't remember anybody talking about security. I was a correspondent for opensource.com for several years and when I would want to do news roundups talking about cyber breaches with open source, sometimes I was told, you know, our audience doesn't really want to read about that which means they don't really want to address it. Open SSF is very young. But the fact that it exists five years later shows how much progress has been made on security. So many talks at this conference have been about security and I can't help but notice we have so much innovation in tech and data storage, in machine learning, all of this stuff and yet we are really in the infancy of figuring out- And nobody wanted to talk about data at KubeCon. Remember we brought up AI? First of all, the sessions were put out to call for papers in November, so a little bit. They kind of wave it hard in January but they don't think about data other than log files. I mean, we're so not there yet and that has been a surprise to me again at this conference and it's great to see the progress on security but it really just hits home to me that we're still at the very infancy of figuring out what governance looks like even in open source. But I will say, I think there's no better model for how to be data driven- Explain this concept of being data driven. Yeah, open source, I know you're going with this guy. Yeah, so open source I feel like is bringing, it's about cross-functional collaboration, bringing people from various organizations, roles, skill sets together around a single vision to bring a project to life that is bigger than any one person. And I think that's no better model for how to implement data governance in an organization because there is too much data that exists today for one person or one team to own it. If you look at a typical enterprise, it's typically the CIO. Data is one of many decisions they have to make in their portfolio. They might have a CDO underneath them, but it is not going to subsist on having a top-down model where the CIO shop owns all of the data. You really need to- Or someone gets forced to take errors. Who wants to own this? Don't pick me. Nobody wants it. And I think the other thing that open source does well, I mean, there's a history of gamification, there's a history of rewarding people. And by the way, not rewarding them with money. I mean, I talk about this in the book about how if you want people to serve as stewards on your data governance council, you have to reward them in tangible ways. They have to be honored and compensated and promoted for the hard work that they are doing to bring this data governance to life because what's in it for them if they are left to do it and they're left holding the hard work without anything to show for it? So Lauren, if I hear you correctly, what you're saying is that the open source has a track record of governance and kind of could be hostile, by the way. Vendors want to come in and manipulate. I think we're back old school. He's like, okay, we want to be pure bottoms up with a little bit of top down, maybe a little bit of a dictatorship in there, but mostly community driven. Yes, community advocacy and protection, sometimes at the expense of the bottom line with really difficult trade-offs, but no one, who does that better than people in open source? It's efficient, very efficient, diverse, efficient, scalable. Yes. And so I think there's a lot of opportunity for an open data project at the Linux Foundation five years from now. Who knows who's going to lead it, but I see the writing on the wall. I think if we're talking five years from now, I hope that something like that has come to fruition, but what hit home for me here is that we're not there yet and I have never been to KubeCon in all honesty and I've wondered, what's the appetite for data there? And it sounds like in Amsterdam it was pretty low. Well, some of the thought leaders like Justin Cormack at Docker and we had Matt Butcher riffing on this idea of the developer and they were both on that consensus of thinking, yeah, we could see why should someone else decide where the data's going to be stored, if it's going to be developed on. So we were riffing that chat GPT, which everyone kind of sees as the future, as an example, saying a prompt is essentially a call and that gets tuned, tuning is self-healing. So that scales with automation. So they understand automation. So their data view is, oh, it's a tool. So I think that's going to be cool. I think it'll be fine, but right now they're more operational out of security is a big deal. They don't want hallucinations. They don't want to have any cracks in any kind of foundation. That being said, the conversation then shifted to AI's coming in here, the conversation we've been having is, is the open source foundation and all foundations prepared for the tsunami or tornado of AI, more velocity, more volume, misinformation, more code, is the foundation set up organizationally to handle this new next gen phenomenon with AI? What's your reaction to that? I can't think of a better organization that would be prepared for it purely because the Linux Foundation is founded as the home for all of these various open source projects that with contributors, with advisory councils, with leaders and GMs, and that is really what you need in order to build the bones of a really solid house when it comes to data. So I do think that when I look at who is doing this work, someone like the Linux Foundation is perfectly positioned if there is the will and the interest in it. And so I think that's a really great opportunity. It's interesting because we were talking in the boff session yesterday about whether you're a startup and if that is a detriment to doing data governance because you either don't have the volume of data that others do, you don't have the staff that others do, and there are certainly barriers to being smaller in this space, but one of the really great benefits is that you don't have as much technical debt. You don't have as much bad data that is in your possession. You might not have consumed data for that long, which means that data destruction is not as big of an issue as it might be for other organizations. In other words, you might not have held on to data for five years past an expiration date, and now you have to worry about that. And that is a huge challenge for enterprises is that they've been consuming all of this unchecked data for so long. The retroactive work to go in and fix it is huge, and I think that alone is a barrier for people. So designing data governance from the ground up, six steps to build a data-driven culture. Yes. What's been the biggest reaction from the book? What are some of the things people are saying? What's the reaction? I'm really relieved to hear from practitioners who say I needed something like this because when I was pitching this book to the Pragmatic Programmers, it was born from an idea. I was talking to Brian McDonald, who's an editor for Prague Prague at All Things Open in 2019. And Raleigh, about a year later, he very kindly made an intro to an editor at Prague Prague, and I gave the idea for the book. And they are a technical publisher. They publish very niche technical books on Rust, Angular, all of these very specific topics. And the editor said, well, we'll take a look at this manuscript, but I don't know if people are going to be interested in this. And so whenever I hear from practitioners in data who are engineers or architects or CDOs, and they say this really resonated with me, and this is what I've been trying to say for years with no luck, that feels really validating because it shows that this is a genuine challenge. It is something that leaders are concerned about. And I really wrote the book for people who say, I understand that I need governance. I just don't know what to do. And I have very little time to figure it out. What are the six essential things I need to get started? So it was meant to be that primer because data governance is going to look very different depending on your tech stack, whether you're in snowflake versus data bricks versus whatever, if you have a warehouse versus a data lake. So all of that is very nuanced, but there are fundamental things that every organization should be doing. And that was what this book was meant to be. Data as code, which we've been calling it for years in the Cube is like infrastructure as code. At some point it has to be programmable and integrate in with code and it has to be easy for developers to interface with. So you're going to see, we think data ops is a huge trend. It's funny you mentioned about, some people might not like it. I've been a database business since I'm in college and it was one of my degrees. It's gone from completely boring no one cares to the most sexy position, the best position. Then it goes boring again. So it's boring until the next hot data topic. Now AI is the hottest thing and that's data. Securities are data problem, AI is a data opportunity and now it's great to be a data geek. Yes, and I've gotten some bizarre reactions. When I say AI is just data, people including very intelligent people who have PhDs sometimes are like, what are you talking about? It's got its own consciousness. It's going to overtake us, all of this stuff. And that just goes to show there's a fundamental misunderstanding about what AI is in the first place and until we create more literacy around it, we're not really going to move the needle and I think what I'm afraid is going to move the needle is there's going to be some catastrophe with data. If you look at why cyber advanced, why standards advanced and secure pipelines advanced, it is because there was breach after breach after breach and they're only getting worse and the worst offenders of cyber breaches are internal employees and so organizations were like, we can't afford not to do this anymore and I do think that that's what's going to happen with data. Matt from Furma was saying we're in this weird moment of like just exploration of euphoria and we said in theCUBE there's going to be a plane in the Hudson moment for data where people, whoa. There is. And that's going to look like catastrophe, some sort of event, mass murder. I mean something has to happen. What I think is going to happen is that personally identifiable information is going to get leaked at such a massive scale that all organizations will have no choice but to really take this seriously. And I read this article by a historian years ago that talked about how the arc of the universe bends towards progress following catastrophe and that sounds very severe and dark until I looked at what he was talking about and I thought, oh, that's actually true. And so then you think about tech and innovation in tech, innovation in cyber, innovation in everything, it often follows an enormous problem and it's a problem to be solved. And you talk about rights about financial, not having rights and data. We also don't have rights in cyber. The red line is we're being attacked under the red line and just because there's no physical troops on our shores doesn't mean we're not being manipulated. So we are as citizens. Very vulnerable. Very vulnerable, yeah. Yeah, I think you hit the nail on the head with that one where it's going to be, the catastrophe I think is already primed. I think we're in front of the wave and then the wave is coming. And I think the catastrophe potentially is going to be bankrupting of a company that gets fined for violating one of these large leaks that happens. And I think that'll be a very interesting moment for all of these industries. And I think you also hit on a really good point which I find very interesting in this community here. And there's not a lot of product management of these projects. And I think it's, and I'm not talking, you know, scheduling and stuff like that, but it's more the, how they, some groups do a better job than others. And data product management has really become a big, big topic. Are you seeing that that people are looking to you and saying, hey, where do you see this work going on so well? I am. And I agree with you. And that's really what data mesh is all about. It's promoting a data as product's mindset and that product management ethos to governing and managing your data as a consumable product, both by your colleagues and your customers. And the point I always make is that there's no other role other than, you know, someone in a senior data role where it, like if you were a product manager and you said, you know, I'm going to find, figure out my product strategy after we ship or I'm going to, I'm a VP of sales and I'm going to meet my, have my sales strategy for inbound and outbound leads after the quarters over. No one else is allowed to do that. And so when I hear people say, oh, we're going to come up with a data strategy, including a governance plan later, it's again, no one else is allowed to do that. And so it's totally backwards. Oh, we'll do that later, back up in recovery. We'll figure it out later. Hope on, Lauren, we got to go. Great, great chat. Congratulations on the book. I'm glad you brought it with Zamax. She was on our super cloud panel. She's now at Next Data. She started a new company, the Data Mesh. We're big fans of, so let's keep in touch. Let's follow up and congratulations on the book. Thank you very much. Okay, this is theCUBE. Day three coverage wrapping up soon. We'll be right back with more. I'm John Furrier, Rob Streche. Stay tuned.