 Okay, welcome back to our SuperCloud 2 event, live coverage here of stage performance in Palo Alto, syndicating around the world. I'm John Flurry with Dave Vellante. We've got exclusive news and a scoop here for SiliconANGLE in theCUBE. Shumak Tagandhi, creator of Data Mesh, has formed a new company called NextData.com, NextData, she's a CUBE alumni and contributor to our SuperCloud initiative, as well as our coverage and breaking analysis of Dave Vellante on data, the killer app for SuperCloud. Shumak, great to see you. Thank you for coming into the studio. Congratulations on your newly formed venture and can continue success on the Data Mesh. Thank you so much. It's great to be here, great to see you in person. Yeah, finally. Wonderful. Your contributions to the data conversation has been well documented, certainly by us and others in the industry, Data Mesh taking the world by storm. Some people are debating it, throwing cold water on it. Some are thinking it's the next big thing. Tell us about the Data Mesh Super data apps that are emerging out of cloud. I mean, Data Mesh, as you said, it's, you know, the pain point that it surfaced was universal. Everybody said, oh, why didn't I think of that? You know, it was just an obvious next step and people are approaching it, implementing it, I guess, the last few years. I've been involved in many of those implementations and I guess SuperCloud is somewhat a prerequisite for it because it's Data Mesh and building applications using Data Mesh is about sharing data responsibly across boundaries and those boundaries include boundaries, organizational boundaries, cloud, technology boundaries and trust boundaries. I want to bring that up because your venture, Next Data, which is new, just formed, tell us about that. What wave is that riding? What specifically are you targeting? What's the pain point? Absolutely. Yes, so Next Data is the result of, I suppose, the pains that I suffered from implementing a Data Mesh for many of the organizations. Basically, a lot of organizations that I've worked with, they want decentralized data. So they really embrace this idea of decentralized ownership of the data but yet they want interconnectivity through standard APIs. Yet they want discoverability and governance. So they want to have policies implemented. They want to govern that data. They want to be able to discover that data and yet they want to decentralize it and do that with a developer experience that is easy and native to a generalist developer. So we try to find the, I guess, the common denominator that solves those problems and enables that developer experience for data sharing. Since you just announced the news, what's been the reaction? I just announced the news right now. So what's the reaction? But people in the industry that know you did a lot of work in the area, what have been some of the feedback on the new venture in terms of the approach, the customers problem? Yes, so we've been in stealth mode. So we haven't publicly talked about it but folks that have been close to us, in fact, have reached out. We already have implementations of our pilot platform with early customers, which is super exciting. And we're going to have multiple of those. Of course, we're a tiny, tiny company. We can have many of those where we're going to have multiple pilot implementations of our platform. In real world, we're real global, large-scale organizations that have real-world problems. So we're not going to build our platform in vacuum. And that's what's happening right now. Yermak, when I think about your role at ThoughtWorks, you had a very wide observation space with a number of clients helping them implement data mesh and other things as well prior to your data mesh initiative. But when I look at data mesh, at least the ones that I've seen, they're very narrow. I think of JPMC, I think of HelloFresh. They're generally obviously not surprising. They don't include the big vision of inclusivity across clouds, across different data stores. But it seems like people are having to go through some gymnastics to get to the organizational reality of decentralizing data. And at least pushing data ownership to the line of business. How are you approaching, are you approaching solving that problem? Are you taking a narrow slice? What can you tell us about next data? Yeah, absolutely. Gymnastics is a cute word to describe what the organizations have to go through. And one of those problems is that the data, as you know, resides on different platforms. It's owned by different people. It's processed by pipelines that who knows who owns them. So there's this very disparate and disconnected set of technologies that were very useful for when we thought about data and processing as a centralized problem. But when you think about data as a decentralized problem, the cost of integration of these technologies in a cohesive developer experience is what's missing. And we want to focus on that cohesive end-to-end developer experience to share data responsibly in these autonomous units. We call them data products, I guess, in data mesh, that constitutes computation, that governs our data, policies, discoverability. So I guess I heard this expression in the last talks that you can have your cake and eat it too. So people have their cakes, which is data in different places, decentralization and ET2, which is interconnected access to it. So we start with standardizing and codifying this idea of a data product container that encapsulates data computation, APIs to get to it in a technology agnostic way, in an open way, and then sit on top and use existing snowflake, Databricks, whatever exists. The millions of dollars of investments that companies have made sit on top of those, but create this cohesive integrated experience where data product is a first class primitive. And that's really key here, that the language and the modeling that we use is really native to data measures that I'm building a data product, I'm sharing a data product. And that encapsulates, I'm providing metadata about this, I'm providing computation that's constantly changing the data, I'm providing the API for that. So we're trying to kind of codify and create a new developer experience based on that and develop it both from provider side and user side, connected to peer to peer data sharing with data product as a primitive first class concept. Okay, so the idea would be developers would build applications leveraging those data products which are discoverable and governed. Now, today you see some companies, take a snowflake, for example, attempting to do that within their own little walled garden. They even, at one point, use the term mesh, they pull back on that and then they sort of became aware of some of your work. But a lot of the things that they're doing within their little insulated environment, support that, that governance, they're building out an ecosystem. What's different in your vision? Exactly. So we realized that, you know, and this is a reality. Like you go to organizations, they have a snowflake and half of the organization happily operates on snowflake and on the other half, oh, we're on, you know, bear infrastructure on AWS or we are on data breaks. This is the real, this, you know, this super cloud that's written up here, it's about working across boundaries of technology. So we try to embrace that. And even for our own technology, with the way we're building it, we say, okay, nobody's going to use next data, data mesh operating system. People will have different platforms. So you have to build with openness in mind. And in case of snowflake, I think, you know, they have very, I'm sure, very happy customers as long as customers can be on snowflake. But once you cross that boundary of platforms, then that becomes a problem. And we try to keep that in mind in our solution. So it's worth, it's worth reviewing that basically the concept of data measures that whether you're a data lake or a data warehouse, an S3 bucket, an Oracle database as well, they should be inclusive inside of the data. We did a session with AWS on the startup showcase data as code. And remember, I wrote a blog post in 2007 called data is the new developer kit. Back then these call them developer kits, if you remember. And that we said at that time, whoever can code data will have a competitive advantage. Aren't the machines going to be doing that? Didn't we just hear that? Well, hey Siri, hey cube, find me that best video for data mesh. There it is. This is the point. Like what's happening is that now data has to be addressable for machines and for coding. Because as you need to call the data. So the question is, how do you manage the complexity of being things as promiscuous as possible, making it available, as well as then governing it? Because it's a trade-off. The more you make open, the better the machine learning. But yet the governance issues. So this is the, you need an OS to handle this maybe. Yes, so yes. Well, we call our mental model for our platform is an OS operating system. Operating systems have shown us how you can kind of abstract what's complex and take care of a lot of complexities, but yet provide an open and dynamic enough interface. So we think about it that way. Just we try to solve the problem of policies live with the data and enforcement of the policies happens at the most granular level, which is in this concept of the data product. And that would happen whether you read, write or access a data product. But we can never imagine what are these policies could be. So our thinking is, okay, we should have a policy, open policy framework that can allow organizations write their own policy drivers and policy definitions unencoded and encapsulated in this data product container. But I'm not going to fool myself to say that, you know, that's going to solve the problem that you just described. I think that we are in this, if I don't know, if I look into my crystal ball, what I think might happen is that right now the primitives that we work with to train machine learning model are still bits and bytes and data. They're fields, rows, columns, right? And that creates quite a large surface area and attack area for, you know, for privacy of the data. So perhaps one of the trends that we might see is this evolution of data APIs to become more and more computational aware, to bring the compute to the data, to reduce that surface area so you can really leave the control of the data to the sovereign owners of that data, right? So that data product. So I think that evolution of our data APIs perhaps will become more and more computational. So you describe what you want and the data owner decides, you know, how to manage the data. That's interesting, Dave, because it's almost like we just talked about chat, GPT in the last segment with Allie Chu, who's a machine learning guru and been around the industry. It's almost as if you're starting to see reason come in. So the data reasoning is like, you start to see not just metadata, using the data to reason so that you don't have to expose the raw data. So almost like a, I won't say curation layer, but an intelligence layer. Exactly. Can you share your vision on that? Because that seems to be where the dots are connecting. Yes, this is perhaps further into the future because just from where we stand, we have to create still that bridge of familiarity between that future and presence. So we're still in that bridge making mode. However, by just the basic notion of saying, I'm going to put an API in front of my data and that API today might be as primitive as a level of indirection. As in, you tell me what you want. Tell me who you are. Let me go process that all the policies and lineage and insert all of this intelligence that need to happen. And then I will, today, I will still give you a file, but by just defining that API and standardizing it, now we have this amazing extension point that we can say, well, the next revision of this API, you not just tell me who you are, but you actually tell me what intelligence you're after. What's the logic that I need to go and now compute on your API? And you can kind of evolve that, right? Now you have a point of evolution to this very futuristic, I guess, future where you just describe the question that you're asking from the chat. Well, this is the super close. I have a question from a fan. I got to get it in, George Gilbert. And so his question is you're blowing away the way we synchronize data from operational systems to the data stack to applications. So the concern that he has, and he wants your feedback on this, is the data product app devs get exposed to more complexity with respect to moving data between data products or maybe it's attributes between data products. How do you respond to that? How do you see, is that a problem? Is that something that is overstated or do you have an answer for that? Absolutely. So I think there is a sweet spot in getting data developers, data product developers, closer to the app, but yet not overburdening them with the complexity of the application and application logic. And yet reducing their cognitive load by localizing what they need to know about, which is that domain where they're operating within. Because what's happening right now, what's happening right now is that data engineers with a ton of empathy for them, for their high threshold of pain that they can deal with, they have been centralized, they've put into the data team and they have been given this unbelievable task of make meaning out of data, put semantic over it, curate it, cleanse it and so on. So what we're saying is that get those folks embedded into the domain closer to the application developers. These are still separately moving units. Your app and your data products are independent, but yet tightly close with each other, tightly coupled with each other based on the context of the domain. So reduce cognitive load by localizing what they need to know about to the domain, get them closer to the application, but yet have them separate from app because app provides a very different service, transactional data for my e-commerce transaction, data product provides a very different service, longitudinal data for the variety of this intelligent analysis that I can do on the data. But yet it's all within the domain of e-commerce or sales or whatnot. So a lot of decoupling and coupling create that cohesiveness architecture. So I have to ask you, this is an interesting question because it came up on theCUBE all last year. Back in the old server data center days in cloud, SRE, Google coined the term site reliability engineer for someone to look over the hundreds of thousands of servers. We asked the question to data engineering community who have been suffering, by the way, agree. Is there an SRE-like role for data? Because in a way data engineering, that platform engineering, they are like the SRE for data. In other words, managing the large scale to enable automation and self-service. What's your thoughts and reaction to that? I get it. Yes, exactly. So maybe we go through that history of how SRE came to be. So we had the first DevOps movement, which was remove the wall between dev and ops and bring them together. So you have one unit, one cross-functional unit of the organization that's responsible for you build it, you run it. So then there's no, I'm going to just shoot my application over the wall for somebody else to manage it. So we did that. And then we said, okay, there is a ton, as we decentralized and had this many microservices running around, we had to create a layer that abstracted a lot of the complexity around running now a lot or monitoring, observing and running a lot while giving autonomy to this cross-functional team. And that's where the SRE, a new generation of engineers came to exist. So I think if I just look at- Hence Borg, hence Kubernetes. Hence, hence, exactly. Hence chaos engineering, hence embracing the complexity and messiness, right? And putting engineering discipline to embrace that and yet give a cohesive and high integrity experience of those systems. So I think if we look at that evolution, perhaps something like that is happening by bringing data and apps closer and make them these domain oriented data product teams or domain oriented cross-functional teams full stop and still have a very advanced, maybe at the platform level, infrastructure level kind of operational team that they're not busy doing two jobs, which is taking care of domains and the infrastructure, but they're building infrastructure that is embracing that complexity inter-connectivity of this data product. So you see similarities. I see it, absolutely. But I feel like we're probably in a more early days of that movement. So it's a data DevOps kind of thing happening where scales happening, it's good things are happening, yet a little bit fast and loose and some complexities to clean up. Yes, yes. This is a different restructure. As you said, the job of this industry as a whole and architects is decompose, recompose, decompose, recompose in a new way. And now we're like decomposing centralized team, recomposing them as domains and... So is data mesh the killer app for SuperCloud? You had to do this to me. Sorry, I couldn't do that. I know. Why should I? Especially when we're the baby. Yes. Yes, of course. I mean, SuperCloud, I think it's really the terminology SuperCloud, OpenCloud, I think the inspiration of it, this embracing of diversity and giving autonomy for people to make decisions for what's right for them and not yet lock them in. I think just embracing that is baked into how data mesh assumed the world would work. Well, thank you so much for coming on SuperCloud to really appreciate it. Data has driven this conversation. Your success of data mesh has really opened up the conversation and exposed the slow-moving data industry. Been a great catalyst. That's now going, well, we can move faster. So thanks for coming on. Thank you for hosting me, it was wonderful. Okay, SuperCloud 2 live here in Palo Alto, our stage performance. I'm John Furrier with Dave Vellante. We'll be back with more after this short break. Stay with us all day for SuperCloud 2.