 In the 2010s, organizations became keenly aware that data would become the key ingredient in driving competitive advantage, differentiation and growth. But to this day, putting data to work remains a difficult challenge for many, if not most organizations. Now, as the cloud matures, it has become a game changer for data practitioners by making cheap storage and massive processing power readily accessible. We've also seen better tooling in the form of data workflows, streaming machine intelligence, AI, developer tools, security, observability, automation, new databases and the like. These innovations, they accelerate data proficiency but at the same time, they add complexity for practitioners. Data lakes, data hubs, data warehouses, data marts, data fabrics, data meshes, data catalogs, data oceans are forming, they're evolving and exploding onto the scene. So in an effort to bring perspective to the sea of optionality, we've brought together the brightest minds in the data analyst community to discuss how data management is morphing and what practitioners should expect in 2022 and beyond. Hello everyone, my name is Dave Vellante with theCUBE and I'd like to welcome you to a special CUBE presentation, Analyst Predictions 2022, The Future of Data Management. We've gathered six of the best analysts in data and data management who are going to present and discuss their top predictions and trends for 2022 in the first half of this decade. Let me introduce our six power panelists. Sanjeev Mohan is former Gartner analyst and principal at Sanjmo. Tony Baer is principal at DB Insight. Carl Olofsson is well-known research vice president with IDC. Dave Menegar is senior vice president and research director at Ventana Research. Brad Shemin, chief analyst at AI Platforms Analytics and Data Management at Omnia. And Doug Henschen, vice president and principal analyst at Constellation Research. Gentlemen, welcome to the program and thanks for coming on theCUBE today. Great to be here. Thank you. All right, here's the format we're going to use. I as moderator are going to call on each analyst separately who then will deliver their prediction or mega-trend and then in the interest of time management and pace two analysts will have the opportunity to comment. If we have more time, we'll elongate it but let's get started right away. Sanjeev Mohan, please kick it off. You want to talk about governance? Go ahead, sir. Thank you, Dave. I believe that data governance which we've been talking about for many years is not only going to be mainstream, it's going to be table stakes. And all the things that you mentioned, where data oceans, data lakes, lake houses, data fabric, meshes, the common glue is metadata. If we don't understand what data we have and we are governing it, there is nowhere we can manage it. So we saw Informatica when public last year after a hiatus of six years. I'm predicting that this year we will see some more companies go public. My bet is on Culebra, most likely, and maybe Alation, we'll see go public this year. I'm also predicting that the scope of data governance is going to expand beyond just data. It's not just data and reports. We are going to see more transformations like Spark Jaws, Python, even Airflow. We're going to see more of streaming data. So from Kafka schema registry, for example, we will see AI models become part of this whole governance suite. So the governance suite is going to be very comprehensive, very detailed lineage impact analysis. And then even expand into data quality. We already seen that happen with some of the tools where they are buying these smaller companies and bringing in data quality monitoring and integrating it with metadata management, data catalogs, also data access governance. So what we are going to see is that once the data governance platforms become the key entry point into these modern architectures, I'm predicting that the usage, the number of users of a data catalog is going to exceed that of a BI tool. We'll take time and we've already seen that trajectory. Right now, if you look at BI tools, I would say there are 100 users to a BI tool to one data catalog. And I see that evening out over a period of time. And at some point, data catalogs will really become the main way for us to access data. Data catalog will help us visualize data. But if you want to do more in-depth analysis, it'll be the jumping off point into the BI tool, the data science tool. And that is the journey I see for the data governance products. Excellent, thank you. Some comments, maybe Doug, a lot of things to weigh in on there. Maybe you could comment. Yeah, Sanjeev, I think you're spot on a lot of the trends of the one disagreement. I think it's really still far from mainstream. As you say, we've been talking about this for years. It's like God, motherhood, apple pie. Everyone agrees it's important, but too few organizations are really practicing good governance because it's hard and because the incentives have been lacking. I think one thing that deserves mention in this context is ESG mandates and guidelines. These are environmental, social, and governance regs and guidelines. We've seen the environmental regs and guidelines imposed in industries, particularly the carbon intensive industries. We've seen the social mandates, particularly diversity imposed on suppliers by companies that are leading on this topic. We've seen governance guidelines now being imposed by banks and investors. So these ESGs are presenting new carrots and sticks and it's gonna demand more solid data. It's gonna demand more detailed reporting and solid reporting, tighter governance, but we're still far from mainstream adoption. We have a lot of best of breed niche players in the space. I think the signs that it's going to be more mainstream are starting with things like Azure Purview, Google Dataplex, the big cloud platform, players seem to be upping the ante and starting to address governance. Excellent, thank you, Doug. Brad, I wonder if you could chime in as well. Yeah, I would love to be a believer in data catalogs, but to Doug's point, I think that it's going to take some more pressure for that to happen. I recall metadata being something every enterprise thought they were going to get under control when we were working on service oriented architecture back in the 90s and that didn't happen quite the way we anticipated. And to Sanjeev's point, it's because it is really complex and really difficult to do. My hope is that we won't sort of, how do we put this, fade out into this nebula of domain catalogs that are specific to individual use cases like Purview for getting data quality right or like data governance and cybersecurity. And instead, we have some tooling that can actually be adaptive to gather metadata to create something. I know is important to you Sanjeev and that is this idea of observability. If you can get enough metadata without moving your data around, but understanding the entirety of a system that's running on this data, you can do a lot to help with the governance that Doug is talking about. So I just want to add that data governance, like any other initiatives did not succeed. Even AI went into an AI window, but that's a different topic. A lot of these things did not succeed because to your point, the incentives were not there. I remember when Sarban's Oxley had come into the scene, if a bank did not do Sarban's Oxley, they were very happy to a million dollar fine. That was like, what could change for them instead of doing the right thing. But I think the stakes are much higher now. With GDPR, the floodgates open, now California has CCPA, but even CCPA is being outdated with CPRA, which is much more GDPR-like. So we are very rapidly entering a space where pretty much every major country in the world is coming up with its own compliance regulatory requirements. Data residency is becoming really important. And I think we are going to reach a stage where it won't be optional anymore. So whether we like it or not, and I think the reason data catalogs were not successful in the past is because we did not have the right focus on adoption. We were focused on features and these features were disconnected very hard for business to adopt. These are built by IT people for IT departments to take a look at technical metadata, not business metadata. Today the tables have turned, CDOs are driving this initiative. Regulatory compliance is beating down hard, so I think the time might be right. Yeah, so guys, we have to move on here, but there's some real meat on the bone here, Sanjeev. I like the fact that you called out Calibra and Alation, so we can look back a year from now and say, okay, he made the call, he stuck it. And then the ratio of BI tools to data catalogs, that's another sort of measurement that we can take, even though there was some skepticism there, that's something that we can watch. And I wonder, someday if we'll have more metadata than data, but I want to move to Tony Bear, you want to talk about data mesh and speaking, coming off of governance, I mean, wow, the whole concept of data mesh is decentralized data and then governance becomes a nightmare there, but take it away, Tony. We'll put it this way, data mesh, the idea at least is proposed by ThoughtWorks, basically was unleashed a couple of years ago and the press has been almost uniformly, almost uncritical. A good reason for that is for all the problems that basically had Sanjeev and Doug and Brad were just speaking about, which is that we have all this data out there and we don't know what to do about it. Now, that's not a new problem. That was a problem when we had enterprise data warehouses, it was a problem when we had our Hadoop data clusters, it's even more of a problem. Now the data is out in the cloud where your data lake is not only S3, it's all over the place. And it's also including streaming, which I know I'll be talking about later. So the data mesh was a response to that, the idea of that we need to debate, who are the folks that really know best about governance? It's the domain experts. So it was basically, data mesh was an architectural pattern and a process. My prediction for this year is that data mesh is gonna hit cold hard reality because if you do a Google search, basically the published work, the articles and data mesh have been largely, pretty uncritical so far. Basically lauding is basically being a very revolutionary new idea. I don't think it's that revolutionary because we've talked about ideas like this, Brad and IU and I met years ago and we were talking about so and decentralizing all of this but it was at the application level. Now we're talking about it at the data level. And now we have microservices. So there's this thought of hopefully managed if we're deconstructing apps in cloud native through microservices, why don't we think of data in the same way? My sense this year is that, this has been a very active search. If you look at Google search trends, is that now companies are gonna, enterprises are gonna look at this seriously. And as they look at it seriously, it's going to attract its first real hard scrutiny. It's gonna attract its first backlash. That's not necessarily a bad thing. It means that it's being taken seriously. The reason why I think that you'll start to see basically the cold hard light of day shine on data mesh is that it's still a work in progress. This idea is basically a couple of years old and there's still some pretty major gaps. The biggest gap is in the area of federated governance. Now federated governance itself is not a new issue. Federated governance decision, we tend to forget like, how can we basically strike the balance between getting, let's say, between basically consistent enterprise policy, consistent enterprise governance, but yet the groups that understand the data know how to basically, how do we basically sort of balance the two? There's a huge gap there in practice and knowledge. Also to a lesser extent, there's a technology gap, which is basically in the self-service technologies that will help teams essentially govern data, basically through the full lifecycle from developing, from selecting the data, from building the data pipelines, from determining access control, determine looking at quality, looking at basically whether the data is fresh or whether it's trending off course. So my predictions is that it will receive the first harsh scrutiny this year. You are going to see some organizations, some enterprises declare premature victory when they've built some federated query implementations. You go see vendors start to data mesh, watch their products, anybody in the data management space that are going to say that where there's basically a pipelining tool, whether it's basically ELT, whether it's a catalog or a federated query tool, they're all going to be like, basically promoting the fact of how they support this. Hopefully nobody is going to call themselves a data mesh tool because data mesh is not a technology. We're going to see one other thing come out of this. And this harks back to metadata that Sanjeev was talking about and the catalogs that he was talking about, which is that there's going to be a renewed focus on metadata. And I think that's going to spur interest in data fabrics. Now, data fabrics are pretty vaguely defined, but if we just take the most elemental definition, which is a common metadata backplane, I think that if anybody is going to get serious about data mesh, they need to look at the data fabric because we all at the end of the day need to speak, need to read from the same sheet of music. So thank you, Tony, Dave Medinger. I mean, one of the things that people like about data mesh is it pretty crisply articulates some of the flaws in today's organizational approaches to data. What are your thoughts on this? Well, I think we have to start by defining data mesh, right? The term is already getting corrupted, right? Tony said it's going to see the cold, hard light of day. And there's a problem right now that there are a number of overlapping terms that are similar, but not identical. So we've got data virtualization, data fabric, excuse me for a second. Sorry about that. Data virtualization, data fabric, data federation, right? So I think that it's not really clear what each vendor means by these terms. I see data mesh and data fabric becoming quite popular. I've interpreted data mesh as referring primarily to the governance aspects as originally intended and specified, but that's not the way I see vendors using. I see vendors using it much more to mean data fabric and data virtualization. So I'm going to comment on the group of those things. I think the group of those things is going to happen. They're going to happen. They're going to become more robust. Our research suggests that a quarter of organizations are already using virtualized access to their data lakes. And another half, so a total of three quarters will eventually be accessing their data lakes using some sort of virtualized access. Again, whether you define it as mesh or fabric or virtualization isn't really the point here, but this notion that there are different elements of data, metadata and governance within an organization that all need to be managed collectively. The interesting thing is when you look at the satisfaction rates of those organizations using virtualization versus those that are not, it's almost double, 68% of organizations, I'm sorry, 79% of organizations that were using virtualized access express satisfaction with their access to the data lake. Only 39% express satisfaction if they weren't using virtualized access. So thank you, Dave. Sanjeev, we just got about a couple of minutes on this topic, but I know you're speaking, or maybe you've spoken already on a panel with Jomak Tagani who sort of invented the concept. Governance obviously is a big sticking point, but what are your thoughts on this? You're on mute. So my message to Jomak and to the community is as opposed to what Dave said, let's not define it. We spent the whole year defining it. There are four principles, domain product, data infrastructure and governance. Let's take it to the next level. I get a lot of questions on what is the difference between data fabric and data mesh? And I'm like, I can't compare the two because data mesh is a business concept. Data fabric is a data integration pattern. How do you define, how do you compare the two? You have to bring data mesh level down. So to Tony's point, I'm on a warpath in 2022 to take it down to what does a data product look like? How do we handle shared data across domains and govern it? And I think we are going to see more of that in 2022. It's operationalization of data mesh. I think we could have a whole hour on this topic, couldn't we? Maybe we should do that, but let's go to, let's move to Carl Ellison. Carl, you database guy, you've been around that block for a while now. You want to talk about graph databases, bring it on. Oh yeah, okay, thanks. So I regard graph databases, basically the next truly revolutionary database management technology. I'm looking for it to, for the graph database market, which of course we haven't defined yet. So obviously I have a little legal room in what I'm about to say, but that this market will grow by about 600% over the next 10 years. Now 10 years is a long time, but over the next five years, we expect to see gradual growth as people start to learn how to use it. Problem isn't that it's used, the problem is not that it's not useful, it's that people don't know how to use it. So let me explain before I go any further what a graph database is, because some of the folks on the call may not know what it is. A graph database organizes data according to a mathematical structure called a graph. The graph has elements called nodes and edges. So a data element drops into a node. The nodes are connected by edges. The edges connect one node to another node. Combinations of edges create structures that you can analyze to determine how things are related. In some cases, the nodes and edges can have properties attached to them, which add additional informative material that makes it richer. That's called a property graph. Okay, there are two principal use cases for graph databases. There's semantic graphs, which are used to break down human language text into the semantic structures. Then you can search it, organize it and answer complicated questions. A lot of AI is aimed at semantic graphs. Another kind is the property graph that I just mentioned, which has a dazzling number of use cases. I want to just point out, as I talk about this, people are probably wondering, well, we have relational databases isn't that good enough, okay? So a relational database defines what it uses. It supports what I call definitional relationships. That means you define the relationships in a fixed structure. The database drops into that structure. There's a value, a foreign key value that relates one table to another and that value is fixed. You don't change it. If you change it, the database becomes unstable. It's not clear what you're looking at. In a graph database, the system is designed to handle change so that it can reflect the true state of the things that it's being used to track. So let me just give you some examples of use cases for this. They include entity resolution, data lineage, social media analysis, customer 360, fraud prevention. There's cybersecurity, there's strong supply chain is a big one, actually. There's explainable AI and this is going to become important too because a lot of people are adopting AI but they want a system after the fact to say, how did the AI system come to that conclusion? How did it make that recommendation? Right now we don't have really good ways of tracking that, okay? Machine learning in general, social network, I already mentioned that. And then we've got, oh gosh, we've got data governance, data compliance, risk management, we've got recommendation, we've got personalization, anti-money laundering, that's another big one. Identity and access management, network and IT operations is already becoming a key one where you actually have mapped out your operation, whatever it is, your data center and you can track what's going on as things happen there. Root cause analysis, fraud detection is a huge one. A number of major credit card companies use graph databases for fraud detection, risk analysis, tracking and tracing, turn analysis, next best action, what if analysis, impact analysis, entity resolution? And I would add one other thing or just a few other things to this list, metadata management. So Sanjeev, here you go. This is your engine, okay? Because I was in metadata management for quite a while in my past life. And one of the things I found was that none of the data management technologies that were available to us could efficiently handle metadata because of the kinds of structures that result from it, but graphs can, okay? Graphs can do things like say, this term in this context means this, but in that context, it means that, okay? Things like that. And in fact, logistics management, supply chain. It also, because it handles recursive relationships by recursive relationships, I mean objects that own other objects that are of the same type. You can do things like fill materials, so like parts explosion, you can do an HR analysis, who reports to whom, how many levels up the chain and that kind of thing. You can do that with relational databases, but yeah, it takes a lot of program. In fact, you can do almost any of these things with relational databases, but the problem is you have to program it. It's not supported in the database. And whenever you have to program something that means you can't trace it, you can't define it, you can't publish it in terms of its functionality, and it's really, really hard to maintain over time. So Carl, thank you. I wonder if we could bring Brad in. I mean, Brad, I'm sitting there wondering, okay, is this incremental to the market? Is it disruptive and replacement? What are your thoughts on this space? It's already disrupted the market. I mean, like Carl said, go to any bank and ask them, are you using graph databases to get fraud detection under control? And they'll say, absolutely, that's the only way to solve this problem. And it is, frankly. And it's the only way to solve a lot of the problems that Carl mentioned. And that is, I think it's Achilles' heel in some ways, because it's like finding the best way to cross the seven bridges of Cunigsburg. It's always gonna kind of be tied to those use cases because it's really special and it's really unique. And because it's special and it's unique, it's still unfortunately kind of stands apart from the rest of the community that's building, let's say AI outcomes as the great example here. The graph databases and AI, as Carl mentioned, are like chocolate and peanut butter, but technologically, they don't know how to talk to one another. They're completely different. And you can't just stand up, sequel and query them. You've got to learn, what is that Carl, Specter, special, yeah, thank you, to actually get to the data in there. And if you're gonna scale that graph database, especially a property graph, if you're gonna do something really complex, like try to understand all of the metadata in your organization, you might just end up with a graph database winter, like we had the AI winter simply because you run out of performance to make the thing happen. So I think it's already disrupted, but we need to like treat it like a first class citizen in the data analytics and AI community. We need to bring it into the fold. We need to equip it with the tools it needs to do that the magic it does and to do it, not just for specialized use cases, but for everything, because I'm with Carl, I think it's absolutely revolutionary. So I also identified the principal Achilles heel of the technology we just scaling. When these things get large and complex enough that they spill over what a single server can handle, you start to have difficulties because the relationships span things that have to be resolved over a network, and then you get network latency and that slows the system down. So that's still a problem to be solved. Sanjeev, any quick thoughts on this? I mean, I think metadata on the word cloud is gonna be the largest font, but what are your thoughts here? I want to like step away. People don't associate me with only metadata. So I want to talk about something a little bit slightly different. DBEngines.com has done an amazing job. I think almost everyone knows that they're chronicle all the major databases that are in use today. In January of 2022, there are 381 databases on its ranked list of databases. The largest category is RDBMS. The second largest category is actually divided into two property graphs and RDF graphs. These two together make up the second largest number of databases. So what Ackles heal, this is a problem. The problem is that there's so many graph databases to choose from. They come in different shapes and forms to write, point, there's so many query languages in RDBMS, it's SQL. End of the story. Here we've got Cypher, we've got Gremlin, we've got GQL, and then we've got proprietary languages. So I think there's a lot of disparity in this space. Well, excellent points, Sanjeev, I must say. And that is a problem. The languages need to be sorted and standardized. And people need to have a roadmap as to what they can do with it. Because as you say, you can do so many things. And so many of those things are unrelated that you sort of say, well, what do we use this for? I'm reminded of the saying I learned a bunch of years ago when somebody said that the digital computer is the only tool man has ever devised that has no particular purpose. All right, guys, we gotta move on to Dave Menegar. We've heard about streaming. Your prediction is in that realm. So please take it away. Sure. So I like to say that historical databases are gonna become a thing of the past. But I don't mean that they're gonna go away. That's not my point. I mean, we need historical databases, but streaming data is going to become the default way in which we operate with data. So in the next, say, three to five years, I would expect that data platforms, and we're using the term data platforms to represent the evolution of databases and data lakes, that the data platforms will incorporate these streaming capabilities. We're gonna process data as it streams into an organization. And then it's gonna roll off into historical databases. So historical databases don't go away, but they become a thing of the past. They store the data that occurred previously. And as data is occurring, we're gonna be processing it. We're gonna be analyzing it. We're gonna be acting on it. We only ever ended up with historical databases because we were limited by the technology that was available to us. Data doesn't occur in batches, but we processed it in batches because that was the best we could do. And it wasn't bad and we've continued to improve and we've improved and we've improved. But streaming data today is still the exception. It's not the rule, right? There are projects within organizations that deal with streaming data, but it's not the default way in which we deal with data yet. And so that's my predictions that this is going to change. We're going to have streaming data be the default way in which we deal with data and how you label it and what you call it. Maybe these databases and data platforms just evolve to be able to handle it, but we're gonna deal with data in a different way. And our research shows that already about half of the participants in our analytics and data benchmark research are using streaming data. Another third are planning to use streaming technologies. So that gets us to about eight out of 10 organizations need to use this technology. That doesn't mean they have to use it throughout the whole organization, but it's pretty widespread in its use today and has continued to grow. If you think about the consumerization of IT, we've all been conditioned to expect immediate access to information, immediate responsiveness. We want to know if an item is on the shelf at our local retail store and we can go in and pick it up right now. That's the world we live in and that's spilling over into the enterprise IT world. We have to provide those same types of capabilities. So that's my prediction. Historical databases become a thing of the past. Streaming data becomes the default way in which we operate with data. All right, thank you, David. Well, so what say you, Carl, a guy who's followed historical databases for a long time? Well, one thing, actually, every database is historical because as soon as you put data in it, it's now history. It's no longer, they'll no longer reflect the present state of things, but even if that history is only a millisecond old, it's still history. But I would say, I mean, I know you're trying to be a little bit provocative in saying this, Dave, because you know as well as I do that people still need to do their taxes. They still need to do accounting. They still need to run general ledger programs and things like that. That all involves historical data. That's not going to go away unless you want to go to jail. So you're going to have to deal with that. But as far as the leading edge functionality, I'm totally with you on that. And I'm just kind of wondering if this requires a change in the way that we perceive applications in order to truly be manifested and rethinking the way applications work, saying that an application should respond instantly as soon as the state of things changes. What do you say about that? I think that's true. I think we do have to think about things differently. That's, you know, it's not the way we designed systems in the past. We are seeing more and more systems designed that way. But again, it's not the default and I agree 100% with you that we do need historical databases, you know, that that's clear. And even some of those historical databases will be used in conjunction with the streaming data. Right? Absolutely. I mean, you know, let's take the data warehouse example where you're using the data warehouse as context and the streaming data as the present. And you're saying, here's a sequence of things that's happening right now. Have we seen that sequence before and where, what does that pattern look like in past situations? And can we learn from that? So, Tony Bear, I wonder if you could comment. I mean, when you think about, you know, real-time inferencing at the edge, for instance, which is something that a lot of people talk about. A lot of what we're discussing here in this segment looks like it's got great potential. What are your thoughts? Yeah, well, I mean, I think you nailed it right, you know, hit it right on the head there, which is that I think what I'm seeing is that essentially, and basically I'm going to split this one down the middle, is I don't see that basically streaming is the default. What I see is streaming and basically and transaction databases and analytics data, you know, data warehouse, data links, whatever are converging. And what allows us technically to converge is cloud native architecture, where you can basically distribute things. So you can have, you can have a note here that's, you know, doing the real-time processing that's also doing it, and this is what it leads in. We're maybe doing some of that real-time predictive analytics to take a look at, well, look, we're looking at this customer journey. What's happening with, you know, with what the customer is doing right now and this is correlated with what other customers are doing. So what I, so the thing is that in the cloud, you can basically partition this and because of basically, you know, the speed of the infrastructure that you can basically bring these together and sort of orchestrate them sort of loosely coupled manner. The other part is that the use cases are demanding and this is part that goes back to what Dave is saying, is that, you know, when you look at customer 360, when you look at let's say smart, you know, smart utility groups, when you look at any type of operational problem, it has a real-time component and it has an historical component and having predictives and so like, you know, my sense here is that, you know, that technically we can bring this together through the cloud and I think the use case is that, is that we can apply some real-time sort of, you know, predictive analytics on these streams and feed this into the transactions so that when we make a decision in terms of what to do as a result of a transaction, we have this real-time input. Sanjeev, did you have a comment? Yeah, I was just going to say that to Dave's point, you know, we have to think of streaming very different because in the historical databases, we used to bring the data, install the data and then we used to run rules on top, aggregations and all, but in case of streaming, the mindset changes because the rules, the anomaly, the inference, all of that is fixed, but the data is constantly changing. So it's a completely reverse way of thinking of and building applications on top of that. So, Dave Menegar, there seemed to be some disagreement about the default or not. What kind of timeframe are you thinking about? Is this end of decade, it becomes the default? What would you pin? I think around, you know, between five to 10 years, I think this becomes the reality. I think it's quicker. It'll be more and more common between now and then, but it becomes the default. And I also want Sanjeev at some point, maybe in one of our subsequent conversations, we need to talk about governing streaming data because that's a whole other set of challenges. We've also talked about it rather in a two dimensions, historical and streaming, and there's lots of low latency, micro batch, sub second, that's not quite streaming, but in many cases it's fast enough and we're seeing a lot of adoption of near real time, not quite real time, as good enough for many applications. Right. Mike, go back to that. Yes, sir. You're really loud. Dave, you can tell, I think he's half on the other side. Because nobody's really taking the hardware dimension of the centerpiece of the relation. That'll just happen, Carl. So, near real time, maybe before you lose the customer, however, we define that, right? Okay, let's move on to Brad. Brad, you want to talk about automation, AI, the pipeline, people feel like, hey, we can just automate everything. What's your prediction? Yeah, I'm an AI fiction auto, so apologies in advance for that, but I think that we've been seeing automation at play within AI for some time now, and it's helped us do a lot of things, especially for practitioners that are building AI outcomes in the enterprise. It's helped them to fill skills gaps, it's helped them to speed development, and it's helped them to actually make AI better, because it, in some ways, provides some swim lanes. And for example, with technologies like AutoML, can auto document and create that sort of transparency that we talked about a little bit earlier. But I think there's an interesting kind of conversion happening with this idea of automation. And that is that we've had the automation that started happening for practitioners. It's trying to move outside of the traditional bounds of things like, I'm just trying to get my features, I'm just trying to pick the right algorithm, I'm just trying to build the right model, and it's expanding across that full lifecycle of building an AI outcome to start at the very beginning of data. And to then continue on to the end, which is this continuous delivery and continuous automation of that outcome to make sure it's right, and it hasn't drifted and stuff like that. And because of that, because it's become kind of powerful, we're starting to actually see this weird thing happen where the practitioners are starting to converge with the users. And that is to say that, okay, if I'm in Tableau, right now I can stand up Salesforce Einstein Discovery and it will automatically create a nice predictive algorithm for me, given the data that I pull in. But what's starting to happen, and we're seeing this from the companies that create business software. So Salesforce, Oracle, SAP and others, is that they're starting to actually use these same ideals and a lot of deep learning to basically stand up these out of the box, flip a switch and you've got an AI outcome at the ready for business users. And I very much, I think that that's the way that it's gonna go. And what it means is that AI is slowly disappearing. And I don't think that's a bad thing. I think if anything, what we're gonna see in 2022 and maybe into 2023 is this sort of rush to put this idea of disappearing AI into practice and have as many of these solutions in the enterprise as possible. You can see like, for example, SAP is gonna roll out this quarter, this thing called Adaptive Recommendation Services, which basically is a cold start AI outcome that can work across a whole bunch of different vertical markets. And use cases, it's just a recommendation engine for whatever you need it to do in the line of business. So basically you're an SAP user, you rook up to turn on your software one day, you're a sales professional, let's say, and suddenly you have a recommendation for customer churn, boom, it's going, that's great. Well, I don't know, I think that's terrifying. In some ways, I think it is the future that AI is gonna disappear like that, but I'm absolutely terrified of it because I think that what it really does is it calls attention to a lot of the issues that we already see around AI, specific to this idea of what we like to call it omnia, responsible AI, which is how do you build an AI outcome that is free of bias, that is inclusive, that is fair, that is safe, that is secure, that it's audible, et cetera, et cetera, et cetera, et cetera. That takes a lot of work to do. And so if you imagine a customer that's just a Salesforce customer, let's say, and they're turning on Einstein Discovery within their sales software, you need some guidance to make sure that when you flip that switch that the outcome you're gonna get is correct. And that's gonna take some work. And so I think we're gonna see this, ooh, let's roll this out. And suddenly there's gonna be a lot of problems, a lot of pushback that we're gonna see and some of that's gonna come from GDPR and others that Sam Jeeve was mentioning earlier and a lot of it's gonna come from internal CSR requirements within companies that are saying, hey, whoa, hold up, we can't do this all at once. Let's take the slow route, let's make AI automated in a smart way and that's gonna take time. Yeah, so a couple of predictions there that I heard, I mean, AI essentially, you can disappear, it becomes invisible maybe if I can restate that. And then if I understand it correctly, Brad you're saying there's a backlash in the near term. People can say, oh, slow down, let's automate what we can, those attributes that you talked about are non-trivial to achieve. Is that why you're a bit of a skeptic? Yeah, I think that we don't have any sort of standards that companies can look to and understand. And we certainly within these companies, especially those that haven't already stood up in internal data science team, they don't have the knowledge to understand that when they flip that switch for an automated AI outcome that it's gonna do what they think it's gonna do. And so we need some sort of standard methodology and best practices that every company that's going to consume this invisible AI can make use of. And one of the things that has sort of started that Google kicked off a few years back that's picking up some momentum and the companies I just mentioned are starting to use it is this idea of model cards where at least you have some transparency about what these things are doing. So like for the SAP example, we know for example that it's convolutional neural network with a long short-term memory model that it's using. We know that it only works on Roman English and therefore me as a consumer can say, oh, well I know that I need to do this internationally so I should not just turn this on today. All right, thank you. Carl, can you add anything, any context here? Yeah, we've talked about some of the things Brad mentioned here at IDC in our future of intelligence group regarding in particular the moral and legal implications of having a fully automated AI driven system because we already know and we've seen that AI systems are biased by the data that they get, right? So if they get data that pushes them in a certain direction, I think there was a story last week about an HR system that was recommending promotions for white people over black people because in the past white people were promoted and more productive than black people but it had no context as to why, which is because they were being historically discriminated, black people being historically discriminated against but the system doesn't know that. So you have to be aware of that. And I think that at the very least there should be controls when a decision has either a moral or legal implication when you really need a human judgment. It could lay out the options for you but a person actually needs to authorize that action. And I also think that we always will have to be vigilant regarding the kind of data we use to train our systems to make sure that it doesn't introduce unintended biases. And to some extent they always will. So we'll always be chasing after that. That's a good one. Absolutely, Carl. Yeah, I think that what you have to bear in mind as a consumer of AI is that it is a reflection of us and we are a very flawed species. And so if you look at all the really fantastic, magical looking super models we see like GPT-3 and 4 that's coming out, they're xenophobic and hateful because the people, the data that's built upon them and the algorithms and the people that build them are us. So AI is a reflection of us. We need to keep that in mind. Yeah, the AI is biased because humans are biased. All right, great. Okay, let's move on. Doug Henshin, a lot of people that said, now data lake, that term's not going to live on but it appears to be have some legs here. You want to talk about lake house, bring it on. Yes, I do. My prediction is that lake house and this idea of a combined data warehouse and data lake platform is going to emerge as the dominant data management offering. I say offering, that doesn't mean it's going to be the dominant thing that organizations have out there but it's going to be the predominant vendor offering in 2022. And heading into 2021, we already had cloud era, data bricks, Microsoft, Snowflake as proponents in 2021, SAP, Oracle and several of these of fabric virtualization slash mesh vendors joined the bandwagon. The promise is that you have one platform that manages your structured, unstructured and semi-structured information and it addresses both the be on analytics needs and the data science needs. The real promise there is simplicity and lower cost but I think end users have to answer a few questions. The first is, does your organization really have a center of data gravity or is the data highly distributed? Multiple data warehouses, multiple data lakes, on-premises, cloud. If it's very distributed and you have difficulty consolidating and that's not really a goal for you, then maybe that single platform is unrealistic and not likely to add value to you. You know, also the fabric and virtualization vendors, the mesh idea, that's where if you have this highly distributed situation, that might be a better path forward. The second question, if you are looking at one of these lake house offerings, you are looking at consolidating, simplifying, bringing together to a single platform, you have to make sure that it meets both the warehouse need and the data lake need. So you have vendors like data bricks, Microsoft with Azure Synapse, new really to the data warehouse space and they're having to prove that these data warehouse capabilities on their platforms can meet the scaling requirements, can meet the user and query concurrency requirements, meet those tight SLAs. And then on the other hand, you have the Oracle SAP, Snowflake, the data warehouse folks coming into the data science world and they have to approve that they can manage the unstructured information and meet the needs of the data scientists. I'm seeing a lot of the lake house offerings from the warehouse crowd, managing that unstructured information in columns and rows. And some of these vendors, Snowflake in particular, is really relying on partners for the data science needs. So you really got to look at a lake house offering and make sure that it meets both the warehouse and the data lake requirement. Thank you, Doug. Well, Tony, if those two worlds are going to come together, as Doug was saying, the analytics and the data science world doesn't need to be some kind of semantic layer in between. I don't know, weigh in on this topic, if you would. Oh, didn't we talk about data fabrics before? Common metadata layer. Actually, I'm almost tempted to say, let's declare victory and go home in that this is actually going on for a while. I actually agree with much what Doug is saying there, which is that, I mean, we, I remember, as far back as I think it was like 2014, I was doing a study, you know, I was still at OVM, Procedure of Omnia, looking at all these specialized databases that were coming up and seeing that, you know, there's overlap of the edges. But yet there was still going to be a reason at the time that you would have, let's say, a document database for JSON. You'd have a relational database for transactions and for data warehouse. And you had basically something at that time that resembles or do for what we consider the data life. Fast foe. And the thing is what I was saying at the time is that you're seeing basically, you know, sort of blending at the edges. That was saying like about five or six years ago. That's sort of, and the lake house is essentially, you know, the current manifestation of that idea. There is a dichotomy in terms of, you know, it's the old argument, do we centralize this all, you know, in a single place or do we virtualize? And I think it's always going to be a union. There's never going to be a single silver bullet. I do see that there are also going to be questions and these are things that points that Doug raised. They're, you know, what, you know, what do you need of your, you know, for your performance there or for your performance characteristics? Do you need, for instance, high concurrency? Do you need the ability to do some very sophisticated joins? Or is your requirement more to be able to distribute and distribute our processing as far as possible to get to essentially do a kind of brute force approach. All these approaches are valid based on, you know, based on the use case. I just see that essentially that the lake house is the culmination of, it's nothing, it's just a relatively new term introduced by Databricks a couple of years ago. This is a culmination of basically what's been a long time trend. And what we see in the cloud is that as we start seeing data warehouses as a checkbox item to say, hey, we can basically source data in cloud storage and S3 Azure Blocks or, you know, whatever, as long as it's in certain formats like, you know, like, you know, Parquet or CSV or something like that, you know, I see that as becoming kind of, you know, a checkbox item. So to that extent, I think that the lake house, depending on how you define it, is already a reality. And in some cases, maybe new terminology, but not a whole heck of a lot new under the sun. Yeah, and Dave Meniger, I mean, a lot of this, thank you, Tony, but a lot of this is going to come down to, you know, vendor marketing, right? Some people just kind of co-opt the term. We talked about data mesh washing. What are your thoughts on this? Yeah, so I used the term data platform earlier. And part of the reason I use that term is that it's more vendor neutral. We've tried to sort of stay out of the vendor terminology patenting world, right? Whether the term lake houses, what sticks or not, the concept is certainly going to stick. And we have some data to back it up. About a quarter of organizations that are using data lakes today already incorporate data warehouse functionality into it. So they consider their data lake house and data warehouse one and the same. About a quarter of organizations, a little less, but about a quarter of organizations feed the data lake from the data warehouse and about a quarter of organizations feed the data warehouse from the data lake. So it's pretty obvious that three quarters of organizations need to bring this stuff together, right? The need is there, the need is apparent. The technology is going to continue to converge. I like to talk about it. You've got data lakes over here at one end. And I'm not going to talk about why people thought data lakes were a bad idea because they thought you just throw stuff in a server and you ignore it, right? That's not what a data lake is. So you've got data lake people over here and you've got database people over here, data warehouse people over here. Database vendors are adding data lake capabilities and data lake vendors are adding data warehouse capabilities. So it's obvious that they're going to meet in the middle. I mean, I think it's like Tony says, I think we should declare victory and go home. And so, is this a followup on that? So are you saying the specialized lake and the specialized warehouse, do they go away? I mean, Tony, Tony, data mesh practitioners would say or advocates would say, well, they could all live as just a note on the mesh. But based on what Dave just said, are we going to see those all morphed together? Well, number one, as I was saying before, there's always going to be this sort of, kind of, centrifugal force or the tug of war between, do we centralize the data or do we virtualize? And the fact is, I don't think that there's ever going to be any single answer. I think in terms of data mesh, data mesh has nothing to do with how you're physically implement the data. You could have a data mesh on a basically, on a data warehouse. It's just that, you know, the difference being is that if we use the same, you know, physical data store, but everybody's logically meant, you know, basically governing it differently, you know. Data mesh is basically, it's not a technology, it's a process, it's a governance process. So essentially, you know, I basically see that, you know, as I was saying before, that this is basically the culmination of a long time trend where essentially we're seeing a lot of blurring, but there are going to be cases where, for instance, if I need, let's say like, observe, if I need like high concurrency or something like that, there are certain things that I'm not going to be able to get efficiently, get out of a data lake. And, you know, we're basically, I'm doing a system where I'm just doing really brute forcing, very fast file scanning and that type of thing. So I think there always will be some delineations, but I would agree with Dave and with Doug, that we are seeing basically a confluence of requirements that we need to essentially have basically the element, you know, the ability of a data lake and a data laid out, dare a warehouse. These need to come together. So I see. I think what we're likely to see is organizations look for a converged platform that can handle both sides for their center of data gravity. The mesh and the fabric vendors, the fabric virtualization vendors, they're all on board with the idea of this converged platform. And they're saying, hey, we'll handle all the edge cases of the stuff that isn't in that center of data gravity, but that is off distributed in a cloud or at a remote location. So you can have that single platform for the center of your data and then bring in virtualization mesh, what have you for reaching out to the distributed data. Bingo. Or as Dave basically said, people are happy when they virtualize data. I think yes, at this point, but to this Dave Menengar's point, they have converged, they are converging. Snowflake has introduced support for unstructured data. So now we're literally splitting here. Now what Databricks is saying is that, uh-huh, but it's easy to go from data lake to data warehouse, then it is from data warehouse to data lake. So I think we're getting into semantics, but we've already seen these two converge. So it's taking somebody like AWS who's got what, 15 data stores. Are they gonna have 15 converged data stores? That's interesting to watch. All right guys, I'm gonna go down the list. I'm gonna do like a one word each and you guys, each of the analysts, if you wouldn't just add a very brief sort of course correction for me. So Sanjeev, I mean, governance is gonna be the, maybe it's the dog that wags the tail now. I mean, it's coming to the fore, all this ransomware stuff, which you really didn't talk much about security, but what's the one word in your prediction that you would leave us with on governance? It's gonna be mainstream. Mainstream, okay. Tony Bear, mesh washing is what I wrote down. That's what we're going to see in 2022, a little reality check, you want to add to that? Reality check, because I hope that no vendor jumps the shark and calls there offering a data mesh product. Yeah, let's hope that doesn't happen. If they do, we're going to call them out. Carl, I mean, graph databases, thank you for sharing some high growth metrics. I know it's early days, but magic is what I took away from that. It's the magic database. Yeah, I would actually, I've said this to people too. I kind of look at it as a Swiss army knife of data because you can pretty much do anything you want with it. That doesn't mean you should. I mean, there's definitely the case that if you're, you know, managing things that are in a fixed schematic relationship, probably a relation database is a better choice. There are times when the document database is a better choice. It can handle those things, but maybe not, it may not be the best choice for that use case, but for a great many, especially with the new emerging use cases I listed, it's the best choice. Thank you. And Dave Menegar, thank you for, by the way, for bringing the data in. I like how you supported all your comments with some data points, but streaming data becomes the sort of default paradigm, if you will. What would you add? Yeah, I would say think fast, right? That's the world we live in. You got to think fast. Fast, love it. And Brad Shiman, love it. I mean, on the one hand, I was saying, okay, great. I'm afraid I might get disrupted by one of these internet giants who are AI experts. So I'm going to be able to buy instead of build AI, but then again, I've got some real issues. There's a potential backlash there. So give us your bumper sticker. Yeah, I would say going with Dave, think fast and also think slow, to talk about the book that everyone talks about. I would say really that this is all about trust, trust in the idea of automation and of transparent invisible AI across the enterprise. But verify, verify before you do anything. And then Doug Henshin, I mean, look, I think the trend is your friend here on this prediction with lake houses really becoming dominant. I liked the way you set up that notion of, you know, the data warehouse folks coming at it from the analytics perspective. And then you got the data science worlds coming together. I still feel as though there's this piece in the middle that we're missing, but your final thoughts will give you the last one. Well, I think the idea of consolidation and simplification always prevails. That's why the appeal of a single platform is gonna be there. We've already seen that with, you know, Hadoop platforms moving toward cloud, moving toward object storage and object storage becoming really the common storage point for whether it's a lake or a warehouse. And that second point, I think ESG mandates are gonna come in alongside GDPR and things like that to up the ante for good governance. Yeah, thank you for calling that out. Okay, folks, hey, that's all the time that we have here. Your experience and depth of understanding on these key issues and data and data management really on point, and they were on display today. I wanna thank you for your contributions. Really appreciate your time. Enjoyed it. Thank you. Now, in addition to this video, we're gonna be making available transcripts of the discussion. We're gonna do clips of this as well. We're gonna put them out on social media. I'll write this up and publish the discussion on wikibon.com and siliconangle.com. No doubt several of the analysts on the panel will take the opportunity to publish written content, social commentary, or both. I wanna thank the power panelists and thanks for watching this special CUBE presentation. This is Dave Vellante. Be well and we'll see you next time.