 From Cambridge, Massachusetts, it's The Cube, covering MIT Chief Data Officer and Information Quality Symposium 2019, brought to you by SiliconANGLE Media. Welcome back to MIT CDOI Cube, everybody. You're watching The Cube. We go out to the events. We extract the signal from the noises. Day one of this conference. Chief Data Officer event, I'm Dave Vellante with my co-host Paul Gillan. Stuart Bond is here as a research director at International Data Corporation, IDC. Stuart, welcome to The Cube. Thanks for coming on. Thank you for having me. You're very welcome. So your space, data intelligence, tell us about your swim lane. Sure. So my role at IDC is I follow the data integration and data intelligence software market. So I follow all the different vendors in this market. I look at what kinds of solutions they're bringing to market, what kinds of problems they're solving, both business and technical for their clients. And so I can then report on the trends and market sizes and forecasts and such. And within that, part of what I cover is everything from data integration, which is more the traditional ETL change data capture, data movement, data virtualization, types of technologies, as well as what we just call data integrity, or what I'm now calling data intelligence, which is all of the, tell the metadata about the data. It's the data catalogs, it's the metadata management, it's the data lineage, it's the data quality, data profiling, master data intelligence. It's all of the data about the data and understanding, really answering what I call answering the five W's and H of data. It's the who, what, where, when, why, and how of data. So that's the market that I'm covering and following and it's why I'm here. Were you here this morning for Mark Ramsey's talk? Yes, I was. So he kind of went through your herd and he started with EDW, kind of threw ETL under the bus a little bit. MDM and then the enterprise data model said all that failed, but that stuff's not going away. And I'm sure they're, you know, Glaxo's still using, you know, all those, all that tooling today. So what was your reaction to that? Were you nodding your head then? Yeah, it's true. Or saying, well, maybe there's a little, well, haven't we been saying the mainframe's going to go away for years and it's still around? So I think that obviously there's still those technologies out there and they're still being used. You can look at any of the major ETL vendors and there's new ones coming in the market so that it's still alive and well. There's no doubt that it's out there and it's the biggest segment of the market that I follow. So there's no- And there's open source tooling, right? Yeah, there's no doubt that it's still there, but Mark's vision of where things are going, where things are heading with data intelligence really being at the core. We talked about those spiders, we talked about that central repository of information about knowledge of the data. That's where things are heading to, whether you call it a data hub, or whether you call it a data platform, not really one big huge data platform, one big huge data repository, but one place where you can go to get the information about the data so you can find out where the data is, you can find out what it means, both the business context as well as the technical information. You can find out who's using that data, you can find out when it's being used, why it's being used and why do we even have it and how it should be used so it's being used appropriately. So, you would say that his vision, actually what he implemented was visionary, he's skating to the, they skated to the puck so to speak. Is that, yeah. Yeah, yeah. And that's, we're going to see more of that, we are seeing more of that. That's why we've seen such a jump in the number of vendors that are providing data cataloging solutions. I did a, IDC has this work product we call a market glance. I did that beginning of 2018. I just did it again in the middle of this year. And the number of vendors that offer data catalog solutions has significantly increased, like 240% increase in the number of vendors that offer that. Now, it's off of a small base. These are not exhaustive studies. It may be that I didn't know about all those data catalog vendors a year and a half ago, but it may also be that people are now saying, well, we've got a data catalog, but you've really got to peel back the layers a little bit and understand what these different data catalogs are and what they're doing because not all of them are created equally. Well, they hit your radar if you don't know about it. 99% of the world doesn't know about it. Mark talked this morning about some interesting new technologies they were using, the spidering to find the data, bots to classify the data, tools to wrangle the data. I mean, there's a lot of new technology being applied to this area. Which of those technologies do you think has the greatest promise right now? And how automated can this process become? It's the spidering and it's the cataloging of the data. It's understanding what you've got out there. That is growing crazy. We just started to track that and it's growing a lot. That has the most promise. And as I said, I think that's going to be the data platform of the future is the intelligence knowing about where your data is so you can then go and get it. You know, it's not a matter of all the data is one place anymore. Data's everywhere. Data's in hybrid cloud. It's in on-premise. It's in private cloud. It's in hosted. It's everywhere. I just did a survey, got the results back in June 2019, just a month ago, and the data is all over the place. And so really having that knowledge, having that intelligence about where your data is, that has the most promise. As far as the automation is concerned, the next step there, it's not just about collecting the information about where your data is, but it's actually applying the analytics, the machine learning and the artificial intelligence to that metadata collection that you've got so that you can then start to create those bots, to create those pipelines, to start to automate those tasks. We're starting to see some vendors move in that area, move in that direction. There's a lot of promise there. You guys, at least when I remember of IDC, the software group had this pretty robust taxonomy. I'm sure it's evolved over the years. So how do you sort of define your space? I'm interested in how big is that space? You know, in terms of market size and is it growing and where do you see it going? Right. So my coverage of data integration and data intelligence is fairly small. It's a small little market at IDC. I'm part of a larger team that looks at data management and analytics and information management. So we've got people on our team, like Dan Veset, who covers the analytics, the advanced analytics. Shannan Gopal, Carl Wolfson, he's been on theCUBE, he covers database management technologies. Those, I apologize, I don't have that number off the top of my head. Okay, I know, but your space, really. My space. Yeah, because that software market is so fragmented and what IDC has always done well is you put people on those fragments and you go deep in there. And so somehow you've been able to not make your eyes bleed when you do that and it's a challenging job. So the data integration. But you put it all together and it's important. Yeah, the data integration market's about six and a half billion dollars. Substantial size. Yeah. But again, a lot of vendors. A lot of vendors. A growing number of vendors. And the market's growing? The market continues to grow. As the data's becoming more distributed, more dispersed, there's a need to continue to integrate that data. There's also that need, that growing need for that data intelligence. It's not just, you know, we've had a lot of inquiries lately about data being fed into machine learning and artificial intelligence and people realizing our data isn't clean. We have to clean up our data because we're garbage in, garbage out. It's probably more important now than ever before because you don't have someone saying, I don't think that data's right. You've got machines that are looking at that data instead. The technology that's out there and the problem with data quality, it's not a new problem. It's the same problem we've had for years. A lot of the technology is there to clean that data up. And that's a part of what I saw. I looked at the data quality vendors. Experian is here. SyncSort, you know, and all of the other data quality capabilities that you get from Informatica or from Vantaho or from a click through Podium, that all is there. And so that part is growing and there's a lot of more interest in that data quality and that data intelligence side again. So the right data can be used. Good data can be used. The trust in that data can be increased. We're used for the right reasons as well. And that's adding that context and understanding. That's semantic. Having all that metadata that goes around that data so that it can be used most importantly. It's one of those markets that, maybe relatively small, it's not 100 billion, but it enables a lot of larger markets. So, okay, so it's six, six and a half billion and it's growing it, is it growing single digits, double digits? It's growing, it's hovering around the double digits. It is, okay, so it's pushing 10%. And then who are the big players? Who's driving the share? Is there a dominant player or are there a bunch of? So Informatica's number one in the market. Followed by IBM. And SAP's right up there, SaaS is there. Talend is making a good, yeah, they're making a nice headway in size, yeah. But there's a number of different players. There's a lot of different players in that market. And the leading market share player has what? 10%, 15%, 50%, is it like a dominant? Share, I don't know what to put in the spot there. That's tough to say. Informatica's big. It's over a billion. It's over a billion, yeah. Right, so they've got maybe a sixth of the market. Okay, so but it's not like Cisco has two thirds of the networking market or anything like that. And what about the cloud guys? Are they participating in this and how do you guys deal with that? The cloud guys, yeah, the cloud guys, so there are some pure cloud solutions. There's a RELTO, for example, Pure Cloud, MBM, Master of A Management. I'd say there's less pure cloud than there used to be. But someone like an Informatica is really pushing that cloud presence and that cloud capability. So running this tooling in the cloud, but the cloud guys directly are not competing at this point yet? Is Amazon? No, so Amazon, Google, yes. So those cloud guys, yes. They are Google announced Dataflow back in, or data, sorry, data fusion back at Google Cloud. Yeah, that's right, yep. And so they've got an ETL tool in the cloud now. Amazon has Glue, which is both a catalog and an ETL tool. Microsoft, of course, has Datafactory and Azure. So those guys are coming on. They're coming on. I'm guessing if you talk to Informatica, they said, well, they're not as robust as we are and we've got a big install base and we go multi-cloud and is that kind of the posturing of the incumbents or? Yeah, that's the posturing. And I shouldn't, maybe that's, I don't mean it as a pejorative. If I were those guys, I'd be doing the same thing, but we were talking earlier about how the cloud guys essentially killed the dupe, right? Do you see the same thing happening here or is it, will the tool vendors be able to stay ahead in your view? Depends on how they execute. And if they're there and they're available in the cloud along with those cloud providers, they're able to provide solutions in the same way, the same elasticity, the same type of consumption-based pricing models that the cloud vendors are offering. They can compete with that. They still have better solutions than what? And multi-cloud and hybrid is a big part of their value problem. That the cloud guys aren't really going hard after. I mean, they're sort of dangling their toe in the water. Some of them are anyway. And some of the cloud guys, they have the hybrid capabilities because they've got some of what they've built comes from on-premises worlds as well. So they've got that ability to. Microsoft in particular, right? Microsoft and Google. Google, the data fusion came out of the. You're saying it's part of the Anthos initiative or? I apologize if the Google folks are watching, but. I know. It's a soup of acronyms in our business. It is. What tools have you seen or technology have you seen for making, for governance of unstructured data that looks promising? So I don't really cover the unstructured data space that much. What I can say is, just as in the structured data world, it's about the metadata. It's about having the proper tags about that unstructured data. It's about getting the information of that unstructured data so that it can then be governed appropriately. It's making structure out of that. That is, I can't really say because I don't cover that market explicitly. But I think, again, it comes back to the same type of data intelligence. Having that intelligence about that data by understanding what's in there. What advice are you giving to the buyers in your community and the sellers in your community? So the buyers within the market talk a lot about the need for that data intelligence. So data governance to me is not a technology. You can't go out and buy data governance. Data governance is an organizational discipline. Technology is a part of that. To me, the data intelligence technology is a part of that. So really, for organizations, if they really want to get a good handle on what data they have, how to be enabled by that data, they need to have that data intelligence. They need to go out and look for solutions that can help them pull that data intelligence out. But the other part of that is measurement. It's critical to measure because you can't improve what you're not measuring. So that type of approach to it is critical. And you've got to be able to have people in the organization. You've got to be able to have cooperation, collaboration across the business, IT, the Chief Data Officer office. You've got to have that collaboration. You've got to have accountability in order for that to really be successful. For the vendors in this space, hybrid is the new reality. In my survey data, it shows clearly that hybrid is where things are. It's not just cloud. It's not just on-premise. It's hybrid. That's where the future is. They've got to be able to have solutions that can work in that environment, that can work in that hybrid cloud ability. They've got to be able to have solutions that can be purchased and used, again, in the same sort of elastic type of method that consumers are able to get services from other vendors in that same type of. Great. All right, so we've got to run. Thank you so much for sharing your insights and your data. I know I was firing a lot of questions at you. You did pretty well. Not having the report in front of you. I know what that's like. So thank you for sharing. And good luck with your challenges in the future. You got a lot of data to collect and a lot of fast-moving markets. So come back anytime and share with us your update. I've got space for me right now. Yeah, right. Okay, and thank you for watching. Paul and I will be back with our next guest right after this short break from MIT CDO IQ. Right back.