 from the campus of MIT in Cambridge, Massachusetts. It's theCUBE, covering the MIT Chief Data Officer and Information Quality Symposium. Now, here's your host, Stu Miniman. Welcome back to theCUBE, SiliconANGLE Media's flagship program. We go out to all the shows, help extract the signal from the noise. Here our fourth year at the MIT CDOIQ, which is the Chief Data Officer in Information Quality. In the time that we've been covering this event, it's really the mash-up of a lot of trends we've been seeing, talk about data itself, analytics, big data, happy to have on the program. First, I've got George Gilbert, who's part of the analyst team at Wikibon, and first time guest on the program, Steve Todd, who's an EMC fellow, of course with EMC and somebody I've known for years. Steve, welcome to the program. Thanks, Stu, reunited and it feels so good. Yeah, absolutely. So, Steve, for our audience that doesn't know you, just give us a little bit about your background and what your role is at EMC. Sure, well, actually last month, Stu, was my 30-year anniversary with EMC. That includes 14 years with Data General, where I was one of the founders of the Clarion storage subsystem. So, 25 years as a software engineer, working on building EMC's products in the last five years in the CTO office, working on innovation, strategy, and research objectives. All right, so help us connect the dots, going from storage to, now we're talking about data and data governance, and all of these pieces. How does that connect to what you're doing today? So, I would say when you look at storage, the trend is just continue to move up the stack, right? We move storage systems, we move towards switches, and then we moved up into the server, and we moved up into consumption models, and now we're seeing these trends about the volume of data are driving these new digital business models, and it's hard to get a handle on what exactly is the value of data, which is why EMC, as one of the largest vendors, storing data would love to figure this problem out. All right, so George, let's pull you into the conversation here. You know, you cover analytics for many years, and that before we called it big data on this side. What about data itself? Where is the value in data? How does that fit into kind of your research? Well, it's actually interesting because it's very hard to draw a line between data and analytics in the sense that what we've learned over the last few years, what seeped into the mainstream is that data by itself has value, analytics by itself in terms of the algorithm has value, but it's, the magic comes when you fuse the two because when you have a natural sort of boundary between two sort of categories of value, no one really makes money at the intersection. It's only when it's really hard to put the two together, like with the weather channel, that, not the channel, but the weather company that IBM bought. That's a huge amount of data, that's really sophisticated models, putting it together, no one else could do it. All right, so Steve, you inside EMC have been looking at this data value piece. Talk a little bit about what kind of spurred that initiative and what is the initiative Sure, two years ago, EMC joined together with Cap Gemini and we did a survey on big data. And one of the surprising results that came out of that is two thirds of the respondents said, they believe within a few years, the revenues they'll get from data will equal or exceed the revenues that they get from selling their own products. So we looked at that and said, well, is that true? And if it is, how can we help our customers and help the industry cross over to that type of model where they're actually getting revenue off of data in addition to the revenue they're currently getting from their products? So that's why we reached out to one of the experts in academia, Dr. Jim Short, from the San Diego Supercomputer Center. And we joined with him about a year and a half ago to validate that statement and then to plan about how we would reach that result. Yeah, we had a conversation earlier today with SAP who talked about moving data from just being part of the application to really getting value out of data, how the business owners do it. I mean, George, did you look at the application stack? How's that migration? How's that change happening about data ownership and data value? Well, this, it's a really good question because for essentially 40 years, maybe more, ever since we started building apps, whether by hand or in packages, you do this thing called schema design, which is, think of it as a bucket brigade where you put exactly, you designed exactly where the data was going to go for an application and then you put these forms across it which drove the workflow or the business processes and we couldn't have changed more in an era of big data where essentially you put the data in this repository without any upfront or very minimal design and the data, as we were talking about before, when you fuse it with the analytics, out of that fusion sort of bubbles up the application and it's the data that informs how the application works and since that data is always changing, you're adding new stuff, not just variations on what you have, but new sources, the data, the application is always evolving. Yeah, so Steve, I know you've seen, both at Wikibon and through the video coverage we've been doing with theCUBE, this is a topic that's kind of interesting and crossing a number of domains. So we cover both the infrastructure, kind of cloud, the analytics, and as you wrote in your first book, some of those Venn diagram innovations where you cross some of the boundaries and re-exuse some of the existing pieces, you get some interesting ideas. So we've had some conversations. What have you seen, kind of what we're doing, what we're covering, what we're kind of unearthing in the industry? So one of the things I've noticed from the Wikibon coverage is, I noticed you interviewed Stephen Manley from our data protection division and asked him the question about data value. And as the industry comes to understand the true value of their data and as that value goes up, the need for protection goes up. So that's one of the first insights that we've seen is helping our customers or helping the industry categorize and understand the value in terms of let's keep a metadata repository that tracks value and watch that value move up and down and make placement decision or protection decisions based on that value changing. Yeah, so what does that mean as data becomes more dispersed? So it used to be, I built a data temple, now it's, data's everywhere, things like IoT are going to spread things even further out there, public cloud. What does that mean? How does EMC look at that holistically in that space? So in order to track value holistically, again, you have to keep a metadata repository that tracks that value and the only way to scale that globally is to use the same technology that you use to globally scale data, which is object-based approaches. So as we think about attaching valuation metadata to content, we think about associating that directly with the content and using the same techniques to scale that valuation metadata that you do with the content itself. Okay, so are you saying there needs to be some tie with the infrastructure because that almost goes a little bit counter to money with things that I've seen that we should be able to kind of abstract away from the infrastructure? So I think if you abstract away from the infrastructure, then you're keeping your metadata about valuation separate from your content and the orchestration of that becomes more complicated. So it's possible, but by understanding a little bit about the underlying technology, you can actually leverage some of the benefits of it. Yeah, George, I'm curious your thoughts. Metadata is something I know when I joined Wikibon six years ago, David Fleurer had been talking about it for years. It's one of those gnarly challenges that we've had in the industry. Hey, how do you see that discussion evolving? Well, so far I've seen, and I've spent more of my time up on the analytics and apps layer, but from what I've seen, the two broad choices are you have a data value chain that starts all the way back in the operational apps and it moves the data moves over to your sort of analytic repository, increasing the data lake, but also data warehouse and then operationalizing those insights. And they're not a lot of metadata solutions that we've come across that take you through that data life cycle from birth to death. And at the same time, if you have metadata solutions that do a good job of sort of slicing that up, then stitching them together is very difficult. So, and then one other wrinkle, which is that the sort of buying power for how to consume data in business intelligence and analytics is shifting into the line of business. And they, IT and the line of business have yet to come to terms with how does IT provide guardrails so that line of business doesn't sort of take the data and mangle it in a way that no one knows where it came from. Right. Can you comment on that, Steve? So I, one of the things I'm learning in this conference is chief data officers who I thought before today don't care too much about the infrastructure. I just went to a session with a CDO and a CEO sitting side by side, collaborating very closely on new types of data platforms that can support things like valuation. So I think the abstraction away from the infrastructure and the industry is happening. Maybe we over rotate it a little bit and there's reason to bring it back and have the CDO and the CIO work more closely together. Yeah, I know we've seen the last few years on this. They need to work together. I think many said the CDO probably shouldn't report to the CIO, but they will said, as George said, the kind of the guardrails send some of the operational pieces, but the CIO is going to be the one involved with the technology and the tools for that. So does that seem a decent kind of separation of power? Well, I think that the separation of power that you specified is accurate. There's also separation of power with the CISO, right? So it's the three working together and the CDO doesn't want to get too much into the infrastructure, doesn't want to get too much in the security and privacy, but needs to make sure that all three are working together. Okay, so Steve, you're going to be coming back to speak with us tomorrow for people that want to find out more about kind of the data value research that you have that aren't here at the show. Of course they can watch the other segments we're doing but where else can they go find some more information on this? Well, certainly as we learn more from the research with Dr. Short in San Diego, we're publishing everything we find on my own personal blog and other EMC related blogs, but in a nutshell, Dr. Short is focusing on what are some of the new processes and new roles that organizations are going to need to roll out in order to value data appropriately? Well, I'm looking at what is the impact of the IT architectures? And again, all of this is rolled out on my blog or on my Twitter handle. Okay, great. And we've also got a website that we're going to help pull this information together. If you go to thecube365.com slash MIT CDOIQ and have a collection of the videos we're doing here, have a link to some of the stuff that Steve's talking to. And we'll be back here with lots more coverage from the MIT CDOIQ 2016. You've been watching The Cube.