 Hi everybody, we're back. This is Dave Vellante. I'm with Wikibon.org. I'm with my co-host, Paul Gillan. And we're here at MIT today and yesterday. We're focused on the topic of information quality. theCUBE is very excited to be here. This is theCUBE, by the way. We do a lot of big data events at SiliconANGLE Wikibon and not enough attention really is paid to the notion of information quality, data quality, data governance. It's a topic that's been kind of pushed down within the organization over the years and really focused a lot and say for example in the financial services area driven by the CFO. Not really, I mean certainly touching all the line of businesses, but a lot of line of businesses just haven't paid attention. Now with this big data theme, everybody's paying attention to data. You hear about data-driven organizations and data-centric organizations and part of that emergence brings this new light to the role of the chief data officer. Mark Johnson is here. He's the CEO of Gavros and Gavros is a company that helps organizations get their data governance act together, really consulting around that whole CDO initiative. Mark, welcome to theCUBE. Thank you, it's good to be here. Yeah, good to see you. So hopefully I described your organization correctly but why don't you, I'm sure you could do a better job than I. So tell us a little bit about. Well Gavros is a 31 year old company. We've been around, started in Johannesburg, South Africa, but we've been around for 31 years exclusively focused on data. And so data quality, accessibility of information, the accuracy of data within operational systems, the accuracy of data within analytical reporting systems for executive decision making has really been where we focused. And of course now data warehousing sort of became a trend, Dave, back in about 1985 or so, 1986, it began to sort of appear the aggregation of all this information from across the enterprise into a single database so that you could use it for analytics and reporting. But that was a big expansion and the credibility and the importance of information and this whole idea of information technology. But until today, until probably the last three years with the advent of big data, all the social, mobile, unstructured digital content. And cheap infrastructure in the cloud. Cheap infrastructure, right. Have we had the kind of confluence of trends coming together in the organization around the value of data and information? And so now this is the best time that I've ever seen to be focused on data. And we're excited about helping our clients get their data governance and data management act together. Well, and you're talking about the DW and VI space. Yes, yep. And I remember actually, I remember when Teradata came out. Yes, yeah, back in the early 90s, yeah. Actually, even before that, when they came out it's a startup. Yep. And people are like, okay, this is interesting. And the promise of decision support and data warehousing was alluring. But in many ways it didn't live up to that expectation, right? And then sort of the Enron disaster and Sarbanes-Oxley sort of gave a boost to that whole space. And then now of course we have this big data meme and everybody's excited again. Right, right. And we're going to solve a lot of the world's problems. What gives you confidence? And by the way, I think that's another reason why the data governance issue became kind of in the boring but important category. What gives you confidence that this new wave is actually going to live up to the promises and then obviously elevate that whole CDO role? Well, I think we've had a number of big events happen. The first big event that really threw a spotlight on data was the implosion of the investment banking industry and the crash on Wall Street in 2008 and 2009. As we look back at what happened there, folks who have done a lot of analysis around this issue have identified the fact that data, the free flow of information, the accuracy and integrity of that data played a big role from a regulatory perspective and not being able to see and understand, yes, right. And not being able to see and understand that the system was under stress. And so I think that has cast a huge spotlight on some of the challenges and issues that organizations face in the absence of managing high quality information and making that information shareable. The second thing is the Baby Boom population. The Baby Boom population has, you know, like a pig in a python, moved through the python to the point where now we're about to embark on our journey into our retirement years and into the Medicare Medicaid system. So from a healthcare perspective, we're recognizing that patient information and the free flow and the accuracy of patient information at a volume and scale level that we've never before seen is going to create a very, very important need to get our arms around information, information quality and its free flow. The third sort of megatron that's come together is this whole digitization of everything. What we call it is the internet of things today and I think you probably have talked with some folks about that. But the internet of things is creating just scads and scads and scads, large volumes of streaming data content from all kinds of devices. For example, Progressive Insurance has got their snapshot device now, which rides along with you in your car and their accelerometer tells Progressive how fast you're going, where you're stopping, the geo sensor on it tells them where you're driving within a neighborhood. And so this is helpful in helping insurance companies like Nationwide and Progressive Insurance get a sense of how you're driving and what the implications that are gonna be on your claims behavior. And that will allow you to get cheaper insurance. Yeah, and Paul was actually saying he didn't want that device in his car. Is that what Paul said? There's gonna be a hell of an incentive. No, in the way that Paul thrives, I suspect that that would be a problem. To your point about the pig and the Python, the data now isn't in the pig, it's everywhere. It's distributed by its very nature. So how does that change the role of data governance? Well, I think that we're seeing the advent of this role of the chief data officer. So I'm sure that you're very familiar with the CIO role. It's been around for a long time. The I in that role means information. But really, if you look at what the CIO has been focused on, it really, the last thing has been information. The focus has been on technology, on software and applications, on infrastructure and data centers, on software as a service versus software that we write from a custom perspective, on data centers in the cloud, but not on information. And so the role of the CTO, which would actually be a better moniker, would be chief information officer, but it's already used and it doesn't focus on the objective. But the role of the chief data officer now, reporting to the executive leadership team within organizations outside of IT, is really beginning to give rise to the notion that information is an asset. We need to manage that asset. We need to leverage that asset and make it available within our organizations. And so I think that is an indication that we're seeing leading organizations, a conference like this, bring from around the world data leaders whose job it is and whose compensation is based upon, how well they manage the organization's data, how well they ensure the accuracy of that data, and how effective that data is in making decisions in the business to drive impact to the bottom line. That is a very big foreteller of things to come. Well, the parlance is sometimes confusing. Think of information management, information asset and liability management. Yes. Many CIOs do see that as their potential role, but I think you're right. A lot of the focus has been on software, but there's some tension, right? A lot of people think the data's R should work within the IT organization and virtually everybody we've talked to here said, no, that's not the right model. Yeah, that absolutely isn't the right model. Look, I've over the years worked with a number of CIOs from Silicon Valley where I reside now across to the east coast and over in Europe and in other parts of the world. And I can tell you that the general theme, the general focus and the general incentive of the CIO is to keep the systems of the company up and running so transactions can be booked, keep the data centers secure with fault tolerance and reliability. So if we take a hit here, our business can continue. It's an availability KPI, that's right. And they've been focused on, and rightly so, on the ERP systems and on the CRM systems and all these capabilities that are necessary to run the business, but they haven't been focused on the information content. IT is the successor to what we called earlier, data processing. That's right. Which is actually a more appropriate term. It's a more appropriate term and yet we've abandoned it. But data processing recognized that data was the centricity and the technology was there to automate the processing of information, which before data processing came along, was a manual effort within organizations. You could argue the whole CIO concept was driven more by ego than it was by actual responsibility. That's right. Anyway, I do want to change the tone a little bit though. We've seen in the last 10 years we've gone through this complete inversion where 10 years ago our problems, we didn't have enough data. And now our problem is we have way too much data. And you see that with surveys and research reports which are flooding the market every day. It seems like you can always find a way to make the data say what you want it to say. So, which I guess the question that leads to is, is there a risk in processing all this data? Is the job of the CDO as much to decide what data to ignore as it is to make sense of the data that we have? No, I don't think that's the job of the CDO. I think the job of the CDO is to ensure that the organization's decision makers and knowledge workers have access to high quality information, where needed, when needed, and in the form needed. Now interestingly enough, you mentioned David Teradata back in the early days. I was working for a telecommunications company back in Denver when Teradata was first a company, when they first came out and we were an early adopter in their technology. And at that time we were able to aggregate all of the call detail records and all of the subscriber records from that particular company's customer base and make that data available to the business to make decisions. It opened up a whole brand new world of capability that led to the analytics required to have call waiting, voice messaging, three way calling, all those products that came along. The innovation, the understanding of the customer needs and behaviors came out of analytics of that data. Now when we first put it together, it was too much data. All we could do was look at bill patterns and what have you. But once we began to explore that data, as it was available, then use cases began to emerge for different kinds of analyses to support all kinds of new product innovation, efficiencies driven down and throughout the business. So I think in this advent of big data and this volumetric explosion, really we call it, of unstructured content, everything from digital image to emails and blogs and web blogs and all kinds of Facebook and Twitter feeds and posts that's not too much data. It's really creating much more rich terra firma, if I can call it that, from data mining to take place. And so I think we were talking earlier in the conference at one of the sessions about should we have policies to archive and compress the data content because of this concern that maybe the volume and the cost is going to become unsustainable. But if you look at history over time, if you look at history in the trend line, we've seen that as technology has driven down the cost per gigabyte stored of data, that the volume of data that we've stored has gone up. And I would challenge you to tell me an organization that actually has purged, deleted, done away with, any content historically generated that was once captured in the business. Google says it's cheaper to keep it. Google says it's too expensive. That's exactly right. This is too expensive. When you delete a Gmail, it doesn't get deleted at Google. That's right. It's too much. The labor is the labor cost. I wanted to build on that a little bit. Two of the great companies that came out of the internet, Google and Amazon, I think both excel at data mining. Though both of them are very good at delivering a delightful experience. Google seems to know what you want. Amazon seems to know what you want to buy next, right? And this is a result of excellent analytics. So the question is, is this the next competitive playground? I mean, is this the, is this going to be the basis for competition going forward is excellence of analytics? And if I could add to that, because you were right about, you're talking about the telecommunications example and Walmart putting beer next to diapers, but it hasn't been really transformational from an organization. Oh, I would argue. Actually, I would disagree with you about that. Well, we'll come back and talk about that. So let's do that. Because I would like to have that. We're gonna get a little challenging here. I'd like to get you to do that. Okay. But so, okay, is this the next? Is this the competitive, are these the competitive stakes now? Is this the actual game? No, it is the game. Yeah, it's not going to be the game. It is the game. You will either lead, follow, or get out of the way. And I think that your comment about Walmart is really key here. When Walmart began to use Terradata technology, back when I was working for the telecommunications company, they were one of the first customers of Terradata. Kmart, Sears, those companies weren't even paying attention to what Walmart was doing with Terradata. They had one store in Bentonville, Arkansas. Yep. Half a trillion dollars later in sales, where's Kmart? Where's Sears? They're partners together on the ropes trying to remain alive. And Walmart has become the predominantly, the largest employer around the world and a half a trillion dollar organization differentiated by their use of information and technology. And that was the lever. I agree. And that did transform the retail business, no doubt. But I guess I should say it was transformational to the extent that it was in specific industries. But it feels like, well, the promise of big data is that it's going to be transformational across all of them. I think the promise is definitely there. And here's why I'm saying that. Is because the processes to actually get information that you need it as a business person out of the data warehouse were so cumbersome and frankly still are. It might take weeks or sometimes even months or sometimes you just can't get it. You want to make a change to the cube, for example. And that seems to be changed. You didn't have shadow IT with business analytics and data warehousing. You do now with big data. That to me is the difference. But challenge me. I mean, I want to hear your point of view. No, I think you're correct. I think you're correct. We do see, there's sort of the democratization of the access to information that's happening these days. And what that really represents is that today I can swipe my credit card and store a half a terabyte or a terabyte of data in the cloud for under $100. I mean, it's just amazing what you can get without ever using one IT person. In the past, when we were talking about enterprise-based IT and no alternatives, such as software as a service and the cloud as a provider from my data center, those capabilities weren't available to me. So it was much more difficult for me to go outside of my enterprise data warehouse environment. When we bought our first Teradata platform, back in the late 80s or early 90s, I think we paid something like $3 or $4 million for that platform. So contrast that to a credit card swipe. You didn't have any money to let it do anything else. That's exactly right. But that took an executive committee approval and authorization. Today, I can store a terabyte of data where all I could store then was 300 gigabytes for a credit card swipe that I put on my expense report. So this is, this is, it gets good. There's good discussion because essentially this is what makes this new wave transformational is that it's, infrastructure is essentially, it's no longer a container. That's right. It's not a constraint. It's throwing all the money into. You're putting money into people and processes and algorithms to be able to leverage the data. Right. And even small companies, this is an interesting phenomenon. Today, even small companies because the price performance has gotten so favorable for storage as a service in the cloud can be on par competing with Procter and Gamble or some of the largest organizations out there because it's so inexpensive to store massive amounts of information and it's really cheaper to keep it. I actually think you're going to see a lot of small, mid-sized companies hop on this bandwagon sooner. And I wonder if you're going to see new businesses springing up around essentially aggregating data from multiple sources and reselling it to companies, to small companies to do that kind of thing. Which brings us to a really interesting point and that point is that the constraining factor used to be technology. It's ability to scale and expand and it's cost factor. That's not the case any longer. So now what the constraining factor is, it's on the human capital. It's on the human factor that has the expertise and the knowledge and capability and the skills to create the integration, to build the analytics, to do the work to analyze and to bring the insights back to the business. And here the U.S. is not even among the top 25 countries in the world in math and science education. The challenge isn't it. So where is that expertise going to come from? Well, I suppose. We just have to keep importing people from Asia. I, you know, I have. They're sending them on back. I'll tell you what I want to call this. My saddest moment in life is that I have an 18-year-old who's just going into college in the fall and a 19-year-old who's in school and a 16-year-old daughter right behind the two boys. And I can't get them to go into the STEM field with the, you know, software engineering and data. They just too much conversation over the years of their life from dad. I think I've soured them on technology. I have to be careful. Because I'm constantly telling my kids you got to get into something related to data. Right. I think it's probably better to become a plumber. Right. Well, you know, I mean, watch for my daughter, you know, in the New York Ballet, or the San Francisco Ballet, because she loves it. That's her passion, you know. Well, you know, chase your passion. Yeah. But I do think that when we think about this world, this emerging world of big data, big data analytics and what we're going to do with information, it's, you have to be excited about it. I mean, at Gavros, you know, we, like I said, we've been practitioners in this space for 31 years. And we've never seen the confluence of forces so impactful in the organization from sea to shining sea. Every organization under that, you know, under that sky, bent as a beneficiary every size. If you don't get into this game, if you don't play big data analytics and understand the trends and patterns, you could be an internet merchant with, you know, running your business out of your back room. And to the degree that you leverage the insights from data, you're going to be able to target your customer base and fulfill those orders and grow, you know, phenomenally large over a very short period of time. And that's what Amazon has done, starting out as a small bookseller, looking at information, looking at technology to drive their growth and expansion. So I wonder if you could double click on that whole skills gap for a minute, because you said that is the scarce resource right now. So talk to the young people out there. What kind of skills should they be developing? Well, you know, I think you get that undergraduate degree in some interesting things. And certainly, software engineering is an important place to, from a technology perspective, to be focused. But once you get that, and I can't remember the, was it North Carolina State that has a program? Yeah, they have a program for the data scientists. And they're, they quoted a statistic of, they have 80 people in that program. It's a 10 month, five day a week full time curriculum that they're running now to get a master's degree in the data sciences. But they're graduating, they're graduating their enrolled classes with a greater than 80% multiple job offer ratio before graduation and 100% employment in jobs that pay starting college students, six figures, $100,000. Michael Rappa, who was on yesterday was saying that he has to tell some of his, the companies that he works with to back off from hiring his entire team. They do team projects for companies in the area and they want to hire everyone on the team. Right, so this is again an indication that we've gotten to the point where we've got rich, rich, rich stores of information that have all kinds of insights buried in them. The skill that's required is the mining skill. It's the analytics skill. The other thing that I'll point out, and if we should watch the election the next time around to see how this all bears out. But there was an article that ran, actually it was a segment on the news that I saw. Shortly after this recent election where President Obama was re-elected and it was a comparison between the accuracy of predicting the outcome of the election with the Gallup poll versus the Twin Decks which was Twitter's index based on all the tweets posted and the sentiment analysis they did on their data set. And they didn't have to do any exit polling. They just listened to what people were saying on Twitter. And every day at 8 p.m. in the evening Twitter rolled the next update of the Twin Decks which was an indication across the country in all the geographies of who was going to win the election if it was held today. It was much more accurate and much more timely than a Gallup poll. I would argue that in the next election we may see the Twin Decks as the stand-up. Well, Nate Silver nailed it. He nailed every single state in the election. It's kind of hard to argue with the accuracy of that. It's 100% accurate. It's kind of hard to argue with that. Giovanni Rodriguez who worked on the Obama campaign was telling me that they actually knew, they were targeting voters in swing states, they were targeting voters down to the single house. And in which houses we're going to swing the election. No, no, no, no, no. Here's a good, here's a, oh, I'm just excited. You may be excited here on this interview. Do you think the DNC has discarded any of the data? Of course not. Oh, absolutely not. They're going to use all of that data and they had micro-focused campaigns and advertising as a result of big data analytics. In the same segment that I saw this TwinDex versus the Gallup poll comparison, Yo-Play yogurt ran an ad and their ad was about a new flavor of yogurt or a new class of yogurt they had launched. And they said in the ad, a beautiful pastel background in a package of yogurt coming down, sort of floated down and it said, they said, the nice lady in the background said, at Yo-Play we've listened to your tweets. We've heard your voice through Facebook and we've come up with a new protein rich yogurt because that's what you're asking for. Now, while we could help you with your love life, we're not going to go there because we're in the yogurt business, something like that. Now, but they made an accurate disclosure. You know why? Because in the same set of data they subscribed to from Twitter that they did the analytics to find out what the sentiment was about yogurt every time Yo-Play was mentioned or Danin was mentioned or the sentiment was favorable, the sentiment was negative. In that same set of data, but they weren't looking for this, were all kinds of pointers to love life problems. They could easily have analyzed for issues in the love life of the demographic of the data tax. It's fascinating because you're getting into unstructured data here and analysis of unstructured data, sentiment analysis which is a very complex problem to solve. Are you saying that, I mean the next great frontier really undermining unstructured data and figuring out correlations that aren't obvious. It's not the next frontier. It is the frontier here today. I mean, why do you think that Facebook can suggest to you who might be your friends? Why do you think that LinkedIn knows before you know, every time I log into LinkedIn, there's a new, you're going to be a professional connection before I get home, David. I'm in LinkedIn, I'm convinced of it. Because this data is all being analyzed in real time analytics and it's all of the unstructured data. People are posting tweets now about this event. And those tweets are adding to the catalog of knowledge and the analytics that are going out there are creating insights to connect people together. So it is the frontier that we are facing. We're standing on it right now. This notion of real time is really interesting too. We've looked at this a lot where you're essentially taking transactional data and analytics data and bringing them together and actually allowing machines to make decisions in near real time much faster than humans. And this is what I get back to though. See, that doesn't happen magically without humans who understand how to connect those dots. Absolutely. So that's what we're all about at Gavrush. Our focus is on how do you define an architecture, not wires and pliers and boxes and networks, but how do you build a framework that takes the traditional structured data content in my data warehouse that's a part of my enterprise and takes my big data sets and my big data analytics program and marries the two of those together irregardless of where those solutions live. So if they live in my data center, fine. But if they live in the cloud at Amazon, fine, right? But what I need is I need that information to come back together so that I can get maximum value out of the decisions that I make around those sets of insights. We were speaking earlier to, oh gee, I'm not going to remember who it was now, Peter Aiken perhaps is saying that only 10% of big companies even have a data governance strategy in place. I think that's a generous number, but certainly what we see is. That's a problem you're attacking, right? Right, that's exactly a problem we attack. We've got playbooks and charters and jumpstart templates and process guides for data governance, for data ownership, for data stewardship, for data architecture, for data quality improvement, because for 31 years we've worked on these subjects, right? And so we've built a set of IP and we've got a number of very large financial services clients that we're working with currently, some state entities that are dealing with healthcare transformation, who are taking advantage of moving the needle forward rapidly by taking advantage of the work that we've done in these areas with the expert consultants we have and with the methodologies we've been working with. Doesn't that number though indicate lack of urgency? I mean, what's the problem there? Why only 10%? Well, I think that's an interesting question and if I knew the answer, I'd give it to you, but I think that data still, within the organization, as a place where money gets spent, right, is a rope being pushed uphill, not a leadership pull from the top, if you will. And so that's why we're focused on helping. We're talking to one client now, a client prospect that has a real challenge and what we're gonna do is we're going to bring one of our data governance executive briefings to the executive team in a two-hour meeting so that the executive team can be informed to understand how these dots fit together. And I think examples and analogies that hit home will really work very effectively to turn the lights on so that some funding can be allocated to this. I'll tell you this and this is something I've known since the 80s, that of all of the components of an information technology solution, the data center, the hardware, the software at the operating system level, the ERP, the custom code you write, and the data that lives in the system, the only piece of that equation that stands the test of time once architected is the information. Cause think about it. I mean, we've had digital equipment, IBM mainframes, we've had SNA networks and IP networks. We've had ERPs that were from McCormick and Dodge and from Oracle and from, I'm bringing some names back to remember, but I'm just a kid. I read about those in a Smithsonian Institute. But the point is, but the information model, the customer-to-product relationship, the geographic relationship of the customer's residence to the retail outlet for Safeway has a change. But organizations have been woefully inadequate, have made woefully, that's not what I want to say, they've just been not able to focus on investing in their information management, because the technology's more sexy. We're very good at investing in the technology. The balance in those scales are changing right now and we're really, we're at a confluence right now. We're on the precipice of that big change. Doesn't the cloud kind of change that equation? I mean, if you don't have to worry about the hardware and software anymore, then you should be able to focus on the information. Yeah, and that's really a good, it's a net good thing. All of us data guys, yay, because it's finally time for data to get the credence that it's due. And I think that a lot of organizations are recognizing. I mean, we see that in the people that are interested in our services, there's an uptick, you know? And, but what we find is it's an influence, it's an uptick in interest that's influenced by, you know, corporation A and person who knows a person in corporation B, who, corporation A has used our services and gotten great value out of them who says to their friend at corporation B, call Gavrash, they can help you with this, right? And it comes with credibility behind it. Rarely does the phone ring with some person who just, you know, said, hey, you know, we're looking for some data management help. Right, right. Yeah, so that's. Well, and the data's gonna, it's getting board level visibility. You know, one in 10 have a board approved data strategy. Everybody's talking big data. They're reading it on airplanes, they're hearing about it. And it's bringing competitive advantage. I think, you know, I always say, Nick Carr was dead wrong. See it, see it. IT does matter. Yeah. Google, look at Amazon. And it's not, I think he'd admit, if you got a couple beers in him, he'd admit he was wrong, too. But he sold a lot of books. He's smarter than I am. He did a lot of speaking very well. It's not, and it's not expensive to do big data. I can do big data on my credit card, right? Let's not get, let's not get it twisted. A little phrase I use. I can do big data on my credit card, but it's costly to do it wrong. Yeah, yeah. Right? Because if you've got three organizations in the same business, and everybody can do it, but two do it wrong and one does it right, game over. Well, if you do it right, it's very lucrative. How did you start, you said 30 years ago in South Africa? The company started in South Africa. The founders. They're jealous birthday today, 95. Well, that's right. Yeah, I mean, the co-founder of the company, we're there living in South Africa and just we're passionate about data. Now I'm going to disclose a little bit about my age again because you remember James Martin in Information Engineering? Sure. Yeah, that's been, that was a sort of back in the cod and date days of relational model and all of that. Well, these guys just got really passionate about relational databases and data. And so they began to be focused on the disciplines of designing databases and separating them from business process and application software logic. And that just has been sort of the foundation of the company. Through information resource management and information engineering into data warehousing, into big data, it's just a natural progression. Yeah, yeah. All right, Mark Johnson of Gavros. Thanks very much for coming on theCUBE and sharing your insights. It was a pleasure having you. Thank you, Dave. And thank you, Paul. All right, keep it right there, buddy. This is theCUBE, right back with our next guest. I'm Dave Vellante with Paul Gillin. Gillin, we'll be right back.