 Live from Cambridge, Massachusetts, it's the Cube. At the MIT Chief Data Officer and Information Quality Symposium. With hosts Dave Vellante and Paul Gillin. Hi everybody, welcome back to Cambridge, Massachusetts. This is the Cube and we're here at the MIT Information Quality Symposium, the CBO Forum. Now this is the second year that we've been invited to participate in this forum It's been fascinating to see the evolution of the Chief Data Officer and how that role has been adopted, particularly within regulated industries, but increasingly other sort of less regulated industries are looking at the role, organizations trying to figure out how to handle data governance with the big data meme really coming to the fore. Professor Stuart Madnick is a Cube alum, was here last year. Stuart, great to see you again. Thanks for coming to the Cube. It's a pleasure, you guys do a fantastic job. Thank you for joining us. Appreciate it. It was wonderful to be here. It was a pleasure. It's so many surprises come out of these events and great guests on the Cube and we really love the collaboration. You kicked off yesterday with one of the keynotes and I want to talk about that a little bit. You know, you picked up on the theme of data quality and you used an example of you broke out the, I think you went to the Smithsonian and you said, you got the overhead projector. I remember them well, you know, the foils, right? You had a data point that I'd like you to share with the audience. Can you talk about that a little bit? It's a bit of an amusing one that I've used many years in the past and I was asked to resurrect it, if you will, for another ex-cage in MIT. And what it was a study done probably 20 or 30 years ago, but what the study did is two points. First, the funny part and then the interesting part was the study that is they measured the average IQ of the entering class of freshmen at MIT. And then four years later, they measured the average IQ of the graduating seniors. And what was interesting, although not a large number, there was a statistically significant drop in IQ between those two studies, which of course led to all kinds of investigations of why that might be the case. But the thing which I wanted to use it as a motivation, besides being kind of an interesting funny story about MIT, to motivate the issue of data understanding data quality and such. And that is in the way I described that experiment, the assumption the average person makes right away is we're talking about the same number of people possibly getting dumber. But there's no guarantee that the graduating seniors four years later were the same people that were measured four years earlier. So that ability to understand exactly the data we're using is so important and we see time and time again where we misunderstand our data and it leads to whether it be amusing stories like that or it leads to sometimes serious problems occurring. Now you're at the MIT Sloan School and of course you got on the backdrop this year. We're very pleased about that. It's just a great backdrop, a lot of collaboration. I wonder if we could talk about that a little bit. It takes a lot of folks to make symposium like this happen. You come from an environment that thrives on collaboration. So give us some background there. Well, as I introduced, our dean was a great generous who was able to spare the time to come over. Well, I have to interrupt you. You said you were very humble. You said your boss is boss is boss is boss is boss is boss is boss. You know, you're very humble of you to say that. But one of the things that he has mentioned and has put in his part of the mission statement for the Sloan School is basically to have an impact on the world. And so although we do a lot of very scholarly activities at MIT and such, the ability to interact with the professional people, to interact with the public at large is an important part of our mission. And hosting things like the chief data officer information quality symposium is just one of those things that we have done. And we're very pleased with the kinds of reaction and reception we get from a broad array of people, both in other academic institutions, as well as the profession at the large. I want to go back to your MIT students getting stupider examples. Not just because... By the way, if your wife says you don't seem as smart as you were before you left for MIT, can they give an excuse to blame it on MIT? Oh, no, I would blame it on reaching the drinking age during my ten years as a student. But it really does point to an important issue, though, which is truth and the believability of data. And arguably one of the consequences of universal data access and everybody's a publisher now and anybody can say anything they want is that truth has become elusive. We don't really know who to trust anymore. And in fact, people can create their version of the truth, all kinds of bogus research out there, using statistically unsolved methods and such. How does that affect how you teach your students about how they go into the world where many of your students will be occupied by very important positions in technology positions, about how they should interpret truth or believe what is true? Well, there's actually two things I want to point out about it. One thing that isn't an interesting issue is to getting back to the comment regarding MITs and IQ tests is what is it that IQ tests actually measure? And basically measure your ability to answer a certain specific set of questions, which may or may not be correlated with anything to do with your success in life or your ability to be effective and so on. So one of the dangers you run into is exactly what it is you're measuring. And one of the other examples I used to illustrate that point was, I guess it was a while back, maybe 10 years back, there was yet another housing crisis. Housing crisis seemed to occur about every decade or so. And there was a headline in the Boston Globe, which was the most authoritative newspaper in the world, that said that good news, housing sales have really picked up way up from last month, way up from last year, isn't that good news? And it turns out, though, when you investigate it further, what they had done is they had gone around to the registry of deeds in the individual counties in Massachusetts and added up the number of deeds that had changed hands or been filed that month. And sure enough, the number was way up. The thing they hadn't considered at that time was that when a bank forecloses on your house, the deed changes hands. And so what actually happened that month was a record number of foreclosures, not that housing sales had gone up. The reason I mentioned it, it wasn't that anybody typed the number wrong. The number of actual deeds that had changed hands that month was way up. It was misinterpreted that they jumped to conclusion that the number of deeds changing hands indicated the sales had gone up. So getting back to your point, that's the key thing you need to dig down to understand, that's one of the core principles of a place like MIT, kind of the scientific method, to really understand what is the data actually telling you? Isn't this the role that the educational institutions have to play is teaching critical thinking so that we don't accept at face value the numbers that we see as truth? And I love the question because when you come into work life early in a large corporation, and somebody puts out a data point, you learn very quickly. If the data point supports the political agenda, the person who markets it best gets something done. And if it doesn't support their agenda, of course they attack it. And then of course there's the gut feel, which if your Steve Jobs is fantastic. But for the rest of us, maybe gut feel sometimes works, sometimes it doesn't. So learning how to actually use data, which is not a trivial exercise or discipline is something that is a very valuable skill, but one that not a lot of people really know how to do. Well, you're raising actually a couple interesting issues. First there is the whole political dimension to data, if you will, and that data can be used, and in fact frequently is used for particular political agendas, whether it be real politics in the government or politics within an organization. And that's always a bit of a challenge, but the reality is even more complicated than that. And that is data has a context to it. And really understanding exactly what the data means, there are so many cases when there is more than one right answer, mathematically right answer, depending upon exactly what you're trying to ask and the circumstances and conditions you're trying to get at. And once again, one of the terms, I can't remember who mentioned this in their talk yesterday, but it's a fascinating concept. People talk about management by objectives, or management by this or that, but evidence-based management was an interesting term. I hadn't really heard it before if someone used it. And in many ways it's a little bit like evidence-based medicine. Rather than basing upon conjectures or guesses or intuitions, can you bake it up by facts? And the reason why, I'm gonna be brief here and let you go on, but the reason I bring up is there's a lot of big excitement over big data. And what I mentioned both last year and this year is what big data allows you to do is allows you to see things that have always existed, but we never had a way to measure or analyze in the past. We had attended a talk just a few weeks ago of someone who was studying political campaigns, as you mentioned politics, if you will. And it turns out there were a lot of generally understood or generally agreed upon principles that people had that when they actually got dated, there was a hundred times more fine grain turned out to be totally worn. Some should be people making for decades. And the ability to actually measure how much did this actually change people's opinions? Or how many people changed their vote from this to that? The ability for us, whether it be in politics or in business, to really know what's going on as opposed to guessing, because most of what we do, whether you call it insights or guessing, is the way much of management is run. I was just reading the other day that the belief that your tongue has different sensitivity areas for sweets that are sweet, sour, bitter, and such. Totally bogus. This is what we were all raised on, and I believe that it's totally untrue. But often it's just because we don't question. You brought this article, hazards tied to medical records rush. Subsidies given for computerizing, but no reporting required when errors cause harm. Why did this article strike you? Well, I think it's fascinating because, first, I'm a big fan of data, as you might imagine. And the article goes on to talk about the percent of hospitals and doctors' offices that have gone to electronic medical records as skyrocketed because the government's providing strong incentives, as well as some sticks to go with the carrots, if you will. So that's kind of good news. The problem with this kind of change is the related issue of worrying about the quality wasn't part of the requirements. The requirement was, are you or are you not digitized? Not are you or are you not correctly digitized? And so the incentive system was to rush to do it, but without any monitoring facility there, and the story goes on to some rather tragic cases. Now these may have happened under human or paper-based systems as much, but a lot of these were directly because of the way in which the information systems were being misused. And so these kinds of transitions just open up opportunities for all kinds of new problems. Is this a risk you see with big data where there's an investment frenzy right now around big data companies raising hundreds of millions of dollars, everyone, there's pressure on every enterprise to go into big data. You're saying that the risk is there that they'll go into it without knowing what they're doing. Yeah, exactly. I mean, once again, everything has to be balanced, if you will, and any transition, whether it be, I don't know the records, but when you went from horses to horseless carriages, I don't know how many people had there, two horses colliding with each other compared to the number of people who's colliding their cars together. And so sometimes innovations go through a gestation period that takes a while to sort out. But the mere fact that there are these bumps on the road doesn't mean you should go into, without being open to realizing that there can be bumps on the road and doing everything possible to minimize it. And I think that's where things have been left down, is that the rush to getting it automated without the rush to getting it done with quality. That's the missing link. Well, we talked yesterday about the transition from carriage-based transportation to automobiles. One of the big concerns early on was, would there be enough chauffeurs and people use that analogy in big data? Will there be enough data scientists and we have to put the data in the hands of the business people and make it simpler to do this analysis? But there's a big gap. What this conversation's underscoring to me is the big gap between that nirvana and the data quality that we need and where we are today. Well, it's interesting to use that example because I have kind of a similar one. There was a study done, I don't know the actual year, but probably somewhere around 1918 by the telephone company with the rate of telephones being installed and the number of switchboard operators, I can't remember the conclusion, but it was something like the entire adult population of the country would have to be telephone operators in order to handle all the calls being made by the kids. The interesting question is, in one sense, the number of people with telephones more than exceeded their expectations, but we're not all telephone operators. Or conversely, we are, because we've made the process whether it be touch tone dialing or even now speaking at your phone, so easy. And I guess that's the issue of transitioning is we need to make the tools for managing data easier. And I see it over and over again. The other example I use, if you go back a couple of decades or so ago, a lot of things we now do with spreadsheets, we do with programmers writing programs and basically a cold ball. And if you ask the average business manager who use a spreadsheet, are you a programmer? They would laugh at you. But in fact, they're doing the same kinds of things that a programmer would have done a couple of decades earlier, but because it made so easy and so transparent, they don't have to think of it that way. And that's our challenge, to make the ease of use of data. And that's one of the big things we're seeing. We have a student who did a project called Big Data as a Service, looking at new companies and services being offered that really are trying to bring it up to the lay user to be able to manage big data. I do want to be sure we get to a topic that you mentioned before we went on the air, which is the role of big data in security. And I don't see the correlation, where is it? Exactly. So let me do it in kind of three parts, kind of visualize this if you will. I've often mentioned you've got the CIO's role and the CIO is very much concerned about the IT infrastructure. They worry about things such as firewalls and what have you. In many organizations, particularly organizations either regulated or involved in utilities and so on, you often have a chief security officer and they worry about both the physical security but they often worry about things like cryptographic codes and passwords and so on. And now of course we're having the emerging chief data officer. Now why is the chief data officer so important? Well let's go back to two of the most widely discussed cyber incidents. And that's the WikiLeaks on one hand and the NSA at Snowden on the other hand. Would a better firewall have stopped that? Would a better cryptographic code have stopped that? The answer is no, it had a lot to do with the organizational structure and the organizational training and what went on. That's not part of the CIO's role normally. Not part of the CSO's role normally. If you think about what the role of the CDO should be, how the data is used, who uses it and for what purpose should be dead center of the CDO's role. And so if you really want to address cyber security, you have to have all those three players working together in unison with the CDO playing an increasingly important role. I don't have the hard data for this, Chris says it myself I guess, but my intuition is that half of more of what we think of as cyber break-ins are all to do with people, management and data control if you will. Not to... We know that internal security bridges are the most common... Exactly. And that I think clearly is a role that the CDO needs to be dead center on. That's the connection I see there. I did want to say one other thing if you don't mind. Jeannie Ross I guess was also interviewed and she's a fantastic person. And she had a very, I think she mentioned a very controversial title in her article called something like you don't need big data. And she explained I think in the talk and also here what she really meant was that most organizations have so much small data that they don't know how to use. Don't worry about the big data that you don't know how to use. And I agree completely about that. And the reason I mentioned as I did last year you got the cube here. And one of the things we talked about, we developed last year and we just published it in January is we called the CDO cube. That is the role of the CDO across three dimensions. One dimension is whether they are inward looking or outward looking. Whether they're worried about how to improve the operation within their company or how to improve the collaboration with their suppliers and partners and so on. Another direction we call is kind of the value whether they're trying to be kind of an operational person or a strategic person. But the third dimension we talk about is the type of data looking at. And in that we refer to traditional data or what she would call the small data. The data you already have that are probably not using very well. And then we thought of calling Nuvo data but we left it with big data. And once again it can be CDOs whose focus can be on that little data. Because I suspect that we're getting much like to talk about our brain power. They're only using a small fraction of it. I suspect the same thing that you do with data is that the amount of value in the data that we've had and have had for decades and don't know how to use very well is enormous. Interesting. I wonder, I've never asked this question before. We had Moore's law during the microprocessor revolution. We had Metcalfe's law, is there an analog? Metcalfe's law in the big data world. And your three dimensional cube just sort of got me thinking. And traditionally Metcalfe's law, the network nodes increase linearly but the value increases exponentially. It's always been the opposite for data. The more data we have the less value we seem to be able to get out of it. Although that seems to be changing. Well I think there's two issues. Once in some cases, if you're looking for some kind of a trend, often you can determine a trend with modest amounts of data and just getting two more digits accuracy doesn't change what you're looking at. But I think where the opposite is true is there's certain types of phenomenons that can only be discovered when you get to very detailed level data. The example I used last year, so I won't go through again, was the discovery of the microscope to be able to recognize insects if you will, that is bacteria and so on in water. Well the same kind of idea, there's a level of detail about our society, a level of detail about our own human behavior that we've never had an ability to measure in the past before. And by being able to measure it, we can uncover all kinds of things, ranging from work done by my colleague, see me petally here, MIT, in uncovering early stage certain types of diseases, just by detecting subtle changes in your body motions that can be picked up on your accelerometer and such. So there are some things which having that extra fine-grained data is very valuable. Not everything, but for many things. We've talked about the data scientist here, how that role is evolving, how many academic institutions now are adding disciplines around data science. Understanding that you can't speak for MIT as a whole, how would you say that you look at that in terms of what role MIT should provide in training data scientists for the future? Well, I think there's two parts to that again. I think MIT as kind of an engineering-based school, even in a business school, we have a significant amount of activities going on in operation research, statistics, and so on. So the idea of having managers who are not afraid of numbers, we think is an important role, and that's one of the things we've done long before, big data ever came on the scenes. MIT's operations research center, which came about after World War II, has been one of the leading institutions that we got. So I think as a role of trying to make people comfortable with information and data, I think that's always been our role. I think the thing that we believe will change over time is it'll become easier and easier and become more ingrained. Think of it now, I go into a restaurant, and I see a three or four-year-old kid at another table playing with his iPad while waiting for the meal to be served. No, not rocket science, but think of it, that's a computer, thousands of times more powerful than the biggest computer MIT had 30 or 40 years ago. And he has a three or four-year-old just using it for games. So the ability to take a lot of this power and put it into people's hands, and you can be older than three years old, by the way, if you want to be. I think that's, and I think a lot of work going on at MIT and elsewhere is trying to take the early stage, you made a comment earlier about the reason why we had chauffeurs, besides the fact that they only were available for each people, was you had to be a mechanic to get them out of the work. You had to know how to crank it up, you had to worry about adjusting the spark gap and all that stuff. And so in the same vein, we have to have chauffeurs for our data at this stage until we get to the point we made it easy enough for the average person to use. And I think it's just a matter of time. And as we talk about the internet time, things happen faster and faster. So that's our vision, is that you need to be not afraid of numbers, but you don't have to be able to do it all yourself. Sir, we have to leave it there. You're a great guest. Thank you so much for coming on. Always a pleasure. I hope we can be back next year and have you on again. Thank you very much. All right, keep it right there, we'll be back with our next guest. We're live at MIT Information Quality Symposium, right back after this. Thank you.