 Live from Cambridge, Massachusetts, extracting the signal from the noise, it's the Q, covering the MIT Chief Data Officer and Information Quality Symposium. Now your hosts, Dave Vellante and Paul Gillan. Welcome back to Cambridge, Massachusetts everybody. This is Dave Vellante with Paul Gillan and we're here at MIT IQ, the MIT Information Quality Conference, which is really focused on the Chief Data Officer. We're here at the Tank Center. Dr. James Meng is here, Cube alum, it's great to see you again, Dr. Meng. James is retired, just this year, retired Deputy Officer. This month. This month, congratulations, fantastic. So we have all the, very fresh in your mind, all the great stuff, so we're going to talk about that. Office of Assistant Secretary of the Navy, really a focus has been your career on data quality and transparency and really initiating the edicts of Congress for the federal government. Well, thank you very much for joining us again. Oh, pleasure, Mike. Yeah, so wow, retired, long and excellent career. How's it feel? Oh, wonderful. So I can devote more time to this kind of academic exchange and bring about greater knowledge and advancement and using the real life experience so that one day we can find a more optimum way of improving government's data quality eventually. So what do you make of, this conference obviously has been focused on data quality historically and then, of course, the Chief Data Officer role has emerged, the big data theme has emerged. How, as a practitioner in this industry, how do you respond to the changes that have occurred and where do you see it going now that you can step back after your career and sort of look to the future? Where we come from and where are we headed? Right. Okay. Actually, if we step back to about 1990 vintage, the Congress established a government performance and the results act. It's called a GPR-A. In 1993, it demands the government be more transparent in terms of their financial, primarily in the financial management aspect. And then came 2010, actually came out in 2011, and it's called a GPR-A Modernization Act. And that's President Obama's one of his signature legacy is to improve the data transparency using the modern means. Instead of hiding the Stokepipe data warehouse somewhere, now the data becomes accessible by the general population. And the more recent today is the Digital Data Accountability Act in 2015, which really get into the nitty-gritty. It's really focused on the standard format of your presenting the data. So the government is also learning in the process that just talking about the data transparency without understanding the data standard, which is a fundamental root cause driver for we call the data divergence. You cannot integrate and now focus on the standard. And so my role in this symposium, this is the fourth year, and I've been here and I'm meeting presentation every year, but for past two years, I've been focusing on bringing about the government agencies, primarily federal governments, agencies, advancement and initiatives in fulfilling GPR-A and the Data Act. What are we doing, and why are we doing what we are doing, and what are the key lessons learned in this process? So you go from accountability, transparency to standardization. Some people might think that's putting the cart before the horse, but maybe the thinking was expose it so we can understand what the problems are and then create standards. Is that right, or was this really putting the cart before the horse? Well, actually it is. The data standard is something we really have to do from the very beginning. But of course, the computer era and the data processing, all that started long ago. As a result, all federal agencies accumulated many, many data systems and many, many applications. Number one, they are not interoperable. And then number two, there are no standards or architectures to bring that interoperability. And as a result, the past data initiatives suffered a lot of setbacks and the difficulties and the resistance, and largely because we did not really have a commonly accepted data standard and the data architecture that we could leverage from realizing that now and going back and really doing that, it's actually a very major advancement. Because without doing that, you will never get there. It occurs to me that you have been tackling for the past 25 years a problem that many CDOs are facing right now, which is interoperability standards, consistency of data formats. What lessons can you share from your experience as they embark on this task? What lessons can they learn from what the government is doing? That's wonderful. I will actually present that one o'clock, but might as well share it right now. There are two key lessons that we learned. First of all is the data standard is not something that you, there is no shortcut. If you wanted to have a data standard, you really have to initiate that from the offset and you have to be very determined to get it. And data standard, it's not a technology, it's not a strategy, and it's a more about management change of people. People resist that because you are digging me in my backyard. You demand to know more about, it's really my business. I don't want you to do the micro management right, yes, exactly. For those fears, you really have to have a way overcoming that. And to overcome that, number one, you cannot calm down as a mandate. I am descending from the highest HLON and DALMA students. For data standard, you're not going to get it in one year. You're not even going to get it in five years. It's a very long, hard journey. Actually, we learned that from IBM. IBM started doing that 25 years ago, and they were kind to share their lessons to learn from. It's a very long journey. So it's a journey, it's not a step. And that's the first lesson to learn. It's a, the data standard is not given. You really have to work. And there's nobody can help you. Your own organization is unique for whatever business product and services that you deliver. You have to do that. Nobody knows better than yourself. And number one, number two, is this is really talking about the interoperability. It's really a system of assistance engineering issue. We're dealing with a lot of interoperating systems. They all are independent. They reside differently. They are managed differently. They're governed differently. But yet the data they create has to be interoperable. So all of a sudden, the different store pipes of assistance that have been created over past 60 years or longer have a new requirement that those systems have to be interoperable. Not in their day-to-day operation, but in terms of the data they produced needs to be interoperable. And that is a new requirement. So most of the large organizations all of a sudden found out that the assistance has to be interoperable. And that was not originally in their requirement document. Well, can you describe what you mean by interoperable? I mean, there's standard, standardization is one thing, but interoperable. Interoperable fundamentally demand two things. One is that you have to be able to be accessible by systems that you never thought about that they need to access your data. Number one, number two is when they access the data, it needs to get through the interfaces that has interface standard. So all of a sudden, you have to have a data standard, but you also have to have interface standards. Those were the things we learned through, I mean, I worked for the Navy, we learned from all other warfare systems are governed by interoperability many, many years ago. So we realized that system engineering discipline has to be brought in in order to allow the data become interoperable and accessible and being able to still retain its pedigree that during this data transfer, its standard and the polished, the clean data is not being lost in the translation. So this is obviously a very difficult problem in a large part because of the pace of data growth. So you're creating data at an exponential rate on these systems that are non-standard. So you're essentially refueling the plane while you're in the air. So how do you, and you've got people, you talk about accessibility, well it's not just giving them access, it's who's got the right to access that data. You've got the process around which you standardize that data. And then you've got the technology that enables all this. So how do you deal with the fact that so much more data is being created in, let's say, a five-year period than you've created in the entire history of the organization? How do you deal with that? Okay. I'm more optimistic than you are, Dave. Number one is that the key ingredients in order to succeed in this process, and number one is the common understanding of the urgency of the need to do the standardization. And the number two is that there are so many enablers came into being because the past decades emphasized on, you know, bringing this together, standardization, interoperable, and integrated data from sources you never thought about before. So there are very, very, very good enablers that helps the leadership in terms of standardizing the taxonomy and also in terms of bringing together the interoperability from a variety of systems to name a few, like open architecture. So this is a technology enabler? Yes. There are technology enablers enabling us to do it. IBM has done it. And we have done it in other applications. So this is not a mission impossible. True, the data is growing at the enormous rate. But that rate does not dictate the ability for us to make the system interoperable. And in fact, the internet commerce is growing so fast, yet you are able to get onto the line and access any places you want. What does that mean? It has an internet protocol that it was dictated to begin with, unless you follow those protocols, you can't get in. So it is possible to get this. It sounds like the approach, I'm going to simplify it, is to say, okay, all the data we create from this day forward will have this standard. And that's step one. And then do you go back and standardize the existing corpus of data or do you just let that die through attrition? For the approaches we are doing, as far as I know, we are from this point on. You're not trying to go back and solve the problem of history? Because you will never see the light of the day. It's a mission impossible. And also there is an issue about what is the value? We don't just do things without business value. There are times that we have to go back, but when we do need to go back, there are legacy data processors that we can generate, we can use, but not on the global scale. And what we are really driving is from here on, we leverage all the modern technology so that we can leverage the benefit out of it, rather than looking at the rear. How are you future-proof this process? I mean, we don't know what new kinds of data we'll be using in the future. A lot of things is coming online, introducing a whole new set of protocols and data types. How do you future-proof this process? Right now, in a way, the government agency, the majority of the government agency are dealing with largely structured data. Accounting, that's a digit number. For the federal government, at least. We have many, many years of work to do on structured data. Of course, on the unstructured data, like IBM briefed yesterday, data analytics, they have IBM Watson. If you have seen Watson, IBM Watson demonstration, it's impressive. Did very well in Jeopardy. That's why they beat the human team. The unstructured data, the IBM Watson represents the future, one of the future approaches. But of course, there is always a probability. They will also tell you what is the probability this projection is correct. The extrapolation plausibility of the data has some limit, too. So our focus right now are primarily on the structured data. And there is enough work there to keep us busy, for decades to come. So in terms of how do we make sure that we are doing, our indeed has a sustainability, my view is that until we meet the GpRA and GpRA modernization and data act, we really have many years of work to do. And that's our sustainability. And that's the mandate we have to fulfill first. James, in the commercial world, there's a lot of discussion around, you've got centralized IT and you've got the lines of business. And with all this data explosion and the opportunity to drive revenue, the lines of business heads, the P&L managers go out and they initiate, they spend money on, we call it shadow IT. And they generate projects and some of them are very successful, some of them fail. And the failed ones, they go back to IT to clean up. The successful ones probably do, too, at some point. Is there an analog in the Navy where you have this notion of shadow IT? Or does everybody sort of adhere to the edict of we will standardize? Do you have that problem of seepage where initiatives are started and they don't necessarily comply to the edicts? Well, actually, we encourage innovation and the flexibility. When we say data standard and interoperability, we are primarily dealing with the major operations, which represent 90% for our Navy operations. But that remaining 10% is dispersed around all our major commands. And especially for research development and acquisitions, they are actually given that flexibility. Because they are dealing with the innovation and changes on a daily basis. And it will be really overbearing to go down and say, we must follow this standard. There is no business value to do that. Actually, in the standardization, the two key things that we have to be very, very careful is allow the flexibility so that each command and the business areas, they can create their own without being dictated by this overarching architecture and the standard. And also encourage the innovation so that your assistance can be collected or be accessed by many, many research institutes and academia without going through this very rigorous data quality rigor. But aren't those two objectives completely counter-poised? The flexibility and the standardization? They are on the opposite ends of the spectrum. But on the other hand, that's why we have executives. Okay, that's your high-paying job. That's why we pay the big bucks. That's right. So that's a challenge. And of course, there are a lot of subjectivity in terms of where you draw the line or you don't draw the line. But in general, we do understand where we have to enforce and where we also have to encourage innovation. But on the other hand, most of us, especially at the executive level, we go through a lot of executive training. And those issues are brought from and say, be careful. I mean, don't drive the organization into the mud. Just follow in the GPRA to the end of the life. And because there are issues that 20 years or 30 years from now, you have to deal with. And all that innovation has to start today. Don't kill them. What are you doing to share your findings and your methodologies with business or with the commercial sector? Obviously, it's a great deal that you've learned here. Yes. At least in the Navy, and I know for other agencies, we share, actually not only share, but largely draw the lessons to learn from industry. For example, within the Navy, we have a data standardization working group. We've been doing that every week for 150 times. In those working group weekly telephone conferences, we routinely invite industry like Appanicio, Informatica, IBM, and many companies, large and small, on a focused issue. They are selflessly, very openly share with us their most recently learned lessons. And we learn from those we jump, rather than try to catch up from behind. And for example, the data standardization not to mandate, but in a collaborator move and inform and say, we really want to pay this back. Okay, you help us, you tell us what are the data elements or what do you use? What's the data dictionary? And we will do the maximum to incorporate becomes the overall data standard. And we may have to tell you you have to drop some because you have to adopt somebody else's in order to eliminate the overlap. But in return, we will give you those business intelligence dashboards. We do it for you. You get that back and you have to give them the benefit back. Otherwise, they say, you know, get lost. I'm busy. I got my mission delivery. Okay, I don't understand what is this business intelligence and all that. But so that's a mutual. You have to make a mutual. So we're out of time. But last question is, how are you doing? How do you measure your progress and how's it going? Well, later on this afternoon, you're going to be able to see in addition to the defense part of the agencies that now we also bring FCC, office of management of budget, and also the Department of Energy. So I hope next year we will bring more federal agencies and their initiative is just mind-boggling. It's incredible. The approaches that they are taking and they're very much on the frontier on the cutting edge of adopting those industry approaches and solving a problem and meeting all the government mandate. So my view is the collaboration at this forum and all the team members who came and are really very, very happy that this is the kind of forum that we exchange the lessons to learn and accelerate our initiatives. To me, it's extremely mutually beneficial. So I'm very optimistic. So I hope to see you again getting by the back next year. Absolutely. Dr. Chang's going. You are optimistic. You're very relaxed attacking such challenging problems. So congratulations on your retirement and outstanding career and thank you very much for coming back. And next year I'm going to be co-chair for this. So, you know, my new endeavor. Oh, you signed up to be co-chair. All right. You're going to be very busy. Maybe busier than you've ever been in your life working hand in hand with Rich. Thank you very much. Thanks again. All right. Keep right there. Paul Gillan and I will be back with our next guest. The silicon angles, the cube. We're live from MIT right back. Thank you.