 Okay, we're back. This is Dave Vellante. I'm with Wikibon.org and my co-host Jeff Kelly is here with me. We're here all day at the MIT Information Quality Symposium. This is theCUBE. We go to the events. We extract the signal from the noise. We bring you the smartest people at these events and share with you their knowledge. Mario Faria is here. He is a member of the MIT Data Science Initiative. He is a big data technology advisor to the Bill and Melinda Gates Foundation and an individual that knows a lot about CDO's chief data officers. He was the first chief data officer in Latin America and has recently moved to the United States. Welcome to theCUBE and welcome to America. Thank you. So glad to be here, Dave. Yeah, it's great to see you. So you were here yesterday as well, Mario, talking to the chief data officers. I think you did a talk on unstructured data and I want to talk about that. But I want to go back to your role as the chief data officer at Boa Vista, which was part of Equifax. The first one in Latin America, we heard from Stuart Madnick today that the guys at MIT Sloan did some research. They said the first CDO they could ever find was like 2003. I'm sure they existed before that, a different name. But how did it come about that you became the first chief data officer in Latin America? Richard Wong was part of the Stuart Madnick team. He went to Brazil three years ago, had a meeting with the members of Boa Vista, and he asked them, why don't you have a CDO in place here? You guys need a CDO because your business is data related. So the CEO for the company, he reached me out. I have an experience with digital marketing, with supply chain, CRM, and I'm not a data guy per se, but I'm more on the downstream of data process. I understand how to make use and monetize in terms of data. You're a data practitioner, in a sense. That's correct. Yeah, that's correct. So that's interesting because in the early days of CIO, even before the CIO term came about, the leader of technology tended to be somebody who was very technical. Increasingly, they're more business people. So it sounds like the CDO is taking a similar path. Is that right? That's right. Even though I have a background in computer science, through my whole life, I've been working on very specific technology projects related to business benefits, like in a sense that to streamline some operations for retailers, in a sense of increasing customer loyalty for some global brands throughout the world, and that's why I got involved with this data. Boa Vista, so the notion of a CDO, so your CEO essentially made the decision after talking to Richard, he got the idea, and I'm sure he had a good gut feel about it as well. Was there any other justification process? I mean, we heard earlier this morning from our keynote speaker, Dat, that this is not a project. This is a journey, it's an ongoing thing. How do you justify, how do organizations justify the role of a CDO, how do they make a business case around it? Do they have to, or is it just so blatantly obvious that they need one? We have this question yesterday, and my point is, when you bring a CFO, do you make a business case to bring your CFO, or do you make a business case to bring your CIO in place? So it's a matter of a common sense. For every company nowadays, data has been a critical asset to bring value, in a sense that you have to manage your data, you have to have a good strategy on how and where to buy data from your partners, and how you combine this data to create information that's available for your customers that you are trying to serve. So was it your responsibility to do, so what did you do when your first day on the job, what did you, what did you say, do I have to start with figuring out how to build a data architecture, do I figure what I have, or what did you do? First day on the job, I spent the majority of the time talking to my peers, understanding their expectations, try to understand how they feel about data, to understand what data quality problems the company had that were hurting sales, that were hurting product development. I would say that my first hundred days on the job, I did more listening than to do any action. Okay, so I tried to understand all the issues, and together with my team, we created this vision that what we need to do to implement in the years to come. So you were like a CEO taking over a new job, like Meg Whitman, I said, Meg, what did you do on the first day of the job? It was the first day and the first hundred days, I always just see customers. You saw your customers, you sought them out. That's correct, because you have two approaches of being a successful CEO. You can be a tactical one in a sense that you can look at the data, data management and everything, or you can have a more strategic approach. And when the CEO came to me and invited me for the position, he told me, Mario, I want data to be looked at as a strategic asset for your business. And I want you to be as much strategic as you can, as you can, to help our business to move forward. So in a sense, I like running a business on my own. So really, it sounds like the key to getting started is focused on the business problem and the business issues and maybe the business opportunities rather than the data, which maybe is a little counterintuitive to the role of a chief data officer. Would that be accurate? It's very accurate, because going back to the return on investment question that Dave has just asked, you're there to try to make a difference. You're there to make money or to save costs in an area that you have some inefficiencies. So in that sense, what I did and what my team did with me, the team that I led, we're looking at issues that we had to solve. From the day one, I told my organization, we're going to be working on the world, out world in a sense. We're going to be working for the sales organization, for the products organization, for the people who are there in front of the customers. We're going to be helping them. We are going to be a service provider organization for them to succeed. Talk a little bit about the organizational structure. So where does the CDO really sit in the organizational structure? And how do you and your team, how do you help your team engage with the business? Do you have team members sitting in business units? Or how do you actually structure, where does the CDO sit and how do you structure your team throughout the enterprise? Okay, in our case, data was so important that this chief data officer position was reporting to the CEO directly. So the CIO was my peer, the leader of sales, the leader of marketing, the leader of products, they were all my peers. I was reporting directly to the CIO. Depending on the company, you see some CDOs reporting to the COO. Depending on the company, you see them reporting to the people at marketing. What I don't think is the best move is for the CDO to report to the CIO. I don't buy that because data should be seen as a business issue, not as a technical issue. And in the sense of my organization, I have created several areas and I invited my team in people that are responsible for data acquisitions, people responsible for the data operations that were bringing the data that were coming in and making sure that all the process was streamlined until the data was combined to give information. I had people with responsible for data quality. I had people responsible for performance improvement, people with Six Sigma and Lean backgrounds that were looking through the whole process and not just from the data perceived, but all the functions involved with that. And I also had the data architecture organization that were responsible for helping IT work together with the product team to define new solutions. At one time, our organization had 120 people, which was from what I have seen here on the research on the MIT, was one of the largest data organizations in the world. So we're here at the information quality symposium. So talk a little bit about the concept of data quality, information quality in this big data world. You've been in this business a while. How has the concept of data quality evolved as we're starting to talk about quote unquote big data, new data sources coming from often outside the enterprise, whether it's social media data or it could be machine generated data, whatever the case may be. How has data quality, the concept evolved? And what does it look like today in a big data world? We have a saying with data, people say, if garbage in, garbage out. So if you don't have data process quality in every step where your data is being processed in your organization, you will have a problem and you'll be hurting yourself down the line. So the term data quality, I think it should evolve as being part of what we call this framework of data science. Data science in per se is how you can look at raw data and make that data bring value to you, to your organization, to your business. So in a sense, quality for me is one of the pillars for the data science framework. So that's really part of what the job of a data scientist is to apply those data quality measures, it sounds like. That's correct. That's correct. And I have the lucky to have a very competent team in place in my organization. People with Six Sigma background, that helped a lot to create those metrics. So for any organization that wants to succeed, try to look at the Toyota production system, try to look at the lean methodologies, try to look at Six Sigma that became very popular because of General Electric GE. So look at those things outside of the data area because they really apply for the data concepts. Right. So data science is more than just knowing statistics, it's knowing some of these business concepts and Six Sigma and the Lean model, if you will. So Maury, you talked yesterday to the CDO gathering of unstructured data was your topic. We had Stuart Madnick on before, he gave us a three dimensional model and one of the dimensions he might have talked to you about this was the kind of data, the traditional data, sales data, inventory data versus what he called a nouveau data, Twitter, which is essentially unstructured. And some people don't like that term, I'm comfortable with the term unstructured or semi-structured, whatever you want to call it. What was the discussion like? What were you talking to the CDOs about and what was the feedback? We have there people from a financial service organization, people from health care and basically we're trying to look where unstructured data makes sense and how do you go to do projects and how can you succeed on that. And specifically when you see a world as of today and that the US government is building near Utah a 2.5 billion dollar facility to monitor what's happening in terms of communication, social media and everything, they have a very large unstructured data problem. And the point that I brought yesterday, we should bring those guys here next year so they can teach us what they are doing, how they are creating the metadata, how they are using a dictionary, what data quality issues they are dealing with every day. Should not bring them here to discuss if they are doing is right or wrong. Forget about that, but let's talk on the tech technicalities that they are doing, metalloids they are applying for that. And yesterday we were able to figure out that not everybody is using their internal data that they have inside the organization. So if you want to go there and get information from social media, from what the media is telling about your company, your brand, or you get a brand sentiment that we interfere in your marketing campaigns. You should also look at the internal data that you are being generated for emails that have been sent for your contracts that probably are not in place at this point. So you have a lot of internal data that probably is not being used at this point. So thinking about sort of the traditional issues around data quality in the world of what some people again call structured and thinking about your role at Boa Vista. A part of a company that does credit reporting. So obviously it's very important to have data quality there. And then compare that to Twitter data, we're a false positive, no big deal. In the one hand you've got things like financial services in healthcare, which is critical, and then you've got this influx of new data. How do you see the change in the structure of the data affecting the data quality imperative? Yeah, depending on the business that you go, granularity plays a large role. And in our case, it was very granular. We have to go to the quality down to every detail for a customer or for a company like an address, phone, or credit history. Or chase history that a company or individual might have in the past. So we have to be really, really careful. And when we apply data quality methodologies, we have to be even more paranoid on dealing with that. Because any error could hurt our business tremendously. So quality was played a really, really strong issue on every day for our business. Well, so can you take that credit score? Okay, great, I got 780, I'm good, well below that, I'm bad. Could you apply that even in a traditional business, like credit scoring, to social media? I mean, come up with some kind of social media, social graph, propensity to swear in public on social data. Do you see that or are those two separate worlds? It sounds like the latter. No, you're absolutely right. Take a company like Cloud, what they do, you have your social media score. In a sense, what they're doing that, saying how many tweets are retweeting, or how many posts on your Facebook people are liking. In a sense, Cloud is a social media credit score company. So I bet that the founder of Cloud, he looked at the credit bureau industry for that. And with the credit bureau, we have to look outside to do our jobs better. And what I did as a CDO there, I implemented concepts that I took from manufacturing companies. What I told you about the Toyota production system, Six Sigma from GE, and also the Lean methodologies, those are companies that were applied first for manufacturing companies. And I used them to manage digital assets. Now, you're also chairing a session on new trends and directions in data science. What are some of the things that you hope to learn at that session? Well, I hope to learn a lot because I'm bringing two individuals that are the top notch guys from what I have done my conversations recently for the last few months. I was very impressed with Matt, with Andrew. And I brought them here to this conference for them to share what they see as the future that when we go beyond Hadoop and MapReduce, Hadoop and MapReduce are great technologies. But they do not solve some of the problems that we are seeing. They create a lot of problems. That's right. And we're going to be talking about how artificial intelligence makes sense for you to implement a high end data quality initiative and where we're going beyond Hadoop and MapReduce. And those are the discussions that we're going to be having in that. So you're saying using analytics algorithms in a machine to improve data quality, and then the obvious follow on question there is the bromide and big data is you can't take humans out of the, humans are the last mile. So if you buy that, or can machines do most of it in terms of a data quality standpoint? Machines can automate, but we're now going to be able to take the human side of that, okay? And a lot of tasks that I do as of today that my team does and all the data professionals that they have to do are still not very automated. So there's a lot of things that you can automate, make their jobs more brain oriented and not as like labor oriented. So that, because currently it doesn't scale well. That's correct. It's way too expensive to scale. So in your vision, to the extent that you can automate, let's say, whatever, 80%, 90% of those tasks, are you then at the point where you can scale much more seamlessly? We can scale and we'll be able to look at new data sources. We'll be able to create new solutions that they're not here in this market as well. For example, when I joined Boa Vista and I understood what a cred bureau company does, we're not just about the cred bureau. We're about human behavior. We have information about human beings and how do they interact with each other. So when you go to a store and buy a cell phone, for example, and the cell operator checks your cred history, he's looking at a lot of things that happen in your past life that it's part of your human behavior. So cred scar companies have to look at that, have to look at new sources of information that can improve or can lower your cred score throughout your life. Yes. You're talking about new sources of business value being created. How do you think that'll change the spending profile? I mean, because let's face it, this entire industry has been under huge budget pressures for at least a decade. Oh, I've got to spend more on my data warehouse and it takes me forever to get another channel into my system. Do you think what you're describing will change the spending profile and the investment profile? I think it will change a lot. Let me tell you from my personal experience. I moved here to the United States in December. I couldn't buy a car here because I didn't have any cred history in the United States. Even though I had, I carried 15 years of good cred history. You're a great customer. You're a perfect prospect, but you can't buy a car. I can't buy a car here. Can you believe that? So it took me a while until I could get a car because the cred score company here in the United States, they were not using information from other countries. So in some sense, there are a lot of inefficiencies. In our industry, that's very big data-oriented. So if you look at other industries, you see those kinds of inefficiencies, and that's how the CDO plays a large part of that. So we're getting close on time, but I wonder if you could give one piece of advice to other maybe CDOs out there, they're just kind of getting started in their job. And what piece of advice would you give them to be successful in their careers? Not necessarily a technology related, but career advice for a CDO, what does it take? What's a good piece of advice in terms of surviving out there in this relatively new role that may be looked on with a little bit of suspicion from other parts of the organization? Yeah, not every data person will be a CDO down the road. In a sense that not every soldier will be a general down the road. So a lot of skills are needed if you want to be a CDO. You have to be good at communication, you have to be good at data, you have to be good at statistics and mathematics, you have to be good at dealing with people, and you have to be a great understanding on technology as well. So you have to be a more complete profile to succeed. So it's really a blend of skills it sounds like. That's right. Fantastic. Well Mario, thank you so much for coming on theCUBE. We appreciate having you. Great conversation. We hope you'll join us again next time our pass cross. Excellent. So yeah, and good luck with the panel and it was a pleasure meeting you. So we'll see you soon. All right, keep right there everybody. This is Dave Vellante with Jeff Callie. This is theCUBE. We're right back with our next guest right after this.