 Live from the campus of MIT in Cambridge, Massachusetts. It's theCUBE, covering the MIT Chief Data Officer and the Information Quality Symposium. Now, here are your hosts, Stu Miniman and Paul Gillan. We're back at the MIT Chief Data Officer and Information Quality Conference in Cambridge, Massachusetts. This is theCUBE. And joining us right now are Justin Litz, the Vice President of Enterprise Data Management Team and Enrique Pinto, the Director of the Global HANA Center of Excellence, both at SAP. Gentlemen, thanks for being with us today. Thanks. Thank you. Justin, tell us what does the Enterprise Data Management Team do at a company that makes primarily application software? Sure. Primarily, we support the sales teams and their initiatives from a data perspective. So, specifically at SAP, we support the global customer operations, making sure that their processes run efficiently and effectively, that there's no breakage. So when we have opportunities or deals, so we really make sure that the data is what it needs to be so that we have no process breakdown or failures in terms of making sure that the contracts can go through and the customers get the software they need to run their business. And why are you here at this conference? Have you been at this conference before? I don't remember SAP sponsoring in the past, but why is this conference important to you? Yeah, great question. The reason why it's important to us is we like to go out and explain the journey that we're on to the customers. We like to hear from our customers as well where they're at in their journey. So, info sharing is one reason for that. We're internal to SAP, where we actually have to implement the solutions that we have. So it's one way for us to share our journey as well as get the feedback from our customers as well where they're at in their journeys. Because a lot of this is really sharing best practices and how do we evolve our own capabilities. So we have been here before, not necessarily as a speaking engagement, but we've been here before to learn what is going on out there in the world of the chief data officers and what's really happening out there. Enrique, one of the journeys we've been hearing about is it was before the application was where the data was really owned. And today it's really more the organization that owns the data. What are you hearing from customers? Where are they on that kind of CDO journey and how does that fit into your interactions? Yeah, great question as well. I mean, it's true. I mean, and also it's something that before we jump into the question, actually reflects what SAP has done internally in terms of portfolio strategy. And I'm not sure it's effective, you guys know, but SAP actually makes less money today from our classical year P portfolio than from our data-driven portfolio. So SAP actually made the decision a while ago. I mean, it started with the BO acquisition in 2008, but we also made some internal investment in that regard, especially with the HANA development exactly to attend an existing demand from our customers where exactly what you're saying, we saw this shift from a process-oriented kind of decision-making process more into a data-driven decision-making process, right? So companies and organizations, they are demanding more analytics, more data to sustain those kind of decision-making processes. And they are organizing their teams around this as well. I mean, in the past, you saw BI managers or BI directors under the CIO, and today you're seeing more of the data offices being split from the technology team and being put in the middle between technology and business, exactly because the organizations are perceiving the importance of having this auto-sufficient and independent organization to drive those kind of decision-making. Yeah, we've talked some about the technology, but those organizational changes are pretty big. How's that impacting your go-to-market, where the CDO, is that a CEO thing? How's that impact the CIO? I mean, a lot of dynamics there. Maybe it'll help us impact that. And I think it's completely true, and I think SAP is actually in a great position to compete in that market because we're differently from the other technology companies. We did interact with the CIOs, but we also had a very strong business relationship due to our application background, right? And I think actually put SAP in a great position to be able to leverage our business background into this kind of conversation with the CDOs. And maybe if I play off of that a little bit, when we started off in our journey, we were in the sales organization. So that was one of the interesting pieces is that we reported in many years ago into the sales organization because they recognize the importance of really making sure that they could have a data-driven organization. So there was data managers within the business that really helped build out that competency. So our initial focus wasn't even though we were a software company to implement master data management. It was really making sure that we understood what the processes were to be efficient and effective. What were the critical data elements that we needed to manage and govern? Really partnering with our business units to make sure we understood what their business goals and objectives were, and then really be able to speak that business speak and then really drive our initiatives that way. So when we started off again in the journey, it was with the chief operating officer over time we've now moved into the CIO organization, but that's one of the things we see within our own company as well as others. A lot of times the executive sponsorship could change. So one day you might be in the operating area, another time you might be in the CIO area, but again, it's really kind of that evolution that we saw, but for us, it definitely started very business-focused, business-process-focused, and then really try to build that credibility out and then really get involved in the tools and the technologies to help out. So you mentioned governance, that's certainly a strong theme at this conference. How do you implement a data governance model? What's the data governance model at SAP look like? Yeah, great question. For us, it started with some outside-in-learning, looking at what our other customers were doing when we started off in the journey, and the aha moment for us was when they said there's probably multiple fields that you need to govern or manage, but your systems could have two, 300 fields. They're not all the same level of importance or significance. It's really identifying what are the most business-critical or most business-relevant. Those are the ones we needed to focus on, and that was that aha moment for us, was like, oh, I get it, it's not all these fields in CRM that we have to kind of manage and make sure accurate. We really need to make sure we understand what are we going after, so if it's opportunity or sales, what are the fields that drive that process? So we were able to come away with, at that time, the top 20 fields to start off in our journey. From there, we've evolved to 75 fields that we really govern, manage, high degree of account standards, what are the rules, practices? So for us, it was really focused on defining those fields and working, but it comes from understanding the business process to which you support. So for us, it was sales. Other groups in finance might be in terms of contracting, so those fields become important or relevant. So that's how we manage the data itself. From an organization perspective, we also have each line of business, and for us, our lines of businesses are like sales, marketing, finance. Each of them are responsible for a data domain. So when I was in sales, I was responsible for a counter customer. For instance, marketing, they were responsible for the contacts. So that's how we do, and we have a data council and we meet on a monthly basis to interact where we have commonality or pain points or challenges ahead. So it's kind of twofold. We have each line of business has a responsibility and then we come together as a data council to share our experiences and our plans and roadmaps in what we're doing. What to ask about, Enrique, you have responsibility for the HANA Center of Excellence. HANA really one of the, really a groundbreaking product in many ways, combining the OLTP and the analytic processing in one engine. As we move toward more of a streaming world, and we're seeing with the evolution of Spark, huge interest in streaming and real-time data right now, what impact does that have on information quality? How do you maintain quality in an environment where data is sort of streaming in from all over? Right, that's a great question as well. So yeah, I mean the Hadoop standard has actually brought some paradigms that we have been built for the last five, 10 years in data governance, right? So usually we curate it very well the data model before data was sent to the databases or data warehouses. And so we had a very well-predefined model that the data had to fit into. Now it's differently because I mean data is coming in, it's being gested at whatever model it's generated in the data sources and the curation actually happens afterwards. So quality is not something that's, I mean it's still prerequisite to the analysis to happen, but in the data ingestion process it does not happen in the ETL process as it used to happen in the past, right? So quality today is actually being built into the applications themselves. That's why you see a lot into the Hadoop kind of applications. I mean because the applications built on top of Hadoop need to be aware what kind of data models, data semantics they need to adhere to, to be able to really derive insights out of those data pools within the data lake, right? So we don't have very well-defined data models. So you need to be sure that the application understands that. But it's a great thing that with HANA, what we have been able to do and just a parenthesis here, SAP is not really into the data storing business. We are more into data processing. I mean, from our application development background we don't really wanna own data from a data storing perspective, but we do own or we wanna own where data is being processed and it's being transformed because we understand, we believe we really understand what the applications, the business applications need the data for based on our background in our business history. So we actually have built a lot of those quality controls and quality checks and also quality semantics within the HANA platform. And you can actually put HANA together with Hadoop in a virtualization environment. So we can enforce those data quality kind of control checks on top of a Hadoop based environment without having to disrupt your Hadoop based architecture. And internally for us, when I first started a lot of our data quality reporting was moved, right? We would do ETL, we would move the data from one place to another to do our data quality. We would actually have executive dashboards where we could monitor and trend the quality, those key fields that I mentioned before, but then drill down like the data stewards to really see where the problems were and then we would enrich and get those records accurate. Recently with the evolution of HANA, being able to do it more real time near time, that's what we do today. So we've actually evolved from having to move our data on a daily, weekly basis to actually reporting the data right within the system itself. So that's kind of how we take advantage of some of the technology, but at first start with making sure we understood what the fields are important, getting the reports and the structures and the processes in place. But now we don't have to wait a week necessarily to get the data, we can see it real time. So as we're fixing and improving the records, we can actually see it right within the dashboards themselves, leveraging HANA. I'm wondering if you've got any customer examples of how you're helping customers kind of meet the mandates. Is it real time, is that a mandate you're hearing or are there any other specific general initiatives that you're specifically involved in? Yeah, maybe I start. For me specifically, yes. I mean, I think it all depends on the customers. We get pulled in a lot to talk about the journey that we're on and really try and help them understand how they begin. So it all depends. Many customers are at different levels in their journey. Some just starting off, some more evolve. So part of the dialogue we get involved with is really helping them understand how we got started, where we're at, and then put that wrapper around it to help them understand and then process it. We then work collaboratively with our sales teams and our product teams to really help do. And that's me where you could add some. And it's a great reference that they have been to our event at Sapphire. They are also in YouTube in our channel. So I can mention their name. So Walmart actually used HANA on top of their existing data warehouse in the same way I was explaining on Hadoop. But without disrupting their existing data management processes, without replacing their existing architecture, we were able to add value on top of it by, again, owning the data processing engine, not the data storing engine. So with HANA on top of their data warehouse, Walmart is actually able to better utilize their existing data warehouse for reporting loads and for more advanced analytics kind of use cases. They're using HANA to support those advanced calculations and mathematical algorithms. What are your customers asking you to do on top of Hadoop, on top of these unstructured big data platforms? Are they asking for new applications? Do you have different support in your existing applications? Yeah, applications definitely. Let me start. Applications definitely, especially because SAP has this background, right? So especially on new fields like Internet of Things, for example. So IoT is a field where you actually see a lot of those kind of demands. But especially today, what I hear more is data quality and data governance. Because, I mean, again, they have been able to use Hadoop as a store everything kind of platform. They have not been really able to curate what they have been put there. I mean, they're not really sure what's the semantics of the data. They say they're going to use this platform to be able to derive single view of the customer. But they're not really sure what kind of information from the customer they have inside their Hadoop data lake. So that's the usual demands I hear from our customers. Data governance and data quality into the data they're storing to Hadoop. And for us to add to that, we really are pushing kind of that data as a service, really more of that data virtualization. So a simple example we have is white spacing, right? So we can meet with someone in the field where they have done an effective campaign selling certain products, let's say HANA, to a certain industry. So they want to get more names or more information into the system. Today, it may take a few days, few weeks to get that in. We want to make that a click of a button, and make it more virtualized. So we're looking at partnering with the HANA team, others, where it's really more of a click of a button, where it's more of a hub, and being able to get that information in. So it's really that data virtualization for us is what we see as making it more seamless to get other data sources to really be able to aggregate and enhance what we're doing as a business intern. We've got to wrap it up. I want to thank you both for joining us, Justin Litz and Henrique Pinto of SAP. Thanks for talking big data with us. Thank you. This is theCUBE, MIT's Chief Data Officer in Information Quality Symposium. We'll be back in a moment.