 Live from New York, extracting the signal from the noise. It's the Cube, covering Spark Summit East, brought to you by Spark Summit. Now your hosts, Jeff Frick and George Gilbert. Hey, welcome back everybody. We are live in Midtown Manhattan at Spark Summit East 2016. Spark is the latest, newest, greatest thing in big data. So we wanted to get out. It's our first Spark Summit. Actually, we did a little fly-by last year. So we're excited to be really where the epicenter of big data shows is in the East Coast, the Hilton Midtown. And George Gilbert from Wikibon, who's joining me for this next segment. Good to see you, George. And it's good to be here. And we have one of the new and giant entrants to the big data stage, SAP, in the form of Ken Sye, who is VP of Data Management and Cloud Platform. So Ken, tell us a little bit about how SAP was among the first, if not the first major vendor, to the in-memory transaction processing database space and then how the strategy evolved, the product strategy evolved to handle big data, especially what now people collect in Hadoop. Right. And I think that's a great question, great setting. And I think, well, the commentary will kind of illustrate really the need for a company who are kind of just specifically focusing on big data and also SAP, company like SAP, who has been doing real-time enterprise for quite some time to work a lot more cohesively together. So SAP, as most people know, that we have been in the business, a real-time business solution for the last 40 years. R3, the R stands for real-time. Throughout the years, we kind of shifted through different, I would say, data computing platform, all the way from mainframe to client server in the last 15 years into in-memory computing. I would say that with the entrance of SAP HANA into the marketplace in five years ago, we have really kind of defined and shepherd the in-memory computing edge to the enterprise application, in-rise computing workflow. And while HANA was created and to kind of dispel some of the prevailing notion, it was more than just accelerating analytics and putting in memory. You obviously can do that. And frankly, why SAP was innovating in this space for 15 years ago was already doing that. We figure out a way to accelerate reporting by caching data directly in data cache. But what HANA was aimed to do was to completely simplify a modern application architecture where some of the redundancies as application developers that we have to go through, we can completely eliminate, such as the need to build materialized aggregates. Now, the system becomes interactive and we can deliver new capabilities into the end user building to ERP. That wasn't possible before. Let me just interrupt you for a second. For our viewers who may not know the SAP sort of schema, just give us a refresher actually for the database question, materialized aggregates and how much of the underpinning of SAP those form. And I think this is not just true within ERP, but if you take any information processing system, any enterprise that needs to go through, it's very typical. And I use this example in the keynote this morning, just post an invoice. Whether you do an ERP or not, there is a interaction, right? You have to post a journal entry. You have to typically an ERP system and there is an aggregation and summarization of that invoice amount into different factors and dimensions that the company needs to report into. There's a summation and a total. That probably needs to be pre-calculated because as financial accountant, you want to know what it is. There's index and there's a different assertion. There's a lot of different backend processing that needs to happen to make the system interactive. SAP ERP was built that way, right? It was architected that way because the latency involved in pulling the data and doing it on the fly was too long, right? And enter HANA, right? And HANA and the example I show, we were able to reduce the number of tables that touches the invoice from 15 all the way down to four because these just keep the rawest form of information. And then you leverage the power in memory computing like HANA to dynamically create the summarization view that you want to see, right? And that's kind of where we shepherd into the eight, what I call the H, the age of the real-time computing, right? The true real real. So as you said, R3 and R2 were meant to be real-time system two and real-time system three. Tell us beyond just summarizing like an invoice or whatever, tell us what's possible now when you don't have to sort of pre-compute things, but that when you make a transaction, you can ask questions interactively, what's possible? Yeah, so I think one of the holy grail and ask business, owner always wanted to do is do simulation, right? The ability of asking multiple what if, and then settle on what actually it's the most optimal scenario I want to do, right? Take a simulation, the forecast. What is my revenue forecast, right? What is actually projected revenue? When I close my book right now, based on real-time data, can I actually have multiple versions of these forecasts based on the way I want to organize my company or organize my data structure. These, you can take that example and multiply it in different line of business, right? That's kind of the really the possibility of what a real-time business should be able to deliver. And we have been hampered by really inability to tap into what the computing infrastructure and what the software instruction platform was able to deliver. So HANA kind of solved that. We kind of simplified it. We changed the game with that. And that's why HANA has been so successful in the marketplace for five years, right? But maybe to your original questions, why didn't we stop there, right? Why didn't we put HANA and use it for big data? We do, right? HANA is a multi-data processing engine, right? Can support our structure data, natural language processing, all the different things. But the reality of what the customer are facing today is big data is everywhere. It's not one large cluster, right? Cons are the days of putting every single centralized. I understand everything and I give the results or understanding the big data signal that I can take your action on. Now, with advanced of the IOT, let's say advanced of digital business where everything that a company offer can be a service. We call it everything as a service, right? Now, everything involving that process is digitized in some sort of data form. You have to track it. You don't need to understand it. And that's why big data is literally around everywhere. Multiple way of managing these type of large data clusters. And so from the data processing point of view and what needs to happen is we need to help the business establish what we call a very easy to use data, end-to-end data processing framework across multiple of these computing landscape. Whether you're running enterprise computing workload like S4HANA ERP or massively distributed computing workload like just a data lake that you want to do some data exploration. You should have a framework that actually connect that together. And that's really the reason why SAP HANA Bora was born and why we launched the product. We saw the need in this area and we wanted to actively work with the community and kind of bridge these two worlds together, the distributed computing world and the enterprise computing world. I think, Ken, you can address as a big incumbent, you guys have ERPs all over the place. You've got a lot of the core data, right? That everybody wants to get access to. On the other hand, as an incumbent, you have that legacy. Talk about the change really in strategy inside of SAP to deliver value to your customers as the sources of data, the sources of input have gone beyond what was really a closed ERP system and now they want to aggregate all kinds of stuff that falls outside of what traditionally was held inside of a controlled system. So one of the statistics that I think SAP internally says that 74% of the world's transaction revenue run through a SAP system in some way, right? It does mean that a lot of transactions that touch money, right? Revenue costs anything. There's a lot of value in terms of exposing that into other workflow, right? Whether data science exploratory workflow or other way to consume that. And in this day and age where business are being digitized, right? And every element of the business processes should be available as a services. SAP needs to have stands in the position to actually offer that more broadly to the entire customer and their business network. And I think that's kind of the reflect your original question, the change of strategy. This is absolutely critical for SAP to help customers successful. That we want to expose the ERP system. Now running completely in HANA, right? That all the information around the business, core business policy can be exposed as let's say a service element that someone can consume, interact with. So let's talk about now all the data that might surround the core business system that beyond SAP and how Vora helps access that data and then tie it into the, you know, back to Spark where we have this new unified sort of compute engine for processing data in sort of any and all frameworks. How does that all fit together? So just take the one of the customer example that we have, I think that will illustrate this knee very clearly, guys. One of the use case that we're currently working for SAP HANA of Vora is in the airline industries where they invested in all these telemetry information directly in the plane itself, right? And the whole idea was kind of reading these telemetry information and understanding the insides of it, understand the threshold of it and you can actually determine a preventive maintenance schedule. But what the customer actually wanted to do is move way beyond that because that only gives you a signal in terms of what needs to be done. What actually has a direct financial cost is can you actually take that early signal and put it in what we call the MRO process in the ERP, maintenance, repair and overall because you have to line up all the experts who can maintain their airplane, right? Line up the spare part, line up the equipment, all that takes the scheduling process and now you're kind of taking big data inside, make it very actionable in the process itself, right? And to the airlines, there's a very direct financial impact because to them every hour of delay represents $10,000 cost, right? So now you see the value of some of these big data insights move, you know, rather than kind of staying in one silo when it's embedded into the core business process what a company runs. Now you're actually getting some very direct benefit. I was going to say, just the whole internet of things you guys must be so excited to have this completely different set of data inputs now, tangential to the core business applications that can now drive business behavior, business decisions and tie it directly back to financial impact. Yes, and I think that we're definitely in the golden age of data, right? Where data is the currency of everything. Data doesn't come from transaction system anymore, it can come from products, right? And user interacting with the product, from sensory information that we sell directly to the customers, right? In the customer and the partner, how they feedback even potentially from social media, that understanding and tapping into the richness of it and taking that and embed it into the business process that we eventually want to deliver. Really is the vision that we're trying to deliver. How many integration points are there like done and then on the roadmap for things like taking predictive maintenance and putting it, embedding it in the MRO workflow? That's an amazing example. Yeah, so I think you're asking about some of the specific use cases where we can apply this type of technology, bring let's say ERP data and these amazing surrounding contextual information, right? To enrich the entire decision-making process. And our hope is to help the business owner make better decision and also potentially to automate the process underneath. So decision can be made automatically. Now, we are definitely in the process defining a lot of these different use cases and you know that in SAP, we have very deep industry knowledge across 25 industries in line of business, right? So literally in every single industry, we have a concerted effort defining what we are doing in those industries to bring the value of, we call it, completely distributed big data that you collect either from IOT or from any other source married with the corporate data and the business data. And then you clean the right insight and you automate, take the business process, do you have, is there like a roadmap of like, well we have a dozen now, we're going to have two dozen over the next, two dozen more over the next year. Yeah, that's a good question. Right now we don't have one, but maybe it's something that we have to internally think through, right? What we have done on the roadmap side is more on delivering horizontal capability roadmap, right? As we want to aggressively build all the capability as they be HANA-VORA to kind of really fulfill the vision of the complete digital enterprise for our customers, that's very, very critical. And there is very, I would say right now, very IT specific type of scenarios that we see the application of VORA, right? And we're riding on Apache Spark and working in conjunction with HANA, right? Tell us the spark tie-in, is spark the preferred sort of execution engine on top of VORA? I think really due to the adoption and maturity of Apache Spark, as you can see in today's event, it is definitely the execution engine that VORA actually leverages to do distributed computing. Now, who knows what the future will hold? This is a very rapidly evolving area. So VORA talks to Spark, which talks to Hadoop? Is that how, okay? Yeah, in a kind of logical type of thinking that is the way to think of it. But VORA itself is a memory computing engine, right? It took the best of what SAP HANA has to offer. Really the very, what we wanted to do was very simply, VORA, the promise of VORA was to deliver this whole HANA-like experience, right? But on the big data set, but also give it real strong, deep, business semantic understanding. So that when you actually harvest these data that may be on structure or sensory information, there is a rationalization for the business audience to take some action on it. That's really the purpose of why VORA is there. Now, there's always going to be foundational component technology like Apache Spark, who has been really great and then really kind of softening a lot of, I would say, distributed query processing workloads. We'll continue to write that wave and continue to make it more robust. But we'll, just like VORA did in version one, we have been really aggressively adding capability that the core business audience actually need, that are just missing in Spark environment as extension. That's why we plug into the Spark environment and we try to make it more robust for enterprise workload. So we're in the platform hardening mode before we get into the vertical applications. Yeah, okay. Yeah, well that's great. Well Ken, unfortunately we're out of time, but a lot of excitement happening at SAP, leveraging these new technologies, leveraging open source, kind of opening up, which you guys have so much of the core information with really a different way of looking at the world as well as being able to bring in a whole new set of information to drive your engines. Yeah, I think so. I think this is definitely exciting time, not just for SAP. We certainly, we are a very major participant, right? Just because the workload and the solution set that we deliver to all our customers. But I like to tell the community here, also the people here, right? Collectively, these are our joint customers. The way that we are solving joint leave for our customer is a next generation problem. How to actually build a foundational data processing framework that will be massively distributed across multiple different computing clusters. How do we deliver that insight to the customer that they take business action? That will not change. It's up for us to figure out how to do it. All about solutions, right? It's all about solving business problems. Talk about tech. George likes to talk about tech, but at the end of the day, are we solving business problems? So Ken again, thanks for stopping by theCUBE. I'm Jeff Frick with George Gilbert. We are live in Midtown Manhattan at Spark Summit East. We'll be back with our next segment after this short break. Thanks for watching.