 the Union Square in the heart of San Francisco. It's theCUBE covering Spark Summit 2016, brought to you by Databricks and IBM. Now here are your hosts, John Walls and George Gilbert. Well, hi everybody, John Walls here on theCUBE, along with George Gilbert. We continue our coverage of the Spark Summit 2016. We are live in San Francisco, 3,500 strong attendees this year at the Spark Summit by far the largest attendance they've ever had, and that really, I think, reflects what's happening in this community. With us now is Amit Sattor, who is a Senior Director of Solutions and Marketing at SAP. And Amit, thank you for being with us here on theCUBE. Nice to have you back again. Thanks for having me. A repeat performance, right? You've been with George just a couple of months ago. Right. Let's kick off just with the 30,000 foot here, about Spark, what you're doing with it, how you're integrating that on the HANA platform. Yeah, absolutely. So Spark is a great framework for processing data, and we see the value out of it. What we get out of a lot of our business users is they're collecting a lot of data, and they want to make sure that they process that data efficiently before it's ready for consume within the enterprise system and enterprise processes. So we are making sure right now with the new product called HANA Vora, that we first structure the unstructured data. We want to make sure that data is ready and at the same service level like in-memory computing to make sure that we can bring that data at the same service level in HANA so that can be consumed by the business application, business processes, to make sure it's efficiently providing value to the business audience. So what are your customers telling you then in terms of ultimately what they want to exact out of all this? I mean, I assume they realize the treasure trove of data that they have at their disposal, and you're helping them see that. But ultimately, it's like, there has to be a value prop here for them, right? I mean, what are you hearing from them? So we hear constantly from our customer that all of us have Hadoop projects. It's in every department. But we're not really sure how we're going to use that data for business value. So, and then there are two challenges there is the size of the data is pretty big. And as well as like how, what kind of processing is needed so that it can be brought together onto the enterprise side. And that's the main challenge they're facing, one on the business side to show how they're going to rationalize the investment to see the ROI, but also how they're going to be able to do things that they have not been able to do before. But they do understand that Hadoop is a big part of the environment. They have a lot of data that they've collected into the systems. But their challenge right now is how to effectively process, get value out of it, and flow it into the enterprise day-to-day business processes so it's not disruptive to what they're doing today. And can you get them there? I mean, you talked about Hadoop. I mean, can you get them there gradually then or sometimes kicking and screaming into the new systems, the new services, whatever? Or do people want to hang on to some familiarity, some comfort, you know, something, you know, I think we're all kind of that way, right? Yeah, we don't, we're not agents of change. Yeah, we go slowly. But you're trying to introduce some new concepts and new capabilities at the same time. Absolutely. So our main goal here is to bridge these two worlds, to bring the enterprise and big data together so that it's seamless. And in SAP, we always proud ourselves in being non-disruptive and providing a simple path for them to go step by step into whatever end goal is. And right now in Hadoop market, it's actually, they are the ones who want to get there as fast as they can. The challenge they have is how? And what do they need? So our goal here is to simplify that experience as much as possible by providing the right solutions and as well as the right processes to make it happen. So in tomorrow, whatever they're doing, whatever the screen they're using, you want to make sure that this big data is seamlessly part of that experience. Whether you are the product manager is trying to forecast the data and historically you have been forecasting it using past orders. And tomorrow you want to forecast by using social data in addition to the past historical data together in the same screen that you have been using. So from day-to-day perspective, it is not causing you more work but you're getting more benefit out of it. That sort of, I want to key on that last thing you were talking about. Understand that keep the same screen means upgrade the app but don't upgrade the user in terms of training. But there has to be a lot of work under the covers to be able to take a consumer packaged good app to have a whole new sort of algorithm for forecasting and replenishment based on big data. What parts of the business suite, the old R3 derivative and then S4HANA has that been thought through and built in? Absolutely. So S4HANA as you know architecture is built on in-memory computing. And right now with Spark, the Hadoop is coming to in-memory computing. And with Vora, we are trying to bring the same level of for the enterprise use. So we are trying to build now an architecture where people can keep the data in Hadoop where it is, be able to process it, refine it, explore it, and then eventually when that data is consumable by application, we have in-memory platform under S4HANA already. So right now we have two stacks, the HANA in-memory stack or service solution as well as HANA-Vora in-memory solution using Spark that both can provide them the same service levels as well as some experience. But for the longest time when SAP talked about big data, they would say big data and in-memory, they'd say, oh, we can read meters every 15 seconds, all across Europe or just to take an exaggerated example. And then they'd also say, and we could calculate how many people's receivables are overdue in 20 minutes instead of eight hours or two days or whatever it is. But that means the app has to be rewritten to know that it can take advantage of more data. Which parts of the old app, the old business suite, and which parts of the new one are designed to leverage not just in-memory, but the faster, but big volumes of data that give you more precision? Right, so that's a big conversation because a lot of these applications have so many different components. And I understand where people are coming from because so far with the data that was in Hadoop, a lot of conversation from SAP and S4HANA and HANA was about integrating that data and bringing that into HANA platform. Over the years, what we realized working with a lot of our customers is that data is growing so big that we first need to pre-process it before bringing into HANA. And right now our approach is that, that we want to first pre-process it, pre-compute it, and then bring it into HANA so that it is like learn and deploy type of model. And it will, S4HANA applications are being rewritten, but there is distinction between changing the existing process which we don't want to disrupt people with versus changing the experience. So we are rewriting those applications to take advantage of in-memory parallel processing, like massively parallel computation, but not necessarily change the way business like to do their best practice processes. But just to be clear, there's a distinction between the experience and the business process. So doing a better job forecasting is the process. You want to keep the experience the same. That's so that the user's screen doesn't really change or change much. Actually it's the- The other way? Yes, the process that have been said that are like designed to build across the partners and customers, they need to stay the same because there are regulations and there are other customer or partner engagements related, but the experience that is provided them in terms of the type of data they can consume, type of interaction they can have with data, those all change, right? Like, and that changes by process to process, whether it's, I would say, MRO purchasing, like, you know, bill of material, like one of the perfect example of the data that explodes is bill of material. And with our experience, we have been able to optimize that experience in memory because we understand when you explode bill of material, which parts explode into subparts, and that's almost like a graph that gets exploded. And how does that get accessed is the how you will load the data in memory. So we know those competing patterns that we are now building into applications to make sure that user experience is a lot better in terms of their interaction as well as the engagement with it. But then to drill down on the process, is there's the business process, like the forecasting process, is that sort of stable so that, for regulatory reasons, or because so many more constituents are involved than just one end user. And so if you were to change that, you would break a lot of other things, including other processes that depended on it? Oh no, so we have a very experienced teams who have developed this application and we have a lot of large customers who work with us, so we make sure that nothing is breaking and it's a non-disruptive process. But what we want to do to help that is actually take the data into the Hadoop and Spark and do the processing of it beforehand so that it can be easily accessible, joined with the entire enterprise data. So it is about making the data better that is currently being accessed right now. That's the focus of Hanauora. That's the product that we are releasing to the market. So let's talk about some of the customers. I mean, you mentioned customers. Unlike, we'll take social, people obviously communicating via Facebook, Twitter, and you've got people looking to market that information or whatever. So what are you doing in terms of that maybe those social platforms to extract the data that can be useful for some of your clients? Using Spark and using it. So we about see like about three usage patterns. The one that we always see is with IoT related, Internet of Things, where for example, you have aircraft that has a lot of parts and most of the aircraft parts have sensors. So we can do the predictive maintenance by collecting data from those sensors and understanding the possibility of part failure. But that part list and the maintenance schedule is in the ERP system. So knowing that this sensor is going to fail on this part is not good enough unless you can figure it out what is the order time and what's the maintenance schedule that we can provide for that. So we want to make sure that that type of one use case is fulfilled for that industry. The second type of use case is most of the companies when we launch the products, we have social launch on the social media. But it's very difficult once the launch is done to understand how we're going to take all the traffic and convert to transaction and business. So we want to make sure that we take the data, be able to process it, structure it, combine the data with your customer data and figuring out what kind of offers you can make so that we can capture the interest that customers are having. The third use case that we see now more common is in the research and development environment. We had a lot of research data in the past that wasn't processed efficiently to figure it out which research techniques are most useful for building good product. And right now with big data and exploration abilities, we can analyze that research data better and make actual development and the use of the research much better in building new products. So we see that in manufacturing happening, oil and gas, a lot of industry we find that now that research data that we used to kind of keep it as things, just the log data in the past, now that's getting more and more used to optimize even the research itself and building new innovations. It just seems to be kind of a conflict there in a way like we talked about R&D. Because I'm thinking about if there's any sector of a business or a vertical that is going to be really data centered, right, focus, it's R&D. And yet you're saying there were gaps or maybe just kind of wasted energy, wasted material out there that wasn't being harvested or wasn't being evaluated effectively. And that spark has allowed you to rework that and provide a much more robust series of data points for companies. Is that right? Yeah, absolutely. I think a lot of companies are always concerned in investing in research because the ROI on that research is not very significant. Many times are not easily demonstrable, right? So here, what we are doing is we are providing better tools, not with Spark, but more also to take that data and provide different style of processing of that data so they can get different insights. Because right now with the Spark framework, the value is you can plug in different modules on top of it to enhance that framework. And that's what we are doing for enterprise users is to take the existing framework and actually add more value to it because there are a variety of data, there are a variety of analysis scenarios and everybody from their experience with the customers needs to enrich that environment that Spark provides with more additional processing and the use cases. HANA is this unusual database in that it handles transactions and analytics, but it has very rich analytics, predictive modeling and a whole bunch of other things. Might we see some day Spark as an analytic engine within HANA the way we're starting to see it with some of the new SQL DBMS vendors? You never know what team happens in future. It's a technology that keeps evolving. Right now, as you see with HANA Bora, we have contributed code to the open source. So we understand the value of the open source community and that's what it provides. So all the innovations that we have with Bora are available in open source community for them to take advantage of it. But yes, I think the predictive analytics is a big area, machine learning is big area and depending on where the data is and what type of exploration we want to do, we will provide engines on both platforms whether it's through Bora or through HANA platform. Right now what we see is both being in many competing platforms, allows us a good framework across all the data set to modify the new applications and new analytics. Certainly is, I think, pretty good proof that the open source model is alive and flourishing. In fact, we have 3,500 people here already, testifying to that. And obviously you're doing at SAP also a good testimony to that. I mean, thank you for being with us. We appreciate the time on theCUBE and look forward to the next time down the road. Sure, thank you, thanks for having me here. You bet, all right. Spark Summit 2016 coverage continues here on theCUBE in San Francisco.