 Live from New York, extracting the signal from the noise. It's theCUBE, covering Spark Summit East, brought to you by Spark Summit. Now your hosts, Jeff Frick and George Gilbert. Jeff, we're the last time we had it. Hey, welcome back everybody. You are watching theCUBE live from Midtown Manhattan at Spark Summit East. We're excited to be here. Spark is the latest, greatest, coolest thing in data science and big data. So we had to come out, get the smartest people we could find, extract the signal from the noise, get their insight and share it with you, our audience. So we're really excited to be joined by Peter Lee, who we just saw. It seems like only yesterday, man. Yeah, only yesterday. CEO of RapidMiner, we're at your event, RapidMiner of Wisdom, welcome back. Great, great to see you again, Jeff. Absolutely, so before we jump in, give us a quick update. I don't know if you had some good updates in the last, only good what? Weeks, I think, since we last saw it. We always have great updates. First of all, not injured. That's right. That's right, the kickboxing. Gotta watch out for that. A little bloody the last time, so getting better, slightly better, or hurting in other places we can't see. We don't want to go there. Right, that's right. No, so I'd say we're tremendously excited. It's been rapid adoption of the RapidMiner 7 release. It's a release that has a lot of meaning in the Spark community. We now have capability to execute or to push down something like 1,000 data prep methods in 250 machine learning libraries into the machine learning models into Spark. So there's some great stuff on the product side. I think Gartner has just released the magic quadrant for advanced analytics, predictive analytics platforms. We're really one of three key commercial offerings that are in the leader's quadrant and we've moved very sharply in advance of our legacy incumbent competition. So some great stuff since then. Thanks for asking. Yeah, congratulations. Let's jump in a little bit, because we're here at Spark now, we're not at wisdom, so talked a little bit about it, but where is really the key tie between RapidMiner and Spark and how do you see Spark kind of changing the game for RapidMiner? Yeah, a couple of key messages. I'd say thematically, we couldn't be more excited about Spark. Spark is something like 100 times performance improvement over alternative approaches. So that's tremendously exciting if you're a data scientist, citizen data scientist, if you're a business owner trying to do really perform compute intensive machine learning, modeling and predictive analytics modeling. So that's just tremendous. I'd say a second key area for us well beyond the performance, and of course we take advantage, and I'm sure we'll get into some discussion of what we take advantage of in terms of support, but RapidMiner really expands the universe of people that can take advantage of this just transformational paradigm shift in technology. Spark today is still, as I'm sure we can see here at this summit, a very developer centric effort. It's kind of a bit lower down in the weeds still, certainly relative to where a lot of the customer conversations I'm having take place. Lines of code in the keynotes, that's really all you need to know. Right, right, a bit deeper than where I am for sure. So if you think about that lines of code in the keynote, RapidMiner is a code optional platform. So for deep coders, we have native support of R and Python scripts, which we can execute in Spark. So Spark R, PySpark, that's fantastic. If you are not really developer centric, you can still use RapidMiner to help exploit all of the capabilities of Spark, but through a visual interface. So much more expanding from data scientists to really the citizen data scientists that really do want to take advantage of all that Spark has to offer. So let me try and unpack a couple of those things because they're very significant. So R traditionally has been very popular or open source statistical programming language, but for the most part, as far as I understand, like only really terror data and maybe Oracle sort of on their own, made it distributed. So Spark R, I assume, is an implementation that allows you to program in R as if you were on a single machine and it works across the cluster, scales out. That's right, that's right. So we have direct integration. We announced in fall last year that we would support developers who are able to take advantage of their code of that distributed computing framework in Spark. And so that type of effort is pioneering. We're the first predictive analytics platform to do so. And I think kind of even if I go beyond that and I think about it as a journey, we've been supporting native execution in Hadoop since 2011. So Spark to us is really a very natural evolution of a much broader migration adoption towards that Hadoop, big data-centric approaches, modern approaches to predictive analytics problems. Now, just to clarify, if you can parallelize R, if that's a verb, does that do all the libraries inherit that capability? Is that what makes them all elastic? We have support today for two of the libraries that are affiliated with the Spark platforms, the Machine Learning Lib, ML Lib, and as well as H2O. So with RapidMiner, you can take advantage of all of those cool offerings that are developed there and we expect to be able to announce shortly some of some more special capabilities, in particular with the H2O library. And so you could use R within Spark to talk to ML Lib or H2O and those libraries are distributed. Yeah, and I don't want to get, and I think George, yeah, I kind of don't want to get too deep and stray at it. I'd say kind of the higher level messages in terms of less how we make the pasta, but more the delicious tasting meal afterwards, is I think you could think of RapidMiner, I think, as a platform that easily enables both coders who want to get really under the cover in the weeds and I'm not the right person, my partner and technical founder, Ingo, would really be able to do justice. That's why I don't want to stray into areas where he's better served than me. But I'd say that the coders level, I think there's tremendous optionality to really take advantage of everything Spark has in multiple levels. I think the other part, which is really exciting for the Spark platform itself, is there's a whole universe of people who really understand that Spark represents a paradigm shift in the big data era. A hundred times performance improvement, let's say, relative to alternative approaches, and then how do you open up and make that much more accessible to a broader universe of analysts that are really, when you look at the click tech and the tableau wave of data visualization and self-service analytics that are really moving beyond, I think, explanation of what's happened in the past and wanting to embrace new capabilities to really plot complex associations and make predictive insights and take action into what will happen. So I'd say we're a platform that can govern, even for the coders, the seamless integration and orchestration across the full life cycle of predictive analytics. Data prep, modeling and validation, and ultimately operationalization, where you deploy. And as you said, the citizen data scientist, right? Which is really where everybody wants to go to open this. It's growing five times faster, Jeff. We talked about it a few weeks ago. Growing five times faster than the data scientist. You should just be citizen, right? I mean, ultimately, you don't want it to have to be a data scientist to have the opportunity to execute better decisions based on a data-driven process. That's right, I think that's right. I think it's still, even with today and data visualization analytics, I think we do have that aspiration to push more deeply into the citizen, as you say. Let's call it the Microsoft Excel user, the three to 400 million people that can use Excel. But I think a word of caution is we still want our analytics processes, whether they're visual analytics today or I'm going to say predictive analytics, a little bit more complicated to happen within a governed framework. I don't think we want to have a situation where someone happens to come at a particular insight on their own in one corner of the organization and then take action when someone else may have a very different view. So I think analytics, predictive analytics, all of that still has to take place within some kind of a governed framework so that decisions aren't just precariously arrived at. I think with predictive analytics you have another challenge, which is that you still need a degree of, let's call it safety rails, so that we make sure that we're drawing the right, we're making sure that we're validating the right type of predictive insights. What are those safety rails? Because we keep hearing from different vendors that IT has to give guardrails if they're giving sort of power of self-service in different domains. What are those? Yeah, so I don't think IT has to give the guardrails in our use case. I can't speak to other vendors, but I would say that the type of guardrails that we believe we support in this massive movement, I think Jeff talked about citizen, I call it massive movement towards data democratization, data visualization, engagement, really the analytics revolution in this big data era. I'd say what we think about to the end user in our use case is really guided, both self-service and guided workflows. So when you think about that predictive analytics life cycle, data prep, model and validate, and then ultimately you operationalize, we help you within the platform with automated recommendations for data prep, model selection and parameter options. If you think about the world of big data and Spark, let's take it back to Spark, 250 machine learning algorithms. We have 95% of the world's no machine learning algorithms built in, but any particular algorithm may or may not be useful against any particular data set. It's bewildering, how do you choose? So I think helping our data scientists and citizen data scientists really begin to get more quickly to which one of these is most likely to be of best benefit is a real help. Even within a particular algorithm, you can have a dozen or so parameters that require tuning. So I think some form of guided workflow in that regard can really jump start. And I think, yes, that's right. Or we call it wisdom of the crowds. And I think being the number one open source predictive analytics platform, there's something like a quarter million active users of RapidMiner who have had tremendous experience in applying these machine learning algorithms against different data sets and can really aid and accelerate that life cycle. So let's talk about some of the problems they're solving, some of the applications they're building. Is there a customer journey in terms of how much of the application surface area they get comfortable with at first? Is there commonality in terms of the first applications they start building? What a great question. We actually had a chance following our wisdom conference to actually look back on the year. And we actually did a little bit of a deep dive into the use cases that propelled us in our growth. We doubled the year over year. So we thank Jeff and his team for coming out and talking to us about that. So as I think about the actual experience that we've seen and of course our plans for the future, a couple of key use cases do emerge. First of all, I'd say, George, that they fall into three simple, high value ROI kind of distillations. First and foremost, by a long shot, revenue-based ROI. So customer and marketing analytics, cross-sell, up-sell, customer segmentation, really the know-your-customer journey. I'd say that's by far and away the most important. Is this being built on a customer 360, data lake kind of implementation? Yeah, we see tremendous, of course, we're in the big data era. I wouldn't say it's restricted to that, but absolutely modern data science approaches in the know-your-customer use case, people are really trying to move the needle to get a much better, a much more granular view of the persona. And then what can I predict about this persona? And then I'd say much more importantly, just because you can make a prediction doesn't mean you can take advantage of it. So insight without action has no value, and that's something we talked about earlier. So I would say if you could answer your question on use cases and how we tie rapid miner and spark to those use cases, if you think about the vast trove of customer data in multiple industries, financial services, telco, manufacturing, go down the list, you now are, we've now arrived at a point in time where using say Hadoop or other big data technologies, we're storing much more information to really round out the true persona. Customer profiling, propensity to acquire certain customer intent, the customer journey, what they bought and didn't buy. So so much more who they hang out with in social networks. So I just say kind of higher level, we see rapid miner, number one use case is really taking advantage of technologies like Spark to really move the needle and get much more granular on customer insight. And then to take that customer insight and you ask about applications and actually embed predictive insights into either human decisions or automated actions. And so those are really cool trends. Human decisions is we would say something like Tableau or CliqueTech say, you might take a predictive insight, deploy it into a visual interface and then really have a human say, wow, that's pretty cool. Let's do something about this. Automated actions might be something where you have enough confidence in the predictive insight that you want to quickly just make a mobile offer. A shopper's in the store, she has a certain persona, you have a certain inventory, hey, let's send something out immediately and let her know that this is available at such and such a discount or so forth. So I think there's a variety of use cases but predictive analytics, last year customer and revenue ROI, number one. Number two and three kind of tied would be either risk based ROI or operational efficiency, you know, kind of in a dead heat, let's call it. Operational efficiency, things like predictive maintenance, things like, you know, operation scheduling, you know, optimization scenarios around assets, people, activities. And are these being embedded in existing enterprise apps or are they being stitched together in new workflows? Great question, both. We would say, first and foremost, the number one use case is deploying to human action, human decisions, so most likely a visualization capability. Secondly, we would say embedding into enterprise applications. And so those could be third party or homegrown applications. The next piece would be some form of embedding in workflows, as you've pointed out. Can you elaborate when it's embedded in a visualization? Is it sort of a gooey, I don't know, dashboard or just from a business objects? Yeah, so we announced that wisdom, they all used the Tableau announcement as a great example of that. It was in fact the number one most talked about development just a few weeks ago when we were with theCUBE. But we are a predictive analytics platform, we're not a data visualization platform. So we really take our predictive insights and we can take advantage of, in this case Tableau, by being able to seamlessly integrate our capabilities in their visual platform and then allow their end users to be able to get all kinds of filtering and interaction or engagement, visual engagement with those predictive insights. So I'd say in that case of embedding, you're really looking at kind of an interactive, a two way interaction, it's not just the static display of results, but Tableau of course as we know is a leader in data visualization and interaction, exploratory data discovery. So as they kind of probe that predictive insight, there's a two way reciprocal way in which their visual interface can then affect the predictive insight and vice versa. So I think that's something that we're very excited about, we had a customer speak about it a couple of weeks ago and there's much more lined up. I think moving to other types of use cases, if you're talking about let's use risk and compliance in financial services and you thought about the vast trove of e-surveillance that has to happen in our financial services customers, you're talking about a volume of communication where it's very important for insider trading or other type of compliance rules to really sort out potentially harmful conversations. And so when you think about the rate of false positives in that regard, that's really not something where the use case is going to really help you very much to deploy all of that into a visual, let's call it, human decision interface. You're more likely than not to begin triaging and really using our predictive analytics platform to isolate the most likely troubling cases and really triage that and really automate, let's call it the investigative case management work that would need to happen. In that case, we would see our platform being used to actually take advantage of a predictive insight and then immediately embed or route that predictive insight into other third-party or homegrown tools. There are leaders in this space like nice systems that do a lot of anti-money-landering or financial crimes. So third-party systems or homegrown systems, we see both of them, absolutely. But those are different levels of volume and throughput. Well, Peter, unfortunately, we are out of time but we are glad that you're able to stop by here in our trips to New York City. I assume you come to New York City a lot, George. So again, thanks for taking a few minutes out of your busy day. Peter Leet from Rapid Biner. I'm Jeff Frick with George Gilbert. You're watching theCUBE. We are live in Midtown Manhattan at Spark Summit East. Thanks for watching. We'll be back with our next guest after this short break.