 from New York, extracting the signal from the noise. It's theCUBE, covering RapidMiner Wisdom 2016, brought to you by RapidMiner. Now, your hosts, Dave Vellante and Jeff Frick. Welcome to New York City, everybody. This is Dave Vellante with Jeff Frick, and we're here in New York City at the RapidMiner Wisdom Conference 2016. This is our first event of 2016. We had the deep dive with Mark Hurd last week. Actually, we just released it this week, so you saw that with John Furrier. This is an event, a conference for data scientists, like serious alpha geeks at this show. RapidMiner is a company that does end-to-end predictive analytics. Really, a lot of people in the space, of course, trying to simplify analytics. RapidMiner is a leader there. They've got an open source platform that they monetize at the back end. Just recently raised, Jeff, you were telling me $16 million, I guess from Nokia. So we heard this morning from their CEO, Peter Lee, who's coming on later on this afternoon, and Ingo Mercewa, who's their founder, very entertaining speaker, and we also heard from Professor Weisman from UMass Boston, David Weisman, talking about his perspectives on data. We're hearing a lot about data and misinterpreting data. You always hear that at these data science conferences obstructed by Ingo put up a slide and he showed a spike in 1993 of alien sightings and he said, he asked the audience, why is that? And of course the answer was, because that's the year that the X-Files came out, so people started seeing, showed some other data. And so of course aliens love X-Files. That was the interpretation, tongue in cheek. The point being that the data can tell a story, but sometimes it's misinterpreted and doesn't really tell the right story. The other thing is he showed a spike in alien sightings on July 4th. Of course the conclusion there is aliens love Americans. So that was good fun. And so I think RapidMiner really is in the business of helping people try to make sense out of data and try to improve the accuracy of those conclusions. And so Jeff, first event this year, great to see you in New York. You know, thanks for flying out. Yeah, it's good to see you, Dave, first time. And I'm just kind of struck by, last year we kicked off 2015 in New York City. You were here at the IBM System Z launching, talking about mainframes and bringing old school mainframes into the modern era of a lot of the new applications. Now we're at a really small conference, really heavily focused on data science, but still kind of continuing on with this beat as this transformation, this digital transformation continues within our industry. So it's exciting to be here. It kind of shows the breadth and depth of the things that we covered. And we're looking forward to another great year. The thing on the data science that always strikes me, is always the causality versus correlation. And the one that was referenced earlier, is height is a great predictor of reading skills, of course, because it ties to age. But what goes into that is even though maybe it's not direct causality, if it is a predictor, is causation still okay if it's a predictor for whatever you're trying to achieve? I think that's kind of an interesting thing I want to follow up, because we know correlation is such a big deal. The other thing that I find really interesting is progression from data science. This is a hardcore data scientist show, but we talked a lot last year, and we'll talk today about the citizen data scientists. How do you realize the dream of getting the information to the business analysts or the people that are making decisions? And out of what traditionally was that kind of the hallowed halls of the PhDs and the heavy duty lifting guys. At the other end of the spectrum, now we've got more and more machine learning. I'm fascinated every day by what Google throws up on my phone and their predictive things about where I am, suddenly it'll tell me how much time it's going to take me to go to work or where I parked my car. I never asked it to tell me where I parked my car. It'll tell me you just parked your car here and there's a map so you can find it when you finish. Are you on your way home? Yes. Yes, so we've always had kind of the data scientists in the middle, there's a real vision to get that power downstream to the business analyst and then now we've got the computer and the machine learning on the other side. It's an interesting kind of mesh of where this is all headed. Well, it's interesting you brought up the IBM Z event that we did almost exactly a year ago at Jazz at Lincoln Center, awesome event. It's like the other end of the spectrum, right? The big giant mainframes. But the interesting theme of that event was bringing analytics and transaction systems together. And when we talk about building so-called systems of intelligence, a concept that George Gilbert has been promoting on Wikibon and certainly we borrowed that from Jeffrey Moore, but we've advanced that notion of systems of intelligence to include bringing transaction and analytics systems together to directly affect the business outcome. And one of the things that we're seeing is and one of the big questions that practitioners have or we have really that we talk to practitioners about is are these Greenfield apps, are these predictive solutions and machine learning that involve machine learning, are they Greenfield new, clean apps or are you sort of blending them with existing applications? And what's the right model for that? I mean, to the extent that you can bring your transaction systems and your systems of record together with that predictive learning, you are going to get in ostensibly anyway, better data and be able to act on that. Or in fact, if you have Greenfield apps that are somewhat disconnected from those systems of record, you've got some challenges there. The other thing that we're looking at is how much time is actually spent, and we heard this today, cleaning the data and essentially a cleaning the data to fit the algorithm versus bending the algorithm to fit the data. And I'm interested in what rapid miner customers are doing in that regard. We're going to talk to several about that. We heard today about different organizational models. Professor Weisman was talking about that, to centralized versus decentralized. And so, and how to make those decisions about where to invest? Because it seems, based on what I heard this morning, Jeff, there really are two vectors. One is the business case, the ROI. And the other is the technical feasibility and the risks associated with that. And so I'm curious as to, who's sort of driving the decision? Is it the, everybody's going to say it's the business, but technical feasibility is fundamental to this and the lack of data science skills is obviously a challenge for many organizations. So how are those decisions being made? Are people picking off the easy ones first that may or may not deliver the greatest ROI but to try to develop that machine learning data science skillset? Or are they going after the hard problems and the big ROI? Yeah, he hit on a bunch of things. Another one of the things he talked about, the magic happens when you start with the business objectives, which seems patently obvious, but it's just not the way. We heard a lot of these shows and people are talking about the technology and the tools and the fun and people like to get into the Hadoop and Spark and Real Time and in-memory and all this stuff, as opposed to really talking about, what are those unmeasurable business objectives? And now when we can perform against that objective, now everybody kind of buys in and you can see the ROI. But the other thing I still find fascinating is kind of the purity. And he talked about the difference between data science and MIT, very different points of view, very different fields. And at the end of the day, there's a lot of ways you can present the data. He went through a bunch of examples to really support whatever it is that you're trying to get across. And while behind the scenes, somebody who really knows what they're talking about, talking about ads, where is the data? What's the correlation? Where are you getting these percentages? But unfortunately, I don't know that that's necessarily always brought to the forefront if you're really trying to make your case. So it's one thing internally you're driving to an ROI decision. But so often we see numbers thrown around just randomly where you don't really know what the basis is. You don't really know. So I think there's some really kind of ethical issues and really presentation issues that are going to impact the way that this is kind of developed and embraced by the marketplace. Well, Weissman showed a four quadrant chart. Speaking of sort of dissonance in the data, he showed that four quadrant chart and it was on the vertical axis was technical prowess and the horizontal axis was management vision and lower right was conservative, lower left was beginners, upper left was fashionistas, upper right was digerati. Okay, you want to be in the upper right. And he drew a correlation between those in the digerati quadrant and profitability. And I tweeted out, whoa, whoa, wait a minute. You know, I would like to, and he's coming on later, I understand, but I'd like to understand what the, what was that causality? It's like plus 26% over the mean, right? It's a big number. But you think about how many factors, you know, go into profitability. Right. You know, I've talked to the sales guys, the majority of the products. I mean, so many factors. You know, can we make that correlation? So that was sort of interesting. I would like to understand that, that data a little bit better. Is there really a correlation between, you know, excellent data science and profitability? I feel like it's early days and or a lot of the companies that are doing this stuff, like Uber, aren't profitable. So, but certainly because of Google and Facebook are. So, that'll be an interesting discussion. Yeah, it should be great days. We're looking forward to it. We're going to get some really big minds. And as we like to say, we like to go out to events, get the smart people we can find, get them on and really share the insight. So it's a great way to kick off 2016. Well, it definitely fits in our wheelhouse, you know, one of our pillars, big data cloud infrastructure, software led infrastructure and big data is a really key theme for us in 2016. We'll be at Spark Summit East. You know, obviously kicking off here, Spark Summit East. I believe we'll be doing Spark Summit West. We've got big data, SV and NYC. We've got the Hadoop Summit. We'll be, I believe at Hadoop Summit in Dublin this year. So, that's going to be really exciting. So, all right, so big day today. A lot of practitioners coming on, some technologists going to geek out with the data scientists. So, keep it right there, everybody. Jeff and I will be right back after these words. We're live here at the Aventi Hotel in New York City at Rapid Minor Wisdom. This is theCUBE. We'll be right back.