 Live from Orlando, Florida. Extracting a signal from the noise. It's the Cube. Covering Pentaho World 2015. Now your host, Dave Vellante and George Gilbert. Hi buddy, we're back. This is the Cube. We're live at Pentaho World 2015 from Orlando. We're at the Renaissance at SeaWorld. George and I are going to wrap up the day, day one. So George, big keynotes this morning. Got the high-level view from Pentaho, Quentin Gallivan. Gave us the big picture. Then we drilled into products very deeply. Had some practitioners, particularly Finra, talking about how they're using this platform. And then we also got a dose of Mike Olson who came in the Cube talking about their partnership and the trends in Hadoop. Forrester Analysts gave a great talk talking about some of their data. And then of course we had the punctuation with Hitachi, putting it all into context how Pentaho is a lever to inject into Hitachi's internet of things, big data strategy, what they call social innovation. So a lot of stuff, meaty content day. What do you make of what's happening here at Pentaho World? Well they're like you're saying there's a whole bunch of really interesting themes, which is if there was some lingering perception that Pentaho was a business intelligence tool plus plus turbo charge plus plus, it is now clearly a platform that hosts a set of tools tightly integrated that stretches all the way from ingesting and blending and integrating data all the way to transforming it and analyzing it and presenting it. For all those capabilities being able to orchestrate the activities of specialized tools that might do that as well. And that is a rare balancing act. It's extremely difficult in the software industry to go from either a platform into a tool or from a tool to a platform. Mostly because you alienate your partners, you risk alienating your partners, and the business model is different. When you're a tool or an application, you sell to someone who's buying sort of economic value. When you're a platform, you're really selling first to partners who are going to turn it into something that realizes that value. And to straddle that as they appear to be successfully is no small feat. Well, and it's very clear you would agree they're become a platform company. Who else has this capability? I've used Tamer as an example and solving a narrow problem. But what I see from Pentaho is far more comprehensive. How about IBM? IBM seems like a series of what were bespoke tools that they could put together through services. And there's, I think, efforts to bring those tools together. But they seem to be a company who has that capability. Who else is platform fit into that? Who's the layout the horse is on the track for? Okay, there's a lot out there. And this market landscape sort of needs some categorization. And just to take one example, we have Tableau that's done a great job in visualization. Grew up really sort of in the web era more than the mobile era. In the sense that it's got a user interface that is suited towards a mouse and a trackpad, for instance. Really rich for visualization. But then you would have it fed by another tool, say, maybe like Alteryx, which does the data integration and preparation for it to be presented. So, you know, you've got focusing on two different halves. Then there's another forward-looking tool like Zoom Data, which is very, very rich on the presentation and analysis end. And unlike most traditional BI tools, it doesn't go through this narrow ODBC or JDBC interface to get to a database. It has this spark interface. And it can ask for huge amounts of data and render them progressively as they return. So it's not like you're waiting. So what about platform plays? Is IBM a competitive platform play? What about guys like Syncsort and Informatica who are doing forms of data integration? Syncsort, in my understanding, having actually worked with them fairly closely, is that they do a very, very good job of not only integrating and blending and sorting the data and preparing it for analysis, but working very hard on sort of future-proofing that. So you're not drawn into all the complexities of Hadoop. And other vendors are going to have to do that, too. But as far as the end-to-end, we do most of these activities very well, and the value is in the integration, and we also allow you to plug in. I don't know yet who else does that. Well, so as I was saying, it feels like IBM's got one of each. Yes, but they're not integrated in a certain way. Right. IBM does not have that integrated platform yet. From my understanding, what they do really, really well is they have a set of sort of business capabilities that are analytic frameworks and a services organization that can put those together into solutions. If you look at it in the Jeffrey Moore, you know, adoption curve, what they do is it's in the services-heavy product-light part of the curve, which is early. And what we have to see is can they turn that into a more repeatable product-rich services-light mix where it becomes more like a packaged application. So a couple of weeks ago, we were sort of joking. It was like Survey Week. Merv from Gartner did a survey. Data Bricks had a survey. The guys at AtScale did a survey. You had a survey. What did your survey tell you? What are some of the tentpole conclusions and findings and takeaways from that survey? Okay, so it's easy to get drawn down into the weeds when you have tons of answers and cross tabs. And for a day or two, we looked at the 500 different cross tabs that we could perform on it and we were like, okay, where do we go from here? But the reality is, as you say, the tentpole takeaways are, we hear from all the vendors about triple digit or near triple digit growth, revenue growth in their big data analytics solutions. When we talk to folks on the ground, we find that, for instance, customers who have pilots and proofs of concepts that have converted into production apps, this year it's 41% reporting at least one production app. And that's up 10% for Hadoop. And that's up 10% from 18 months ago. So 40% of the practitioners that we talked to in a random survey have Hadoop in production. Up from 30% roughly last year. And the point about that is that's good steady progress, but it's not triple digit. So it's with anything in a business where you're selling licenses at this rate and you're implementing them at this rate, the Delta is inventory that's piling up in the enterprise. And now the numbers are not huge. I mean, when we did this, we went through this same movie with ERP and Internet infrastructure selling to enterprises in the mid to late 90s, the numbers got bigger there before we hit the wall. But, you know, it's not a sustainable trajectory. At some point, the attitude of fear of missing out becomes, well, let's be more cautious because maybe there's an easier to consume solution on the public cloud. Or maybe we need to wait till we grow the skill sets. What else? What else did you learn from that survey? Another one was that we found respondents viewed Spark relative to Hadoop more favorably by a factor of, well, by a number of 90% first for speed. And somewhere in the 60s or 70s, I believe it was for simplicity. So a 90% of the people we talked to that were using Spark felt that it was faster. 60% said it was easier. Easier. Okay, so it's faster. We kind of all know that in memory versus serious. But easier, still not easy enough. Right. Because it's not fully baked. I mean, because we were limited in the amount of questions we could ask, what didn't get asked was, so what doesn't really work yet with Spark? And as the Hadoop guys rightly point out, it doesn't have the same security model or the rich security model. It doesn't have the data lineage. It doesn't have all the things that an enterprise needs, you know, to make the thing work in production. It has all the things that a Goldman Sachs could work with, or a Zynga, perhaps, or a e-commerce company. Those things, those companies are much more self-reliant. Well, it's interesting to see FINRA and NASDAQ both fairly sophisticated, but both using the cloud heavily, leveraging the cloud, they say for agility, no question about it, but it also simplifies their world, doesn't it? Yes, so as you pointed out actually earlier on the broadcast, we have, it was about 30% to 40% of users saying the public cloud versions of Hadoop were simpler to consume and an equal amount saying Spark was easier to consume than on-premise Hadoop. So, you know, as much as Hadoop is sort of the center of conversation today, there are two competitors. One is Spark and one is cloud-native services. And, you know, they may not be direct competitors, but they are substitutes and we have to keep an eye on them. So what else do you want to share with us from that survey? Let's see. Just in terms of the results of big data analytic projects, again, the theme is slow, steady progress. This is, if you're following along on our Figure 3 in the report, 18 months ago, just under 41% said we've realized full value from financial investment, now we're up to just under 45%. Again, it's good. Not a huge uptake. But it's slow, steady, you know, ERP, the first SAP introduced R3 in about 1992 or 93 and it sort of crested about 2000, just shy of 2000, I'll tell you, it was really late 1998, but then it took another decade of consolidation to make sure that, you know, sort of the helter-skelter implementations are consolidated. In other words, this stuff doesn't happen overnight. There's only so much software that an enterprise can consume. So other than SAP, who obviously won the battle, ERP, who else won on the buyer side, on the practitioner side? Frankly, the people who bought PeopleSoft and Oracle and JD Edwards, they were able to consolidate and sort of build out sort of an integrated infrastructure. I'd say where the buyers did not win as much as they thought they would, where the buyers didn't win as much as they thought they would, was that it took a lot longer to implement than they assumed when they signed up. Let me give you just a couple of figures that may be relevant here. It came out at the crest of the ERP wave, right when Y2K was sort of slamming the brakes on it, that the per-user cost of an ERP license was about 4,000. But when you added up the cost of administration, the cost of supporting each of those users, it was $1,300 per user per month, not per year. And we experienced the same thing with PCs when we discovered they were 6,000 a year to support. So do you expect with big data that the practitioners are going to be the big winners, the guys applying big data? It seems like maybe 40% are realizing the full value, but I want to see that data split by the IT guys versus the business guys. I can almost guarantee the IT guys, this is what we saw last year, are going to have a much higher success rate. Because they measure successes. Hey, it works. Right. We did see that preliminarily. We have some cleaning to do on the data. But yes, there was a 10 or 20 point difference between the IT guys and the business guys. How about cloud? Where does cloud fit in this? Let me... So as I recall, we had, I don't know, there was a question there about cloud adoption. A huge percentage, I want to say 70%, we're doing public cloud. Yes. Almost 40% are using Hadoop in the public cloud and 36% are using the native services. Meaning by native, we mean like Amazon Kinesis and Redshift and DynamoDB. So big data tools. Yes, big data tools. So 70% are using some form of big data analytics in the public cloud. In the cloud, which means that there is hybrid, there is demand for hybrid deployments because we know there's a fair amount on private cloud. And what that should tell us is, there is infrastructure being put in place to make it easy to move, maybe not entire workloads discreetly, but to move data when you need to. And that means you cannot assume that just because you've sold a bunch of large clusters to Goldman Sachs that they're going to stay there. Great. All right, George. Well, we got a wrap. Last thoughts on the event today, some of the guests we had. Any just initial takeaways, conclusions? From the point of view of Pitaho interacting with Hadoop, I would say that all the new projects that are coming along within the Hadoop ecosystem have now made it so that Hadoop might be open source, but it's no longer a single ecosystem from which we see three semi-equivalent distributions. Every vendor now is essentially forking the project. And it's up to vendors like Pitaho to try and put some sort of abstraction layer that simplifies it. And I guess the risk to the Hadoop vendors is that they become like the proprietary Unix flavors became in the mid to late 90s. They thought that Unix would take over the world, but they actually, in trying to differentiate themselves, they created an opening for Linux and for Windows as Windows matured. All right, George, great day. Thanks so much for hosting with me. We'll be back tomorrow, full day. Tomorrow's going to be deep dive on Pitaho. We're going to get the executives on. We've got a couple more customers. Really great event. About 600 people here, half, a little under half for a technologist. The other side of business, guys, it's always great to have that mix. So thanks for watching, everybody. Check out siliconangle.com for all the news from this event. Check out wikibon.com. That's where you'll find George's recent research note. There's also a ton of stuff up there on Dell EMC. Check out crowdchat.net to see what's happening. We'll be here all day tomorrow. We've got an event as well at Grace Hopper, Women in Tech, John Furrier, and Jeff Frick are down there. So thanks for watching, everybody. We'll see you tomorrow.