 Live from Orlando, Florida, extracting a signal from the noise. It's theCUBE, covering Pentaho World 2015. Now your host, Dave Vellante and George Gilbert. The 600 customers and partners here and obviously a few Pentaho folks. Pentaho is a company that solves the problem of data integration and blending data to operationalize analytics. And it's a company that's been around for 11 years. So they started about the year before Doug Cutting and his colleagues put forth the first idea of Hadoop into the ecosystem. And so the timing was actually very good for Pentaho because what's happened since then in the last decade, if you think about it, you and I are going to talk about this, a huge amount of data has surfaced. And in that decade, data went from being a problem that had to be managed, a liability that could get you sued for a smoking gun email or some other type of evidentiary issue that could take you to court and cost you hundreds of millions of dollars to one that is now an advantage, a business advantage, people driving analytics, driving machine learning, driving real time, trying to affect business outcomes, what we at Wikibon call systems of intelligence, some of the work of George that you've done in that area. And the interesting thing about Pentaho is it's not an out-of-the-box BI tool. It's a robust platform that includes governance and a lot of capabilities and tooling around blending data, a lot of customization capabilities, some relatively lightweight visualization tools, a lot of integrations with other partners, and they've built out that platform and hardened that platform over the last decade. And in my experience, George, it takes about a decade to really harden the platform. When you look at Pentaho and you compare that with some of the other tooling that's out there, it's much deeper, much more robust, and has a lot more stickiness within the customer base. The other thing is very strong affinity with cloud. We heard customers like FINRA today talking about how they're extensively leveraging AWS as a platform, how they could not have done this with a bunch of heterogeneous stovepipes. And the other thing I'll mention is, Pentaho is a company that just was recently acquired by Hitachi. Now, Hitachi is this giant $80 billion conglomerate. Ironically, the combination of Dell and EMC will be about an $80 billion conglomerate. The difference being that Hitachi is much more diversified. Hitachi obviously competes historically Hitachi data systems with EMC and hardware. It's interesting to note how Hitachi is diversifying away from hardware, something that EMC has done a little bit obviously with VMware, but largely has focused on infrastructure hardware with its acquisitions. Hitachi now diversifying, leveraging a company like Pentaho to drive its internet of things and its big data strategy, George. So that's kind of my overall summary. A lot of energy here, a lot of hardcore Pentaho practitioners. What's your take on what you saw this morning with the keynotes and what's happening in this world of big data? Well, what's interesting is that we're evolving to a model that's very different from what we saw in terms of tools for traditional data warehouses. Business intelligence tools then, or the tools we had, they were either, for the most part, ETL tools which helped you get data into the data warehouse, or they were the business intelligence tools which helped you visualize and report on that data. And those were kind of the two big categories. And we had some niches with statistical modeling and things like that. What's different here is that we've got a new data platform in Hadoop that is really an ecosystem. And the ecosystem is about mixing and matching data processing capabilities, something that we didn't have in the data warehouse world. So the mix and match could be something as simple as the extra transform load. It could also include the sort of summary and aggregation that you would expect from a SQL database. It could include also the predictive and prescriptive analytics. It could include drawing in data from many different sources. The key point here is, and there are other analytic capabilities as well, including machine learning and graph processing. The main point is it's a mix and match world. That gives great flexibility, but at some cost. The cost is complexity. And the role that Pentaho is filling is to put a layer above that mix and match flexibility and complexity and to start to hide that from customers. And they're not trying to fill in every piece of functionality. They put functionality that is very strong in each of the areas, but then they're beginning to open it up for others to plug in. So let's unpack the keynotes a little bit. We heard from Quentin Gallivan, who's the CEO of Pentaho. Chris Zeichan, who's the Chief Products Officer. FINRA gave a great talk on how they are ingesting different types of data sources, blending them with Pentaho, running them through Amazon with S3 and EMR, and then delivering self-service analytics to the business users what Pentaho refers to as embedded analytics. We heard a talk from a forester analyst talking about the different types of analytics. So he did a very good job breaking those down. Mike Olson picking up on his themes from a couple of weeks ago at Strata and Hadoop World about making Hadoop disappear. I say it's more invisible. It's not really disappearing, but it's becoming invisible. And Kevin Eggleston from Hitachi talking about where the fit is within Hitachi. So I thought that was very interesting. Some of the highlights from Quentin Gallivan, there'll be five billion people, 50 billion things, so he really struck an Internet of Things chord. He also said big data's getting cloudy. Now, I tweeted out, I totally agree. So in our recent big data survey, 70% of the organizations we talked to that are doing big data analytics are using the public cloud. I thought that was an astounding figure. We had some excellent examples today of AWS. Google actually, with Microsoft, was number one and two basically tied with Amazon sort of a close third. But it's clear that the public cloud guys, and you've talked about this, George, are building out end-to-end data pipeline capabilities incorporating systems like Pentaho, giving you access to them in order to simplify big data analytics and deliver it as a service. That is a key trend that we're seeing. I couldn't stress that more, which is the cloud vendors are trying to marry the mix and match flexibility with the simplicity that came from the traditional data warehouse where the entire software platform was created by a single vendor and all the data was sort of curated by a single set of staff. Those native services, it actually turned out in our data, I think that it was half were using just on-premises Hadoop. Of all the people who were using Hadoop, half were using it just on-premise and half were also using either cloud-native or Hadoop on the cloud. But that was a really shockingly large number because it says that that's hybrid scenario which has only just been possible in the last really year or so is really taking off. We also heard this morning, Mike Olson talked a lot about Spark. We're going to have Mike Olson on later today. We're going to ask him about Spark, we're going to ask him about Kudu. A lot of discussions in the community about does Spark replace Hadoop, our survey showed that a large percentage, almost 100% of the people that we talked to in our survey said they're going to take workloads and move what they normally would have placed on Hadoop using MapReduce and HDFS into Spark. And then of course the data stack survey that showed that a huge percentage, of course they were... Day Bricks, yeah. The data Bricks, data Bricks survey showed a huge percentage of people doing Spark with no Hadoop. About 40%, I think was the number. Now of course that was very much weighted to their customers. And then of course Cloudera bringing out Kudu. A lot of people think it's a replacement for HDFS and H-based. That it's complimentary. Of course Mike Olson discussed today. It's really not ready, it's not hard. And so of course they're going to have that positioning. But essentially what's your take on Spark and the move toward data in motion in real time? Well, it's fashionable, especially among us analysts, to set up a battle royale. But it's not entirely. Basically Spark, think of it as the next gen MapReduce. That was for many years the core... For Hadoop 1.0, that was the core execution engine. Anything you wanted to do on data, you did on MapReduce. If you wanted it to do SQL, you did it SQL on MapReduce and we called it Hive. So now, because hardware is different, we can rely more on memory because the folks at Berkeley did a sort of a better job in creating an execution framework. We have something much, much faster and more expressive. So all those workloads that used to depend on just MapReduce are sort of migrating to Spark. And when I say workloads, I don't mean necessarily all the things that are in production. I mean the sort of the new workloads or the hives and the pigs and things like that are being ported so that they run on Spark instead of MapReduce. But just last point, they all still have to fit into the... For customers who've deployed Hadoop, there's management and security and data lineage sort of part of the ecosystem that keeps it enterprise ready and that's not going anywhere. So Spark would run in that context for people who have Hadoop. For people who don't have Hadoop, that's a little more of a jump ball. They could use Hadoop as a... Spark is a service from Databricks, for instance. So we're going to be impacting this today really trying to understand how organizations are operationalizing analytics. We heard today that, I thought it was great, Hadoop has gone from the sort of awkward teenage stage into the hipster with the beard stage so that's where we are today. I'm still waiting for that transformation. So you're pretty hip, George. You're going to get a beard. So we're going to be talking to customers today. A number of customers are coming on today. Fenra being one of them. Mike Olson is coming on. We've got the analyst perspective from Forester. We've got the Hitachi perspective and what that means for IoT. So stay with us. This is theCUBE, SiliconANGLE's flagship program. We're here at Pentaho World 2015. The hashtag is P-World 15. So definitely tweet us, join the conversation. This is theCUBE right back after this work.