 Line from Orlando, Florida, extracting a signal from the noise. It's theCUBE, covering Pentaho World 2015. Now your host, Dave Vellante and George Gilbert. Welcome back to Pentaho World, everybody. This is theCUBE. This is day two of Pentaho World. First year theCUBE has been here, second year of Pentaho World's user conference. Donna Perlich is here, three-time guest on theCUBE now. It's great to see you again, Donna, Vice President of Product Marketing at Pentaho. Awesome, we just came off of Strata two weeks ago. It's like big, big date of month. Been quite a month for us, right? Yeah, so you must be really excited. I mean, anytime you're able to get all your customers together, and you've got a company with such momentum, a new owner like Etachi with a whole new future, it's got to be really exciting for you. Yeah, very exciting. I mean, I think after the keynotes yesterday, just the impact of FINRA, right, on this fascinating organization that's doing amazing things with volumes of data, and we're at the core of it, and just continuing to see that. And then we've got customers, yesterday in my session, talked through a few of our customers who were really early with us five years ago, who started in very small environments and now have scaled out and taken advantage of all these new technologies. And now we've got Etachi behind us, right? So that just pushes everything forward, which is amazing. So Pentaho is a company that some people find hard to understand sometimes. When I first heard about it, somebody said, let's throw away the BI for big data. But in speaking to customers and hearing the keynotes, drilling down into Pentaho, it really is a platform. It's not an out-of-the-box BI tool. So from a product marketing standpoint, how do you describe Pentaho? How do you deal with that challenge when you're talking to customers? Yeah, oh, absolutely. And as I had a product marketing, when you're saying it's really hard to understand, and when I hear, oh, you're just a BI tool, it's just kind of get a little uncomfortable. But yeah, so it's fine because it's really what we came out of. And I think the interesting thing is we were really successful there at a time because it was open source, right? So we were available to organizations that couldn't afford the big giant platforms and from Oracle or whoever. And so Pentaho was an amazing platform. We had the data integration, we had all the visualizations, we had reporting, all of that. And they could embed it and they could get their business going. And then when big data hit, it was sort of like, wow, this is amazing opportunity and you can embed it. And so we kind of shifted the company and went after that and we're very successful. So now how we talked to customers about is we really deliver the data for any analytics that you're going to do. So if you've got to push it out to a dashboard, we can do that. And then anything you want to embed on the front end, we have it, right? So you want to give your customers the analytics and you want to embed it in your own application. It's a fabulous fit for us. But if you want to use another tool, another discovery tool or a predictive tool, we're just delivering that data. And the really important thing, and I've seen it over and over and over, we saw it, Strata is just to be able to do that in a governed fashion. It's hard, it's really hard because it's a big challenge from that data engineering side to get the data to the consumers and still have that governance around it. And that's really where our value is. Yeah, so let's talk about some of the changes. So at Strata, two weeks ago, Hadoop World, we heard a lot of big themes. We heard data in motion slash real time, but more real. You know, we've heard real time for a long time, but now we're starting to see actual use cases, you know, Spark coming in to the floor. We heard a lot about, you know, storage and new database types, but we also heard a lot about complexity. And people struggling with complexity. So talk about some of the changes that you see in the big data space and how Pentaho is addressing them and how you turning them to an advantage. Yeah, yeah, we actually had a session yesterday on what we call future proofing. And somebody said, you know, Pentaho's the heat shield. I don't remember who said it, it might have been Chris, but, you know, great comment because we are kind of the heat shield for big data. And we really see that the biggest problems are number one is, you know, just kind of the data sources and the formats, right? And how do you manage that? We used to just have data that kind of fit nicely, even if we had to create a star scheme or whatever, it would fit nicely into the storage mechanisms we had. And now we have all those existing ones, now all the new storage, you know, Hadoop, we've got NoSQL, got analytic databases, got applications you want to get data out of. So that's a kind of center of complexity. And then you've got all these emerging technologies, right? You've got Storm, Spark and Yarn and a new one every day, it seems like. So, you know, there's opportunity there, right? And then lastly, you know, you've got where do you put the data to work? It's not just about the data lake, right? It's not just about the data warehouse. Most customers have one or several or all of those or eventually will. So we talk to our customers a lot about, you know, the best thing to do is to understand kind of where your business is and what you're trying to accomplish, determine what the data is that you're trying to get at and start from that approach. And when you feel a business pain, look for the right technology. So maybe Spark is the solution, maybe Yarn is the solution, but maybe not. And so you've got to really be tied to the business and we heard this over and over and over yesterday in our session that it's very important to keep that in mind to manage that whole shifting sand world, right? Of all those technologies and data formats. That's a theme we keep hearing about, which is, you know, Hadoop is, Hadoop was always an ecosystem, not a product, but it featured incredible innovation. But the trade-off was some amount of complexity. And that's sort of, both angles are sort of getting better slash worse. Do you find that customers are turning to you more and more to help insulate them from that? And is that more and more of the positioning? Yeah, absolutely. And I think that's why we've been successful. So our advice to them is, you know, have something like Pentaho that's really flexible, open. We can deal with any of those data sources and formats and we have the flexibility as these technologies emerge to look at them in our labs environment in a very open way. A lot of them are, you know, we heard Mike Wilson talk about two new open source projects. We can look at that and pretty easily with the way Pentaho's, you know, architected. And if there's a fit, there's a fit. And if there's a customer that wants to do something interesting, we're really behind that. I think I've talked to you guys before about rich relevance, but they needed yarn. Their business was suffering. We worked with them and then, you know, we ended up supporting it. So along those lines, we've done some survey work and we found that, you know, the sort of the pilot proof of concept Hadoop cluster might be somewhere six to 12 nodes, I think, and take almost four administrators or four FTEs. And so there's a lot of overhead in getting going skill shortage. But with you as a simplifying layer, do you find your customers are farther along in their journey than the typical Hadoop customer? Well, I think what we're finding is because we've seen the adoption of Hadoop move pretty quickly is now we've got customers who are kind of at the beginning there, but we've learned a lot of how to get them started. So they are turning tests for that advice. We've been able to get a lot of that complexity removed with tools that, the graphical tools, right? So you don't have to script everything. And people say, yeah, but we want to, we want to write something in pig because we know how to do it and they're absolutely right and you can do that. But the best part, the resource part and the complexity is, if you have a team like that, you want those people working on the most important, hardest problems, right? You don't want them doing data prep and that's not a good use of their time. So with the tools, you can expand your pool of resources, right? You have a bigger team now because you can give some Java developers, you know, Pentado tools, and they can be doing that part of, you know, writing MapReduce via the tools and those other folks can be working on the harder problems. Don, go ahead, George, and go for it. I was just, because I'm thinking about, you know, we have this sort of capability maturity curve. I think that you might have seen when we did the Big Data New York City event. And it's sort of, it's a function of technical maturity and skills. And on both counts, you make it less demanding on the skills side and more cohesive on the technology side. So I'm learning sort of, what are the leading edge use cases for customers using Pentaho relative just to mainstream Hadoop, you know, without that sort of tooling? Yeah, so I think we've allowed people to extend their use cases, right? So, you know, you might see a very simple data warehouse offload, right? We want to take this data, put it into Hadoop and it's a pretty straightforward process with Pentaho now. And we've worked, you know, as Chris pointed out in our keynotes yesterday, we really work to try to simplify a lot of that, you know, on the front end for our customers. Make that easier. But what ends up happening is then with the flexibility of Pentaho, if you've got another set of consumers that can, you know, are able to access that data, you can start thinking about things like monetizing your data, you know, offering that, offering your data to existing customers, offering new capabilities. If it's a customer 360, maybe you started bringing in one data source, you know, in addition to your standard customer information, but with Pentaho and that flexibility, you can have an evolutionary approach where it's one data source, but we know there's this, you know, we like to talk about it sometimes as 45 degrees at a time, you know, 360, 40. You don't have to take the whole thing, and boil the ocean and take everything at once. And, you know, with Pentaho, you can start there and then, you know, kind of, and it's easy, right? It's easier. Can we take a high-level view of the product portfolio and help people understand what's in there? I mean, we've talked about the end-to-end data pipeline. What comprises that end-to-end data pipeline, again, at a high level? Yeah, so if you think about, you know, the core of, I've got all these data sources here, and you kind of think about an IT world, right? And you've got to ingest that data and process it and cleanse it and all of that, and then you kind of keep moving past the sort of this data engineering, and you get to the area that's really hot right now, this data prep area that's in the middle that's really about bringing the IT closer together with the business, so the business is more self-sufficient, but you still want that governance. We have a lot of sort of tools in the middle, and Chris talked about those yesterday, things like auto model, inline modeling, so that a business analyst doesn't have to ask somebody in IT if they want to edit a data model, right? They're able to do that, they're able to iterate, they're able to publish it, and others can use it in the organization, and then you think about pushing through there, you get to, through that sort of center refining part that we do really well, and if you think about Pentaho data integration sitting across that whole pipeline, right? You're going to get to the other side, and Pentaho has reports and dashboards and analytics visualization, all of that, predictive models, if you have predictive models, you can drop those right into Pentaho data integration in a transformation, and that is now part of that whole data flow, right? So there's a lot of capabilities there, but the real point is that we're bringing those two sides closer together, that's really where I think we're making a lot of headway with that auto model and auto publish, we're kind of closing that gap, and I think why it's, sometimes it's a harder problem for us to discuss this, because we're looking at it in a governed way, we're saying when that data goes from here to here, we're going to know what happened to it, and that's a difficult problem, I don't know if anybody's completely solved it in our industry, but we've done a great job of figuring out how to do that and close it up. There are a lot of tools out there, there's a ton of tools out there, and many seem to focus on, let's get the data in, find the signal in the noise, and then let's pass it over to the guys who do the visualization or who operationalize it in an application. And the core value of end-to-end integration, is it fair to say, is that it makes it easier to do that hand-off, and that there's huge value in that seamless hand-off? Yeah, I would say absolutely, but the interesting thing is the vision, the goal, is there is no hand-off. Yes, no, with the other products, there's a hand-off, and here it's a collaboration. Yeah, it's a collaboration, absolutely, yeah. And if you think about those tools, they inherently start with technology and trying to solve a piece of that technology problem, and then it's just a piece part of that, right? You still have this huge other picture out there, you know, it's sort of like that game where there's a piece of the elephant, you see part of it, and it's like, you realize, oh no, it's a huge elephant, no, who do pun intended, but that's, you know, it's the same idea, right? You gotta have that bigger picture. So can we talk about governance a little bit? So you mentioned, sort of, you described this workflow, this data flow, and you've talked about governance, so there's a data quality piece that you address. What are the components of governance? How do you look at governance? How do you define that? How do customers view it? Yeah, so I think data quality absolutely is a huge piece of it, lineage, so we just added lineage into 6.0, because if you think about that pipeline, lineage is all about, you know, it ended up here, but where did it come from? You know, I gotta be able to know where the data came from and kind of what its journey has been over time. And that's another part of the integration story. Absolutely, and that's part of governance, right? You gotta have lineage, you gotta have data quality. The other piece that we think is really important because we're kind of managing it all the way across to the analytics is, you know, all of the monitoring that organizations have in place, so we added the ability to whatever, you know, it's Nagios or whatever your SNMP monitoring system of choice is, you can now plug that into Pentaho because we know that's important. So there's things like that that just from a, if you think about a pipeline and an enterprise, it is definitely about the data quality and the lineage, but there's also other things that go on in there, users, who's using what, who can see what, that you just have to manage and you gotta have a big view of that administration. And that's another area where we've just really kind of been ahead. And how about security? Let's talk about that a little bit. You got some partners here, I see Webboot here, security company, some others that you partner with. How does security fit into the whole end-to-end data pipeline? Yeah, I mean, I think that for Pentaho, we've done specific things, you know, work on with Kerberos relative to Hadoop where there's security concerns. We've added other security capabilities into that pipeline, but in some ways, you know, managing that pipeline, you have to think about the security overall. But if you have a data warehouse, right, you're going to have security measures on that. And then you think about the cloud, right? You know, I was talking with, I think it was the FINRA guys and we were talking about, you know, it's interesting that financial services folks are putting all this data in the cloud. And I said, you know, how do you guys do that? And they said, oh, well, we, you know, think about the organization they are. Well, we've got a lot of things that we've built. So some of it will come from us as the vendor to manage the pipeline. And then the organizations are obviously going to have their own very specific things that they put in place and that's great. I think for us, if it's open, we'll live within whatever that world is. So we're not trying to solve every security problem, but we know in terms of that pipeline and what we see and what we manage, you know, we're going to work to make that more secure. Well, it is interesting. FINRA, NASDAQ, I can't remember who we asked. We had a discussion about that in theCUBE yesterday and they said, listen, their security is better than ours. So I mean, you know, we're happy with it. We don't see a problem there. It's kind of my words, but I think most companies, you know, Amazon and Google got pretty good security, Microsoft, you know. Right, right. So, you know, you had mentioned like, you guys came along before big data and then you sort of pivoted when you saw it. What a huge opportunity it is. Would it be fair to say that the traditional BI products grew up in an age where the data warehouse was the single trusted repository? And so visualization and reporting and we're natural compliments, but when we have a sort of a data lake and it's not trusted and it's kind of messy and there's all sorts of technology sort of bubbling up, you know, that we did need now an end-to-end product and that that was the opportunity? I think the end-to-end was always an advantage for us, right? Because people liked that, having that data integration platform, especially when we were in a market where they were probably more medium-sized businesses, that they needed all of that in one big, you know, not a huge IT organization, but I think for us, it was really more about, it was hard to apply the same kind of methodologies and thought process to a data warehouse world when big data emerged, right? Because all of a sudden, you know, I was mentioning before, you have all these data types that we haven't dealt with before and some of them are coming fast, they're coming in different formats, the volumes are huge. And then what I think is interesting, you know, I was talking about Edo and Lucky Group and Rich Relevance, those were three companies that have been with us since like 2012. They were the ones who really at first leveraged the ability to take the data from all the new data sources and because they had the capability to bring it from their data warehouse, were able to blend and do really interesting things relative to their business because they were able to bring these two data sources together. So I think for our customers, it was great to have that end-to-end platform to manage the pipeline, but also that blending capability. It was like suddenly, we can capture that data and do it, but oh wow, we can actually blend it with other data. Oh wow, now look at what we can see about our customers, right? We can look at what they were looking at on a website and we have all the everything they purchased the day before, right? That's a amazing competitive advantage for a business standpoint. But some of that integration work and the blending stuff, just having a sort of seamless repository, it was more there in a data warehouse only world, no, as opposed to- Yes, absolutely. And so that's why we talk about places of work, right? And you need something that's capturing, can kind of manage that data across those different places of work because we absolutely know it's not going to be one place, right? We have some customers where it's all in Hadoop and that's great and we can work with them, but in most cases it's, there's going to be a data warehouse. We added the virtualized data sets, right? There's going to be organizations that want to virtualize those data sets and then they want them to go away. They don't want to stage them somewhere, you know? They just want that to be part of the picture. Done around the time, but last question. October 2016, where do you want to be? What should we be looking for? Pardon me? So October 2016, where do you want to be? What should we be looking for? That's the end of the- Where do we want to be? I think mostly it'll be, probably the HDS and the Hitachi investments. I think you're going to see us just in terms of customers just scale in an amazing way and we're going to be able to leverage a lot of the new technologies with all the resources we're going to have. So I just think you're going to see where we are now. I'm bigger and better next year. Great. All right, Donald, thanks for coming on theCUBE. Really appreciate your time. All right. Good to see you again. Thank you. All right, keep right there. We'll be back with our next guest. Right after this, this is theCUBE. We're live from Pentaho World 2015.