 Live from Orlando, Florida, extracting a signal from the noise. It's theCUBE covering Pentaho World 2015. Now your host, Dave Vellante and George Gilbert. Welcome back to Pentaho World everybody. This is theCUBE. We're here live, day two at Pentaho World. theCUBE goes out to the events. We extract the signal from the noise. Chuck Yarbo is here. He's an evangelist in marketing. A guy who understands technology but also communicates it to customers. Chuck, it's great to have you on theCUBE. Thanks for coming on. Well, thanks for inviting me. I appreciate it. So you're welcome. So tell us about the conversations that you're having with customers. I mean, you spend a lot of time speaking, getting in front of people. What's the conversation like that you're having with customers? So it's interesting. So I travel a fair amount and we do kind of a thing we call the city tours where we go and we talk about our blueprints, which is an interesting topic. But the blueprints we talk about are the big data blueprints. And it's basically ways that people can get real value out of big data efforts. So to answer your question, what I talk to them about is what are they doing in big data? What are they thinking about? And it's interesting. We kind of live in the big data world. There's still a lot of people that are just starting to engage, right? Just getting started. So let's break that down. You got the leading edge. They're probably dragging you in. Here's what we're doing. How can you help? You kind of got the fat metal. Yeah. It's trying to figure out, all right, what do we do? Maybe they're doing some stuff. They're probably not getting the return that they'd like just yet. They're getting the stuff to work, but they're not having a huge business impact. And then you get the guys that are sort of just coming on board. Simple buckets, obviously. But how would you describe each of those in terms of the way in which they use your product? So the first group, those early adopters, right, those were the guys that they knew early on that they needed to do something, right? And I'm not talking about early. I'm talking about our early adopters. They were people that were using Pentaho, but they knew that there was something about this big data technology. They were watching the Twitters and the Facebooks, right? They were watching what they were doing. They're saying, hey, there's something there. We need to use some of this technology, right? And so they were sort of those, you know, people that kind of like pioneers, right? They weren't the ones that invented it, but they saw it. They recognized there was an opportunity and they made it work, right? So we worked with them early on, right, coming out of open source. It made sense, right? They were driving down those areas. So those guys, you know, they had a will. In fact, if you get a chance on our website, we have some videos where we actually went and interviewed some of these people. We call them the mavericks of big data. And really, they're not talking about Pentaho. They're talking about what they did and what they were thinking, like why they went through really a lot of pain because it wasn't easy, right? So pretty interesting insights to hear what some of them have said and what they brought up. When you get to that middle, you know, those are the guys that are like, okay, they've kind of been watching, you know, and now they recognize that they have to participate, right? And there's lots of companies that are well into doing big data efforts. And again, it goes back to our blueprints. So just to give you a little explanation of the blueprints, this idea of, think of them as design patterns. So it's like, you know, how do we, how can we leverage this stuff, right? It's not, you know, the technology is different. There's new things popping up all the time. How do we get the best benefit quickest and easiest way? So really clear, repeatable design patterns. Things like data warehouse optimization. Something we refer to as the data refinery, which is an interesting and very pragmatic way of doing big data analytics. Something that customers are really adopting. And then another one that I tend to speak about a lot is the customer 360. Really, it doesn't have to be customer 360. It's this idea of just understanding everything about something, right? It could be anything. It could be a product, but usually we talk about it in terms of a customer 360, all the touch points, right? So these blueprints are really kind of the learnings that we got from our internal teams, from our customers, those early adopters we talked about, what they learned, what they created. And, you know, sometimes the pain that they went through and make that available to our customers so that they can follow those recipes, not necessarily a reference architecture, but a blueprint that helps them understand how to build the foundation that they're trying to do. So let's break those down a little bit. Data warehouse optimized, data refinery, which I love, and a 360 degree view. So the objectives of the data warehouse optimized, when we talk to practitioners in the Wikibon community, the vast majority say that the two most critical pieces of their big data analytics initiatives are the existing legacy data warehouse and data integration. So what is data warehouse optimized all about? But by the way, they also say, we're sucking a bunch of data, you know, or a bunch of dollars out of the traditional data warehouse into Hadoop and Big Data. So there's that interesting dynamic too. I call it the big sucking sound. What is data warehouse optimized and how does it fit into that context that I just laid out? Yeah, so the way we look at it is, and basically what we've heard from our customers, right, probably the same things you've been hearing. You have a data warehouse. What happens to that data warehouse over time? It gets big, right? It gets bigger, and then it gets even bigger. And that becomes a challenge. It becomes unwieldy, right? Hard to manage. It can also become extremely expensive. So what are your options, right? So the design pattern that we have for data warehouse optimization is pretty simple. It's you have your existing data warehouse infrastructure. We're not suggesting you get rid of that, not today. But what we're suggesting is implement a big data platform, something like Hadoop. Most of our customers are implementing Hadoop for that purpose. Take the data that you're not necessarily using as often. So maybe if you think about the temperature of data, you've got your hot data that customers are constantly accessing. Maybe that stays in the data warehouse. Push some of that cooler data, the cold data out, whether that's time or whatever. You have to look at the use case, push that data into Hadoop, and then make it available through Hadoop. So what you're doing is those size constraints become easier to manage. You can bring that data warehouse down in size. But what you need is you need a way to be able to not only push down, but then also to bring some of that back. So the sucking sound can also go both ways. So you need to be able to communicate both ways. That's where Pentaho plays really well, is with our data integration and being able to push that out of the warehouse into the big data store and then leverage that data in a multiple way. Okay, so it's not a one-way trip to the bid bucket because you have to get value out of that. So you've got to be able to bring it back. Let's talk about data refinery. Mike Walteri yesterday in his keynote said, he hates the bromide, the idiom, data is the new oil. He said he wanted to use sun. I happen to like it, except I always would point out the data has to be refined. That's why I like data is the new oil because it's kind of like that. It's just like, okay, great, you got a bunch of muck coming out of the ground, but what do you do with it? Talk about data refinery. What's that all about? Yeah, so the refinery, you know, the refinery concept, frankly, has been around for a long time. It's not necessarily new, but there's, there's not a lot of people talking about it. Pentaho talks about it a lot. I think IBM talks about it. From my perspective, it's all about leveraging the technology that's there. So think about that data warehouse optimization. As you begin to, by the way, that data warehouse optimization, great place to start. Like if you're just getting into big data, awesome place to start because it's really well self-contained. And then it delivers that platform and a good place to grow from. So now the data refinery, you think about it, you've got that big data, you know, you got Hadoop in place. You start bringing your data in there. And as that data, you know, as you get what you need in there, and it can be sizable, the challenge is running visualizations or reports or analytics directly on top of that. So to your point, you need to still blend multiple data sets, right? There's data coming from all over the place, whether it's internal applications, external applications, devices, all this data funneling together, then we need a way to basically enrich, you know, refine that data, enrich it with other data to make a specific data set that's going to be of value. We have our blueprint for the data refinery. So the data refinery is really a design pattern. If you think about design patterns in the sense of, you know, when relational databases were brand new, it was all about getting data in, right? It was order entry systems, right? Get the data in as quick as possible. And then somebody said, hey, now let's run some reports. And it was like, whoa, you know, how do we do that? It's a lot harder. So in a way it's, you know, we've got Hadoop, we're putting data in, and now it's like, well, how do we get the value back out? How do we analyze all that information? The refinery enables you to process at scale and put that refined data set that is highly valuable, ready for interactive analytics to make that available to a user immediately. In what we refer to as the streamlined data refinery, it's really about automation. About being able to process that data at scale. So do the refinery, the enrichment of that data, all the appropriate blending, right, with governance, making sure that data is absolutely right. And then automatically delivering that data to a user, which means in the design pattern, we recommend you take that and you push it into an analytic engine for high performance analytics, something like HP Vertica could be a redshift in the cloud, you know, an analytic database. And at that same time that that data is made available, we auto model an analytic model to support that for the interactive analysis by an end user. So it's completely an automated process. The business user is the one that actually kicks off the process, they say, hey, I need this information, right? Blend multiple data sets together all right now, right, complex integrations, and refine it, make it available. So this leads me to the third, which is the 360 degree view and you kind of need the first two. I want to make a comment. I'm always skeptical when I hear 360 degree view because I've heard it for decades from the decision support to the EDW to the BI. We're now hearing it again with big data and I want to ask you what gives you confidence that this big data Hadoop world will actually live up to the promise that was not lived up to in previous decades but it seems like you've got to optimize the existing data warehouse, you've got to refine that data, you've got to get the data quality and the governance in as prerequisites to this 360 degree view. So Chuck, talk about that and what gives you confidence that you can actually see that vision through? Great question. So 360 view, we've got a number of customers that are doing really interesting things around their customers. You mentioned Mike Waltieri, spoke yesterday, Forrester, they talk about this being the age of the customer. So much of what we need to do, our customers need to do and what we've heard from our customers is it's not just about generating revenue but it's improving the customer experience. So anything you can do to understand all the touch points with your customers makes a huge difference. What's different is that in the past we really couldn't manage the speed at which data is coming in and the places that data comes in. So things are happening fast and furious in wildly different formats and so today's technology enables you to really bring that together, simplify the process, make it available and do some interesting things and then perform advanced analytics. So my question is, again, Hadoop has a lot of choices in terms of how to put the pieces together to do this. You have this layer heat shield that insulates, to some extent, the complexity. So how much further are your customers along in this journey than the mainstream Hadoop customers? When I say mainstream, I don't mean the fat middle, I sort of mean the preponderance that Hadoop customers are here. Where are your guys? Great question. So I would say because of the way we sort of insulate between the Pentaho layer and that big data layer, it gives them, and you mentioned, George, that there's multiple ways to make things happen, right? To process data in Hadoop. One of the ways is through Pentaho data integration with our Pentaho MapReduce. So a visual way of building a MapReduce job. It's easier, right? It's not a code generator. It's not a code-driven process. So the advantage is that people with maybe the kind of skills that they've had in the past around data warehousing, ETL-type skills, can better understand how to make that shift into things like Hadoop as opposed to writing, you know, pig scripts and javascripts and java codes. When they make that transition, are they finding it easier to transition to Pentaho or easier to transition to pig scripts? So, yeah. How much faster can they go with you than with just raw Hadoop? Yeah, you mentioned the heat shield, right? So that was kind of brought up yesterday. I kind of like that. It's this idea of future-proofing, right? So we literally have a, what we call an adaptive big data layer. Insulates the developer from the underlying complexities of these different, say, Hadoop distributions, right? So as you, you know, depending on where your skills come from, the Pentaho MapReduce enables you to simplify the process. But as well, you can, there's multiple ways within the Pentaho environment to deploy on Hadoop. So under Yarn, you have different options, right? So there's different ways you can do things. You have to look at the process. But I think where your question was, does somebody that does pig script, do they get more benefit out of PDI, or do they get... Well, maybe because they can use a GUI tool. Yeah, exactly, yeah. That's my question. And in our world, you get a lot of developers that'll say, hey, you know what, I can just do this in my own code, right? And that's probably true. They can. But then, you know, it's a little more challenging to maintain. You know, who's going to follow that up, right? So there's all those complexities. What I've found is really competent Java programmers, pig script developers, have adopted Pentaho because of its, you know, frankly, it is a little faster, easier to use. And then the ability to make it more maintainable is a huge benefit to, particularly to an enterprise where that's becoming really important. Chuck, one of the things you focus on in your efforts is obviously differentiating from the competition. How do you differentiate from the competition? Maybe lay that out for us. Yeah, well, so it would be... our ability to manage and automate the entire analytic pipeline. So if you think of how data moves, and I think, at least the way I used to think about data was like something that we would take from a source and we would put it somewhere. It would just stop. That would be it. You know, we'd run reports off of that, right? Data warehouse concept. Today it's about that data never stops, right? Maybe a copy of the data goes somewhere, right? Could go to a data lake, could go to a data warehouse, could go to a refinery, could go a multitude of different places, right? Based on the use case. So being able to manage that and then deliver the result, right? So that data continues to move for what purpose, for analytics, right? To be able to... Maybe it's a visualization. Maybe it's an analytic that actually goes to an advanced algorithm for some other process. Maybe it goes to a machine, right? So you have to look at the way data moves and how it moves differently. I think the differentiator for Pentaho is that we manage that entire process. We don't always do every little thing. Like we can cleanse data as part of the flow, but we have partners who do things like address verification, right? That's not, you know, data quality management kind of things that maybe we don't do. So it's that orchestration platform to be able to support that all the way through and deliver the analytics in an embedded fashion. With many of our customers, enterprise customers are moving towards that embedded experience for the end user, right? So it's about preparing data, delivering it with governance, and being able to experience it in the right way. All right, Chuck, long wait there. Excellent. No, it's great. Thanks very much for coming to theCUBE, sharing your insights, what customers are telling you. You got a great angle on these things. I really appreciate your time. Well, thanks for letting me be here. Pleasure. Keep right there, everybody. George and I will be back with our next guest right after this. We're live. This is theCUBE from Pentaho World 2015 in Orlando. We'll be right back.