 Live from the Fairmont Hotel in San Jose, California, it's theCUBE at Big Data SV 2015. Hey, welcome back everyone. You're watching theCUBE live in Silicon Valley. This is our flagship program. We go out to the events, extract the seeds from the noise. I'm John Furrier with my co-host, Jeff Kelly, Big Data Announcement at Wikibon. Our next guest is Donna Perlich, VP of Product, the Solutions Marketing at Pentaho. Welcome to theCUBE. Thank you. Thanks for coming on, you guys. Big news, obviously the big announcement with the HDS, Hitachi Data Systems and kind of that partnership merging, kind of acquisition and whatnot. Hitachi has known for storage and they got big storage for the big data, more of the big data week. So what's going on? I mean, your world changed significantly, does this as usual? Yeah. How has that announcement shaped what you guys are doing now, if any? Yes. What's going on? Been a lot of excitement for sure. So, you know, for us it's great because we really have built up this big data brand over the last few years and Hitachi's direction is social innovation, internet of things. We're really good at, you know, dealing with big data, machine generated data. So, we're excited for that. It's going to help us scale and we also get to keep our brand, which is nice and we're a fully owned subsidiary. So, it's pretty much business as usual and can't really say much more obviously. Because it hasn't closed yet. It hasn't closed yet. It's going to climb around the corner and usually take some closing time. Exactly. But hasn't changed what you guys are doing, has it been independent? No, in fact, if anything, it's going to be more business as usual just because we're dealing with so many big data customers and early ones that I think we're going to add a lot of expertise in that area and help to attach folks in that direction. What are you expecting this week? Obviously, on the solution side, we're seeing much more expansion discussion around solutions, all right? Big data, analytics, IBM's talking about cognitive in their vision and other people talking about in memory with Spark, a lot of variety. But it all comes back down to customer solutions and you're seeing the open data platform as an accelerant by their narrative. What does all this mean for the marketplace, for the customer? What is the key solutions that are coming forth from all this growth? Yeah, well, you know, the interesting thing is it's definitely a sign of how much disruption has gone on and really how confusing it's been for the last couple of years. We had a lot of early customers who were looking at big data and trying to determine how to get value from it. Some early innovators like EDO Interactive Front, for instance, they do loyalty programs for the credit card companies. And so we were really looking at how do they end up, what are some of the patterns that emerge for these types of companies? And one of the things we realized early relative to what's been happening with the disruption now and the different technologies is they really want to figure out, first of all, how to get value from the big data and then how to sort of protect themselves from this environment, right? And make good decisions and be able to be flexible because this is a very changing world and there's a new technology every day. So our approach has definitely been one of let's figure out what our customers are doing, keep those reference architectures in mind and help them really with what we've learned. So talk about Pentaho's approach. So it's kind of blending the data integration part of the equation with the business intelligence and the visualization. Talk about at a high level why that's important, why Pentaho sees that as not separate kind of workloads, but two things that need to be integrated. Yeah, well it's pretty interesting because we've always sort of had that approach of BI and DI better together, but I think for big data and especially in the lending, it's really important. So in our view, in order to figure out how to have successful analytics on this side, you've got to be able to actually capture that data, blend it and one of the things that's emerged for us is this concept of I get my data into Hadoop for instance and the product we've announced today at Pentaho 5.3, how do I refine that and blend it and then deliver those data sets to the people who need them. And so in our world it's really about that data flow and almost powering those analytics. You've got to be thinking about where does the data come from and how does it arrive at the analytics. The other thing is if you figure out how that data flows, you really usually can identify where somebody's business pain is, right? You know, if the data's stuck somewhere, there's usually a business pain and that's been really helpful. You have this concept of the streamlined data refiner. Talk about that a little bit and that kind of building on what you just mentioned having to understand where the data's coming from in order to bring it through to the other side of the analytics. Exactly, so about two or three years ago we really started to look at what are the patterns that are emerging with our own customers and one that we saw over and over and over again was this concept of customers who had data in Hadoop but then really needed a way to streamline it and provide slices of data to individual consumers. And the streamlined data refinery that we announced back in October 5.2, this is really building on that and what it allows the customer to do is to follow that process of data ingestion into Hadoop and then once you've got that data there really bring that into a high performance analytic database so you think about a refinery, right? Just like any kind of refinery, oil refinery and then once it's staged there to be able to deliver it to end users and we put, there's some really nice capabilities in our front end tools around auto modeling and auto publishing so that that process of bringing the data and the analytics together and the IT and the business, we automate a lot of that so it really simplifies that process. Data factories is a great term that we remember, you met in the first Hadoop world, we did five years ago now, it's huge. So will there be more factories per customer? Is a factory refinery concept the same thing? So is it one monolithic factory? Is it slicing and dicing? Can I create multiples? Yeah, well that's the nice thing about how this refinery's built out because we've connected really our analytic capabilities on the front end with the ingestion and the blending, it becomes a single delivery to whoever needs it so we don't have to have a giant monolithic factory in the middle, it's almost like refining to smaller factories, if you will, to different constituents. So talk a little bit about the market. I mean we saw yesterday the big announcement was around the open data platform, we saw Hortonworks and people that were kind of getting a little tight there and of course you've got Cloudera and you've got Amazon, the interesting market is playing out right now and a company like Pentano where you kind of play kind of above that layer, right? So you're helping, doing some of that ingestion, some of that refining and then the analytics. How do you look at this market? How do you approach things like partnerships, things like tightly integrating with the Hadoop vendors whether it's Cloudera, Hortonworks, I know you recently also have a partnership relationship with AWS for Redshift. How do you kind of look at that layer kind of just beneath what Pentaho does and how are you navigating that with all these kind of quickly developing alliances? Yeah, well that's a really good question and if you think about a lot of the announcements we're really looking at open, right? So we've come out of open source, that's we lived for a long time and it served us well at the beginning with big data because we were able to, we have something called the adaptive big data layer that's been able to allow us to support natively many different Hadoop distributions, no SQL data stores like MongoDB very easily. If you're a customer and today it's Hortonworks tomorrow, it's Cloudera, that's gonna be a simple process for you because we've got this interface there that makes that simple. So that was really the approach and I would say in general that with all the disruption in the market that's really our goal is to give customers choice in this open, pluggable platform that we have really supports that. I mean, we don't see very many environments that don't include a little bit of cloud, a little bit of Hadoop, maybe there's some no SQL, maybe there's an analytic database. So our approach is to leave that open so we can support that environment. So the idea is you build an application using Pentow you can move that to other environments fairly easily and not avoid the dreaded vendor lock-in that we've heard about in the old traditional data management world. Exactly and we found that being able to walk in and not disrupt the infrastructure that's there has been really important and having an open platform allows us to do that. So if you've got an enterprise data warehouse running and that's right in your business we certainly are never gonna say, take all that data and put it somewhere else. That's working really well for you. The legacy stack, they're okay. They can deal with that, but open source coming in with open source that's a big deal. And if you're, then let's say then you want to blend data, right? Well, you need to be able to deal with legacy systems that have very valuable data in them and if you've got data coming into Hadoop that maybe is unstructured, maybe it's machine-generated data, that's really the beauty of it is to be able to blend those together. As the integration has gotten better, sorry Jeff, the integration has come up, it brings up the integration piece. That is a big concern. Is it getting better and better? What are the key drivers for the integration piece? Because that's what you're addressing here is, okay, I've got my legacy, I've got big investment. Just me after that slowly over time move to something more modern, but I have new apps and you have new requirements. I'm going to deploy something fresh. What's the integration challenge? Can you comment on that and kind of talk a little bit? I think number one, the tools have gotten better. Our tools are definitely in a place where for big data we've simplified a lot of that. I mean, it's much more graphical, GUI-based or not, if somebody doesn't have the resources to learn MapReduce, not a problem. They can use Pentaho, we really simplify that, take a lot of time out for many of our customers. So that part of it's gotten a lot easier and then if you think about what we've done with the streamlined data refinery, we've basically put an interface in front of the user where they're able to request and kind of choose specific pieces of an analytic data set that they're interested in, but they're shielded from that really complex transformation behind it. So I think the two things that have gotten better. One is on the developer side, the tools are better. There we've actually optimized ours for big data sources, which is really a whole nother world of data sources to deal with. And then on the front end for us, closing that loop between the IT and the business with something like a streamlined data refinery where the user actually has the ability to run, I mean, kicking off an ETL transformation, but they don't know it because they're behind a nice user interface that they're just kind of choosing what if it's a financial services analyst, right? They're maybe saying, I need to look at this stock symbol and NASDAQ actually was one of our customers we mentioned today. If I'm looking at a certain stock symbol and I need to look at a certain date, period of time, I'm changing that up, kicking that off through an interface. Transformation behind it could be really complex. It could have Hadoop in it. It could have an enterprise data warehouse. Simple, right? It justifies all that. And I think that's really where we're seeing things going is that closing that gap between. Abstract away that complexity, John, so that giving the business user more power. Exactly. Get done what they need to get done without having to go to IT or data warehouse administrator and say, hey, build me this report, model this data, deliver this answer to this question kind of thing. Yeah, and in many cases, that just doesn't really work anymore, right? Well, the time is, by the time you get the answer and how fast business moves today, it's just not plausible, really. So talk a little bit about open source in general. So, I mean, a few years ago, the idea of kind of open source software and enterprise was a little scary for a lot of enterprises. Now it's just becoming, this is the way business is done. That's where the innovation is happening. That's where we're seeing ecosystems being built. How have you viewed that transition? Have you seen a dramatic change in the way enterprises view open source software? Yeah, well, it's interesting because coming out of being an open source BI company, we for a long time had to overcome that. And when we got into the big data world, it was really having to prove ourselves out. And we have, now it's often a requirement, right? We have a requirement for open source that we get called in, which is great. I mean, I think the biggest thing for us in terms of open in this kind of a market is just the ability to innovate. And so when things, you know, you mentioned Spark, right? And a year and a half ago it was Storm, and we've done a lot of things with Yarn. When those types of technologies emerge, it really allows us to have our community out there banging on things, trying things out, creating plugins. We see what's viable, but we have to have a, you know, closed proprietary engineering organization that's massive in order to test out those technologies. I wanted to ask you about the comment we heard earlier on theCUBE about, Hadoop's now a boardroom conversation. So, which they're saying, you know, I'm like, come on. Show me one boardroom, Hadoop. And they said, yeah, in private equity-based companies that modern stuff that's all born in the cloud, certainly Hadoop's a known word. But big data certainly is a boardroom conversation. What have you seen out there in terms of those kinds of conversations? And what are they like? I mean, are they specific to the geekiness of it? Is it more higher level macro, or what have you found? Yeah, that's a really good question. So we, you know, if we think about, I mentioned the blueprints that we've, you know, identified a couple of years ago. And what we really found was, it was a different conversation with customers because we would go in and talk about what we had seen and really ask them, what are you trying, you know, where is this pain? Because there's some pain that's going on, right? I mean, you think about NASDAQ, it's like 10 billion rows of data a day. That's a problem, right? We've got to figure out how to manage that. And that's the effects part of your business, right? And also when you ask them about real time, I mean, one little problem, if they miss a minute. Exactly, and you know, compliance and regulation and all of these things, right? And so that's really the conversation. Because we, you know, we do end up having pretty technical conversations. Often times we'll be with a CTO. When we've put out these blueprints that really say, well, so we, this is what we're saying at Enterprise Data Warehouse. You might have a tube, you might have no SQL. There's like this light bulb that goes off and they're like, that's exactly it. That's exactly what we have. That's our problem. We've got this in an Enterprise Data Warehouse, but we know we need these other data sources that are coming in that are new. We've got to make this all work together. And that's the conversation. So if I can connect the dots then, hmm, if I had like a storage converge infrastructure platform that could move very fast towards trading day, getting faster time, you know, these kinds of things are what people are talking about. Is that kind of the boardroom frame? Exactly, there's another customer we have where it's really about, they'd have to have analysts spend, you know, 30 days trying to get to data sets. They're looking for fraud, right? If with Extreme Mind Data Refiner, if we can say, you know what, we can take that time down to 30 minutes because now you've got an interface in front of your analyst. They don't need to know about ETL transformations nor do you want to spend. But if they can do that, and then guess what? When somebody comes in and says, now you've got to recreate that, you know, because of some regulation, they can go back and generate that data set. That's huge, right? That's transformative. I mean, I think that's the cool thing about where we are is it's transformative. The other thing I want to ask you is kind of more of a personal question around the industry. You know, we were talking earlier about the UNIX generation. And, you know, Michael also wrote a blog post about the whole open data platform. And, you know, we've lived through and knowing your background, you know, being in the business, going back in the 80s, that was the systems business, you know, in Berkeley, you know, where, you know, it all started really pretty much and we all coded in those system days. But what's different now? I mean, back then it was pretty obvious. Sun, IBM, HP had the big UNIX systems, right? But is it different now? I mean, so it's hard to, I mean, I'm just having a hard time personally making that leap of facing the UNIX comparisons the same. So I'm just, I just don't see it the same way. It's a different landscape back then. Now it's more open. You've got the cloud. You have things like Amazon. It's not an apples to apples anymore. So is it possible for these new things to happen? And can you make that UNIX comparison now today? Yeah, I mean, that was interesting. I was just looking, reading up a little bit about that before we started. And I think there is a comparison, right? Because it was a challenge, right? Organizations are like, we can't deal with this, right? Vendors can't sell anything because customers are like, well, which one do I pick? Which flavor? Tomorrow it's another one. So that part of it to me, I think, is similar. That there does have to be some sort of standards that start to emerge. However, I do think that if you really look at it, it's different because we're connecting people and things. And data doesn't sit in one place anymore, right? So you start thinking about everything that's coming out of your phone and cars that have sensors in them. It's a bigger challenge because this data is so different and large. And we're working off a paradigm of storing it and then analyzing it. So the question is, that's not really, is that really where we're going to be in five or 10 years? So I think it's different in that sense, is that we're not really solving a problem that there's one technology that's going to fix it. Or vendors, there's no real winner in the data world. Data's everywhere, right? Yeah, I mean, that's where one thing that we do see is the embedding on our side, is having that embedded ability is huge because you start thinking about the apps and where things are processed. Embedded data or embedded? Embedded analytics. Embedded analytics is... Like in memory or in the processor, in the silicon? Yeah, well, for instance, you can embed Pentaho, right? You can have put analytics... Well, bring it right to the application. Yeah, in the application. People do their work every day, whereas back when I was covering the BI space very specifically back in my old days as a journalist, what I kept hearing was we don't want to bring, we don't, as a user, I don't want to go to, I have to go to a new application out of where I do most of my work. Bring the analytics into the application now. So the BI space has been trying to do that for quite a long time. Where Pentaho has an advantage there is with the open nature of your platform to integrate that into applications. Yeah, and almost every... I mean, the streamlined data refinery, we talked about NASDAQ, those are all embedding on the front end. So how they're pushing those analytics out is Pentaho, but generally they're embedded in some other application because that's just as, to your point, we don't want to have to go to another tool, right? And a lot of times it's not... We were talking yesterday about Halliburton who's one of our customers and they have Pentaho embedded in like 12 different applications. One of them goes out to guys sitting on an oil rig. Well, they're not going to bring up a BI tool. They need to have their dashboard and see their data every day. But I think that's becoming more and more important because of that sort of point of impact. Where do I have to make that decision, that business decision, right? It's different. And it's about moving in beyond just, here's the analytics embedded in the application but the recommendation embedded in the application. Or what should I do next? I don't even want to take the time as a frontline worker. I don't want to necessarily look at a pretty visual. What do I need to do next to do my job? Exactly. While still giving them some level of autonomy to question that and dig into that if they want to. So some interesting stuff. So my last question would be around the market and how you see this evolving. So we talked a little bit earlier briefly about the pending acquisition. We've seen the acquisition pace pick up in this space. We've seen Jasper's office and other company that was acquired often gets mentioned with Pentaho will have open source BI routes. We saw revolution analytics get acquired by Microsoft. So we're starting to see some acquisitions happening. And so you'll probably be down on the floor at Hadoop World later today or tomorrow and there should be a lot of startups out there, a lot of different companies. You know, generally how markets work they're not all gonna make it. Some are gonna have great exits, some are not. How do you see this evolving? Do you think we're on the verge of kind of a significant wave of consolidation? Do you think that's good for the market? I think that the BI specifically you can extend that to what's happening with big data. It's just, it's in disruption, right? So there's always gonna be vendors who make it, vendors who don't and there's gonna be some consolidation, et cetera. I think that essentially those that and probably why you're seeing some of the open companies you know, being acquired is because that's really where things are going, right? I mean, I think that's the value is that ability to be open. And if you're gonna be part of any kind of established infrastructure, new infrastructure, platform, whatever, having open technology is just gonna make it a lot easier. So I gotta ask you, what do you think about like some of the other approaches? I see Oracle's been very successful lately. I see that's where Sun went to. Cloudera has a relationship with Oracle as well as Intel. So they're straddling the lines between big companies and also open source communities. And I'm talking about Michael's blog post. So this balance between the incumbents who aren't necessarily in a strong position other than their market position with customers and the upstarts. So how do you view that having that perspective of coming through the multiple cycles of innovation we've been through? I mean, it's kind of a new era. How do you look at that? And how would you share with someone coming into the industry? Say, hey, back in the day, we used to walk with your feet through the snow. You know, I mean, it's almost- You had informed expense marks, you know? You had ingress data. I was like, you know, command line. So again, this is a new generation coming in. It's some amazing things happening. Open source has certainly changed it. But is there going to be that kind of, I mean, Oracle's red step up? But like, they're doing well. I mean, the earnings are awesome. I mean, I just saw John Fowler easily kicking butt over there at the inside Oracle with their engineered system. So interesting approach. Do customers care if the performance is there? So- Yeah, I mean, I think customers have to care, right? They're running a lot of their business on those technologies. Let's be honest. When I was talking about, you know, the sort of the established vendors and when we looked at what was happening with these blueprints and go back to that, you can't assume that any of that's going to go away, right? It's what's running the world's business today. Are those established infrastructures of data warehouses? And now we're in this era of something needs to change because the data has changed, the volumes have changed, the speed of business has changed. And there'll be new technologies that emerge. But I think that our approach really, as Pentaho is that, and again, that's the beauty of being open is those will just be complimentary for a long time, right? Until, because those systems that companies have invested millions of dollars in, have to keep running their business for some period of time. And still be available for this new category of software models where open source needs to fit in. Exactly, exactly. And some sort of Lego block or- Yeah, I mean, I think the plug and play idea, extensibility is what's really gonna be important as we move forward in this space for sure. Donna, well, thanks for coming on. I really appreciate it. What do you expect for the week this week? Just to share with us quickly, what do you expect to unfold this week at the Hadoop world? What do you expect to see? Any fireworks, mellow? Yeah, I think it's going to be exciting. I think there's going to be a lot of conversations, like the ones we just had. I think we're going to talk a lot about open source this week, probably more than we have in many years. And we're excited. We have a new product coming out. We just had an acquisition. So lots of good, exciting conversations. You got a spring in your step. You guys looking good. Congratulations on all the success. Donna, thanks for joining us. This is theCUBE. We'll be right back with our next guest after the short break. This is theCUBE. We'll be right back.