 Live from Orlando, Florida, extracting a signal from the noise. It's theCUBE, covering Pentaho World 2015. Now your host, Dave Vellante and George Gilbert. This is George Gilbert, I'm co-hosting with Dave Vellante. We're at Pentaho World 2015. We're here live collecting the signal from the noise. And we're very honored to have as our guest, Will Gorman, who's VP of Labs at Pentaho. Will, welcome. Well thank you George. Great to be here. So tell us a little, I mean all of us techies can imagine what it means to be VP of Labs, but fill in the holes for us, give us some of the color. Absolutely, so Pentaho Labs is the innovation arm of Pentaho. We focus on emerging technologies and use cases and build prototypes, examples to demonstrate to customers and to ourselves what may take hold in the market, so we work really closely with engineering and product management on helping to find that roadmap for innovation. So this would be things where if product management is worried about, okay, what do we need to get out in the next release, you're focused a little further beyond how to, you know, what's bubbling up that will be relevant. Absolutely, and we'll be the first to evaluate a new technology, whether it's our community customers that bring that to our attention or the market at large that we recognize, hey, there's this new technology we need to see, is it something we need to integrate with? Or a new algorithm or a new approach to data analytics. Would it also extend to customers bumping into problems or anticipating customers bumping into use cases that they don't yet see the limits of and product management being focused a little in their term, you know, where you have to see, okay, what's the next leap that's going to get us into a broader set? Absolutely, so as an example, we work very closely with our big data customers who are pushing the envelope around Spark and SQL on Hadoop to figure out what are the right technology investments that they can make and that we can make and how does Pentaho fit into that architecture? Okay, great example, because, you know, Spark is very current right now, our actually latest Just Out survey shows that of customers who have both Hadoop and Spark, their choice, the reason for choosing Spark is 91% higher performance and 60% greater simplicity. He's of use for the developers, absolutely. So when you see customers who are pushing the envelope with big data and they're interested in Spark, how would you meld those two with the current, you know, product line? Absolutely, so in Pentaho 5.4, we actually, as part of labs transition into the roadmap, we released orchestration for Spark. So if you already have a team of developers building Spark jobs, you can incorporate that into your data pipeline with Pentaho. Where we're headed though, is a much deeper integration with Pentaho and Spark. So taking the ability to visually describe a transformation and execute within Spark. And that's where we're really headed. So utilizing the Spark engine, RDDs and the great operations that you have in Spark, and allow you to visually design and build those experiences. So in other words, it would be the Pentaho GUI environment as a development tool and Spark is the invisible engine underneath. Absolutely, and we've done this before with MapReduce, so we have, what's known as Visual MapReduce, it's been around for, it's been out for over five years at Pentaho, and it's a great technology. But as you know, Spark is starting to replace a lot of the MapReduce use cases. Well, who are your peeps? Who are my peeps? Yeah, developers are they? Oh, okay, great. Well James was just up here on stage, James is a peep, right? So we, I work very closely with the developers within the big data space. So partner, so whether it's Cloudera Engineering, Vertica, Hortonworks, I work closely with the engineering teams in those groups to make sure that the technologies can align and work together. Also, within Pentaho, my team consists of Dr. Mark Hall, founder, one of the creators of WECA and maintainer of WECA, as well as Matt Caster is the founder and creator of Kettle. So that's my team, that's who we are. So the developer world is pretty jazzed about containers, what's going on with containers. Well, we're jazzed about containers to our engineering team, loves Docker, they love the agility of being able to quickly deploy any version of Pentaho and test and validate their changes and as they evolve our architecture. And we're also seeing early customers adopt Docker, but there are still limitations, right? There are security concerns and those are things that are being worked out as a community in the open. And Pentaho definitely sees Docker. It was earlier this year, we met with our strategic advisory board and Docker was their number one item on their agenda. They wanted to make sure that it was in our future. And a number of our developers have already blogged about how would you integrate Pentaho with Docker? How do you deploy Pentaho with Docker? So there's some great stuff out there early, but. So what do you tell them? What do I tell them? I think Docker is great, but it's emerging, right? So similar to Spark and a lot of these new technologies, depending on your relative, your conservatism with technology, you have to decide is this ready for primetime yet? And you have to test, right? That's in the open because there's so many great technologies. You have to explore and test it before you put it into production. But we do have customers that have put Pentaho and Docker into production. We had some great talks yesterday around Docker on the edge. So running Pentaho, lightweight, kettle engine on Raspberry Pi and with Docker. And the deployment is just so much easier, right? So it's a great technology. What are some of the, you know, we were just talking about going beyond what sort of the state of the art is right now in terms of sort of the analytic data pipeline, something our customers tell us about and that we've sort of distilled is richer set of analytics, lower latency. In other words, faster time to decision. Absolutely. Is that, are those the two sort of direct sense customers are pushing you in? The customers and the industry too, the new use cases. IOT as an example, as we heard about in the key notes this week, IOT is moving you into closer to real time. So streaming analytics, as well as, I mean, just the fact that the data sizes are so large, the need for algorithms for machine learning to analyze that data versus the traditional OLAP and manual experiences that we had in the past. And so how does that translate into what you guys are looking at in labs and then how will that translate into the product line? Absolutely. So one example is Pentaho's data science pack, which was born in the labs initially as Weka and our integration. And right now we're brewing up Python integration as well. So if you have Python developers building algorithms and you incorporate those into your data pipelines, you can do that through Pentaho. Okay, and just at the risk of dropping in the weeds a little bit, the internet of things, the volume of data and having to do the machine learning on it, should we take it to mean that you don't have time to store OLAP and then sort of learn what's in there? Is it that you have to learn in real time? Well, I think it's a mix of both. So there's always going to be historical analysis that's going to be critical. And as the Lambda architecture demonstrates, right, you need that historical and real time perspective. Remind for our audience at home, you know, that don't try this at home crowd. The Lambda architecture is taking algorithms that you build for real time and historical and combining those together to solve big data problems. And we see, you can build real time use cases with Spark streaming and then you can do historical analysis with Spark as a developer, for instance. But the BI tools, the higher level tools for business users, they're not there yet, right? So they're evolving into that, but we're still working towards that ease of use, right? That ability to make it easy for folks. So, if I hear you right, it sounds like in the plumbing, we know what we need to do in terms of we need to be able to take the real time or near real time data and we need to take the stuff in the lake that's history. And we need to sort of take lessons from both in the form of models. But in terms of translating that up into the user interface of what someone sees, whether it's WECA or whatever. Or visualization. Whatever visualization. I assume that's part of your charter because if we're working in the plumbing, at some point the user has to see it. Absolutely, so in addition to the plumbing work where we live, we also focus on what's the user experience? A, how do they build it? Yeah. And then B, how do they get results out of it? So what's the final visualization or experience that will meet the business needs? So in that sense, as real time enters a picture, it's going to be a combination of real time, basically, and historical data viewed together for that entire context of whatever the business problem happens to be. Let's take predictive maintenance as an example. Let's say you're streaming in real time the behavior of vehicles near your fleet. But something goes wrong. You want to not only see that behavior right at that moment streaming real time, but you also want to look at the historical experience to see, oh, is this a common occurrence? Or is this a new anomaly? Or did you see something show up that we didn't look for on this truck? Yeah, exactly. And maybe when you go back into history, you'll say, oh, this was here, the pattern was there, we just missed it. So that, yeah. Oh, go ahead, sorry. Speaking of visualization, what kind of innovations are going on in the lab with regard to visualization? Well, absolutely. So right now, the focus for Pentalo is really around API. So making sure that whatever visualization library, whatever visualization experience that your company needs, we can support that through our rich APIs. We also have our own open source visualization environment called Community Chart Components, which allows very rich access to the APIs and the ability to visualize about in any way you'd like. So where we innovate there, especially James who's up on stage, he's sort of our best visualizer, he'll take, let's say it's a map or a building diagram and then translate that to SVG and turn it into a dynamic experience that can really bring home that data to the business. So what type of problems would we expect to see, let's say in three plus years Pentalo solving that today people assume you really need a Hadoop type platform for in that you're adding, you're making a more coherent platform and integrated rather than a couple dozen sort of loosely connected projects. Well, so Pentalo definitely wouldn't be where we're at without great partners like Hadoop and NoSQL environments. We're not a database, we're not a processing engine. We try to make those a much more easy, cohesive experience. But we also, we truly believe in seeing the enterprise, there is no single instance of Hadoop, right? There are heterogeneous environments, you have your Hadoop environment now and a lot of enterprises have multiple Hadoop environments already. So you need a way to blend that data to have a cohesive orchestration throughout your data pipeline. And I think that's where Pentalo can really add the glue to that entire heterogeneous environment. So it will be adding functionality so that you can continue to be the end to end analytic data pipeline. Absolutely, so continuing to invest in big data orchestration. You're seeing emerging investment around IoT and then also embedded analytics. So being able to bring the analytics to the business problem, right? Right now, most analytics are out of, you know, desktop tools or out-of-the-box tools that aren't necessarily associated with a business problem in an operation. But we've really embraced operational analytics, bringing the analytics to that use case. Will, I want to give you the last word, but I have to ask one last question, which is, as we see the need to bring the analytics closer and closer to the operational applications, and maybe not all operational applications are going to be in the Hadoop operational databases like HBase or whatever, how do you get closer to the traditional operational applications? So, you know, a lot of the new operational technologies today are Mongo, right? They're based on the new SQL, or no SQL, whatever you want to call it, because they're in Mongo. And those, we have just as much investment in those technologies as we do in Hadoop. So we need to make sure to continue to work with operational stores as well as the analytical stores. All right, if we were to ask you to leave, you know, Pentaho customers who don't have the privilege of getting your autograph live here after the panel, what do you want to leave them with? Well, one thing we didn't get to talk about too much is, Pentaho Labs is a small team, as I mentioned, but we're now part of Hitachi, and I have access to over 400 data scientists in the Hitachi Labs. Big Lab. It's a big lab, and I've been working with them. There are a number of the data scientists here at this show, and it just, it's wonderful, and there's opportunities now that are endless in that sense for them to embrace Pentaho and also to contribute, because we're a community. Right. And that's the fundamental belief of Pentaho is that we're open community, so. Okay, with that, Will Gorman, thank you. Great, thank you. We are going to sign out. We'll be back in a few minutes. This is George Gilbert, co-hosting with Dave Vellante at Pentaho World 2015. Thanks.