 Line from Orlando, Florida, extracting a signal from the noise. It's theCUBE covering Pentaho World 2015. Now your host, Dave Vellante and George Gilbert. Welcome back to Pentaho World everybody. This is theCUBE, theCUBE goes out to the events. We extract the signal from the noise. This is day two for us, wall-to-wall coverage of Pentaho World 2015. Gretchen Moran is here, she's a software architect and alpha geek. I say that with all due respect. Welcome to theCUBE. Thank you, Dave. So it's great to see you. Well, what do you think about Pentaho World? Lots changed since last year. You guys are now owned by Hitachi. But what do you think? What's the vibe? Well, so it's interesting. As long as I've been with the company, I didn't get to come to the first Pentaho World. I heard all about the energy and just all of the amazing ideas that some of our customers and prospects brought to Pentaho World and told us about and just how much adoption had happened and now knowing that we're 40% above attendance-wise last year. I feel that same energy. I think it's just an amazing event and people are really enjoying themselves. I've heard a lot of, I actually had one customer approach me and it must have been about four years ago. He said that you saved my life. And I said, excuse me. And he said, yeah, you actually found the bug in this file that my entire team couldn't find. And when you sent it back to us three minutes after we emailed it to you, they were all on the floor and I thought, if I came for no other reason, that was just the nicest compliment. So there's been a lot of stories like that where people are just really, really in tune with the product and happy to tell us their stories. So you've been at Pentaho since the early days. What's your focus? So I've been, from the very beginning, I've been one of the core developers. I definitely have moved around a bit. I spent some time helping our original roots were open source. So I spent some time as our community manager and that's kind of a funny title. So back in the days when open source wasn't necessarily mainstream, we really wanted to do open source properly and organically. So we spent a lot of time just interfacing with those developers that were out there that were contributing to the product and had it just to scratch. And so we really wanted to interface with them well. So I spent about two years as our community manager just really shepherding those developers to come on board and support the development of our product. And then I went back into engineering for a number of years and a lot of what we really tried to stay true to throughout the years was our original goal of making Pentaho very, very pluggable and very embeddable. This was important to the original founders and it was part of the original vision because we had built two software companies together previously and we already knew what it was like to have to start over and to not be able to reuse what it was that we had already invested so much in. So that led to the principle of we really need to make a piece of software that people can extend, that they can embed in other applications and really take it to the next level and be able to continue to grow and develop beyond the lifetime of what most software ends up being. So I wonder if we could talk a little bit about that open source ethos. So you've been a software developer for a couple of companies now, which I'm inferring from what you just said, when you started it wasn't open source, it was a proprietary closed source. So when did you sort of get the epiphany and start drinking the open source Kool-Aid? Was it the lamp stack movement? Was it? It was really early. I mean we, so Pentaho is 10 years old now and so from inception year before, as I said we did come from a couple of proprietary software companies before that and we had done a lot of really good work and we had to give that work up every time. Not only that, we were coming into a space that already had some pretty big incumbents and some really big guys that would be tough to take on from a proprietary perspective. So again, we've got some really visionary founders and a couple of them got together and said, you know what, this open source thing has legs. And so we committed to starting the first open source business intelligence stack and we found best of breed open source projects brought these wonderful intelligent minds together and the ability to bring that team of talented people together formed the basis of what Pentaho is today. And we still have an open core, still got a lot of code that's out there, more code in the wild than under closed doors. So what were some of those open source projects? I mean, I mentioned Linux but you predated Hadoop when you started Pentaho. Oh, sure. So what were they back then? So the original core stack was built upon Kettle which is now a Pentaho data integration which was an amazing project. And again, these projects really were best of breed when we brought them in. They chose Pentaho to integrate with and really commit to because they loved the team. So we had Kettle, we had Mondrian which is our OLAP engine. So that was a really popular, best of breed mature OLAP engine that has done well for us, for our analytics. We had, I can't remember, I just want to say Pentaho reporting now. We had J-Free reporting was its open source name and Thomas Morgner is a wonderful man who fathered that project and it was very mature when it came in, gave us a great reporting engine. Trying to think, what am I missing? We've got Quartz which is our schedule. So we brought a lot of different projects together but the core ones are the ones that I designed. So these were open source projects that had been incubated that you guys said, those look like winners. Let's dive in, start contributing, make them robust and then let's build this platform from those open source components and then sell it. Exactly. That's awesome. Now with the story. Is it a hybrid where like Cloudera where they say, okay, some element on the side we're going to keep as our secret sauce or is the whole thing like really the Linux model where we're really going to provide value as release manager and maybe an update pipe where Red Hat does it. So we've been pretty true to what we wanted that model to be from the very beginning. We are open core, which means the platform itself, the actual container that brings all of these together and the orchestration engine that makes them all seamlessly operate together, completely open. We have services and management services and administration that is closed source but typically represents about 10% or less of the code. So the vast majority of what you need to do with our product is available to you in our community edition. Okay, and you're charging for the piece on top. Charging for the enterprise level services. And the maintenance of that as well. Yeah, and then of course, a lot of times, especially when you get into the embeddability of the product, it is so imperative to bring in our professional services because we're the people who really can help you make the most, get the best development velocity and make the most of the product from that standpoint. And your wheelhouse is embedded analytics. So talk about, tell people, what do you mean by embedded analytics and how did that all develop? So embedded analytics is really, it's a definition that can be described on a multitude of levels in the stack. So you have analytics that happens in the pipeline, right? You have the ability to take components of Pentaho and either run them in cluster or be able to take the data that's in the cluster and pull them out and operate on those analytics outside of the cluster. You could have a consumer of the data be, say a Kafka queue, or you can have the consumer of the data be our analyzer tool. All of that is embedded analytics. And each individual component gets described based on who you are that's interfacing with that piece of the platform. If you're at the level where you're writing MapReduce programs, we've got embedded analytics because we've got Pentaho MapReduce. If you are just looking for embedded analytics from the user consumer standpoint, then we've got analyzer. And we have all of the power of PDI behind that to be able to develop the pipeline from your Hadoop cluster into your analytic database and then be able to use analyzer on top of that. What were the key and what are the key technical challenges that you and the team faced? Is it just getting the open source code to be stable, robust, hardened? Was it the integration, both? From a technical challenge perspective? So, from our roots, from the old days, that probably would have been the technical challenge. That was the thing that really we thought was the value we brought to the table, was that integration. That's historical for us now, right? We've moved into the big data space. We've moved into a space that is growing so rapidly from a technology perspective that what we can bring to the table now, and it's not actually a challenge for us. It's something that we're just really good at because of the way we're architected is when these new technologies come to the table, we can make it easier for the companies that are ready to look into that functionality, that need that functionality. We help them interface with that functionality, help them bring it into their solutions in an integrated fashion in a much easier way because the orchestration engine that we have spent 10 years building, that platform, that architecture is so well vetted and so mature and so at its roots, embeddable and pluggable, that when these new technologies come along, it's a small bit of code or a small bit of innovation on our side to be able to take advantage of a technology like MongoDB or to be able to take advantage of a technology like Cassandra. And once you're plugged into our platform, that data source now becomes just a simple blend with something else that might be coming from a CSV file from an HDFS file system because we already had those other adapters in place. So the innovation and the ecosystem, the pace of innovation of the ecosystem creates a flywheel effect for you because of the platform and the interfaces, the entries and the exits to and from that platform. Exactly. Sounds straightforward, but it's pretty impressive. I've always liked to tell the story of our embeddability from the perspective that this isn't something that we thought of after the fact. We from the ground up have really had that vision that a platform has to be able to serve beyond the technology that's available today. And what we see now, especially with solutions like Streamline Data Refinery, so I'm also the architect that has been helping FINRA with their Streamline Data Refinery solution. You heard Simone talk about it at our keynote. And this application, Diver, really we are building that against technologies that are so relevant and so new that we continue to build with the incremental understanding that the technology we're using today may be just the incremental solution. And there's a technology that we see in the future that can really solve the problem the way we want it solved and we're ready for that. So that was another reason why FINRA really thought we were a great person to partner with when it came to building Diver was because this ability to continue to evolve that application, as big data gets bigger and as we develop the architectures to handle big data, those technologies are going to continue to change. The Pentaho platform is going to continue to be able to plug them in relatively easily and take advantage of them as quickly as possible. Can you talk about how the types of analytics that you're capable of delivering has evolved and for embedability? In other words, if say five years ago was it just embedded reports and what's it now and what might it look like? But not just the functionality, but how does it change the app in terms of being able to act on the analytics sooner? Yeah, absolutely. So this was actually a talk I gave yesterday just about what we called it is building high end next generation applications. Why do we have to do things differently and what do we have to do differently to be able to accommodate these applications? There's implications at every level of the stack. We made major investments at the data storage level. So we've invested in things like Hadoop and we've invested in things like the actual execution engines. We've got Tez and we've got Spark and we've got all these wonderful technologies that are now basically helping us lay out that data in your Hadoop store. That's one level where we've had to evolve and we've built basically, they're called Hadoop configurations but what we like to refer them to, internally in the engineering side of Pentaho is if you've been a carpenter ever, if you've done any carpenter in your house, you know that whenever you hit a door jam and you try to hang a door, is it ever straight? It's not, right? Usually what do you have to do? You got to have a little shim in there, right? Well, this is what we call the Hadoop configuration layer internally in engineering is our shim layer because we've been able to leverage and help people with multiple Hadoop distributions and they don't all act the same and we just shim them such that through the Pentaho ecosystem, you don't have to know whether you're using the open source Apache Hadoop or another wonderful vendor. That's a great analogy. Although we know that yarn has a shim layer itself. Well, that's why we call there as a Hadoop configuration. Only you guys now know and the rest of our viewers. So tell more, that's a great example. Sure, so that's the storage layer, right? And then at the application layer, the ability to build that data pipeline, the ability to get that pipeline moving at the speed that the use case requires, right? This is where there's a lot of flux and a lot of debate and a lot of passion in the industry right now around, should you be streaming versus should you be doing batch and the streaming guys want to tell you that batch is dead. I'm going to talk about that later today. But the truth of it is, is the use case is going to determine the velocity of your pipeline, right? You do need to have streaming and I feel for the people who have to do the implementation stay because it really takes time to fair it out. What are your use cases? And these technologies are very use case specific. You know, whether you micro batch in a streaming world or whether you are going to do true event by event streaming, it matters whether you need it or not and your use case matters. So at the application level or the data pipeline level, you've got to start to pick these technologies very closely based on your use case and how fast you need that data to move. And I've got, again, multiple clients that have started doing user scenario based application development because what serves a single purpose doesn't necessarily serve the entire population. And so trying to fit all of your use cases through that single hole right now is not necessarily the right way to go with big data. Are you talking about an ISV having to set a, having to create application frameworks that some are sort of slower pipelines and some are faster? Or are you saying that one app might have to serve some customers, some end customers really fast and some not so fast? That's exactly both, both actually. So we've got, we've got, we've seen use cases where people have both a batch layer and a streaming layer and they can actually marry those if they, if they, you know, so choose. We've also seen where I've got a particular customer who based on the number of partitions that they know a query is going to have to hit, they will put that particular query in a different queue in a different high queue than something that might execute a little faster and then they report it back to the user that, hey, you might have to wait a little bit for this one based on what you asked for. So, I wonder if you could talk to a lot of customers, obviously. So I wonder if you could comment on, it's a complicated situation, this big data for a lot of companies. They struggle with the skill sets. That's something that we always hear. We lack the skill sets. We're having trouble, sort of, and then they get it right on, you know, sort of one use case, and then all these other new technologies come out that are going to solve the problem. What do you make of that? What do you advise people? And how does Pentaho help solve that problem? Well, so we hear a lot of what you hear. This isn't uncommon and the wonderful thing about what I do, and I'm, my title is actually Director of Engineering Services. I'm in our customer care organization. You're, I'm sorry? I'm Director of Engineering Services. I actually, I'm the one who can do customizations for customers as a paid engagement. So we also have in our organization many, many very versed big data specialists. So it's imperative when you look at Pentaho as a solution to understand that the services we bring along with can really help alleviate a lot of that frustration and angst. And this is something that we evolved into because we did originally say, well, the technologies we orchestrate with, you know, we orchestrate with them but you probably want to go to that vendor for the expertise. This was back when, you know, maybe we interfaced with just Oracle and RDBMSs but as we've grown into Cassandra and Cloudera and all of these other wonderful technologies, we need to be able to give them architectural advice. We need to be able to help them avoid some of the potholes that we've seen in other customer engagements. And that's become a wonderful service that we can provide for our customers. I tweeted out yesterday that you guys are like the easy button for Hadoop, you know? That's interesting. It's kind of what you're doing about integrating all this stuff. Now, when you said earlier you invest in these technologies, you gave Tez as an example. What does that mean to invest in these technologies? Does that mean to contribute to the open source code or does it mean you integrate, describe them? So we do have engineers that have contributed to multiple open source organizations. And so from that perspective, yes, that is just part of our, that's part of our core operation, if you will, is we do give back to lots of different communities. But really the investment from our standpoint is we have Pentaho Labs that will take into account lots and lots of futuristic looking applications and say, we've got to make some early bets here and we've got to look into technology. So based on what we hear from our customers and based on what we see in the internet companies which are typically at least a couple of years ahead of everybody else and based on what our really knowledgeable labs people think we take on technologies early like Spark and we invest in them from a research standpoint from a gap analysis standpoint from an orchestration standpoint is this thing ready to stand up? Is the JDBC driver mature enough? And we are able to then give that advice to our customers. We're able to say, look, we've taken a look at this technology, it may be there, it may not be there, but we try to keep our finger on the pulse of a number of different technologies which is just as challenging for us as it is for our customers but that's our bread and butter. Has the Hitachi acquisition changed? How has the Hitachi acquisition enhanced that resource or has it? So it absolutely has. So they have a great big lab and they've got lots of services people and we're really anxious to take advantage of those particular departments and it's a new acquisition so we're still in the stages of mapping out how that works but that's everyone's goal. There's no doubt that we're excited about having more resources. So quick question. I mean, when you're talking about streaming versus batch that's sort of topical debate today but have you seen the sophistication of the analytics? Not just the speed change but it might have been a dashboard several years ago and now it's real time recommendation or some form of machine learning. What types of apps are you getting embedded in now that you couldn't do several years ago? So right now I have to be completely honest. We see a lot of inquiry around the real time. We don't see a lot of people actually able to implement it yet. The technologies are still pretty immature and there's still an investment to be made there in the enterprise so I think again you have to separate who we talk to from who we actually have as customers. We do have internet companies as customers but they are the ones who actually already have these real time apps in place. The enterprise companies that have the real time apps in place really have the use case for it and then from that perspective we see more real time dashboarding and that sort of thing but from the embedded perspective the big change for us is the analytics aren't just through analyzer anymore. A lot of what's delivered to us come in as key performance indicators and things that are mashed up and embedded side by side or in a single view that makes sense for that use case to that user. So it's not just delivery through the typical slice and dice and pivot anymore. It's delivery through all aspects of the dashboard. And so the dashboard's the point of integration not the very back end of Pentaho pipeline. Well it can be both, right? So the analytics can be processed and if they're processed and the next consumer in line is a queue then yeah, the analytics center at the machine level and then I could possibly take that result and pass it on to another machine but from the visualization perspective we see so much more going on and so many more requests in the dashboard layer for more interactivity, more data faster and that's a whole nother layer going back to your previous question beyond the data pipeline is that visualization layer and how do we accommodate not only the tech to be able to deliver the analytics at the end of this pipeline but also there's a real caution we give our customers about revolutionizing software for their users. At the point that you've talked to your user about changing their software you don't want to give them a revolution, you want to give them an evolution because there's adoption to consider and users don't generally adopt revolutions. So we move a little slower at the visualization layer we tend to still give them traditional dashboards and the charts that they like and that sort of thing but we absolutely can start to accommodate these technologies and get it into the end user's hands. So Gretchen we're out of time but I have to ask you about the women in tech angle. We love the women in tech, we love to amplify and showcase the stories of the women in tech. In fact this week the Cube is down at Grace Hopper conference in Houston. I don't know if you follow Grace Hopper but you know who Grace Hopper is. Grace Hopper was one of the most famous programmers of any programmer and I think invented Cobal also she was an admiral in the U.S. Navy. Amazing woman, if you don't know Grace Hopper Google Grace Hopper but we love to amplify the stories so you started computer science degree, worked at several companies that you've been a hardcore engineer at. How did you get into this whole thing? How did it come about as a young person when you got into the programming world? How did it all start? So it's interesting, you're not the first person to ask me about this and I generally respond with I didn't know that it was unique at the time that I went after my computer science degree. I had a great mind for math, I had a great mind for analytics so it seemed like a challenge and I just really wanted to push myself in college so that really was it. I didn't know what I was getting myself into and I think a lot of women my age probably felt the same way. We didn't really know that this was something that maybe wasn't typically in the realm that women go after. I think there's a lot to be said for open source, basically opening the doors to tech for more than just women, for people who typically may not have looked at it because it's a meritocracy. When I contribute to an open source project they don't know who I am, I'm Jean Moran. They have no idea whether I'm a girl or a boy and I do good work and I'm included and so it's not to me a barrier to be female in this industry and I think just removing from the conversation anything that sounds like women wouldn't have a fair shot or women have to do something different, I think when you see what technology has done in terms of goodwill projects and stitch fix and help me figure out how I find the best pair of shoes from previous shoes that I bought, I mean these are things that we're all interested in as females and tech gets us there so you find out what you love and then you apply tech to it and tech is just another wonderful dimension to add to your career. I love that answer but I'm still going to nominate you second nomination from Pentaho World for our women in tech guests of the week. Well, thank you. Gretchen Moran, thanks so much for coming with us. Thank you very much, it's been nice talking to y'all. All right, keep right there, buddy. We'll be back right after this word. This is theCUBE.