 Live from Boston, Massachusetts, extracting the signal from the noise. It's theCUBE, covering HP Big Data Conference 2015, brought to you by HP Software. Now your hosts, John Furrier and Dave Vellante. Okay, welcome back everyone. We are back here live in Boston, Massachusetts for HP Big Data Conference, HP Big Data 2015. This is SiliconANGLES, the CUBE, our flagship program. We go out to the events and extract the signal from the noise. We're proud to have our next guest, Shilpa Luwande, VP and General Manager of HP Big Data Platform, formerly the VP of Engineering. Now General Manager, great to see you again. Great to have you back on theCUBE. Great to be here. Dave Aquino, thanks for coming. So first question, got to go jump right in. A lot of good stuff happening with Haven. Break us down, what's the update? What's the latest release? What's the big feature now that you guys are talking about? And how does that relate to the Haven momentum that you've had? Yeah, so since about November of last year, we've had a continuous sort of flow of product releases that are sort of under the Haven brand. And our general idea is to sort of merge the worlds of structured and unstructured data together. So bringing the assets we have together as Vertica, which is a lot of structured data and analytics technology, and then idle, which has been our unstructured human data processing engine. Bringing those worlds together. And so one of the things that we've announced today is something called a Haven app framework. And so this is an idea, an initiative to make it simple for anyone to build a big data app. And so we are trying to formalize and standardize some of the aspects of what it takes to build these applications together and make that simple to use, make that be more readily available in the context of specific use cases and really encapsulate the best practices we have and so on. And so it just makes the lowers the barrier for what it takes to build an app on the Haven platform. And you call that excavator, is that right? So excavator is a set of Vertica capabilities that we've announced. And so this is sort of a combination of our technology and open source technology. Some key integrations that we've got. The framework of Haven things that we are doing are generally to bring together sort of structured, unstructured, machine data, all these different sort of disparate worlds together and really make it easy to do. So the excavator sits underneath the Hadoop? So excavator is the code name for Vertica, the next Vertica release. Oh, okay. All right, she was showing a little leg up in the stage today. Okay, so that's, all right. So the framework encompasses, that's an umbrella. Yep, the framework is an umbrella. So there's multiple announcements that we've made today. So Vertica excavator being one of them and it all fits under that general idea of making it easy to blend structured and unstructured data together, blend the worlds of, you know, our technology and open source. I think that's a personal question. Yes. Be former VP of Engineering, which you ran at Vertica. Yeah. We've talked many times on the queue. Now general manager, what's the transition like? And again, this is the trend we're seeing. There's a lot of changing architectures almost in, it's evolving every year. The progression is architecture gen one. It seems to be elusive. It's a moving train, if you will. But yet the progress continues. So share your personal roles, GM now, your GM, not just VP of Engineering, but general managers, the business side. So, which is good because engineering is driving everything. How is that shaping out? And what's going on with the customers and tie that together? Yeah. So what we've been finding, it's been about 10 years since Vertica has existed at this point. And what we've been finding is that while big data, there's a lot of hyper-rounded, there's a lot of promise. In reality, it's still very hard to get business value out of big data. So some of it, I believe that there are good technology elements that exist now with Hadoop and Vertica and things like that. But it's still a lot of shortage of talent that I think drives some of the sort of, there's a drought of information, so to speak, still. You can look, but you're not finding enough insights that are really sort of business. Get a lot of dashboards. A lot of dashboards, not a lot of action. And so one of the things that we are doing as part of our Haven framework is trying to make it so that analytic insights can be, whether they are from structured data or unstructured data, can be blended together and embedded in business workflows and things like that. So things that we have done with our partner logic analytics is really around making that process be easier, but for the business user to be able to customize what they're looking for. But at the same time for people building these applications to customize them for that specific use case or a business. Okay, so this question came from earlier in the day from a reporter doing a story on Kafka. I know you're obviously for it, not against it, because the question was, are you bearish or bullish in the writing point, counterpoint, I guess? I know you guys are for it. Why is this so important? Why are you guys so behind this? Why? Is it just better technology? Is it timing? The way we look at it is that there's a lot of innovation happens in the open source, and there are waves of things that come and go and so on. What we look for is a pattern where a lot of our customers are sort of doing things with a particular piece of open source technology together with our technology. So we find that a lot of our customers have built their own Kafka plugins and Kafka-based data pipelines to put the data into Vertica, but each one of them sort of is doing it from scratch each time. And so it makes a lot of sense for us to sort of just do that piece of work for them. Saves them time. Saves them time. And again, it's all in the idea of simplifying things, reducing the time it takes to bring one of these applications to life. Hey, what a great Novel idea. Save time, reduce the steps that do things and get insight in an elegant way. That's right, that's the idea. And it helps us, it's not entirely altruistic, right? It also helps us because it helps reduce the moving parts that we have on our own platform because it makes it like, okay, so now loading data into Vertica from streaming sources, it just, it becomes extremely simple. And so that means that people are going to make less mistakes. So your job is to abstract away complexities. Yeah, so take away complexities because that's what I think is the barrier right now between the big data projects that fail and succeed. It's really this complexity of putting the stuff together and from all these pieces that are constantly changing and evolving. So I want to get your perspective on sort of the trend in the business. So historically the data warehouse business has been a historical look back at what's been going on. And the operational analytics have largely been embedded in the transactional database. As the pipeline evolves and matures, will we start to see data analysis and collection analysis occur both before as opposed to just after sort of end user interaction? And where does Haven, an excavator fit into that? Does that make sense? Yeah, so I think one of these areas of real-time analytics, right? So whether you take, you know, it's really a question of how quickly after you get a particular piece of data that you can do analytics and act on it. And so one of the things that we're trying to do and the reason we're trying to do this with Kafka is reduce the latency of when the data becomes available for actionable insight, right? And so we've always had, architecturally we've always had ability to trickle load data into Vertica, but really reducing that latency from, you know, down to single digit seconds or very quickly being able to act on the data. I think that's what is, that's the idea here. And the new types of data sources like IoT and things like that are going to make that more and more prevalent. So- I got to come back to my, I think I said last year to Conwell, I said, hey, Vertica, the Ferrari, was it the year before? Yes. The Ferrari of, and I meant it as a compliment, but you can look at it now and say, Ferrari is kind of a non-standard, high performance vehicle, but I want to come back to that again this year. It appears that the work you guys have done this year is actually the opposite. You're making things easier. So let everyone can drive it, Vertica. But the performance is still there. So at some level performance meaning, you know, Ferrari just popped in my head. But so talk about the performance gain because with SQL on Hadoop, that has shown that that's a good call because people are used to SQL and that's accelerating adoption in the customer base. But performance is also there. So talk about the balance between acceptance, some sort of standardization and performance when what are you guys doing in particular to do that, to drive that? So it's not a one-off, like a specialized car. Yeah, so in terms of a car analogy, I think the best example I have is a Mini Cooper. Have you seen that? You can actually customize your Mini Cooper. Yeah. It still has a BMW engine inside. The Italian job. You see that movie? But you can actually, so the idea of being able to, you know, take the things that really make it, you know, high performance and so on and keep that, but make it available in different form factors, right? So on SQL and Hadoop, what we've done is we've looked at what people try to do with Hadoop. It's really data exploration. But there's also the idea that you want to have data stored in one format but then available to many applications. So what we've done is made it possible to run the Vertica engine on Hadoop but using an open format so that that data is accessible to Vertica but also to other things. And so, but at the same time applied whatever applicable performance improvements we can on that data. So one of the topics, we want to talk about a SQL on Hadoop. Yes. It's come up a lot. Some people said SQL was the killer app in Hadoop, you know, tongue in cheek. And you know, Cloudera's Impala sort of was a tailwind. You guys have always sort of had this vision. You know, we saw HADAPT attempt to get out in front and it failed because of the architecture. You know, was what we were, we sort of surmised and were told today. What's your take on SQL and Hadoop? What's the uptake? What is it meant to Vertica? So we see this as a continuum of, you know, between data exploration when you have lots of data you don't know what to do with it. You want to try and figure out what's valuable and we are giving SQL and Hadoop as a tool for that end of the data spectrum, right? And then once you figure out data value and you have some sort of SLAs around the performance, et cetera, you can move that same. So if you can design a report in that exploration phase but then you can just migrate it to Vertica Enterprise and you get like all the full power of Vertica there but you don't have to change a tool set to go from, you know, exploration to operationalizing data. You have the same SQL tool set all the way through. And so that's, we believe is a huge, a very powerful thing, not have to change tools each time. So I got, go ahead. Go ahead. So I got to ask you two questions. One is, what is the most important problem that you guys are working on with your customers that they're most concerned about, whether it's ingestion or what else? And two, what are you most excited about from a technology disruption lever, if you will? Something that's new and cool and relevant. Yeah. So customer problems you're working on, number one problem and then your favorite technology trend shift. Yeah. So our customer problems are in two flavors. So clearly that our customers who have been with Vertica for a while now have, you know, their databases are bigger, they're sort of, they have grown as companies, some of them are companies that have grown up on Vertica. I think you had some of our guests like Etsy.com on this show. But so along with that, they're pushing the envelope of the product. So things that, you know, things that we need to support these growing databases. So that's one area that we keep working on. But at the same time, we are also realizing that there are new ways of doing things like I mentioned on-demand earlier. And that sort of takes our product in a slightly different direction. So we are doing things like Vertica on-demand and Idle on-demand, Haven on-demand to really make it, make our platform be more easily available and sort of usable in smaller pieces. So at the same time as growing bigger databases, we're also trying to make, okay, you want to just use this particular image recognition function of Idle, we want that to be available as an API. And so it's sort of disrupting ourselves in some ways. And technology you're excited about? Technology, I'm excited about. Good question. I have to think about that. I mean, there's a lot of technology we work with. It's like asking what your favorite child is. I know. You can't say one or the other. It's the collection. There's a lot of things, yeah. But Tamer is an interesting example of a new class of companies that are very interesting to us. It's basically this idea of curating the data but also figuring out where the pockets of expertise are within enterprises. Another company that we recently partnered with is Alation. It's the same type of idea, which is... So they're taking the Google-like search engine approach and finding nuggets of value in the data exhaust. That's right. It's basically there's a lot of data in the enterprise that's strewn all over the place and trying to figure out how to catalog it in a meaningful way so that you can sense out of it. So Nikita, today you talked about Excavator. Can we unpack that a little bit? There are technologies in there that you must be excited about, right? I mean, you mentioned... Yeah, the technologies like streaming analytics. Streaming analytics, Kafka, things like that. Definitely... So is that the streaming analytics piece of Excavator? Is our integration with Kafka? That's the starting point of it. We are also working on integrating our platform with Spark. So Spark is fairly new right now, so it's... I don't know if the jury is still out on where it's going to go and so on, but we do believe that there is following. And so we are going to help our customers adopt the machine learning side of Spark and offer them our SQL capabilities as a complement to Spark. And so we keep looking out for things like that. You've got an open source FlexTable library. What is that all about? So FlexTables was something that we introduced a couple years ago as our schema and read capability. So essentially for semi-structured data like JSON and things like that, you don't need to really define a database schema and for a lot of data like logs and exhaust coming out of web applications and things like that, it just makes it extremely easy to put that data into FlexTables. And then you can just start querying that. So no schema on right. You don't need to define your schema before you query these systems. And then distributed R, which you've provided back to the open source community, I understand, is that right? So R typically not performed well in the distributed environment, right? So distributed R is our extension of R. And R is something that is again very interesting to a lot of our customers and something that we do are excited about. And it's predictive analytics in general, right? And this is why we are looking at R and Spark and contributing to this R community and so on. But distributed R extends the scale of R. So it's taking some of the things that we've always done well at scale and applying that to some of these existing technologies that have been out there for a while. And then some improvements to Vertica on Demand you talked about as well. And then what's the sort of application development layer there? I mean, is there a PAS equivalent here for developers, let me talk about that. Yeah, so we have an umbrella called Haven on Demand which is all of our on demand capabilities. And right now we have Vertica on Demand which is your sort of data warehouse as a service. But we also have Idol on Demand which is APIs that are for unstructured data. Some predictive APIs as well that we are offering as REST APIs for developers. Right now it is free for use. It's in the sort of preview early adopter mode. We're trying to get feedback, get people to build interesting applications on it. And then our intent is to offer services that are higher level services using some of these APIs to solve specific problems. Okay, so these capabilities as a solution essentially more integrated than you would expect historically. Is that fair? That's right. So think of it as a layer of different classes of APIs. So at the lowest level we are offering REST APIs extensibility APIs on our platform. Higher level than that our services like Vertica on Demand which solve a particular problem in context. And then the highest level is providing these rapid deploy solutions for applications. So whether it's voice of customer or you would have seen smarter cities, financial product analytics, things like that are use cases that we are providing out of the box sort of templates for solving. Well I got to ask the product question because I love talking about platforms. Since you're the general manager of the platform I thought I would chat with you about that. There seems to be platform wars going on. You have companies that have become, that was started out and did very well. In some cases it's gone public, some are Muslim private, that are good tools or one trick pony type product. Now they're like I need to make more money so everyone goes to the first answer they can think of is let's build a platform. So how hard is it to share with the audience out there what it takes to build a platform because a lot of noise is out in the market right now across many different sets of products. What's a platform, what's a tool, what's the table stakes for a platform, what minimum things do you have and how should customers look at platforms and decide which one's the best or better or minimum viable product. Yeah so I think it's hard to say a platform can be viable without a community. So I believe that of course there's the product set but that continuously expands and improves but really I think what makes a platform real is having a developer community around it and really a partner ecosystem around it and I think that's what I believe are sort of things that differentiate like platforms that are real and viable and can actually have a long term future from platforms that just are simple tools that you want platforms that smart people can use but you also want platforms that you can get a lot of help if you don't have the right type of resources to use them yourself. I think Hyde floats all boats, open source plays well there, startups you guys have a lot of startups. And so really the reason behind some of these partner oriented activities that we are doing is that we believe we can't do it alone to do a platform. We've got to build a skill set in the marketplace for big data in general but our products in particular but also get people started on our technology right from the beginning because it's much harder to move people afterwards so. It's impressive, it's certainly a sign of great compliment when you have a venture backed startup funded by a tier one venture capitalist using OEMing Vertica's engine. Because they don't do that unless it's nicely compositely modular and valuable. That's correct, that's a real, yeah, and that has been our sort of the OEMs or ISVs embedding our technology has been something we've done for a long time but it's been much later in the life cycle of those companies so we would like to get them much earlier on on the right technology platform right from the beginning. It's a fair to say there are opportunities to build analytics into the apps. That's correct. As a packaged app with their industry specificity. Yes. And scale. That's correct. So there's definitely that opportunity. I think there will be problems that are solved by people with specific domain expertise in a specific way. There's on the other side, there's also the ability to stand up an app for a specific purpose but doing that fast. So there's that gamut of sort of constituencies that we cater to in this case. And that's what makes a real platform that having that flexibility to really handle diverse collection of use cases. So has this feeling to be part of a new focus split company, all the energy's behind the enterprise now? Yeah. A really smaller boat if you will. Yeah. So I think that it is to have a smaller company with a clear focus as well as a much more agile company. So. Big data is a big part of that. And big data is one of the four big priorities for HPE. You'll look back at enterprise. Well, congratulations. Thanks for coming on theCUBE again. Thank you very much for having me. Thank you. Thanks for answering all of the tough questions. Kafka, I'm sure there weren't tough questions for you. But big interest in Kafka. Yeah, definitely. So we're here live in Boston, Massachusetts with the general manager of the HPE data platform. We'll be right back with more after this short break.