 Live from the San Jose convention center, extracting the signal from the noise, it's the cube. Covering Hadoop Summit 2015, brought to you by headline sponsor Hortonworks, and by EMC, Pivotal, IBM, Pentaho, Teradata, Syncsort, and by Atunituandisco, now your host, John Furrier. Hey, welcome back everyone. We are here live in Silicon Valley in San Jose for Hadoop Summit 2015. This is SiliconANGLE's program, the cube, where we go out to the event and extract the signal from the noise. I'm John Furrier, founder of SiliconANGLE. I'm Joe McCoast, George Gilbert, wikibon.com, Big Data Analyst, heading up to space for all of our research. Our next guest is Sandeep Madhura, VP and General Manager of the Data Product Group at Pivotal. Welcome to the cube. Thanks for having me, guys. So when you hear Big Data Product Group, I love, for sure I love any product discussions. When you hear the keynote up on the stage from Rob Beard and Hortonworks, the data operating system, I mean, that's your wheelhouse. Yeah, it is. So tell us, one, what's going on in the data operating system, and what's going on with customers, too, because now we're in a market transition of relevance, enterprise value. Yeah. So, you know, the big thing for us at Pivotal, you know, last year we launched the Big Data Suite, which is, you know, a packaging of our Big Data products, including Plum Database, our Gemfire Memory, DataGrid, PhD, or Hadoop Distribution, and Hawke, our sequel on Hadoop Technology. And so, you know, what we're doing now is we're continuing to push with that Big Data Suite. We've enhanced that offering. We've also added, you know, Spark to the Big Data Suite, Redis, Rabbit, and Pivotal Cloud Foundry entitlement. So what we're really focusing on now is exactly what you said is helping customers along that journey, not just capturing the data, but capturing it working to get insight from it, from there with their data science groups. You know, we have the mad lib capabilities on both Hawke and GPDB, and then operationalizing those findings and applications, which we, you know, use our data science group and Pivotal Labs to really come in and build applications. You had a lot of success with Cloud Foundry. Yeah. That's going well. Yeah. That's galvanizing the ODPs on top of it. Yeah. The orbit of value is kind of circulating, kind of the foundation set. But before we get into the conversation, set the table on what's going on. You guys did some open sourcing of the Big Data Suite. Yeah. You have an acquisition you guys just did. Yeah, we did. What is this all about? Why? You guys buying? What's organic? What's being bought? Give us the update. Yeah. So I'll break it out. So in February, we announced our intention to open source our products. And in April, we actually, you know, Gemfire's open source version, Geode, was accepted in the Paki Incubator. And so that was the first of our, you know, three major products. We'll next go with Hawke and then Green Plum by the end of the year. So we're heavily working down that path. And you know, when it comes to the acquisitions, what we're really focusing on is, you know, expanding our lead in technology. We've always been strong in query optimization. We've had a really strong team there. And we wanted to augment that by bringing, you know, Jignesh and his team on board from Wisconsin that had been working on the QuickStep technology. And we'll be working to integrate that into, you know, our products. So what, and the acquisition piece, right? Is this speaking to the speed of evolution? I mean, I look back to the EMC analyst meeting a couple of years ago when you guys were rolling out with the Cloud Foundry and I was like, I was pretty skeptical. I was one of the skeptics. I think this might be not a vaporware, but no, things really accelerated. Why and what's going on now? What's the big focus? So there's two huge pushes that we're seeing. You know, one is this push to cloud across the board, right? And then, you know, once you push into the cloud, it's really being about cloud native and leveraging next generation, the underlying next generation architectures. And that's what the QuickStep technology really allows us to do. One, it really allows us to take our technologies and make them more cloud native. And then two, leverage the underlying hardware in a more efficient way. And that's where Jignesh has been focusing for the past four to six years is really taking a look at next generation CPU architecture. You know, we've gone from single threaded CPUs to multi-core CPUs. And his technology really leverages that. And that's, you know, highly leverageable in cloud environments. And so we're excited to start, you know, integrating his technology. One of our themes, our sort of principle theme at the big data practice is the systems of intelligence, which Paul Moritz, CEO of Pivotal, has been talking about under a different label for a long, long time, sort of bringing analytics at the point of interaction to influence, anticipate and influence outcomes. Can you tell us about how the big data suite works together and some sample apps to, you know, deliver that capability? Yeah. So, you know, what we're really starting to see is, specifically in enterprise, which is, you know, pushing hard to leverage these, you know, systems of intelligence, is they have this need where they need to take, you know, standard structured relational data and correlate it with, you know, unstructured data. And that's where the big data suite really comes in for our customers and, you know, helps them solve problems as such. And so I'll give you an example. Let's say you're a traditional brick and mortar retailer. And, you know, you have a transactional relational system that's storing your transaction information. And now you want to leverage that information for, you know, some new intelligence. Let's say you want to look at, you know, the security camera information from what's going on in your parking lots to better understand, you know, how you should shift your, you know, your hiring schedules, right, time people start and leave your staffing hours. And so that's when you have to take a traditional data set and combine it with something that's, you know, unstructured like video. Then you, you know, run that through, you know, some Hadoop jobs to extract some, you know, some relational information from there and then correlate those two things together. So the big data suite's really coming into play where, you know, customers can't fully move to Hadoop and they need to leverage their existing systems. And we allow them to do that by, you know, having a single point, a single skew that they can purchase from us. And just to make that sort of where the rubber meets the road, it sounds like you would be processing, say, that video of the parking lot and people coming in through the front door. And then once you've got a repeatable process, you would move that over to the relational world. Yeah, of course. I mean, once you've completed your image analysis, then, you know, at the end of the day, to do any analysis, you have to extract something from that unstructured data, right? And so once you do that, you would move that over. And then, you know, you'd create a model to say, hey, let's look at, you know, what time people come to certain stores and what time we should shift the hours that people work there. So, you know, we see Hadoop being, of course, this ecosystem of unbelievable innovation. But at the same time, you know, it's an ecosystem, not a product. So there's a ton of complexity for admins and developers. Yeah. Your big data suite and then Pivotal Cloud Foundry underneath that starts to create a data platform along the lines of, you know, what's on Azure or Amazon. How do they, how do those pieces fit together better than, you know, best to breathe? Yeah. So, you know, what, what we're really trying to focus on is, is, you know, time to delivery of some, you know, output, right? Some result. And so think about it like this. Today, there's probably an enterprise data warehouse in most large businesses. But if you need to get a data scientist onto some data set from there, that time to do that is, it's quite long, right? It can stretch anywhere from weeks to months. And, you know, it's typically six months of what we see. With Cloud Foundry, what we're really looking to enable is the ability for anybody in the organization, whether it's a data scientist or engineer, to say, hey, look, I need my own slice of data from the larger data warehouse. But I want to use the same tools that you have access to because I want to query that data. And so, you know, push a button, have that system deployed in some ephemeral state, you know, create your model, create an application that interacts with that model. Once you verify that, then take that, you know, and, and push it into your, you know, your actual production back into the data warehouse. And so we're really looking to make the ability for customers to have that agile nature that they do in software development, where, you know, you can have kind of continuous delivery, but to bring that all the way into the data world as well, right? Being able to create a model quickly, iterate over that model, and, you know, do that without having to have someone ship a bunch of hardware and, you know, software to you. And then deploy it into that agile develop application. That's it, right? Because the idea is that once you create that model, you're not done, right? You want to continually iterate that model. And so that's where Cloud Foundry comes in because it provides a platform for you to do that in a continuous way. Does it? Oh, right. Does Microsoft or Amazon have something equivalent where they can facilitate that, you know, the process from iterating on the model to deploying it? Yeah, I think what's interesting is, like, you know, both Microsoft and Amazon, you know, have pieces of this. What makes Cloud Foundry sort of, I think, the better solution for going down this path is it's agnostic to the underlying infrastructure, which is where, you know, we see with our customers this notion of, you know, data gravity, which we talked about. And a lot of times some of this information may live, you know, on-premise, and you may want to run the application in the cloud, and that's where Cloud Foundry becomes a lot more interesting, whereby, yeah, you know, these technologies may be available in Amazon, but your, you know, your data is living on-premise, and you may not be able to get it over there. So in other words, the hybrid deployment is something you can provide independent of the cloud? Correct. All out of Cloud Foundry. And I think that's when, you know, bringing those two things together is really powerful for us. Got it. So I've got to ask the customer request we hear from theCUBE and also through Wikibon practitioners community is a couple big picture challenges. I want you to kind of break down for us and how to figure this out in big data suite and in general how to roll out the customer to the next generation, which is, I got a ton of sources of data. I got boatloads, pun intended data lake, sources. I got a ton of sources. I got infrastructure. You mentioned that earlier. That's one challenge. Another act, other challenges. I got software guys that got Pivotal telling me I should be agile, DevOps, infrastructure as code. I'm running this on and a network I built in the 90s. It's just been consolidated and tweaked. It's a red lining. It's holding it together. I had management software that's living in the stone age. I just can't, I mean, what do I do? Yeah. How do I deal with that? So those are the two big picture items. Lot of sources. I'm getting pressure for performance and new app architectures. Every company's telling me I should be building X, Y and Z. I am, there's a lot of noise. Yeah. So what is the plan for the customer? And what do you guys do specifically in that use case? Yeah. So in those use cases where we're really trying to help customers and there's a lot of push from the Cloud Foundry side is to take a look at that legacy and try to break it up into this notion of microservices. And so on the Cloud Foundry side, we really embrace the Spring Cloud model, right? And really take what exists as the legacy systems and break those into microservices. And those are incremental tasks that the companies can take on. As they move to make those things microservices, then they can start separating the data and the applications such that the individual next generation apps you want to build can be built on top of those microservices. Is that like micro batch? No, it's not like micro batching. It's basically like taking the monolithic applications that exist and breaking them down into, look at it simply like a bunch of restful services like you'd see in a consumer internet application, right? You know, it's the same thing that you see out of, you know, like a Twitter, right? Or Facebook, they have an API that's available for people to interact with their back end application. You need to start doing that within the enterprise and you need to scale those services independently. So services led right now. We heard from the people, services meaning not microservices, but so if your customer says, I have a problem, I want to unify all these sources. Can you guys do that? Yeah. But this is, you said something really significant which was take the legacy and make the microservices. And we've heard from some other customers which was that's a big job. Like why don't we make the new stuff microservices and the legacy stuff, you know, the more coarse-grained stuff. Knowing that re-use is going to be a little harder. Well, when I mean take the whole stuff make your microservices, it usually means make new stuff that looks like microservices. Okay. Yeah. Which is, or wrap something around your legacy stuff. So in other words, the example we use is, you know, 12 years ago, Apple built the iTunes music store on top of SAP R3. Yeah. And, you know, it wasn't microservices, but you're saying you can do that now. Yeah. You know, we have, you know, with Gemfire, right? We offer a solution with Gemfire which allows you to, you know, interact with a mainframe without adding load to your mainframe. And so, you know, the reality is you may not be able to move off the mainframe because that's your system of record. And so what we do is we stick Gemfire in front of that. Gemfire can, you know, act as a cache to the applications which have the load that comes from mobile and other things that you don't want to add to your existing mainframe. Okay. So looking at it now. So you guys got the batch, you got the batch stuff, the processing engine. That's pretty solid. Yeah. I mean, that's solid as a rock. Yeah. When you start getting it to real time. Yeah. Then you have interactivity, you have visualization, insights, that's an app kind of thing. Yeah. And then on the data sources, real time source could be structured or unstructured. Yeah. So you guys have, how do you handle that? Yeah, so like, and I would add a third one, which, you know, we were talking about earlier, which is even when you think about real time, there's real time in human sense and there's real time in machine sense. And those two things are quite different as well, right? You know, humans act in, you know, hundreds of milliseconds, machines can act in single digit milliseconds, right? And so, yeah, you know, across the board, what we really like to think about it is, you know, for the problems that are, you know, sub 10 milliseconds, we have Gemfire that we use and Gemfire is using a lot of machine to machine use cases, right? Widely deployed on Wall Street. And then, you know, we look at hybrid architectures between Gemfire and or Green Plum and Hawk for the, you know, sort of interactive, human real time use cases. And then for the batch use cases, you know, things like GPDB, PhD. You know, I asked Maritz about this at the open data platform event, which was, we make this trade off today between sort of big data and fast data and, you know, one's low latency and one's high throughput. But hardware is changing where we might not have to make that trade off. And where that could come into play is, when you want to do the predictions, you can learn in real time, but you also get the rich history. And then you can operationalize that right away, the learnings right away. Is that something we should be looking for in the near term or medium term? Yeah, you know, definitely like an area of like, you know, a lot of interest for us is, you know, things like, you know, probabilistic query, right? Where, you know, if you're looking at sensor data coming off a jet engine, right? You may not have time to stick it all the way down into a database and query it back out because the action you need to take needs to happen faster than you can do all of that. So there's a lot of interest there in sort of that fast data sense. Sort of pushing it. Pushing it to the edge. Exactly, right, yeah. But what about when you really do want to have lots of real time streams, but you also have accumulated a ton of history? Yeah. Because you can't discard that for all the, you know, richness. Yeah, so that's where, you know, the work that we did and I don't know if you saw our announcement, you know, a few weeks ago around us releasing the next generation query optimizer, which is highly extensible for us. And that query optimizer is built to, you know, be extensible in a way that it can look at, you know, different sources of data for a database that have different characteristics, right? Whether it's, you know, kind of in memory, on disk, or, you know, anything in between those or any new things that may come up, right? This would fit, well, I know we're diving into the details, but let me try and put context around it. You talked about this yesterday or today, the acquisition of the query execution platform. We know that processor cores are, you know, going to have dozens, hundreds, maybe eventually thousands of cores. Yeah. Does that make it possible to actually distribute the query execution across a huge cluster eventually? Well, yeah, and that's it. You know, and sort of the question earlier is, in looking at what Jignesh was doing, he was, you know, his focus has been sort of what's happened in the last eight to 10 years as we've gone from, you know, single-threaded processors to, you know, single core processors to multi-core processors. And all his work and research has been around leveraging multi-core processors in cloud environments, right? Where you can be highly distributed and highly parallel. So you're taking that, rather than doing the sort of Oracle Rack Cluster, you know, which is like trying to stretch a couple single instance of the database, you're now talking about a swarm. That's it, yeah. Okay, yeah. Wow. Yeah, and even potentially with heterogeneous hardware, right, where the hardware is not, you know, all the sort of in the spirit of the Duke. Yeah, yeah. Sonny, you talk about the event here. We got running that low on time. I want to get your thoughts on the industry, this event for the folks who aren't here. I mean, it's booming. Yeah. But it feels like it's constricting at the same time because the apps and the analytics, all this tool, a lot of shifting going on with the tech, you mentioned some of the things going on, but there's growth happening. So is that coming from the cloud or is it, where's the pressure? Because there's growth here. Yeah. So where's the barometer of the industry? Share your vision. Well, I think it's an interesting intersection of a few big pushes happening at the same time. What we'll see is probably a huge spike. And, you know, it reminds me of sort of the early days in the internet, you know, to go to the conferences about the same size and then you kind of blew them out in the late 90s. And so what I really feel is really driving it is, one, obviously the cloud push, which you mentioned, two, this notion of digital transformation, right? And I think, you know, when you think of those two things really coming together, there's a lot of businesses, and John Chambers had an interesting quote yesterday. I don't know if you guys saw it, he basically said at their conference, 40% of the companies that were there, his own customers, they won't be in business in the next eight to ten years. I did watch that, this last keynote as CEO, very historic. Yeah, and that was, He was good. He was on fire. And that was his big, you know, departing remark, which is, you know, is that, it's like, you need to change your business, right? You've got to look at next-generation companies that look at the problem completely differently. Uber, a transportation company, doesn't own any cars, right? Airbnb, you know, like a hotel, logistics company, it doesn't own any properties, right? And so that notion is making its way very rapidly through enterprise, and we see that across the board with all the folks that we've been talking about. And certainly, we said on theCUBE years ago, you're either the disruptor or you're being disruptive, which is his theme, which is so on the money because the enterprises have been kind of had their head in the sand, maybe for a little five years too late. I mean, this is happening five years ago. Yeah, and the proliferation of mobile is just adding to it, right? You know, we saw yesterday at the Apple keynote, all the, you know, kind of new technologies that they're making available, and that's just pushing the edge even further, right? That's a great point. We could do a whole segment on that because I think you hit the nail in the head. The digital transformation is huge, but it's not the consumerization of IT. No. The consumerization of IT was, oh, it gets VDI on the desktop, right? Yeah. Yeah. For the Geeks, they all know what that is. Two, 100% connected consumer. Yes. That is radical transformation. Yeah. You know, zero to 100 million users, I think they said in the radio days took like, you know, 50 years or something, and you know, with iPhone it was like less than three, right? And so on Instagram it was, you know, so you guys know these numbers, right? So the VP GM of the Data Products Group, great to have you on the queue. Thanks for your insight. If we could automate your insight as we were saying earlier, we would have no guests on the queue, but thanks for sharing the data. This is theCUBE, extracting the signal sharing with you right now. We are here live in Silicon Valley for Hadoop Summit 2015. We'll be right back after this short break.