 Live from Austin, Texas. Extracting the signal from the noise. It's theCUBE, covering Dell World 2015. Brought to you by Dell. Now your hosts, John Furrier and Dave Vellante. Hey, welcome back everyone. We are here live at Dell World 2015. This is theCUBE, SiliconANGLES flagship program. We go out to the events and extract the signal from the noise. I'm John Furrier, founder of SiliconANGEL, my co-host Dave Vellante, founder of wikibon.com. Chief Analyst at wikibon. Our next guest is Armando Acosta, product manager with Hadoop Grot Group at Dell. Welcome back to theCUBE, good to see you. Thank you, thank you for having me. Obviously big data, big part of it. Not a lot of messaging in the overall keynote with some holistic, big picture messaging around changing the world, obviously data fabric, but no specifics on kind of where the data layer is. It kind of depends on the workload and application. But one thing is for sure, certain. We learned at Big Data NYC, the event in New York, Hadoop is changing. Hadoop is not going away. It will be a nice batch storage, huge data lake, data reservoir. Whatever market you're going to use this week, right? I like the word data ocean better personally, but I don't think Hadoop fits that. I think streaming and machine learning fit into a different level, but Hadoop has been a big part and it's finally made it in my mind where when people say Hadoop's invisible and they're talking about Spark, I need memory and a lot of machine learning, then Hadoop's done his job. It's enabling a lot of opportunities. So share with us your view on that. I mean, obviously customers are storing all the stuff and all their data they got put on drives, but Hadoop's been a big part of it. Yeah, I mean, when you look at Hadoop, we really talk about it as being a big data journey because you got to take that first use case, you enable that first use case, you start to develop a second use case, you develop a skill set around that and you continue to grow. You mentioned Spark, I really think there's going to be both. You know, when you see, you know, you hear a lot of people like Spark's going to take out Hadoop, Hadoop, you know. But it's based on the use case. If you're going to do batch processing and you need to do data transformation jobs, I don't know a lot of customers that need to do real-time data transformation jobs, right? They're not going to put that type of Spark. They're going to continue to do that best process. Well, just from your standpoint, explain that because we've been kind of promoting the same thing. You got MAD Produce and HDFS. Right. And Spark is more in memory, HDFS has been a big part of that and that's not going away. Maybe MAD Produce is optimized for use cases. Right. But that's mutually exclusive from what Spark's doing. Now, there might be Spark only stuff, pipelining indifferently or queuing up through other software, but that's not taking away from Hadoop. Can you explain all this stuff? Yeah, I mean. Did I get that right? Yeah, no, no, you're right. I mean, you kind of look at what Cloud Air is doing, right? You know, Cloud Air is essentially the first one that came to the market with a viable first distribution of Hadoop. But when you look at what they're doing, not only are they doing the MAD Produce piece, but now when Spark came around, they say, okay, we will enable Spark within our distribution because they understand that it's based on different use case and different needs. So going back to your question, okay, where would you use Spark versus where do you batch? When we look at Spark, Spark's enabling you to get to that real-time aspect, right? So I want to be able to process data. I want to be able to analyze that data and I need to do it in a short window of time because this data's continually coming at me. But when we look at customers today, they're not setting up specific Spark clusters. What they'll do is essentially they'll spin up a JVM to do a Spark job. They'll do that analysis in Spark and then once they're done with that, they'll power that JVM down, they'll put the resources back into their Hadoop cluster and now that Hadoop, that Hadoop new node can be used now for MAD Produce. And the thing with Spark is, you know, you have the specific use cases, but it's not going to be on all the time. At least we're not seeing that right now. Talk about the use cases or reference architecture. ETL offload has been a big part of it. You guys have done some deals with Vadera, Syncsort, a handful of others, but that really brings up the question of, you know, what is the reference architecture that you guys see deploying more into production now? I mean, the POC market has been booming for two, three years. We're seeing a lot of proof of concepts. Got that. Now production is the new trend in big data. What reference architectures or patterns have you seen? I think really where we see the pattern is that customers want to get to the optimized configuration right off the bat. If you know about Hadoop, you take it out of the box, it's not optimized. You're going to have to, you know, pull some tweaks. You're going to have to move some levers and essentially you move this lever, it's going to affect this lever. So you're going to- By the way, the people that do those tweaks are very expensive and hard to find developers. Exactly. And so what we're really trying to do is when you look at our reference architectures, we're really trying to simplify the time to value. A lot of times what we hear is, I started this big data project. I realized I didn't have the Hadoop skill set. I didn't realize I had to learn Java. Oh, then I had to learn this other tool called Hive if I wanted to query it. Oh, if I wanted to pull data over to somewhere else, I have to use this tool called Scoop. And the next thing you know is, okay, I've got all the stuff in front of me that I don't know what to do with. And so our goal there is really just to solve that skills gap, right? And we want to be able to say, we don't want you to spend your time on designing, architecting it, building it, configuring it, testing it. Let Dell do that hard work for you. And what we want you to focus on is your data and your use case because that's where the value really comes from. And I think that's where we're going to help customers is getting up faster and getting up quickly and not spending all that up front time just trying to get there too. Okay, now I can actually start analyzing data. No, let's get you there faster because that's where the value's at. The value's not in setting up and designing and architecting it. The value's into getting to that business result and getting to that analysis. Yeah, I wonder if we could talk about that a little bit. I mean, our surveys show that when you ask IT people, what degree was your Hadoop, big data project success, the overwhelming percentage say success. When you ask the business people the same question, it's like, nah, not so much. We haven't really realized the ROI. So we identified four basic trends that we saw at Strata, Hadoop World and our big data NYC event last month. I don't know if you were down in New York City. Oh, there you go. Get you up. So you had, then you saw this. You had data in motion, you had streaming and that's where the sort of spark conversation comes in. You had Cloudera announce Kudu. So you had sort of a storage discussion and does that replace HDFS and HBase? You had this big thing that you were just talking about about complexity and as a friction in adoption. And the last piece is value, like where's the value coming from? So you touched on it sort of a little bit there with the complexity piece, but did you see those similar trends? What's Dell's perspective on sort of those big picture mega trends that are going on in big data? Well, I mean, for us, you know, I'm the Hadoop guy, so I know Hadoop. But if you look at what Dell has been doing over the last, you know, since 2011, we released our first reference architecture with Hadoop to where we're at today. You know, last year we purchased a company called Statistica, if you're familiar with the stats off. So now we have modeled algorithms. Before that we acquired a company called Boomi to help you out with the data integration piece and then we acquired a company called Quest a couple of years ago for the data management piece. So you look at what Dell's trying to do, we're actually really trying to build that end-to-end solution because what we've heard from our customers is saying, okay, five years ago I would have taken on the effort to build this myself because I thought I knew it better than you did and I knew my infrastructure, I knew my environment, I knew my tools and I'm the glue, right? Well, you fast forward to where we're at today. Customers don't have that luxury anymore, right? They don't have the luxury to figure it out on their own because they're trying to keep the lights on. Oh, but all the same time they have to be innovative at the same time to essentially keep the lights on, right? So our goal here is we want to build those end-to-end solutions and I was on stage earlier and really what I try to do is democratize it, right? Business and analytics and Hadoop shouldn't be just for the Web 2.0 companies on the West Coast, it should be for everybody and so in order for us to do that for customers, we're trying to simplify your data management tools, we're trying to simplify your data integration tools, we're trying to simplify the way you bring models and algorithms to market, where you don't have a data scientist starting from scratch and let me try this algorithm out 10 different ways. No, we already have some algorithms, we have some libraries for you. If you want to do fraud analysis, we already have some models built around that, whether you're doing insurance fraud or you're doing credit fraud and what we can do is bring the tables, bring that expertise and get you that last mile much faster than do it by yourself and trying to build it on, figure it out on your own. So I'm glad you brought up that sort of end-to-end concept because that addresses the simplification piece and it leads me to the question on cloud so in our survey 75% of the Hadoop users that we talked to were doing some conform of public cloud and interestingly it was Google and Azure were the two most popular, Amazon was kind of third, all three were doing very well but those two were leading. What all three companies are trying to do is build out that end-to-end sort of big data analytics capability and then offer it up as a service. So you're basically talking about delivering that same end-to-end capability, where's the cloud fit in to Dell strategy? For us, I mean, if you've heard what we've been saying, we really believe in the hybrid cloud. For us, if you're doing business intelligence and that's the core piece of your data and you value that data, you're not going to put that on the public cloud. Now you want the ability to do cloud because it gives you the way to deliver this in a different way, self-service, optimization, pool of resources, not dedicated resources to one application, but when you look at hybrid cloud, I think that's where you're going to see the focus, where you're going to have that data you want to keep close to the chest and you're going to keep that in the hybrid cloud. But in some parts, you want to leverage some of Amazon's scalability. You want to leverage some of Microsoft's scalability. So when you need that scalability, that's when you go to the hybrid approach and then you go to that public cloud for that scaling and for that bursting that you might need at that given time. But still, you're going to want to keep a lot of that business intelligence data when we talk to customers, they're like, yeah, I'm never going to put that in a public cloud. A private cloud, hybrid cloud, yes, because I control that, but once you get into the public arena, there's some piece of data that you just can't afford to have out there. So what do they tell you? Why don't they want to put it in? You said they don't want to put that business intelligence into the public cloud. I mean, there are some examples of people doing that, but from your perspective, why are they not wanting to put their data in the public cloud? Well, in the world we live in today, so anytime anybody loses a piece of data, it's all out there in the media. You have the thing that happened with a big retailer in the past and it's just when you lose somebody's data, it's bad news and it's really hard to recover that and it's really hard to gain trust back from your customers once you do that. And so I think that scares a lot of customers into okay, where do I put data and is it really secure? And I'm not saying that stuff's not secure in Google and Microsoft because they've done their work, you've seen their clouds, you know what they do. But in the end, you're still responsible for that piece of data and you're still responsible. That piece of data is a customer piece of data and it has some important aspects about their life, whether it's a social security number and address or somebody that can take that and essentially do something else and rip off that user. You're going to be careful about where you put that. So Amazon will trot out companies like NASDAQ, we interviewed Fenre last week on theCUBE, is what you're saying is, yeah, that's true. You'll see some corner cases, but the vast majority of people are going to do that stuff on-prem. Is that what I'm hearing? Yeah, exactly, yeah. Okay, all right, so where do you see that going over time? How should we think about the journey that customers run? You said earlier at the top of the segment that this is a journey, so where are we in that journey? And then where is Dell taking customers? I mean, I really think it depends on the customer you talk to and where they're at. But I've been doing this Hadoop thing for about four years now. This year I thought we would be talking more about, okay, let's get into the analytics piece of it. We're already done with the Hadoop. Hadoop's been simplified. You got the easy GUI, you got the easy deployment. You get up and running quickly. And unfortunately, just that skill set hasn't developed and you look at that evolution of that customer and that expertise and the tools around it, they're getting there. They're just not quite there ready yet. So when you talk about the journey, it's really about, okay, Mr. Customer, what's your data? What value do you put on that data? And really, what type of use case do you want to do? So when we talk about use cases, we put them in two big buckets. We talk about operational efficiency and then we talk about transform the business. And when we talk about operational efficiency, that's the first case we start out with because what we've noticed with customers is that big data has become this ubiquitous marketing term and if you do big data, you do Hadoop, it's like the fairy dust, you sprinkle it in, it solves all the world's problems. Well, that's not really the case. You've got to have some expertise around it. You've got to have some tools to make it work. And so we look at operational efficiency use case and you mentioned earlier what we're doing with ETL offload. We're looking at that first use case where customers can get value but without having to cause huge turn to their environment, without having to cause new skill sets to be learned, without having to really, you know, essentially my environment's going to have to tweak, my skill sets going to have to tweak and I'm going to have to find different people to run this. That's not good for customers and that's not what we want. So when you look at that first use case and we look at operational efficiency, we went and partnered with Cloud Era and SyncSort to give you that first use case that's easy to integrate into your environment without causing all that turn. So what we looked at is, okay, today in the Enterprise Data Warehouse when we talked to customers today is, you know, Enterprise Data Warehouses are at capacity or underperforming, you know, we don't find a lot of customers say, I love my Enterprise Data Warehouse and then everything's going smoothly. So what we did is we worked and talked with them and really understand, okay, what's that workload within the Enterprise Data Warehouse that's really consuming the most resources and consuming all your performance? And what we realized is data transformation jobs. And so data transformation job, what you're doing is just bringing in different data sources together, different tables, whether you're joining those tables, you're aggregating those tables or you're grouping by a different key and creating a new table, right? Well, that's all in good, but all that hard work is just getting that piece of data to a good ready state so that you can actually do something with it, right? So why do all that hard work in a very expensive tool that you built for business reporting and querying offload that work into Hadoop, let Hadoop do that work because it's massively parallel, you have a cluster of servers doing that work, you get it done faster. But the other thing we brought to the table and the reason we chose the ETL tool in Syncsort, Syncsort has done all that integration. So Syncsort can actually take a SQL script, 10,000 lines of SQL code, they can take that SQL code, they put it into a tool called silt and silt will actually translate that code for you into a native MapReduce job. So the benefit here, like I said earlier, is somebody says I want to do Hadoop but I want to do this Java thing, I don't want to do this MapReduce thing, I don't want to do this pig hide thing, I don't want to mess with those things, I just want to be able to get more efficient and reduce cost. Well, here's a way we can introduce Hadoop into the conversation, it augments your traditional tools but at the same time, you don't have to worry about the skills gap because we saw that, you don't have to learn code, all that good stuff, Dell is simplifying all that for you so that you can get to that value faster and get to use it. That's the key term, value, time to value. That is a key point. But George just wrote that piece on Wikibon about the Hadoop complexity and time to value was the main theme there. Totally, and the thing is the IT guys are still optimistic because they know the value that they're going to extract. So it's almost like the light at the end of the tunnel and they can just touch it. It's like, but it really is a skills gap thing. This comes back more and more to where I think you guys have a lot of value. And so any other anecdotal learnings that you can share with the audience from Hadoop Summit or Hadoop World, Big Data NYC were just down there. Strata Hadoop, we had our big event, Big Data NYC, what would you say is the big takeaway from then to now? Besides the fact that Dell bought EMC. I mean, the fact for me is that you do have more users using it. We do have more customers coming to talk us about Hadoop. Three years ago, I would tell you, I'm talking to a lot of customers on the West Coast, on the East Coast, but I'm not really talking to anybody in the middle. That has changed. Now when I go and talk to customers about Hadoop, in Austin, we have an executive briefing center. We bring customers in and essentially we say, okay, here's what we got. We can show you what we can do. And how many production versus POCs are out there? Percentage-wise, if you could break it down. I mean, for us, we're actually seeing an uptake in production systems. Like you said, POC, two to three years ago, I tell you, I have a lot more POCs than I actually have full production customers. But I can tell you today, Dell, as far as Hadoop specifically, I have 250 plus customers that are doing Hadoop with Dell that are in production. So that makes me feel a lot better where it was three years ago, where I can tell you it was less than 50, right? So you do see that jumping. You do see more customers using it. And I think what's more unique too is that you actually see coming together big data analytics and everybody's getting it. Okay, it just can't be Hadoop. You have Hadoop as a foundational layer where you're storing that data, but eventually you're still going to need the data management tools, you're still going to need the data integration tools, you're still going to need the models and algorithms to make sense of all this and actually get to the value. So our goal here is to bring that all together for customers. And I think they're starting to grasp that. I think they're starting to understand that more. I think now what we just need to do is help them get there faster, get help them get there quicker. That's what we're trying to do by trying to streamline those skill sets, not trying to have to go full bore into Hadoop. Let's do some of that work for you. And Armando, just share for the folks real quickly, who's here in your ecosystem? Obviously a lot of noise, I should say signal and noise, but mainly the EMC thing's taking a lot of the oxygen out of the room. But from a blocking and tackling standpoint, Kaisar, the CEO of Cloud Air, Roman the Hallways, other partners, who's here? I mean, for us, when you look at our major partner as far as Cloud Era, with Hadoop, it's Cloud Era. But within that, you still have, not only SQL vendors out there, like DataStacks, MongoDB, we feel like those are a lot of important tools. And then not only that, what I'm liking and what I saw at Hadoop World last week is you're starting to see the focus on trying to classify that data, understanding that data, putting that data in the right hands of the right, people at the right time so they can get to that insight. So for me, you're starting to see the connection where, oh, now I'm putting the pieces together and oh, now I know how to build this into the solution and now I know who I need to go talk to in order to build that. Armando, thanks so much for coming on theCUBE. Really appreciate you taking the time, sharing your insights. Really appreciate it. You're watching theCUBE here at Dell World. We'll be right back with more. Day one, coming to a close with a wrap up and a few more segments would be here all day tomorrow. We'll be right back with more live here in Austin, Texas. It's theCUBE.