 Live from New York, it's theCUBE covering Big Data NYC 2015. Brought to you by Hortonworks, IBM, EMC, and Pivotal. Now your host, John Furrier at George Gilbert. Okay, welcome back everyone, hello, and welcome to day two coverage of Big Data NYC, part of conjunction with Strata Hadoop. This is theCUBE's SiliconANGLES flagship program. We go out to the events and expect to see the noise. Again, day two of three days of wall-to-wall coverage, covering everything going on in New York City around Big Data from whether it's ad week, conference on the ad tech side, happening across town, but mainly Strata and Hadoop here. We are one block from the Javits Center. We are here with our studio, kind of like at today's show, we're at the ground. I'm John Furrier, the founder of SiliconANGLES, I'm John McCos, George Gilbert, kicking off day two, and I'm here, Jeff Vies, VP marketing of Big Data Group at HP, welcome back to theCUBE, great to see you. To be back, so HP software, the Big Data Group, I just say Big Data Group, but it's really HP software now, encompasses all of the software involved across a portfolio of Big Data, which is not only just Hadoop, there's a lot of other things, given that your customer base is pretty large and comprehensive, so Hadoop has evolved. I was seeing clearly, even Michaels invalidating some of the things we were saying yesterday on stage this morning about he predicted Hadoop would be invisible, I think that's our word, but he basically was saying the same thing. The abstraction on top of Hadoop is really defining the ecosystem, and the ecosystem is not just Hadoop, it's Spark, there's Lego block elements that wrap around Hadoop and there's a lot of value being created. And then also yesterday, Gartner's, I don't know if Murray Adrien pointed out, the data is clear, there's a ton of POCs just now seeing some bump up in production, which meaning, hey the adoption of production hasn't been there in the past few years as it should have been, but certainly the POCs are on the rise, that yields the data saying, hey it's real, it's certainly relevant, but there's a lot of other things going on. So the Hadoop ecosystem, certainly relevant, growing, but now it's prime time, the rubber has to hit the road. So I got to ask you, do we smell the rubber? Do we smell the rubber? No, what are you seeing, I mean HP, you guys are real, I mean you guys have huge customer base, you have other enterprise products, you're talking to customers, do you see the rubber hitting the road, is it prime time now for Hadoop, and what do the customers look for as they think about making these decisions to roll out Hadoop and its ecosystem elements in production? Yeah, no I think it's a fair question, and I think it's a little bit of back to the future, is my view, I was just over in the Javits Hall, you feel the energy, you see the size of the audience, there's something going on, it certainly ties back to when Linux was rising, and getting a lot of interest. The two things I'm seeing is that now people are talking a lot more about the business application of it, not just the technology, and second, they're also talking about what is it going to take to get this truly in production? They're being very careful to say when we're putting new betas out, that these are betas, we'd love you to participate, but don't deploy this yet, don't put this into production. And I think we're going to have kind of two balls in the air for the foreseeable future, where you have this innovation machine that I would argue is actually accelerating, Kafka, Drill, pick your favorite Apache project, and I don't see that necessarily slowing down. At the same time, there's a new kind of critical mass of users and conversations going on, and that is, is it ready for prime time? You almost should have two conferences here, kind of the Wild West, what's new coming, and let's let the innovation go, and then the other people that are here that I think that are saying, is this ready to run a system of record production grade capability? And we see that same from our customer, and the bottom line is I think both are emerging, but now people are trying to get a crystal ball on both, and that's where a lot of the conversations are today. That's a great point about this conference idea of having two conferences, that's why we run Big Data NYC, shameless plug for SiliconANGLE, Wikibon, but seriously, we're seeing the same thing, and I want to drill down on that. You got the developers, you got the data science doing some hardcore tire kicking, pushing the envelope, doing the futuristic thing, kind of pushing that bleeding edge, right, and the lunatic fringe meets, I got to innovate, I got to invent and innovate. Now roll back to Main Street IT. Big checks are being written right now around architectural decisions, around hybrid cloud, I got some Amazon, Shadow IT, I have big data analysts I'm overpaying for potentially in older data warehouses. At the end of the day I got mobile apps, and so the main guys implementing the IT guys and or enterprises, they have real decisions to make. This isn't like spinning in the weeds, this is real money. So what is- Well, real careers and real companies' performance, and so you start to see interesting questions come up when people are looking at taking a critical, mission-critical system that they're running. It may be in their financial area, it may be in their marketing area, it may be in supply chain, and the interesting thing is, it's very much a back to the future when you talk about what is their number one app, and it's SQL, relational databases. That is where a large amount of the focus that I see is both in the building and with our customer bases, and it's for reasons that people are looking at, which is they want to be able to interrogate their data, do it at high performance, do it at scale with high levels of concurrency, all of course in that Hadoop ecosystem, and they want to do it without taking one little step back. And that's the interesting discussions we have today, the interesting benchmarks that we're doing, which actually is the reason we entered into swimming in this data lake, because we think here is where the companies that don't have a year, two or three, but maybe four or five that understand how to do high performance SQL column or databases, and bring that in and meld that into the Hadoop ecosystem. You know, I got to say, one of the benefits of doing the CUBE over the past five years, it's our sixth Hadoop world, or Shroud Hadoop since they took it over, but really the reality is, we get to see people's messages over the years. I got to give HP props, and I also do the same for IBM in some of the areas where they've been on message and they've executed. You guys actually had column or store first, kick-ass column or store, and you had huge customer bases, ones that you can't talk about, I know, on the air here, but there are ones that are demanding. You've also had the SQL access method, both validated yesterday here at Shroud Hadoop, which is, okay, access method is storage column or store, and two, this notion of SQL being a standard access lingua franca of databases. That is now validated, 100%. You guys were there for years, so I got to ask you that, that being said, there are now more variety of use cases for the enterprises, for that Vertica, for that data management stuff that you guys have. How are you building around that performance of SQL and the column or store Vertica? What's wrapping around it? What are the use cases that you're seeing that you're building on? I mean, obviously, Amazon's next week, re-invent will be there. People still using Amazon, they got the public cloud. What are the areas, use cases that you guys are building on? Well, in one sense, it's very simple for us. HP, really, the whole company, and now the Hewlett-Packard Enterprise company, post our split, which is happening in 30 days. So we'll be embracing that green logo and out with a blue, I guess, on our side. But seriously, what we'll be looking to do is provide choice. If I had to give you one word, it's about choice. It is not about a singular way to go and do your data analytics. For us, choice means building on our bedrock product, our flagship product, which is Vertica, the highest performance column or store out there, Twitter's running on it, Facebook is running on it, AT&T is running on it, Cernar is running on it. We're taking that. That's the classic Vertica Enterprise product. That's where our roots are. Yeah, okay, got it. That's where you've been winning for years. So we take that no single leader, no technology, which means no point of failure, which very few people have. We take that and now we offer it and if you will, kind of a Baskin Robbins approach will give you the flavor you want. And let me lay out those flavors for you. So we have the Enterprise Edition, which has an optimized file system, the most scalable, highest performance. That is what we bulletproof test on and that's our flagship. But in addition to that, we now have taken that technology and run it directly and natively on Hadoop. All distros, MapR, Cloudera, Hortonworks. And in fact, and it's what we talked about in August, we work directly with Hortonworks as a proof point to optimize around ORC, to give the fastest performance on any ORC file system for SQL. And we still make that claim today. But we don't stop there. So it's the Ginsu Dives, there's more. And there's more because now we can run Vertica on demand, both on our Helion platform and on the Amazon platform. When I say on demand, it's not marketing on demand. It's real on demand. You take out a credit card, you pay by the terabyte or by the transactions, full it, asticity, no single leader node. So you have that same reliability that you'd get in the Enterprise performance. And if that's not enough, next week at Reinvent, which is Amazon Web Services show, I'm in a little place that's a little warmer than this, right? It's down in Vegas. We will release and demonstrate Vertica AMI with an Amazon machine image. So you can take that Enterprise license, deploy it on a traditional server. We'd love you to do that. HP makes a bunch of great servers. If you'd like to run to the Amazon cloud, you can do that as well. Now, why is that important? Because people want that hybrid mode. They want that ability to ramp up quickly, have an elastic way to be able to trial and pilot things. Sometimes they want to be able to sustain that on an Enterprise cloud. Sometimes they want to take that in-house. Sometimes they really want to push the performance and have that mission critical capability, not just in terms of its size of data, but look at concurrences and the ability to run dozens, if not hundreds of concurrent queries at the same time. That's where you see many of the other Hadoop offerings, they don't support the full ASCII feature set. You know, ASCII, their SQL has an ASCII standardized feature set. ANSI. ANSI, I misspoke, and on the ANSI feature set, we support all of it. We always have, we always will. You see subsets as small as 20 to 30%. That means that somebody taking their current queries that they've been running today on, let's say, Oracle, they want to come over into the Hadoop world. Now they have to regress. So basically, it's no compromise computing is what we want to offer. We're 100% in on Hadoop, 100% in on the Hadoop ecosystem, including Kafka and Spark, et cetera. But we want to do it in a way that you don't have to take that step back. And you also don't have to make that choice in deployment modes, because we think there's no going back there. And that's where you're going to see an interesting dance between the established companies that have been around and kind of earned their mark and the new entries that are coming in, embracing the new technologies, but trying to service a population of customers that are saying, look, I can't abdicate mission critical. I want to take the new innovation, but I want enterprise class. You brought a couple, or touched on a couple points, mentioning Kafka and Spark. More broadly speaking, are the use cases in the cloud different beyond just, I can put some test dev workloads up there. Are people sort of maybe streaming data in from external feeds, machine data? Do they want to use specialized analytic engines in conjunction with Vertica? What does that look like perhaps in the cloud that's different from on-prem? Well, it's a great question. And I think people are underestimating the fact that there has been so much focus on volume of data and a traditional kind of batch and load and find the needle in the haystack. What I'm starting to see, and I think this is not niche is data in motion. And I'm seeing customers that I would not have seen asking about it even a year ago saying I want to take small batches of data in motion, do analytics on the fly and take action. This could be in commerce, it could be in financial services, it could be supply chain. You're seeing people looking for these early indicators of trends on one side. The other thing is to be able to do high performance service of individual transactions. So I want to be able to analyze a single customer, make a decision, and before they leave my website or the engagement I have with them, I want to be able to make a data-driven decision in terms of a real-time promotional offer, maybe take an action to prevent a customer satisfaction issue, maybe respond to one. So that ability to take mainstream what we would have seen before as very niche, unique, maybe gamers do it, maybe financial service guys do it. Now you're seeing- A cross-seller in a commerce, yeah. Yeah, you're seeing people talking milliseconds that we're talking hours and days before. And that one, I think the industry isn't waking up too fast enough that that is a need that's coming. Does mobile have something to do with it? Absolutely. Does the fact that people want to be able to respond during the business transaction and not in passive analytics, I think, is driving that as well? So I think a year from now, we're not just going to talk about esoteric things like Kafka, we're going to talk about the business use cases people use them in. Jeff, I got to get your take on this because you're also in marketing, so you look at the big picture, look at the landscape. Let's talk about the ecosystem of Hadoop. And I want you to look at it from the lens of the customer, right? From the CIO, CEO, CFO, Chief Compliance Officer, Chief Data, whatever that C level CXO is. And look at down from the balcony here on this stage here at Big Data NYC in Strata and Hadoop and say to yourself, okay, bottom line, what is going on beyond Hadoop? I get Hadoop, I can store a bunch of stuff in a big pile and now I got technology like Vertica that goes super fast like it's Facebook and everybody else. Great, I need more, right? They want the more question answered. What are they looking for? What do you see the needs on right now in terms of the rubber hitting the road? Specifically, what is that conversation? Well, I would say there's two parts to it. One, I don't see any let up in self-service and getting this data accessible at the edge quickly and frictionlessly. And there's a lot of different things that are pushing it but I think there's no going back. Nobody is waiting for the next terror data ish data warehouse to be built over six to seven months. Those days are gone. So we're seeing a flip. There's still data warehouse modernization projects and there'll be lots of those going. But what I'm starting to see is what I'll call new generation analytical applications. Looking to do fundamentally new things that previous traditional data warehouses weren't. New functionality or new price points are both. Oh, absolutely both. And then they feed on each other. The fact that we can do it on pennies on the dollar now allows you to consider these. But what's ultimately driving it is that my business can be affected differently. So I'm not going to run the same report just cheaper. That helps IT. There'll always be a push to be able to economize that. But where I'm seeing the excitement where our new big deals coming is where people are saying I want to take that level of analytics into places that normally wouldn't have gotten it before. And that is a switch away from system of record kind of optimization to new forms of analytics that are occurring. So you have the systems of intelligence there. So the roadmap we see is that validates what you're saying. I read it every night. I know you have a hard time going to sleep. That's great data. Systems of record, I get that. That's the database. Systems of engagement and or interactions. That's social. That's the real time, mobile, whatever you want to call it. Now the intelligence comes into more algorithmic things. So the conversation we were having on theCUBE yesterday, I want to get your thoughts on this, is mobile and or these new data warehouse models with the speed of say SQL and the dup allow new use cases. They see things faster. So we call that the speed of business. So when I see a site I can click on stuff, do things, self service, spin up composite data models. And that's forcing the speed of the human mind. I can click and do new things. That's kind of the engagement piece. Now machines are much faster. They're like sub nanosecond responses, right? The humans run in a milliseconds maybe, right? Now machine learning is a big deal, right? So this opens up the notion of what is under the hood for systems of intelligence. You guys have a lot of stuff going on around machine learning and around intelligence. So I got access to the data, but when I'm in real time, I need the machines to do the work. I need to have the machines be more intelligent. Could you comment on how you see that piece of the market? That's evolving very fast, that conversation. Well, it's been pulled in a lot of ways. And we just, with our recent excavator announcement, we really beefed up our machine learning, our search capabilities in real time analytics, because people were demanding that. People basically wanted to be able to have a big data database that could do as good or better as a niche log file analysis offering that you might go out and get for something else. It's being driven by, I'd say, three areas. One is governance. I wouldn't underestimate how people are rediscovering how important data governance is as they're opening up their data. And the machine learning, the log file analysis is absolutely critical to that. It's a bedrock of it. Number two, I would say what they're looking for is that early indication of breaking trend analysis. People want, I don't want to call it predictive, but people want to be able to ingest that data and early on be able to see how can I analyze within my application, within my algorithm, what is occurring. And a lot of that feedstock is going to be through your log analytics. So you see that as a viable, this machine getting smarter as obvious, right? Pretty obvious trend. How deployed is that in the customer's mind? Operationally, it's speaking, separate R&D kind of futuristic stuff to operational. And you might have to, if you had a Pagan inning or status of customer adoption, true machine learning, true machine scale, true collective intelligence, is it like not even the game started yet? Is it first inning? I mean, it seems early to us, what's your take? Operationally, are they operationalizing some of these things, customers? Well, I think you're going to see it do two paths. Some companies are going to be able to grab the levers directly and build that out. I think that's going to be for about a quarter of the fast moving companies out there. Think for majority, and this is just my viewpoint, you're actually going to see that learning capability be pulled into more traditional life cycle management tools, analytic tools, business applications. And we're seeing that internally inside of HP. Our technologies are being pulled into the development tools, the business operation tools that we're offering, even the digital marketing tools. So machine learning is going to be happening under the covers of a lot of applications and services that people buy without them necessarily having to worry about it. I think that is going to be the biggest path of consumption that's going to be occurring. And that is going to be something, one reason we're focusing on on demand so strategically because it's going to shift to a consumption of API and services. We've been bullish on it since day one. We have over 70 big data APIs. They're all in public beta. All the users can go to havenondemand.com and use them. And that ability to consume APIs today and services tomorrow, we have vertical on demand right now, and we're about to bring out some more later in this year. That is how it's going to become much more palatable and make it easier so you don't have to have a PhD to take advantage of this machine learning. So you guys are doing a lot. So it's safe to say you're building on top of the vertical core engine. So that's the on demand, the Amazon piece. This is new stuff on top of it. And we've already, as we've announced, previously distributed R, which takes across the street. A lot of people talking up R. We're one of the first to take an open source, in fact, project, distributed R, that allows you to use all your R tooling. And as an option, you can hook it right up to use the vertical engine for pre-coding and in database scoring and get about a 5X performance improvement. So it's about taking that vertical core, which is absolutely the center of our big data efforts, and now build out and around that. So let's put a plug in. You've got a session on Thursday, that's tomorrow at 1.15 Eastern, here at the Javits at Strata Hadoop at 1.15, with one of your customers yellow pages. And now before you get into discussing what you're going to talk about, I want you to add some color to this notion that was kicked around a few years ago, called the data hub. And Paula was really the poster child for that. But we've seen other approaches. I want to hoard the data. I want to provide value. At that time, it looked like a competitive strategy. Hey, let's become the hub. Let's get the data in there. We'll provide some great tooling and provide real time and all the benefits there. That kind of was squashed by Sequel. So Sequel, again, validated this week, 100% by the community, that is the access method, lingua franca, there are other methods, but that's the primary method. That's what you guys have pioneered. So you're going to do a benchmark with Sequel on Hadoop, with Vertica, against the data hub. So can you talk about what's going to be in that session and what are you talking about? Well, the most vibrant markets are co-opetition, right? Where you get a network effect of companies working together. We've seen this again and again. You don't want to be in a market of one if you're in high tech. That's not a good place to be. In my observation, talk to Blackberry, for example. So this is clearly a community effort and we endorse it left and right and we partner with all the distros. That being said, what our aim is just to add value where we can bring unique value that is wanted by the mainstream computing consuming companies. And one thing it was a no-brainer was to take our Sequel capability and bring that to Hadoop. And what you're going to see is not us, because obviously every vendor's bullish on your own stuff. Okay, if you're not, then there's a real problem. You're going to see one of our marquee customers that has one of the larger deployments, Yellow Pages, one of the largest online commerce companies in the United States, if not the world. And he did, Bill Thyssen ran a benchmark, a very public one, that in this case, Cloudera and HP were invited to participate. It was not for the weak at heart because he reported out how everybody was doing and you got the real deal. What he's going to report out is the drastic difference he saw between running our Sequel for Hadoop on top of a Cloudera deployment versus in this case, Impala. And what was the results? Can you share? Well, I'd rather him speak for himself, but I probably wouldn't be expousing about it. So it shows that you have a session you probably won. It's safe to say you won. We were very content with the outcome at this point. Hey, it's a leapfrog game, you know, and it's real hard to stay on top, but what he did was run real world, not just for throughput performance, looked at concerns, he looked at uptime. There were multiple iterations where the vendors were able to come in, support, adjust. So those anomalies were weaked out and this ran over several months. And he told everybody, if you participate, I'm going to make this public. And we said, we're good with that. And if you check out the 115 session, I'll let him speak for himself, but we're really, really proud because I think this gives us a proof point to say in, well, it's October. October, 2015, we have the best performing sequel for Hadoop on the planet. Jeff, final comment, we're on break here on time. So what's the bumper sticker for this event for the practitioners out there, the CXOs? What is the bumper sticker for this show this week in Big Data Week? I would say now you're seeing Hadoop enter the mainstream. And I think as great as the innovation is here, what I'm sensing over there is example after example of companies that are now starting to bring Hadoop in to real world production. If you want to say we're moving through the hype into the hardcore challenges and upside of deployments, that's what I feel that's in the air. I think it is crystal clear this is here to stay. It's also crystal clear that this is a melding. This is not a war. This is not a one set of companies that goes away the new set comes. What you're starting to see is some of the most established names over there, my friends at Oracle, my friends at Teradata, my friends at IBM, my friends at Dell, and then companies that you've never heard of before coming together. And that tells me that you're bringing a maturation and a critical mass to it. And there's demand too because the conversations change. It's about profit, which means you have good solutions and outcomes, not speeds and feeds. To me that's I think the big takeaway down at that point is great stuff. So HP here inside theCUBE talking big data, great success with Verti, again the foundation. They've been right for many years, validated here this week, 100% on the columnist store in the sequel on Hadoop. Congratulations, congratulations to the Vertica team and HP software. We'll be right back with more after this short break. theCUBE live in New York City for big data, NYC, Strata and Hadoop coverage. We'll be right back. Live from New York, it's theCUBE covering big data, NYC 20.