 It's the Cube, covering HPE Big Data Conference 2016. Now, here are your hosts, Dave Vellante and Paul Gillan. Welcome back to Boston everybody. This is day two of the Hewlett Packard Enterprise Big Data Conference. This is the Cube, the worldwide leader in live tech coverage. We're here with Chris Selland, who's the Vice President of Business Development at Hewlett Packard Enterprise Big Data Platform. Chris, year four, great to see you again. Yes, it's always a pleasure. It's great to see you guys again here too. You've been with us from the beginning and we really appreciate that. You're a core part of this event. Well, thanks for having us. I mean, you were instrumental in launching this event and I see you've dramatically upped your outside speakers. You know, we started, I was one of them in the first year and you got to it. Well, yeah, but you know, Billy Bean was pretty good. Billy Bean was great, actually. That was the first year. The money ball guy. Yeah, but really, you had a great lineup throughout the years. Some terrific content today. We heard from folks from Uber. You had a great customer panel. I mean, yesterday Phil Black was fantastic. Yes. And so, yeah, it's all good. I mean, but you've preserved the roots of the conference, the customer and the practitioner roots. Well, honestly, and as I just said on stage, four years ago, a little over four years ago, Colin and I were walking through the hallway and we were talking about all the cool things our customers doing. You know, big data was becoming a hot topic, but it was really about the things that our customers were actually doing with our products and doing with the data. And we said, you know, we really need to get the customers talking to each other. So let's have a user group meeting. And honestly, the genesis of this event over four years was this was going to be basically the Vertica user group. And then it started kind of gathering loss and part of it was, you know, big data was a hot topic and talked to some of the folks at HB Corporate Marketing. And they said, oh, you're running a conference. I'm like, okay, it's a conference. And then, you know, it kind of went from there. But the core of it is and remains and always has been what are the customers doing? What are they, you know, and getting them communicating and working with each other. And of course we've got our partner ecosystem here as well. We've got, you know, folks like yourself, we've extended, we didn't do press and analysts, except for you guys the first year, we have all of that also. So it's really grown, but it's been great to see what it's become. Well, if you think about the really good conferences in the business that they retain those room, I mean, VMware, certainly EMC world, conferences like the, you know, the service now user conference, the heart of it is user driven content. And that much of what you see here, I was joking, not joking, commenting yesterday in the food line, somebody was talking about the problem you had with that algorithm is you needed to, you know, change the way in which you applied it. And I turned around and said, well, can you explain that in lay person's terms? And that's the kind of conversations that you're hearing in the hallways, right? Absolutely, absolutely. And it's been all along. You know, year one, we had, we actually had a reception with some customers in Billy Bean. I remember Billy Bean and Chris Weggersen who was the chief data guy from the Obama organization. We're geeking out about data talking and I mean talking for hours. And it's funny too, we're actually talking about Premier League Soccer, which is something that I'm really interested in. They weren't even talking baseball, they weren't talking politics, but just, you know, the cool conversations customers and others are having with each other and with us, obviously we participate, but we try to stay out of the way too, because this is not about marketing fluff. This is really about, you know, what's really going on out there and what people are doing. You know, I've been saying for a long time, I think I'm sure I've said it on theCUBE, I've been on so many times, big data has been kind of crossing the chasm. It's gone from a technology topic. You actually don't hear the term big data as much anymore. Now it's more about solutions and applications of what companies are doing with it. And it doesn't mean IT is not in the room, there's still very much in the room, but it's so much become a business topic now and you know, the term's changing, the drivers are changing and we're seeing that. We've been seeing that over the years. And this morning's customer panel was really illuminating. I mean you had some great examples of how big data can be actually misused and what you have to get at is the why. Steve and John had best quote of the morning. They said it's easy to get the what, it's hard to get the why. And you had the example of Mercedes dealers, Mercedes dealer having a big local campaign, advertising campaign, not realizing that a lot of its customers are actually flying in to that dealer because they did something that no other dealer did and they had no marketing that was aimed at those people. And they weren't marketing that, but the customers were also finding it on their own and realizing that, I think it was a dealer, that was Paul Harrison from Simplify was talking about, the dealer in Washington DC had a certain type of car that other dealerships didn't have. They weren't talking about it, they weren't advertising it, but customers were responding to campaign based on information that they had found on their own. So, that's what we have to remember too. None of this happens in a vacuum, right? It's that we're doing things, we're looking at data, but it's the, what is it, the Heisenberg principle, that as you observe, it changes. And customers are aware and participating as well. So, that example of somebody flying from Fort Worth to DC to look at the Cardi, you asked why, because that's what you want to know to the point of the panel. Yeah, and there was also a great conversation about Brexit today, and in another case, here we have a current event where we thought we knew everything, right? All the polls had it nailed and they got it wrong and they got it wrong because they made assumptions. And I think it was Eric Dixon from Blue Labs was talking about, you have to ask the questions about where that data came from. How are you collected? What questions did you ask? We ignore that kind of stuff. Well, do we know? I mean, he was saying, you got to look at biases in the data, it's garbage in, garbage out, is that different than when we were doing SPSS in college? Right, well, Brexit was a good example where, and there's some suggesting that even this election season too, right? I know we're not talking politics, but that what the polls are saying, we've got lakes and lakes and lakes full of polling data, but yet more of it is it telling us the right answer. And we're still, you know, not sure. I mean, clearly in Brexit, the polls were wrong. The polls are out and out wrong. There were plenty of polls, talked to plenty of people. They weren't right. So, but yet some got it right. I think there was some talk about a hedge fund nailed it. But they weren't necessarily sharing that data publicly. But it's true, you woke up the next morning and you were like, whoa, that's not what I expected. It's not what they're reporting on CNBC. It's not what any of the major news outlets report. As you mentioned, the hedge fund, as Eric said, some of the hedge funds, the private equity funds did know what was going to happen. Of course, they weren't telling anybody because there was a lot of money to be made in having that answer. You know, why were they thinking of things differently? And I think of all the research that is out there online right now that we just take it face value, we don't ask questions about how were these questions asked? What was the survey base? You know, who are these people? Right, right, exactly. Well, I think it goes to the theme of, you know, hedge funds aren't necessarily known for sharing their techniques in the public domain or putting them in the open source community, right? But, you know, it also goes more seriously to learning, you know, learning from the data. And, you know, Robert was talking this morning about why when we say machine learning, we like to say applied machine learning. Because it's, you know, what are we learning from it? But what are we also applying as people? There's so much in the news right now, for instance, about, you know, robots and you can apply that to big data, you're going to replace all these jobs and there's going to be no jobs because it's all going to be taken over from machines. But that's not necessarily true. It's that we can use what we learn from the data from machine. Believe me, we're not advocates of throwing away data. We do think more data is good. At the same time, just because you have a data lake or many data lakes full of more data, that doesn't necessarily imply you've got the answer. The answer is something that comes along with human participation, learning, you know, refining, modeling, and again, so much what we're doing, also, you know, building modeling capabilities into our products right now. Again, it's customer driven, right? It's what customers told us they wanted to do. It's not just about more data, faster data, but it's about running models on that data and seeing what affects what, you know, what impacts what. So that's really where the learning comes in. And people are still very much involved in that. Trial and error. As you're talking, one of your panelists is talking today about A-B testing and how much A-B testing is wrong. Or he said the third of it is just bad. It goes wrong. A third of it validates an assumption. And a third of it finds a completely different variable that they hadn't even expected. And that's what you, that's what strikes gold. Exactly, like penicillin. Yeah. So, precisely, it's what you discovered that you weren't necessarily looking for. But how did you need to at least know to look in that place? Or how did you start to realize there was correlation between things that, you know, factors and it might've been a time-based correlation. It could be based on another thing. I thought the example in the other panel, it's like, oh gee, you know, when we sell more ice cream, more people drown. It's like, you could draw that correlation if you just look at data, sure. But you realize that, no, we sell more ice cream because it's summertime, more people swim in the summertime. But those kinds of things, right? There's the causation effect and, you know, what impacts what. And when this variable changes, what does it do to that? And not just variables, behavior. Because we're talking about people here. You know, let's get out of machine terms and let's talk human terms. So, and when we hear the stuff that our customers are doing, I mean, they're so far ahead of us in terms of, you know, how are you actually using it? So, because that's the thing, you know, Vertica was always kind of an early adopter product. We had a lot of customers who would buy the product, love the product, but what we actually said, what are you doing with it? They didn't necessarily want to share, you know? And cause doing some pretty cool stuff and they're taking the technology and directions we never go out. Well, we had a logic bot guys on yesterday. Oh, that was crazy. I was like, what's your secret sauce? He's like, well, I'm not going to tell you. Right. You know, it's some cool stuff. Everything is a trade secret. So I want to talk about the market a little bit and then get into the ecosystem. So, SiliconANGLE picked up on a comment that I made yesterday which was the Vertica set of the cure for the big sucking sound of a dupe. And what I was, we were talking about yesterday, Paul and I was that ROI in the early days of a dupe was sort of reduction of investment in my existing enterprise data warehouse. And Vertica, as Colin said yesterday, sort of always sat between the sort of open source Wild West and the, you know, traditional snake swallowing of basketball enterprise data warehouse. And you added value to that whole scenario. So take us through sort of what you see in the marketplace, what's happening, a lot of tumultuous times for some of the dupe vendors. Nobody seems to be making a lot of money there. Some of the high flyers have been knocked down a bit. What do you see and what's happening with Vertica? Yeah, well, part of it goes back to what I said earlier. I do think the market is now moving to solutions and what organizations are doing with the data as opposed to, you know, it's become more of a solutions market and the customer stories you hear now are around what we're doing, not just how much data we can load, how fast we can query it, how much we can store, how much it costs to store it. I mean, look, we think that Hadoop is a great technology for storing very large amounts of data, very inexpensively. But there's also, you know, there's a price performance trade-off and curve in this industry that different technologies for different places. I mean, as I said earlier, we are advocates of keeping all of your data. But, you know, we've had a couple of customers, for instance, in the predictive maintenance industry, you know, GE who's moving into, I was actually reading something based on that jet engine that blew out the other day. GE who's moving in, they're moving next door, right? You hear these stories about jet engines giving off a terabyte of data an hour. And I always say, I want that data to be really boring data on any flight of mine, right? Yeah, really. But at the same time, you want to look at it longitudinally over time to just see, you know, first of all, yes, you want to see in the near term are the anomalies or the things I need to look at. That's the data you need, like, right there, right real-time access to what's going on right now. So if there's an issue, you know, it's an airplane engine or, you know, my website, my e-commerce site or what have you, or, you know, just the traffic into my store volumes, all these things you want to know right now, you know, that's, cost isn't the main driving factor there. It's a performance thing. At the same time, do you want to keep all of it? Yes, you actually want to keep all of it. I heard a story the other day that one of the other, Natchi, and I won't name them because I heard it from a third party, but one of the other airline manufacturers, every time a flight lands that they have, the flight lands safely, they have to throw away all the data because they can't afford to keep it. Or, you know, at least this was in the past, that's kind of crazy because then you're losing some insight. Now, are you going to query that in real-time? Do you need real-time performance? No. And that's a lot of data. You know, a terabyte an hour per engine, I mean, you're talking about a lot of stuff to save. You don't want to throw it away. But, you know, as you kind of go up and down the curve, performance changes. So, we've kind of really architected what we do to really occupy, you know, a big, big chunk of that curve. But, you know, we're not necessarily trying to be the cheapest. We certainly, for the types of things we do, Chuck Bear was talking about use cases, try to be the highest performance, but you realize there's an ecosystem, there's different technologies for different problems as well. So. Well, it's interesting, and you're talking about, you know, keeping it forever. Colin talked about it. Yesterday, your CTO talked about not moving the data combinations, which I want to talk about, is, you know, this new capability that allows you to combine different API functions without moving the data. The flip side of that is, data used to be, you remember, when you were an analyst, data was a liability, right? The general counselor was like, get rid of the data. After seven years, get rid of the data. Anything that's work in process, get rid of it, because there might be a smoking gun. Still is. And still is, right? So, when you guys are talking about keeping it forever, it's going to give the general counselor a jada. So, what are those conversations like? Well, it's not going to give the general counselor a jada if you do a good job securing it. Robert also talked about security, being one of the biggest use cases, actually, for big data these days, right? So, security compliance being another one. So, if you're protecting it effectively, it gives less adjada. But, yeah, it's still not an argument for throwing it away. Because, by the way, as they say, dumpster diving is probably one of the, throwing it away doesn't mean it's gone forever. It doesn't mean the bad guys can't still get at it either. So, keep it, learn from it, use it, but extract the nuggets. And sort of understand, I talk a lot, we like to talk a lot about hot, warm, cold data, kind of where price is more important, where performance is more important. And the problems you're trying to solve coming back to the customer stories, because that's what people really want to talk about right now. They don't want to talk about query speeds and data loading and so on and so forth. And, you know, it's like parts of the Hadoop stack, I mean, we see broad, widespread use of HGFS. But at the same time, we see like MapReduce being ripped out just about everywhere we're going. And currently being replaced by Spark, or at least attempting to replace it with Spark in many organizations. But, you know, there's a progression. And I mean, a year from now, there'll be new stuff. You know, I've talked to, you know, Merv Adrian at Gartner a lot about this. He likes to talk about, you know, flavor of the month, flavor of the year. There's a lot of new stuff, and it's good. It's good that we can use all of this technology. But it, you know, it both replaces and complements. And some of it sees, you know, long-term utilization. Other stuff goes away, but I do think, you know, you mentioned earlier that, you know, the market's adjusting to this is becoming much more of a business and also financially-driven thing right now. It's not just technology for technology. I want to ask you about that. I think it was Gartner that said last year that the big data was mostly being used for operational cost reduction purposes. That was really where the sweet spot was. Not so much strategic, you know, new ideas developing new markets. Are you seeing that change at all? Well, yeah, but I think that's where the insight stuff comes in too, right? That was like that Mercedes example we talked about earlier, is the, you know, what am I going to do to make it more likely that a customer's going to come into my store or what do I make it more likely that when they're in the store they're actually going to buy something? That's less about having a massive volume of data and that's more about the insights that we gain from it. So, and that's the turning point. Now, do we still need to keep everything for compliance and, you know, other types of purposes? Yes, we do, but it's really the learning. And that's, and again, that's where this whole concept of applied machine learning and a lot of what we're talking about here at the conference comes into play as well. And also, you know, moving the analytics to the data, that's the other thing, because I think it was Yop talked about how hard predictive is, and he's right. But one of the other things that's right, that's so hard about predictive is that most of the predictive products and capabilities out in the market today, they can't work. This is sort of the dirty little secret of the industry. They can't work on large volumes of data. You have to sample, and you have to reduce and sample and pull into these platforms. And, you know, so we've had predictive analytics products out in the market for a long time, but they can't necessarily run across these massive volumes of data, but that's where they're most valuable. So we're doing a lot of work to try to embed predictive machine learning capabilities in our core products, so you can actually run the models across large volumes of data, as opposed to, sure, do you want to run a model on the engine while it's running to see if there's any anomalies, it's running hot, something's wrong, yeah, there's high friction, of course, but at the same time, you also want to be able to take those models and run them on these massive volumes of data. And so that's something that I think you're going to see emerging as well, is doing predictive modeling on very, very large to find those insights that are the key things that sort of jump out. So. I remember the first year of this conference, I got into a little debate with Kurt Monash, which is never a good idea. And, but my contention with sampling is going to be dead. And, and, and I don't know if that's occurred yet, but it's happening in my view anyway. But I want to talk about the stack and the ecosystem. Steve yesterday from Ameripride. Yes. Was basically talking about the stack that they're building. We're taking a little stuff from the cloud stack. We're going to take some Vertica. We're going to take some Tableau and they're basically building their own big data pipeline. Yes. And the ecosystem is doing that as well. And that's something that, you know, you, you're watching things growing and it's building momentum. Talk about what both customers are doing, but also the ecosystem, they're sort of mirroring each other. That's absolutely what it takes. I mean, big data is, there's absolutely an ecosystem. That's one of the things that we've, you know, really made sort of a key part of what we're doing as well here too. Because I actually remember being at Strata three years ago, maybe it was three years ago, three, four years ago, you probably know the timing. And we had our booth and we were, you know, we were on the floor and right next to us on one side, EMC announced they were going into the Hadoop distribution. I think they announced Pivotal. On the other side, Intel announced they were going into the Hadoop business. They were kind of doing their own flavors, right? Well, yeah, but the thing was they were, everybody was going to do, we're going to have ours and ours and ours. And we said, you know what? Let's not have ours. We remember asking you, are you guys going to do it? We want to know what we're doing well. And let's work with all of them. And that's really what we've tried to do. Now, all can be somewhat aspirational because we're really driven by, you know, what our customers tell us are the most important ones to them now, but like cloud, for instance. I mean, we've been on AWS cloud for a long time now. We've had Vertica on AWS. But when we launched Haven of Demand, you know, we launched it on Azure. And at the same time, we were definitely seeing demand from our Vertica base to be on Azure. So we're going to be on Azure. And, you know, we had a press conference yesterday and questions about other clouds. And yeah, we're certainly looking at and working with other clouds, but we see, you know, we need to work with the entire cloud ecosystem. We need to work with the entire BI and visualization ecosystem, ETL, data preparation, data loading. We need to work with that whole ecosystem, just the platform ecosystem. And of course, you know, and things like Hadoop and open source. And you heard the work that's going on with Spark, with Kafka, with Hadoop. So, you know, working with that entire ecosystem is key part. And yes, that's my, that's my day job is to work with that ecosystem. And, but it also ties into, you know, where our customers are telling us and what's important to them. So, I mean, it's got to be simple and transparent to developers. So when young Johns was speaking this morning, talking about the combinations and 70 APIs, and because we, for our own purposes, we're playing around with machine learning and AI, Watson. Yeah. And it's somewhat- We've got crowd chat. But so I'm listening to him. So I hit the Haven on Demand site, saw these things as video, as text analytics. I'm sending it to our development team. They say, check it out. You know, see how easy it is. Cause we've struggled, you know, as a developer integrating that, some of that stuff. It's not so easy. I'm hoping that Haven on Demand is easier. And it shouldn't matter what cloud it's on. You showed a video this morning, it certainly made it look easy. Are you going to, are you going to, is there going to be some sort of proof of concept or something that is easily accessible by developers where they can take it for a test drive? I mean, we build, you know, like in our marketplace, we build things called quick starts that are sort of demo applications. And we've got Haven on Demand, we've also got the Vertica marketplace where you can go see these examples. We've got some of the kind of, and a lot of them tend to be smaller companies, developers building on our products. But we're really also giving it out to customers to say, you know, you build it and bring it back if you choose to share it and learn from other customers. And by the way, come to our event and talk to other customers as well. So, you know, Eric from Blue Lab said something, and I don't think he got to it on stage today, but I was talking to him privately yesterday. He said, really, we were talking about the Brexit polling thing. He said, it's not looking at the data about the voters, customers, in his case, voters that's important. It's not looking at the data, but it's actually observing what they do. And so that's when some of the, you know, and like if they're on your website, that's clickstream analytics. But if they're out walking into a retail store, that's video analytics, audio analytics, you know, mobile analytics, so, and being able to sort of combine this stuff and put it together as some people like to say, mash it up. Is it simple? There's a lot of science behind it, but there's also a lot of art behind it too. And that's where the people come in. And we're trying to give the best of breed tools to the community, developer community, and that also means large corporate developers to let them do it in a way that wants to do it. And it's getting a lot less expensive. Yeah. Pretty much anybody can try it. Absolutely. And just the whole notion of bringing analytics to the data. I mean, that is the epiphany of Hadoop, right? Just leave the data where it is. Right. Just to throw it away. Right. Because it's so cheap to store it that there's no reason to get rid of it. So, from that perspective, Hadoop's been a great thing. That's the real game changer in that sort of bit flip. But, all right, we're out of time, but thanks very much for coming on. We'll give you the last word. As always. Put the bumper sticker on 2016 BDC. The bumper sticker on 2016 BDC. I like Chuck Barris' t-shirt this morning. What was his? Seizing is believing. Seize the data. So, I don't know if that's the last word, but I liked that one. Hashtag, seize the data. This is theCUBE, we're live from Boston, Massachusetts. Chris, thanks very much. Thanks, Dave, thanks, Paul. All right, CUBE right there, everybody will be back right after this short break.