 Live from Midtown Manhattan, the Cube's live coverage of Big Data NYC, a Silicon Angle Wikibon production made possible by Hortonworks, we do Hadoop and WAN Disco. Hadoop made invincible. And now your co-hosts, John Furrier and Dave Vellante. OK, welcome back everyone to Big Data NYC. We're covering all the action at Hadoop World Stratoconference. A lot of action. This is where in New York City it's all about business and tech coming together. I'm John Furrier with Dave Vellante. This is the Cube. Our next guest, Ed Dunbill, Cube alumni, an amazing person, really a big part of the Stratoconference co-chair, Alistair Kroll. Welcome back, part of Silicon Valley Data Science. Congratulations on your huge success, Estrada. Always great to have you on the Cube. How's it feel? Well, thanks very much. I feel absolutely delighted. We have a full packed house. It's kind of difficult to walk around there. It's been a great show, a lot of energy. And I think as well we've still told a good story, right? Full of vendors. You can just always hear what the vendors are trying to say, but above that, beyond I think everyone's reaching for a story and reaching to make Big Data practical and tell people how to do things in a meaningful problem first kind of way. We had a lot of conversations here in the Cube about folks talking about the show, packed. But the bumper sticker seems to be coming of age, right? The maturation. I mean, if you look back since you've been involved, how far it's gone? It's such a short period. I mean, you're talking about, we're talking about production scale deployments now. It's, as Gardner says, it's now not what, why, it's who you're going to do it with. And what are the platforms? What core things have you learned from the show coming out of it now that you can look at and say, hey, you know, this is the nugget that came out at Strata Conference this year? Well, you know, I think we're at this, you know, you said the coming of age is certainly a great threshold. There are several kind of periods in the industry's life, right? And we're at the 1000 flowers blooming stage and everybody pouring money into this at this point and trying to solve a bunch of problems. I think one thing that's very clear that's coming across is it's, the whole is more than the sum of the parts. And we're really reaching for a big data kind of platforms inside of organizations. And really when you look going forwards, we're not just Hadoop and, you know, Hadoop itself is morphing into something more like Linux. It's an ecosystem in which you can run a bunch of stuff on top with Hadoop 2. And really what organizations need is a solid platform to build their enterprise data apps against. So my take home is it's great to see things consolidate where businesses aren't just saying, look at our tech, isn't it great? But look how we can solve your problems. And the second thing is looking forward. We're going to think a lot more about enterprise data applications than we maybe were thinking about making BI cheaper. That's ultimately the Holy Grail. And the theme that Enterprise Ready has been a big one. But it's interesting with ClearStory. I was talking to Charmilla this morning. She brings a whole other dimension that Platform was teasing out, which is, you know, making that real-time BI, business intelligence, be like a search engine. Charmilla takes the whole other level of like, hey, user. iPhone, iPad. I'm not a data scientist. I'm just making, I'm Billy Bean in the organization. I could be anybody. Yeah, I think it's great, you know, that people are actually paying attention to that end of it. You know, for a long time, Tableau has been the only person really in that field. One thing that I find very interesting about ClearStory is that because it's a cloud service, it can take some of that market where, maybe say, ClickView used to be before, in that kind of departmental area where the spenders may be the CIO or the product manager's discretionary spend. So it's opening up a class of product and big data, you know, cloud-based self-service analytics that has been quite young so far, and I'm, you know, excited to see the competition. It's AC after cutting, as I'll be able to say. I just put that together. That's good, because all this stuff was BC. But despite some of the incredible success, there's a lot of it predates Hadoop. Yeah, and you know, one of the interesting things about ClearStory is, as far as I recall it, a bunch of the stuff under the covers is actually the spark of Charlotte, the Berkeley analytics stack. And that's pretty interesting, because you've got them coming into production based on it, and you've now got their funding of Databricks, which are also partnering with CloudEro. So I think, you know, we'll see Hadoop there as a substrate. I think we've got another bunch of techs on top, and it's wonderful to see attention being paid to the user. So that's a low-latency play, right? That's the value of Spark, is that right? Right, that's the in-memory stuff and the SQL querying, yeah. So that's what really enables them to, you know, basically what in-memory gets you is responsive analytics in the system, and that's what the Berkeley stack is bringing on top of HDFS. Talk about real-time, and we had conversations, and not in debates. We didn't really actually get into the debating of Storm versus, say, Spark. Real-time is huge in machine learning, you know, graph processing. These are the cutting-edge computer science conversations happening right now, certainly on the graph side in machine learning. But now with real-time streaming, do you go with Storm? Is it Storm or Spark? I mean, what's your take on that? Is it just simply two horses on the track still? Is it too early? Well, I think there are actually two different things. I mean, there's two aspects to real-time. One is getting data into the system in real-time. The other one is actually, you know, getting queries in interactive time. And so in this case, Storm is one, and Spark is the other. But I think that's a really important point you hear, you know, a couple years ago, oh, batch is good enough for anything, right? But the real important thing in solving data problems is the person. And if you have to wait ten minutes between queries coming back, the person goes to sleep, loses their flow, and they're not empowered. So the real important thing about real-time and in-memory database technology is that it empowers the analyst to explore interactively, rather than going off to sleep and losing their thread while the query runs. Yeah, real-time means real-time like now. Yeah, I mean, we talked about the Amarawadala on earlier, talking about, you know, the personalization aspect of it and how the discovery side of it might not be search query-like, but all that will be rolled into the app. So, you know, we were talking about the recommendation engines, and the first instance of big data was, you know, which I had to put in front of the user, but if you take that same premise, the same concept, discovery could be whatever that edge point looks like. I think one of the really important concepts in computing going forward is going to be personal context, and you see that coming out on mobile devices. Things like Google now are really early foreshadows of this. Well, given that I know a certain amount of what you're doing, I can make even more intelligent things. If I'm your friend, John, I know what you drink, and I'm only going to suggest we go to bars where they're specialized in it. Why on earth do we have to do so much work to satisfy what ultimately is lack of effort on behalf of the programmer? This is one of my big bug beds. So context and intelligence search, I think, is wonderful that's becoming like a default expectation and interface. Great. So talk about what's next for you and what's going on with the data science operation. So your day job is about data science. So give us the update. What's happening now? What are you working on? What are some of the cool projects you're talking about? Yeah, obviously we can't give you exact customer names at this point, but we are rolling forward really happily. We're maxed out with clients already, which is a great place to be, a bunch of companies, names you'd heard of. And really the theme, I suppose, one of the things we're doing is they're probably up and running with some degree of data-driven business already, but they're looking to make a business transformation towards something that's a lot richer, like adding context, like turning information into retail, a lot of things like this. And so we're attacking the problem at a bunch of levels. We believe that to do good data work, you need both data scientists and data engineers in the same team, and that's how we deploy. And then it's often a matter of data architecture at the same time as smart algorithms, because you cannot do data science on the data if you can't get the data in the first place. Yeah, we love having you on. Hillary Mason was just on. She's again one of our, she's cut from the same cloth as we love. We're data geeks, and we're always early on this stuff, so we kind of like, people think we're crazy when we start talking about stuff. You're one of those guys who has a good nose for it, very practical. So I want to get your opinion on the next topic of gamification. And gamification is basically algorithmic manipulation of data in real time, whether it's, you know, poor gaming, or using things like voting, or using some scoring, where rankings come in. And this is stuff that's kind of on now the real front end of really cool user experience. You mentioned, if you know I'm your friend, I like to drink, then that's essentially gamification going on in software, in context. And Hillary mentioned the app where you tap on, it gives you the precise forecast where you're standing. So where is gamification going in your mind? What have you found? And can you share your opinion on that? You know, we certainly heard a lot about gamification in recent years. You know, it was far back as when we were back in New York for Web 2 Expo, which is what, like, four years ago, it was a big thing, and it was certainly a big thing in the user experience world, how to get people engaged. And like a lot of topics, I think it maybe went way out in specialization, and it became a buzzword. One became slightly cynical of turning everything into a game. What I think it is, in its essence, is again what I was saying, essentially respect the humanity of your user in the same way that great, it's about great software design in the way that, you know, say Apple is about great product design. There's something that feels great about it. And to me, gamification and trends that have that aspect is, recognizes a human there, and we interact in many, many sort of ways and reward and enjoyability of the experience, comparing ourselves to our peers, all things like that, make it nicer for us. It also enables us to volunteer more information, which is more context that the big data machine can then use. Two years ago, you talked specifically about user experience. I remember that we were talking about big data. I think it was last year, the year before, which feels like 10 years ago. But gamification for making money seems to have been like, oh, I'm in loyalty programs, or you get a startle badge. No, you're talking about the user experience. Use the gamification, game mechanics to add value to the user experience, not so much to sell them something, right? That's what you're getting at, right? Yeah, you know, I tend to believe that all these things, people find out they work, right? And then you go overboard, and then you pull back. And I think it's something that's sympathetic that, fundamentally, what people has driven gamification is that, oh, software engineers discovered human nature all over again, right? And the things that we enjoy, and then, ooh, that way. But now it's for tea, you can't play 9,000 games, right? But you can understand that things should be enjoyable to use. You should feel a sense of accomplishment as you work through things. Things that great teachers have understood, you know, for years. Yeah, Dave and I were just talking yesterday around fatigue, gamification. And one of the things that we talked about is that, you know, there's a line between creepy and awesomeness, right? So, you know, it's like, at some point, it's like, whoa. That's all, well, that's creepy. And there's almost a rejection, cognitive reaction saying, whoa, but it's all public data, whether you're on Twitter or whatnot. So like, how do you manage it? I mean, sometimes it's a common sense. Maybe it's just a common sense approach, but, you know, is it just transparency? How do you advise startups out there, people doing some really cool work with data? It's still early, so you're going to have to break a little bit here, break some glass, step on some possibly landmines, hopefully it doesn't kill you. But what's your advice to companies out there on how to manage it? Because once you cross over that creepy line, you could really go into a death spiral. We know there's a great talk at Ignite a couple of nights ago. Go check it out if you haven't seen it, called Algorithms of Pain. And hopefully they'll be posted on YouTube pretty soon. And this woman was very brave and talked about her breakup and the problem of, you know, obviously first Facebook kept reminding her. But then, you know, she would be connected in various ways to her ex and she didn't sort of fet the adverts she got served on Google Plus in her regular search results. You couldn't get rid of it. So there's definitely this situation where just because you can implement it, it doesn't mean you understand the entire consequences. And I don't think you're going to get it right for more than anybody else. I think to go back to the team thing, right, I think product design requires cross-functional teams of people with different abilities and different insights working together. That's going to give you the best chance of working your team of getting that balance right. I think it's amazing. You know, as a human, we're really good at intuiting this, right? Your assistant knows exactly what you want and he knows when they're going to overstep the bounds of something that's going to piss you off. But the computer doesn't know that. That's up to us as programmers to put those limits in. Okay, Ed, we got our next guest coming in. Thanks for coming in. I really appreciate it. I know you're super busy with the conference. A lot of activities, hosting, schmoozing, wheeling and dealing, having some fun. I want to have you end the segment on just a quick bumper sticker. Bumper sticker of the conference. So I'm going to put that on the back of the car. What is the bumper sticker for Stratoconference this year? Great progress. A long way to go. There it is. At Dunnville Co-Chair of the Stratoconference, that's Silicon Valley Data Scientist, his day job changing the way companies do their business. Again, this is a great theme. We always have that good content. User experience, design, thinking about the human impact. This is theCUBE. We'll be right back with our next guest after this short break. Big data NYC, right after this short break.