 Live from New York City, it's theCUBE at Big Data NYC 2014. Brought to you by headline sponsor, Juan Disco, with support from EMC, Mark Logic and Teradata. With hosts, Dave Vellante and Jeff Kelly. Welcome back to Big Data NYC everybody. Shreya Mbadi is here as the CEO and founder of Hex Data doing some really interesting things with machine learning and really at the heart of what's going on with Hadoop. So Shreya, welcome to theCUBE. Thanks for coming on. Thank you, thanks for having me. So you're just coming over from the Javits Center. We're of course here live at the New York Hilton Times Square. So what's it like now at the Javits Center? What's the buzz like this year at Hadoop World? It's like Times Square in Javits Center. Busy and a lot of customers, a lot of adoption. You're seeing maturity in Big Data space. You're seeing adoption around Hadoop and Spark and a lot of show me the money. I need ROI from my investments on Hadoop and Big Data. People are coming to different vendors and trying to see what can I do with data that I've stored in HDFS. And they're asking lots of questions on what kind of analysis I can do. What kind of insights I can pull out of this data and how do I transform my business using those insights. I think it's about the next stage that's been set very well. So big themes there are actually finding about making money with the ROI and internet of things must be a hot topic. Everybody's kind of crazy about that. Sort of put the carrot out there. But let's talk about that ROI equation. Jeff Kelly's done some research and said, on average, the people that are deploying Big Data projects for every dollar they spend, they're only getting 55 cents. This is on average, that's the mean. If you look at the guys that are at the edge, they're getting three, four. So what are the guys that we're getting ROI doing differently and how do you guys fit? One of the things that was very important for us, even talking from a technology center like Silicon Valley, which kind of builds lots of cool stuff, is when you come to the real world, you want to kind of embed yourself into what's happening in the real day-to-day life in a business. Businesses are now not only seeing new data sources, new sensors that are producing data everywhere. They want to monetize the imprints that are happening. They want to know if people are coming to their website, they want to impact the customer decisions as they are on the website. So real time is kind of more taking front stage. People want to see interactive analysis of their data. They want to connect into dashboards like Tableau or connect into their traditional way of doing data through different data sources. And so post-installation, what do I do? Right now that I've beyond Hello World, I want to actually take and take quick data and analyze how my customer is behaving. How do I understand the end-to-end journeys of my customer? How do I do better things for him? And I think one of the interesting things about that is you have a very high-dimensional data. So it's big data is not just in the rows. You're seeing very high-dimensional data come from different data sources unstructured and structured, and you want to be able to combine and fuse these data sources and still come up with the nuggets that can give you a pattern for your data. So machine learning is at the heart of this. It's all machine-generated data that we're dealing with. And if you think about the doubling, great doubling effect of big data, is mostly machine-generated data in a networked world. And networks follow power loss. So you're seeing this fundamental inundation of businesses with real data. And data is the new gold. So I'm sure you have H2O, which is, of course, your product, right? It's an open source in-memory prediction engine. So there's a lot in there. So tell us more about HECS data, what you guys do. One of the things that, I mean, science always had big data before even the term big data existed, like physics or biology or neuroscience. Some of the things we were doing in prior lives led to trying to build really good tools to understand patterns and data. And at the heart of big data is this core, I would say real profession called data science. And data science is actually about understanding data and its patterns and pulling out interesting information patterns from within your data so you can take real conclusions from it. And what are the top five features that matter for me? What are the top five customers that we're going to convert in the next quarter? What does the segmentation of my users look like? All of these are really not exact answers. So if you think about the whole NoSQL moment that preceded and then Hadoop unstructured data, in the new world ML, which is machine learning, is the new SQL. There is no, SQL gives you the exact underlying answers, but people are looking for patterns in the data. I know that X likes movie, the latest Bond movie. What are the movies can I recommend to this person? So kind of trying to look at patterns, not just exactly what he wants. So ML is the machine learning is the new SQL. Am I inferring that you're taking humans out of the equation there or where do humans fit into that? Let's unpack that a little bit for us. So if you double click on that a little, right? So what is the role of the humans in a machine intelligence world? If you think of a Google, the way they packaged machine learning, Google search engine is one of the most popular killer app on machine learning. Page rank, distributed page rank. Scaled page rank is packaged as a search engine. And suddenly humans are now tasked with asking the right questions and the right keywords. So if you go one step further, you have this enormous amount of data sources and nice interfaces and algorithms and H2O provides some of the most cutting edge Google scale machine learning algorithms for the average user, for the average business to take and package and run in their businesses. These machine learning algorithms are still only gonna be as good as the questions that the domain experts can ask. So for example, today you're trying to set this up event you know that you need to go from point A to point B, you have all this traffic, you have all this event pieces that you want to actually capture. All of that is domain specific to your business. And knowing that in your terms is kind of what you can add in addition to the traditional prediction that you get from the algorithm. So it's all about the context. Algorithms are very powerful, but they're not necessarily going to replace the emotional response that you need from doing data science. So data science is eventually it all produces a pretty chart. Now is this pretty chart going to cause action from your end user? So it's almost as powerful as art. Art needs an action response to every art form. Data science is you're working with this clay called data and come up with this phenomenal result from your analysis. You want your businesses to take decisions from them. And what we found time and time again from our customers is data-driven decision-making is actually very hard. You almost always know that the truth and you want to follow the truth but it takes courage to follow the truth. And for that you really have to be able to take a very, you take Moneyball for example, one of the biggest example of applying real analytics on as boats and that sport really was transformed by the whole Moneyball effect. And they had to make decisions that were unpopular at the time but they went with the data and they transformed the entire game around data. And that's kind of what is going to be a food with every business. They're going to have to make courageous decisions after coming up with analysis. And that analysis is what we power with using history. So you brought up Moneyball, I have to ask you Shree. So when I read the book from Michael Lewis, Moneyball, I said, I can't believe that Billy Bean has given away all his secrets. This guy's crazy. Now it has transformed the game but at the same time, the Oakland A's have been able to maintain their lead. Somehow, some way, they seem to have stuck to their knitting. Maybe it's because they have to because they don't have the money. Yet they've still never won the big one. Can we, how much can we apply that to the big data world? Are people sort of obviously transforming businesses but as I said earlier, some people are getting ROI, some people aren't. You've got the sort of the best of the best and then you've got the mean. Am I taking this minute for too far or is it similar? Why is that? Help us understand. So there are, so this place, I mean, before we got started, before we had H2O as a machine learning platform, there were prior art here, right? So if you think about Goldman Sachs, they took over almost all of Wall Street with better intelligence, more sophisticated algorithms and taking those algorithms into production to be able to actually run those algorithms in production against real time data. They had the full sophistication all the way to apply what you've learned. They were like the Yankees playing money ball. Exactly, exactly. And then you have Google and Amazon and if you look at Amazon's earliest shareholder statement, it says, I want to understand which book you will buy before you come to my website. And I've achieved a lot of that. Recommendation engines have become mainstream today and Netflix, again, one of the powerhouse in data science and machine learning, they've all applied these to a point where they're dominating their spaces. So in some sense, the transformation, if you think about Netflix, is transforming entire movie and TV industry in ways that have taken on a 100 year old system in a matter of few years. So the transformation that you can do with applying machine learning, and we've seen that happen with the likes of the insurance space. You're seeing tremendous transformation in that space. You're seeing tremendous transformation in healthcare space. Internet of Things, as you brought up, is going to transform most of these ordinary new businesses to the powerhouses of tomorrow. So to some sense, the money ball effect has to be completely applied with the real transformation all the way through. And that's kind of what you're seeing. That's why you're seeing a big shift towards applying big data, applying data to go and take your business to the next stage. I want to come back to that notion of dominating spaces, but I want to let Jeff Kelly jump in. Well, Sree, I mean, I think you're certainly hitting on something that's very important. It's all about finding insights and taking action based on all that data that people have been storing in this new big data infrastructure, whether it's Hadoop or other platforms. So I guess my question would be, so you talked about the courage that's required to take sometimes unpopular decisions. So what role does the big data industry, players like yourselves, have in helping companies find that courage and deal with some of the more cultural people process issues related to big data, machine learning. It's one thing to find these insights, but it's another thing to change the culture inside an organization so that they can actually act on them. Does industry have a role in helping customers do that? Is it going to take another outside force? How are we going to make that shift? Or is it only going to be an elite few that do it that really recognize that those are going to be the winners? I think the key thing is to make it fun. People like to learn new things, but learning happens more naturally in a playful environment. Being able to create this cross, almost cross, I would say not cross, discipline, almost cross-border teams. Because when you look at IT and business, there are really two different teams today. And the promise of big data was to just bridge the business in IT worlds into one world. And I think that cross-disciplinary teamwork is something that at every decision our customers make. There's a business guy, there's an IT guy, there's a big data guy. And the big data guy usually spans two different groups. And there's data scientists who just corner the classic believe-in of every company. And the key piece is to empower this Billy Bean with enough tools that he can actually come to a simple conclusion out of all the noise that's in the business, on a day-to-day business. And I think knowing that outlier effect, which is really, it's like Google was an outlier when it got started, it was a 13 search engine that nobody even cared for, right? So it was just pure word of mouth that brought it into the mainstream. And so being able to look at the outlier idea within your own enterprise. Some of these big companies have a lot of pockets of really new thinking coming out of fresh hires, new acquisitions, and new trends, and people going to the conferences and meeting other people, picking up these kind of the memes, the tech memes. Being able to foster that idea of let's have a play space where you can try different ideas, it's okay to fail, and let's just make new mistakes. Kind of picking up the new idea of building companies which are much more tolerant, much more flat, and building teams that are very equidistant from the farthest idea to the top most powerful person in the room, try to equalize some of these effects. You can have a real democratic effect inside companies. And that should happen not just at a startup level, it's happening now with the help of data-driven thought processes happening at big companies as well. And that's kind of where we're seeing the outlier. We just need one person in the entire, oftentimes this is only one person in the entire big company who is interested in an open source technology like H2O, and he just brings it up to the table, and then before you know, and this summer was quite a revelation. There were seven interns, one at Thompson Riders, one at Trulia, one at ShareThis, one at PayPal, and just real interns, Sharon, who were given this not-so-popular project, go work on H2O and come back. And before we know, all of these became customers by the end of summer. Basically, the intern goes there, quickly shows results, ease of use has never been any more important. The post-consumerization of enterprise where you carry your Apple iPhone to your office, you don't have an office-given cell phone anymore. So you're carrying your consumerization of enterprise happening dramatically, and to some extent enterprise, as we know it, is dead. People are transforming enterprise through their personal lives. And so that whole effect of one person who is basically going to transform your business is happening. Now, can that one person be, you mentioned interns that led to customer engagement. Can that one person, does that one person have to be a PhD trained data scientist, or can that be a more, you know, a business person, someone who doesn't necessarily have the training in statistics and mathematics and some of the other, you know, tools and skills that we hear about when we hear data science? Is that possible? I think one of the transformations that's happening, if you will, is the, so the first level of, so our core constituents for H2O, if you will, we focused on data scientists. They have been historically, so we started with R and making R really powerful, in other words, go beyond small data to real big data on R. And that's something we really focused on because R existed before Silicon Valley existed. It was invented in the Bell Labs. And they never had a real cohesive software team building products for them. So we took the R metaphor and made data science lives easy, right? Now that's step one, but the constituents from data scientists are really producing work for the business analysts. And the business analysts are coming from like the transformation with Tableau are using new click tech. They're using newer tools, which are not necessarily traditional BI and more drag and drop analytics. They want to, and Excel spreadsheets, they want to move from there into a little bit more sophistication because BI historically showed you trends of what happened. Prediction, which is kind of the new search is doing what's likely to happen with what percentage of likely to happen. It's likely to rain and kind of likely to have a big quarter next quarter, right? So kind of likelihood of success is kind of what you're trying to predict. And I think that's what business analysts have been doing all along. They wanted to forecast. The reason they were keeping the books was to be able to forecast. And so historically there was a desire to do it, but the barrier to entry was very high. And so what's happening with the next few, almost next few epochs in the big data cycle is to simplify the application of data science to something that even a grandma can use. And that's almost all the way, take it to search. Page rank is difficult to use, but search is very easy to use. So Sri, we're out of time, but I want to give you the last word. Awesome conversation. Things are moving fast. I mean, you, yourself. I mean, data stacks, platform, hex data. We talked about sort of going from tire kicking to real hard ROI. Where are we going to be a year from now? Where do you want to see this? So machine learning will become much more embedded into a lot of things. You're not even knowing you're using it. Your Nest device knows what's happening. Edge becomes more and more intelligent. We're powering consumerization of data science in ways that it really becomes an application. So our core vision, if you start with data stacks, which is a layer in the data, we really wanted analytics to not be offline processing as data mining was. Instead, make it online, all the way online modeling, real time online learning of things. So you plug the algorithm straight into your application. So building smart applications is going to be the future of the space where big data becomes really vegetified into your application. So you can now, as a CXO or a CMO or a CFO or a business analyst or even a day to day person, humanize big data to find where you can actually look at a widget that shows. And that begins to sort of collapse the skills gap that everybody talks about today. And one of the things we're seeing is open source is also mainstream. So the trend, big trend, if you think about data stacks and all the various evolutions, open source has taken primary stage and big data has become very much of an open source revolution. We had 100 users in January. We have about 6,000 installations today. So it's growing in a very exponential pattern. So we're going to see how the next one year. The enterprise, as we know it, is basically dead. Sree, thanks very much for coming on theCUBE. Some great sound bites there and appreciate your time. Yeah, good luck. Keep right there, everybody. We'll be right back. This is theCUBE. We're live from Big Data NYC at Times Square Hilton. Be right back.