 Live from San Jose, California, in the heart of Silicon Valley, it's theCUBE, covering Hadoop Summit 2016, brought to you by Hortonworks. Now, here are your hosts, John Furrier and George Gilbert. Okay, welcome back and we are live here in Silicon Valley in San Jose for Hadoop Summit 2016. This is SiliconANGLE Media's theCUBE. This is our flagship program where we go out to the events and extract the signal through noise. Our next guest is Scott Nau, who is the CTO of Hortonworks, Dave Ariani, who's the CEO of AtScale, Hot Startup, former Yahoo platform developer now, CEO of AtScale, who was formerly at Cloud, which in a few years ago, you were actually interviewing theCUBE, both CUBE alumni, welcome back. All right, so what's the story, Scott? As CTO, you're highlighting AtScale, which there was some news out there, so for the folks out there who covered this yesterday, our relationship is there between you guys. You're a hot growing startup funded by Ryan Floyd, one of the best venture capitalists in the valley, friend of the CUBE, I'm Robert Dawson, investor. So you have a little geek holy water from the high priests thrown on your venture. What's the connection with Hortonworks? Give us the update. You know, one of the things that really drove the relationship is customer demand. So as customers are onboarding more and more data in their clusters, they're looking at getting more and more access to that data. And certainly one of the most common ways for business analysts to look at data is through dimensional models. It's the way people think, spreadsheets, pivot tables, that kind of thing. And so by combining our technology, we can actually deliver an end-to-end solution for our customers where they don't have to move the data off their cluster into some other appliance to go do that work. They can simply play the data where it lies. And that's really important, especially for our customers who are using their clusters, not only for structured transactional data, but for online archive where they've got 20, 30 years of history, they can now have access to this and really understand how their businesses have moved. You know, Dave Vellante, who's out in the back east, and he and I do the cubanaries, he always says, want to squint through the market activity to try to understand what the landscape looks like. So I'd love to get your perspective, Dave, Scott, on the following thing. Because one of the things we're looking at at theCUBE is to go to all the events is what's going on right now. People are scratching their heads of all kinds of conspiracy theories. Oh, Hadoop, the ecosystem is falling apart or the companies are pivoting and there's no apps. But when in reality, companies are still being funded. There's some softening. There's certainly change up and down the stack. You have a solution that's resonating with customers and growing fast. And Hadoop is evolving and changing. Our rooms talked about it. We're going to talk about modern apps. But in reality, I see a lot of transformation going on. Certainly there's pivoting for the right reason. People pivot the value. So is it a platform of platforms? Is the Hadoop not necessarily that silver bullet to be, oh my God, Hadoop's going to spring up all of these apps. It actually seems to be more of a platform enabler. Platform of platforms where your partners are serving the app developers that are doing the topization and all that stuff. Thoughts on that? On that vision? You know, for me, Hadoop is the data operating system. And for me, business intelligence is really the killer app. So you think about how to create ROI from data. And you create ROI from data by putting business users on it. Because they're going to create the value, not me. I can prep the data, but they're actually going to generate the value. So without that, and Hadoop started out and when I was at Yahoo, it was really sort of like this archive. And to me, it was, we were really underselling that capability that if you bring interactive query into an Hadoop environment, which is what you're seeing now, which is really fantastic with Hive-Tes and with Spark SQL and Presto and Drill, you're talking about opening up that platform for not just the batch workloads, but for the interactive workloads that make it an ROI story, not just a cost-saving story. So Scott, so that we would agree, totally we've seen that. The apps that are killer apps today is the analytics. Just like email is a killer app for the web and the internet. But the question now comes down is, there is a huge app enterprise specifically, developer market boom. We saw Docker containers and DockerCon exploding with activity. So those don't necessarily aren't Hadoop developers. They're more leveraging a platform, whether it's some BI stuff integrated into an app. He set out because some people are looking at that as a proxy for this and it seems to be the wrong picture. If you're looking at that market, you're not really looking at this market. Do you agree with your thoughts? Yeah, and I agree there's a lot of confusion, right? So I think one of the big things at being data-centric as I am, I think about it more from the perspective of Hadoop and the Hadoop ecosystem really becoming the center of gravity for data because of the flexibility, because of the cost, because of the extensibility, because of the robust ecosystem, and because of just the sheer growth in connected devices and all this data has to go somewhere, right? And so this is becoming the center of the universe as it relates to data, data in motion, data at rest, and data analytics. And into that plug, so yeah, platform is a word that gets used. Into that obviously gets plug applications and content delivery, right? So the data, just storing data is fun, but so what, right? How do you get something back out? And then we think, you know, as part of that platform story and we've talked and some of your other guests here today probably talk about connected data platforms, we also think, not only is it the center of gravity, but it'll also be physically diverse. Some of the data will be in the cloud. Logically, you'll have your data lake, but some will be in one class, I'll be in another class, I'll be on-prem. And so being able to connect those things without having to physically move the data and being able to connect those things and also being able to spawn applications that are containerized and can move around that grid of data is a really important technology difference. So you're going to have a connected platform first before containers can move around a platform or network. Otherwise, you don't need containers to move anywhere. So Dave, I love the OS thing, because that was my degree in my computer science degree, in the 80s was OS's system stock guy, but let's take that to the next level. And I want to ask you specifically the question, if, because Docker really kind of points, the Docker madness points to me at least, that DevOps won. Yeah, the DevOps ethos of infrastructure as code won. Okay. So if you believe that, maybe you can maybe not believe it, but developers aren't really provisioning hardware like they were. How do you take that to data? Is data at code a new concept? Is the data of the operating systems that's enabling a new abstraction layer or? It's, so here's the way I think of it. You know, when I was at Yahoo, Jerry Yang and David Philo, they pounded into our head that we don't throw data away. And if you think about that, that statement, there's a lot of requirements there because if you don't throw data away, it means you got to store it someplace and you can't pre-process it or pre-aggregate or pre-structure it. Because if you do that, it's too costly. So you will have to, you'll be forced to throw data away. So we invented Hadoop to be able to basically do that and capture the data. But what that actually resulted in is, is that you capture the data now and you add structure later when you have the questions. And so the whole paradigm of data warehousing where you pre-structured data for questions you had today was flipped on its head because then you structure data for when you have the questions, that whole schema on read concept. So people are always focused on the scalability and the cost savings you get with Hadoop. For me, it's that flexibility of schema on read where now I can adapt my data to solve and answer the questions I have right now. Take a minute to talk about the product that you have and the problem that it's solving and the impact of your customers' this relationship with foreign works. Yeah, that's a, so Scott alluded to it before where we're selling to the same customers and we have common customers. And here we are, we're selling to the same customers separately. And the customer just wants the solution. They don't, you know, they don't want the pieces. They want their solution. And so, you know, I was really, I'm really excited about how we can remove friction for the customer. And we already see this where, hey, guess what? Hortonworks has the relationships. They have the paper. So why not, why not simplify life for the customer and say it's, you want some at scale with that? And to me, that is making life friction-free or at least removing friction for that customer. And you're targeting the BI data user that doesn't want the complexity of dealing with. And that want their own visualization. They want the, they want to use the tools to access the data that are right for their company. So, you know, we don't enforce our own visualization layer. We have none because Excel and Tableau and Click and Business Objects and MicroStrategy and Cognos and Custom Written Code. There's a bunch of different data consumers. And so what we want is we want to expose all that data that we're landing in Hadoop, that beautiful data. We want to expose it virtually to all those tools and give one place to secure it, one place to manage performance, and one place to define the semantics of that data. What do you need it? So you solve the headache of the management of the data, but you also provide that vitamins to provide growth. Yes, yes. Quick question. Scott, you talked about data in motion, data at rest. It's been a big theme, you know, with Hortonworks data flow and data platform. And we want analytics, you know, to join them. And we have multiple types of analytics. What else might we, without, you know, naming names, what other types of analytics should we expect to see as a bridge between the two? Well, I think, you know, the one thing that we're seeing and Apache Metron is one example of one of those applications where you're able to combine traditional data at rest and events that are happening in real time. So you've got machine learning code that actually, in that case, looks for the bad guys, right? Looks for cyber threat potential and continues to model what those patterns look like and then applies those continuously updating patterns to data that are streaming in real time. And the value behind that is you now, instead of looking at your historical data to understand was I hacked? And if the answer is yes, it's too late, you were hacked. To am I being hacked and can I do something about it now? And so that's a great example that's kind of horizontal across industry, but of course in a connected consumer kind of world, connected consumers want it now and they want it to be relevant. And it can't just be, hey, you're a platinum customer, so here's the list of stuff I do for platinum. It's got to be, you're Scott, I know everything that we've done over the course of our interaction and here's how you want to be treated and therefore these are the treatments I'm going to deliver to you in real time while you're on your phone, while you're in my store, while you're on my website as a seamless experience. So just really quickly, Metron then is, it's a very horizontal, widely applicable app and it's a design pattern for other ISPs. It's a design pattern. So how do you take analytics and execute analytic content in real time? And so whether it be in a business, a consumer business, how do you move offers from post transaction, hey, you bought this, why don't you buy that? To, gee, you landed on my website. I think this is probably what you're looking for. Let me serve that up to you now. Okay. So there's another angle too, is like one of our customers, one of our common customers is predicting churn, right? Churn is the key sort of metric of profitability for them is to eliminate that or lower that. So they have data scientists who are writing algorithms, machine learning algorithms to be able to predict and alert. But the data is gigantic. We're talking about half a trillion rows of data that need to be analyzed. And they're using Hadoop and AtScale to actually explore the data to actually inform the algorithms they're writing for their machine learning that are using streams. So it's the combination of I need to use this virtual cube or I need to use this dimensional model to even understand how I even author my algorithms to predict that churn. So it's a perfect blending of looking at the granular data but then informing it and putting it into production as a constant running process, pretty cool. Dave, Scott, thanks for sharing the insight on the cube, really appreciate it. Good to see you again. A lot of technical action going on. Good partnering benefits to customers and great to see the success. Congratulations, Dave, on your startup and continued success. And Horton was good news there. It's the cube here live in Silicon Valley. We are three days of wall-to-wall culture. We're in day two. I'm John Furrier with George Gilbert. You're watching theCUBE. We'll be right back with more after this short break.