 Valley, hard of big data, which is the O'Reilly Stratocons. I'm John Furrier, the founder of Sylvanango.com and joining my co-host Dave Vellante. We're at Sean Connolly, who's the Senior Vice President of Corporate Strategy at Hortonworks, on the executive management team at Hortonworks, which is pioneering the big data, fast follower to Cloudera, really putting out the open source and fueling the open source movement around the Duke. Sean, welcome back to theCUBE. Thanks for having me, John. So Hortonworks, obviously, is growing like crazy since the Yahoo kind of step out or spin out, however, was classified, has been growing like crazy. You guys are funded by Benchmark. Great success with your data platform that you guys have been initiating. 100% open source business model. And really, the credibility level is really high with you guys in the community for a variety of reasons. And so, you know, congratulations. Obviously, we cover the Hadoop Summit for you guys as well as Strata and Hadoop World here at theCUBE. So we're very intimate with what you guys are doing. So a lot of updates to share. So one, just tell us through quickly update on what's going on with Hortonworks as recent announcements, traction, et cetera. And then I really want to ask you, Dave and I want to have a conversation around the big splash by EMC Green Plum and Intel in particular. The two big money players coming into the industry, EMC and Intel, you know, throwing their weight around a little bit and, you know, taking a little bit of different approach. Some are saying proprietary, we're saying open source plus, however you want to put it, they are attacking the marketplace in an aggressive way to get a position and certainly called out Hive and Impala, specifically over the past announcement. So update on Hortonworks and then we'll jump right in and talk about what SQL means in this new world. Sure, yeah, so the past nine months or so in particular the last quarter or two have really been accelerating just from an industry perspective. You know, one of the things that the last Hadoop Summit in June, we had Jeffrey Moore crossing a chasm there. Market indicators are we're on the right side of the chasm. Just mainstream brands that are actively seeking out deployments. They're more educated on what they want to accomplish. So we're concreted. So it isn't a kick the tires, you know, science project. It's beginning to get clarity and sort of repeated use case patterns. In that space, I tend to sort of categorize things as refined explore and rich use cases. And so refined to sort of get it all in one spot, transform it, right, ship it to downstream, EDWs, so a partnership with Teradata fits into that and we're seeing that rinse and repeat pattern. The explorer, frankly, is what a lot of the SQL stuff, the excitement on how do you interactively, get your hands around the data. And then the enrich is really how do you build intelligent applications, deliver, you know, analytic models to online apps to influence behavior. Are those sequential? So it's, you know, six, nine months ago, we saw them as sort of three different patterns. Right now we're actually seeing is you graduate from one to the next and the next. So it's actually a crawl walk run strategy. And so they are, there's a connection. Well, why is that important? And it gets to the, you know, at Hortonworks, one of the things we've been driving is the yarn aspects of the Hadoop 2 platform that actually have, you know, a richer cluster operating system to support multiple workloads. So if you think of Refine Explorer and Rich, that's batch interactive in one line, you want the platform Hadoop, which most folks just think of it as it's batch MapReduce, right? It isn't, it's a broad platform that spans batch interactive and online application needs increasingly, particularly as you plug things in. So continue to innovate that platform forward so we can serve those, you know, higher speed workloads is important, but do it in a good citizen way. So if you need data in a fast way, you're not going to monopolize the memory and CPU if you're also doing batch processing at the same time. So you need to co-exist in a manageable way. And that's really an area of folks. So Dave and I, yes, they were talking about, you know, top-down or in bottom-up organic and, you know, top-down. When they meet in the middle, all the magic happens. Real accelerated market acceptance, growth, et cetera, on the product side. And really what's happening, and we've seen it with Oracle on one hand, Oracle has the purpose-built infrastructure, and then you got everyone else kind of open-source multivendor. Here what's interesting is you have EMC and there's a lot of customers. Not like they're new to the business. They have a lot of enterprise customers, big customers with storage. And then you have the community innovating underneath. You have innovation coming up from the bottom-up, which is an open-source business model. And then you have EMC, which is kind of top-down. They're on a collision course, right? So we want to document that. So what I found interesting about the EMC announcement was the software-based approach, no appliance. That got good marks, you know, aggressiveness aside. You know, they had that, that was very positive. Same time, they have a lot of customers. So proprietary's been kicked around. We're calling it open-source plus. So I want to ask you, okay, where is this collision? Do you agree with that statement? And two, is that a short-term snapshot of a market they're just going to have to serve with a business intelligence, which makes a lot of sense from a business strategy standpoint, is it obviously different than what's going on in the community where apps need to be data aware, data as code, as we introduced this morning, a concept called data as code, where the data itself is not just one data mark. So you have an interesting kind of positioning here. How do you look at that? What's your point of view? And obviously, and if you can comment, specifically with the top-down bottom-up. So at Hortonworks, like I said, we're focused on that Hadoop platform for mainstream enterprise use cases. But a key part of our strategy was enabling it to interoperate with the ecosystem. That was set very early on. So you mentioned EMC, I'll also mention Teradata, right, with their unified data architecture, Microsoft, right, with their approach on integrating Hadoop in Azure as well as with their SQL Server and BI tool, right? So, you know, it's, yes, our focus is on enabling any and all of those types of use cases because that's what enterprises want, right? So there's not really much religion around that. I think we all see that we want to make sure we accelerate the enterprise adoption of that, right? So that's sort of first and foremost. With that said, having been at Jboss and Red Hat and Spring and other places, as I like to say, the open source tide rises inexorably, right? So it'll continue to get better and better and better. And so in the case of interactive SQL, as an example, we've chosen to double down on the Apache Hive community in order to make that solution, which operates at large scale, for operational batch processing, but to move that into human interactive use cases where you can do visualization, ad hoc reporting and those types of things. And we've gotten very strong reception from customers who already invested in it, as well as the Facebooks, Microsofts and others who are working in that community on wanting to roll up the sleeves and move it into that space. Because at the end of the day, it's been the de facto SQL interface and just one sort of final point. It's been the de facto SQL interface, but as I like to say, it's also fortunately or unfortunately been the pinata that people beat up on, right? The kid in the schoolyard being bullied. Yeah, and if you look at a lot of performance comparisons. So let's all, just one second. So the kid in the schoolyard being bullied or the pinata, as you say, the candy will open up and so was it warranted? I mean, what's your take on that? I see Green Plums saying the performance is significant and we have Donald Miner on Twitter. He's going to give us the data on the benchmarks. I mean, what, I mean, they have numbers. How do you address that? And Cloudera kind of agrees with you, sort of. Right, yep, yep. So we're trying to squint through that. Exactly. Customers are too. So there's a couple ways you can look at it. Right now, Hive is used in petabyte scale use cases and you don't want to lose that, right? But what you want to do is for either the smaller data sets or for the more human interactive scenarios, make it faster, make it more responsive, get it down into the second response time. Some people use the term real-time, I don't like that because real-time to me means sub-second machine to machine. But it is, it's human interactive use cases. Number one. Number two, a lot of the performance comparisons tend to configure Hive against just the plain text mode, which it totally supports, but it has optimized file systems that many of these new ones also have. So it's an apples to oranges comparison, fortunately or unfortunately, and we'll be investing in that area. Put your message to the enterprises out there that have a relationship with EMC. Obviously, EMC's business is pretty clear. They're targeting a market that's addressable in the data warehousing business intelligence market. It's SQL-based, but there's a world beyond SQL. But EMC's putting the bridge across their chasm, which is saying, hey, EMC customers cross with us. Don't worry about anybody else. We got you covered. I mean, that's a credible message, assuming that the products can be delivered, but what's your take on that? It's viable. So when you look at an Xgen data platform, SQL is an important use case. It's not the only use case, all right? So there is a large population of users who want to interact in that way. But if you look at the original incarnation, it was like the SQL, the structured model didn't map well and that's why MapReduce and Hadoops was generated in the first place. So the point there is, and that was one of the things I saw on Twitter was like, people abandoning MapReduce, heck no, right? Because it does a lot of the data refining at scale that's required, but you do need to appeal to these new folks. So you'll see the Green Plum solutions, the Aster solutions, the Vertica solutions, all rowey around you. It likes more people to the party, right? Exactly. So how does, you mentioned Microsoft SQL Server before, how does it fit into this whole model, this hive piece of it? Can you sort of explain to customers how that all works? Sure, and a little bit of is maybe a little inside baseball talk about some new technology. So definitely SQL Server, it's out there in our recent announcement of Fortmore State platform on Windows, sort of helps you deploy a consistent footprint. If it's SQL Server and Hadoop in the same servers, you can have that as an option now, right? It wasn't previously an option. But if you look at their PDW solution, they have this new PolyBase layer, right, that sits on top of that. That is a, that's sort of like a universal translator for SQL that can sit on top of PDW and Hadoop, right? Awa Hive and federating queries so you can get data out of either system easily, right? So the end user is oblivious to where the data comes from, right? But so why is it important to speed up hive in that scenario? Last data in is a rotten egg, so to speak, right? So it's as slow as the slowest query, right? So that's why we've been working with Microsoft on optimized file formats and really speeding up the core engine of hive in the community to make sure that those use cases are blazing fast. So that's why our Stinger initiative, 100X Performance One Hive is what we're after here. We're going to bring you back on the cube later for another segment, but we're getting, we're under a lot of time pressure because of the keynotes we're getting over. But I want to ask you one final question. This is kind of more of a 20 mile stare because there are a lot of CIOs in our audience and also developers, right? So what EMC is basically saying is, hey, cross this bridge with us, we'll take care of you. Don't look behind the curtain, but we got you covered on SQL. And I like their announcement, but my only criticism of it, it felt like they're making the horse and buggy faster. Okay, and that's great today in data warehousing. But one thing that we've been talking about in the Cuban and with Wikibon and SiliconANGLE is the new use cases, those new questions, those new data resources that are emerging aren't SQL. They're new applications, it's data as code, a term we coined today. And one thing that's interesting is I haven't heard from EMC is what their plan is outside of SQL. So I want to ask you specifically to talk about one point of view. If you're a CIO and you're an enterprise and you're investing in the future, business value is on the table right now. It's the number one conversation, not making data warehousing faster or a few purpose-built queries. I need to be positioned for the future. So does the EMC bring that to them? What, can you comment on that? Intel's out there as well. They don't want to foreclose the future. So what's your point of view on this new future? So, and then we'll wrap it up after that. Exactly, so I think taking a point of view where you're allowing all the different flowers of access to bloom will win out in the long term, right? So, your point is the SQL sort of use case is fine, but what about the platformers who are inventing a new way of just visualizing on top of that? Or the sasses who are finding ways of running their analytics in the Hadoop stream directly. They're not traditional SQL use cases, right? But they are going to deliver very targeted enterprise value to the customer. And that's a very interesting value proposition to unlock. And so, like I said, SQL's the hot thing right now. Real-time, other alternative patterns are going to be emerging in the next quarter. It's not big data, it's big answers, but it's also, which is one of the things that they talked about, which I like, but also, one thing that they didn't talk about what we said on theCUBE yesterday, it's about new answers and the right answers. And that's, I think, what's interesting in the business value, and that's going to be, I think, the future of the data platform. So, competitive space, final comment we'll break is what's your take on the whole competitiveness and what's Hortonworks going to do to compete? So, it's clearly an important market. So that's clear out of a lot of the competitiveness. What we're going to do to compete is we're going to stick to who we are. When we look ourselves in the mirror, we represent a community-driven Hadoop platform and integrating it with existing enterprise systems and solutions, and we feel strongly that if we stick to that knitting, we'll win in the long run. This is not a sprint, it's long distance running. Okay, Hortonworks talking about their response to all the competition and different approaches. We'll be right back with our next guest. Thanks for coming on Sean Connelly's Senior VP of Corporate Strategy at Hortonworks. We'll be right back. Yep.