 You're settling in. You're seeing the landscape start to really mature. Yesterday, the theme was it's coming of age. And so what's your take on that? I mean, I see people starting to settle in. Cloudera made their move. You guys are staying on your course. What's the update? So I think there's been a first generation, particularly in the Hadoop space of workloads. I think we're really now at the beginning of the next wave of innovation that can happen on top of the Hadoop platform. You know, when I look at the market space, traditionally people have looked at classic Hadoop market as well as no SQL market. And you're going to begin to see those sort of collide a little bit. Why? Because Hadoop can now run interactive online, real-time streaming in Hadoop, right? So how the market plays out. I think there's still more dynamics in the market to play out, more changes to go. But I think from our strategy perspective, I think the original bet we made a couple years ago when we founded the company was 100% open source, create a platform for this next generation data, computing environment, and really partner deeply with partners. I think that's starting to show some evidence, particularly with some of the announcements from Microsoft, Rackspace, and others. So can we go back to that original sort of business model discussion? Because I think it's coming back full circle now and that business model discussion is getting a lot of attention in the press. When you guys spun out of Yahoo, you launched Hortonworks, you said, okay, we're gonna be 100% open source, we're not gonna develop proprietary software on top of anything, we're gonna go 100% open source and we're gonna sell services. People obviously ask the question amongst yourselves, I'm sure, how are we gonna make money? The answer was services, right? Through software subscriptions primarily. So we are an enterprise software vendor, flat out. We just happen to deliver our support maintenance services as an annual subscription, right? My background has been early days, J Boss, Red Hat, et cetera. So I've been in this enterprise open source space for a while. The model of whole back manageability and upsell that is commercial. I did that back in 2004, 2005, right? So that was sort of the one over rendition of a business model around open source. This market opportunity around Hadoop and Big Data is so large, our focus needs to be on making the market, making the market function with the big players who can extend their value prop with getting all data under management as part of solution architectures. That's why we very directly said, don't hold back the manageability, it will slow the adoption and it'll tend to stall the market. So your bet is that, okay, that's gonna allow adoption to go faster and rising tide, lift all ships, we'll benefit from that by selling our subscription services. But when you look at the various business models, hardware, software and services at the highest level. I mean software's got marginal economics get down to the cost of what the delivery mechanism is. 99% gross margin, whatever it is. Hardware, maybe you got a 60% gross margin, you got product costs, whatever. The services business model is different, right? It's as you grow, you got to add more people to service people. Let me clarify that. Help us understand that. So our business, and we're about 200 customers now, so year and a half ago, GA product, fast forward, 18 months, we're about 200 customers. Out of that, about 70% of our business is on software subscriptions where people use the Hortonworks data platform, need support, patches, updates, and want to work with the experts who can drive new feature function from a roadmap perspective. So that's my question, how labor intensive are your services? That is our classic software model where we just don't charge for a license up front, it's kind of like pay for your license. It's a maintenance model. If I think about Oracle's business, the fastest growing part of Oracle's business is the maintenance business. That's the piece that you're in. 30% is training and consulting, this is what you categorize as services, that's lower margin, and that's a smaller percentage of the business. By the way, it's good business, but it's good to do that, it's a good cash flow, but it's just economies of skill. So essentially you have the same marginal economics as a software company. Exactly. For that 70%. Exactly, and our goal is 80%. That's how you run enterprise. Yes, exactly. So like I said, we are software. Shall I talk about all the rage of the data platform? Obviously that's the focus on everyone's even talking about across all companies, it's about the data platform, which is a lot of apps, a lot of decoupling, highly cohesive elements to it, not produce a hive, you name it, it's in there. So you're separating all that stuff out, yarn in particular, for you guys, right? Talk about on that licensing thing, the discussion, the Storify conversation, because yesterday they had a good presentation, but a lot of conversation around Spotify, I'm sorry, not Storify, Spotify. What are they doing with you guys? Because it was an article that was written about those guys on GigaOM, they were in Cloudera. Can you clarify that? We want to get the scoop on that. Sure, so my takeaway from the GigaOM article was less about who's competing with who, I think Cloudera will manage their own strategy. It was really more of an understanding of how open source model works, is people download the technology, deploy it for free, Spotify is right around 700 nodes of a Hadoop cluster. They're doing a lot of music matching and other things. There's great blogs and it was sessions here at Strata that talked about their use case. They, when it came time for them to opt into a support relationship and a relationship with a vendor that can help them drive more features into the platform they're relying on, they chose Hortonworks for that relationship. So they are a paying customer, but on Monday, the kickoff of Strata, we actually had our customer advisory board just down the street and they're one of the customers on our customer advisory board. They're looking at advanced streaming analytics and so what we're doing around Apache Storm and our plans to bring it in the platform is of utmost interest to customers like Spotify. But how the open source process works is people get the technology, no roadblocks to use including manageability, and then at a point where we earn their right to conduct business with them, then they can opt in and then every year is a voting year. We're a subscription model and it's incumbent upon us to serve them well and next year their relationship likely expands because that's how the model works. Talk about the real-time piece of it. You mentioned Storm, there's Spark out there. What else is out there real-time? So the breadth of functionality that we see directly is clearly MapReduce, Batch, Interactive SQL, HBase and Accumulo for online databases. So arguably that's fueling sort of web applications, mobile applications. We happen to look at the real-time streaming in the Apache Software Foundation. We chose to lean in on Apache Storm and begin to integrate that into the platform. Why? Because it's been out for a while. It has a good vibrant community and it can provide that engine for sensor and machine data for like telco, healthcare and insurance industries that we want to go chase, right? There are other technologies out there that can plug in the yarn that are either commercial or open source and we encourage that, right? But we happen to make a bed on Apache Storm to actually bring it in as a first-class data server. We've seen the rise of Accumulo. Obviously, Dave and I have been following Squirrel, the folks there since they were startup and obviously Accumulo is getting a lot of attention. Why is Accumulo so the rage right now? Give us a background on that. Sure, so I think Accumulo's value prop, it's very similar to other NoSQL databases like Apache HBase. I would argue HBase is probably more mainstream appeal to it. What we see particularly with Accumulo is a lot of interest in the federal and Intel space. So we have customers who we serve with Accumulo. We have a couple of the committers for that project as Hortonworks employees so we can drive features, fix bugs, et cetera. Security is one of the value propositions, but it'll be interesting to see the dynamics in the open source community because the HBase community already has Jira's and features slated for future releases that begin to address some of the security notions as well. So we tend not to be religious in that NoSQL space. I like to describe the Hadoop market as let's say it's a billion dollars, right? And the NoSQL market is about a billion and a half. My Italian background, I describe it as thin slivers per jute though, right? For each of these NoSQL databases, right? So it's a fragmented market. It's a harder one to sort of play out. I think MongoDB, those guys are doing a great job and they have a clear lead in that space, but the rest of that market is highly fragmented and very application specific. And so some of those will come into Hadoop as first class citizens into an enterprise Hadoop platform. Yeah, and Acumulo obviously doing well in the government and a lot of places they really can't talk about much. And insurance I think is another one that's emerging as well as some hospitality use cases. Welcome back to the business model. It's been argued that you guys bought the business at this company or that company, but from what I understood, what you just explained is you didn't charge an upfront license, you charge for the maintenance. So hence the overall cost is going to be significantly lower. So you're bet now, thinking about the TAM, thinking about the market opportunity, I can see a lot of people saying, oh, well, that means there's less of a revenue opportunity, but you would, I'm presuming, say, no, wait, the market's so much bigger. So if we can triple the number of transactions that we do, even though we're selling them at, you know, let's say half the rate or whatever percentage of maintenance is of the total over the life of a contract, the maintenance is actually going to be most of that. And data under management is very valuable. So it isn't, and it's not a knock on red hat with the open source operating system model, which is where they started. But this market opportunity is much bigger than that. But their same trajectory that made the inflection was if you get strategic partners like Teradata, SAP, you know, HP and others reselling it and pulling it into the market, it'll make that market faster. And inherently we'll get more subscription volume. So you're saying you're following the playbook of the early days of red hat, but longer term, you're not gonna veer off in the path that they've gone with developing sort of proprietary layers on top. Am I understanding that right? Or is that your commitment or not necessarily? I think if you look at red hat, their software is 100% open source. Yeah, absolutely, right? It's available under subscription and they do distinguish between upstream community innovation and downstream package enterprise product. We do the same thing where the Apache projects are upstream innovation. And then we package stable releases. That was why this Hadoop 2 release was so critical is it needed to land GA in the community first. So you would argue then your model is very much like red hat? I would argue yes. What do you make of canonical and their efforts with the Buntu and the disruption that's going on and with red hat right now? What are your thoughts on that? I think that's a much different stage of the market because Hadoop is making this market and there's still a lot of runway ahead of us. You know, red hat and that the Linux market is very mature, right? So let's have that conversation in another five years. Yeah, kill me with that problem. But right now I think this market has the opportunity to be very big. The other thing I would say is Hadoop's influence on the larger big data market, let's say is about 50% give or take based off of estimates. That's not just the software, that's hardware, that services, et cetera. So it's a very influential technology. It could be higher over time. It's very disruptive, right? So talk about the two other things we hit yesterday were pretty hard hitting was enterprise ready and then apps. Apps seems to be analytics, seems to be the killer app, obviously everyone's talking about analytics and GI. But the big data apps are kind of native everywhere. It's not like there's an app market for big data. It's pretty much everything. Talk about how you guys talk about it internally, the Hortonworks, obviously having a platform you're enabling innovation. So I want to talk, you talk about one, the new stuff like apps that you see and then two, enterprise ready message that the customers are sending to everyone. Hey, get enterprise ready. What does that mean to you guys? How do you talk about that internally? Right, so the app's story, we're clearly focused on being an enterprise data platform. So in many respects, we're horizontal, right? But what we want to do is publicize and work with partners as well as customers who are delivering vertical solutions on top of that and get people to tell those stories like Charles Boise University of California Irvine Medical Center talking about how they're actively collecting patient data when patients are at home so they can do better predictive analytics on healthcare delivery and things like that. I think, again, these applications are very horizontally focused, but very relevant in financial services, insurance, or what happens. Well, you guys are horizontally, the apps can be vertical. Yes, the apps are definitely very vertical. So you guys have no, you know, you don't project any kind of requirements other than just use the platform. And what in the platform is the key? Let me relate it to our Apache storm investment, right? So we don't want to chase the complex event processing market horizontally, right? What we want to do is we want to specifically target sensor machine data processing in a handful of vertical markets to round out the enterprise capabilities around storm for those particular use cases, right? So whether it's telco, you know, large volumes of mobile interactions, things like that, or telematics as it relates to insurance so you can optimize your car insurance or things like that. Those very concrete use cases are what we're interested in adding additional feature function into the project around it out for enterprise use cases. So that's one is listen to the market, listen to the early vertical movement and begin to round out the open source projects capabilities in those areas. Number one, number two is we spend a good chunk of time on APIs integrating with the operational tooling in the data center, the development tooling in the data center, and the data access and data movement integration with partners. So REST APIs around Apache and Bari and how that integrates with OpenStack or System Center or Terror Data Viewpoint or other management systems is an enterprise feature. It's an important feature for enterprises that use their existing tools and skills, right? H catalogs APIs for enabling people to access the data from RESTful calls very easily. That facilitates integration with terror data on high speed data movement, things like that. So there are features that you add and APIs that you add into these projects to address that enterprise readiness point. There's more work to be done. Actually shortly, probably in a couple weeks you'll be seeing us when hortmarks.com slash labs section of our page. We display multi-phase roadmap investments around interactive SQL with Stinger, around real-time stream processing with Storm. You'll see security as another area where there will be a holistic sort of expressed viewpoint on areas of investment so we can recruit others in the community to work with us and accelerate the movement there. Yeah, the Hortonworks blog is actually a great resource for people. You got a lot of, you know, practitioner knowledge there. And you guys published the roadmaps. I mean it's... That's the other aspect is the source code is out in the open, but the roadmap, the investment roadmap is out in the open as well. Why? Because that helps us recruit others like Microsoft and Facebook and others to invest in making the things a reality. Talk about the maturity of the deployments. What kind of size clusters you've seen. Obviously, you know, depending on who you talk to, oh, I got a 10, 20 node cluster here, 200 node cluster. All right, and then the 10 thousands, 10s of thousands. What are your customer deployments look like? So we range currently from clearly the 10s of nodes up into the thousands and thousands and thousands of nodes that you would get at Yahoo, for instance, who's been running the Hadoop 2 stack in production for almost about a year now, right? So the cool thing about this Next Gen platform, in their cases, they're actually able to decrease their nodes from 45,000 down to 32,000 because you get twice the performance and twice the number of jobs that the new architecture enables you to do. So it gives them more headroom and enables them to actually manage and decrease their footprint. But that's sort of the scale is 10s of thousands to just people getting started. I spend most of my time, particularly with mainstream enterprises, is those 10s of thousands, you should just take the comfort that the technology is proven at scale in those scenarios. But let's talk about mainstream enterprise, which is not clearly 10,000 nodes out of the gate. It's 10, 20, 40, right? And then they graduate up. What we're seeing in our renewals, just as a data point, is a year later, we're seeing three to four X growth in the clusters that they start off with, right? So it might be 10, 20, then gets to just below 100 or thereabouts and then grows from there. So from a dollar value standpoint, your renewals are over 100% presumably? Yes, right now, yes, exactly. Yep, nice. It's so it's 100% and then up growth on that. And that's how you drive the open source model is you earn their business. So if we scare away a potential user of the technology from a free perspective, from engaging with us on a relationship, we have lost something. So that was my reaction to the gigaome article is where the statement was made. They didn't lose anything by Spotify going to where it works, yes, you did. You lost the right to serve that user and enable them as a customer. I mean in traditional software terms, you lost the maintenance stream. Exactly. Okay, so we got a break here, I want to ask the final question. What do you think the outlook's going to be going forward? Obviously you guys are staying here, of course it's coming, but in the industry as the landscape starts to harden a little bit in some areas, people are building, go to the next level. What do you see the evolution going for the next year? So I think we'll see, I mean particularly what we've seen over the past year is vendors like SAP and others, more of the traditional vendors getting very concrete around the solution architectures and expanding their ability to address data under management including Hadoop use cases. So those who have shied away from it or have been tentative, I think there's a leaning in process that's started and that'll continue. What we'll also see is, and we've been clear about it, is Hadoop is one important, but just one data system in an overall modern data architecture. And I think we'll see that manifest across a variety of different classic vendors as well as customers who are actually telling their stories about how they're integrating real-time mobile applications with classic Hadoop data processing in modern architectures and I think that trend will continue. The third, just selfishly they like to see and if you talk to a Rune Merity, one of our founders, is more and more data processing engines that run natively in Hadoop. So you get more out of the data that you put there. You get it all in one spot, you need to interact with it in multiple ways and it'll be interesting to see the innovation that happens this coming year around. So it sounds like we're getting to a data OS model. Exactly, exactly. Okay, Sean, always a pleasure to have you on theCUBE. Obviously great insight, obviously the strategy piece is always interesting, but also people are making money. So it's a good year for the business, things are growing. This is theCUBE, we're right back live in New York City for big data NYC coverage and we want to thank you guys for supporting us this year. Hortonworks and Wendisco, you guys stepped up and support theCUBE, underwriting their independent coverage. We're actually outside in front of the Hilton at the Warwick bringing you all the conversations and we have a CrowdChat, go to crowdchat.net slash stratoconf and you'll find a spam-free environment where you can interact with the conversation that you hear in theCUBE and throughout the show. So go to CrowdChat to communicate with your friends and thought leaders, we'll be right back with our next guest after this short break. theCUBE is...