 Okay, we're back live here at Duke Summit 2012. We're getting down to the end near day two in Silicon Valley. It's the heart of big data. And this is where all the action is. Infrastructure activity, big data analytics, and the business side's exploding, and all the engineers are here in business cards being flipped around. Recruiting, you name it. Developers being hired. I'm joined Jeff Kelly with bookiebond.org. Our research team, she leads the big data analysis. And our guest here is Ari Zilka, the Chief Product Officer of Hortonworks. Welcome to theCUBE. Thank you, great to be here. Well, we had Eric on earlier, and he said, you guys, you've been close to the company from the beginning. They know, you guys know each other. People know you, timing was right. So tell us when you joined Hortonworks and how that all went down. Right, I mean, Eric's not exaggerating. I was telling people here at the show that six years ago in the middle of China, we were invited from different companies to speak, and that's where he and I first met. He was teaching people Hadoop. I was teaching people EHCache, and then the founder of MySQL went after us, and then the founder of Cray went after him. So literally thousands of people listening to us talk about these big projects. I joined Hortonworks officially in March, and, but I've been following it since day, like week negative one of the company with Rob Bearden as leader, business leader, and Eric as technical leader of the company. So a lot of things, obviously years happened since they kind of spun out of Yahoo. Rob's come on board, team's expanding. But one of the big issues right now is the market's exploding, right? So last year, the silly conversation was Hortonworks Cloud Era. Okay, it's Apache, everyone knows what's going on, right? So it's a big deal. No real war there, just kind of just friendly competition, but the expansion around that marketplace has been pretty incredible. So that puts a lot of challenges on strategy and product. Take us through your mindset as you look at the landscape now that you're on board full-time. Since March, you got to have the 20-mile stair, you got to know the mountain you're going to climb, and the next one. So what's on your mind? Share with folks what you're thinking about. Sure. My mind is basically focused on, I mean conceptually, divide and conquer. How do I split up the problem into one mile ahead, five miles ahead, and 20 miles ahead? And how do I let the community do the thing it's doing here today at Hadoop Summit, right? And just build great concepts? And how do I simultaneously come behind them, seamlessly turn it into enterprise-viable stuff? Because when I say enterprise-viable, of course Hadoop itself is enterprise-viable. It's big data, it's reliable data, and it's fast access to all your data. But what it isn't is easy to consume for the average Java guy, the average Oracle guy. And to me, it's like how do we add value by getting out in front of the enterprise and basically spend the next 12 months on what I call make it safe in production so ops guys do not screw it up? Because if you're Yahoo or Twitter, you can be taught and you can talk to each other. What are your best practices? These are my best practices. But when you're Joe, whoever, Joe Enterprise, you can't talk to those guys. You buy their books and their books don't give the whole recipe all the details that it's the cumin that you throw in on the side that keeps you up kind of thing. So we're codifying that for ops. We call that safe for production. And then unlock the power of your data, meaning enterprise data. Join my SQL data, my EDW with Hadoop without having to give up my old and replace or forklift everything into Hadoop. How do I kind of join the worlds together and leave everything in the silos I've got today? That's sort of the one mile ahead. The 20 mile stair is let the community do its thing a bit but I'm gonna have to double back to that guy and say that swim lane and say, right? We've gotta make sure that community and innovation like Hadoop 2 doesn't get so far off course that we have to do a Herculean effort to bring it back into enterprise consumability every single time. And that's something we'll start to tackle this year with Hadoop 2 itself. Hadoop 2 is an alpha. We've gotta basically say let's bring it towards center, towards mainstream, towards the proper side of the chasm and let's simultaneously let it run out a bit at the same time. Yeah, a lot of product challenges. So on the priority list, that's the beginning of this show yesterday, Rob was saying it's a long list of things to do. Businesses you're growing. What is the biggest challenge right now in terms of product features? I mean, obviously high availability, you knock that down right away. That's top, you know, another job. One name node too as well. That was good. MapReduce too and name node problems, issues. What are the big issues? Are they still the issue? What's next? HA is purely an educational issue now. Cloudera has a solution, we have a solution. theirs is for Hadoop 2, ours is for Hadoop 1. We'll meet together in the street and basically give everybody a go forward. Both of us have good ideas. We'll bring them together to your earlier point. But HA is education at this point. We know how to solve it now. Monitoring management, we've got to make that completely open, completely pluggable. There are vendors who hold it back and say this is paid or differentiated. It cannot be differentiated. If Oracle had no tools, it wouldn't be where it's at today. If Sybase had no tools, they wouldn't be where they're at today. The DBA is an established function. The Horton, I don't want to say Horton works. Data administrator is a function, but the Hadoop cluster and data administrator is a function that's going to emerge out of DevOps. So tooling is very important, but very tactical. So I don't want to say it's not interesting, but it's a consumable digestible problem. It's actually data integration services, data movement, turning Hadoop into a fully partnered and complimentary add-on to all the data management tools that people already have and turn it into what Sean Connolly, our head of strategy called a data refinery in yesterday's keynote, that refinery has to emerge. Because it's something no one has, it's value add, it gets rid of the debate of Hadoop versus X, Y, Z, and it makes it Hadoop plus everything else. And that's the big challenge is making connectors, making friends of every other complimentary and related data management technology around us. Yeah, and that's another thing. I mean, you mentioned DevOps, right? So we followed DevOps. In fact, we launched a section called DevOps Angle, dedicated to DevOps, but DevOps is a moving train, right? For many reasons. One is the one you mentioned in the internet world, web scale, yeah, DevOps, they kind of know each other. But outside that, they hate each other and they don't talk to each other, or they love each other, but don't talk to each other. So pick your scenario. But the point is, Dev guys should not be playing with operational stuff because ops can't go down. That's kind of agreed too. So you got to abstract away the complexities. So that's clear. So will DevOps be more of a data function? So talk more about your vision around this. Developer who needs to manage data and data integration, because maybe that's where DevOps will sit. And this Hadoop administrator, talk about your vision there. Yeah, Hadoop cluster administration needs to just be straight ops. You know, Hadoop cannot introduce, and it does not today, introduce new paradigms into the data center. So it uses standard networking, standard commodity off the shelf machine, standard operating systems, and standard services, in the case of Hortonworks, to make it all highly available. So I'm really focused, like you're saying, on Hadoop data administration. And that's where DevOps is sitting and will sit for the next two years in my assessment. So cluster is ops, data is dev. Yeah, yeah, but the challenge you have is that in order to interact with your data in this clustered, scale out Hadoop world, you need workflow, you need scheduling. Like I want to basically compose a job that's made up of joblets, if you will. And these guys wrote jobs in this department and those guys wrote jobs in that department, and I can use all their work, join it all together and create some thing on top of them that's even more value add to the business. So a simple example is a dashboard. So I take every department, they build all their telemetry they need for intelligence through Hadoop, and then I wrap it all together as an enterprise level dashboard. How do I do that in a way that's sustainable, maintainable, that I know my artifacts are there at 6 a.m. when I try to gen up my dashboard view? And so scheduling is important, workflow is important, ETL is important, ingest and output and streaming are important, CEP becomes relevant all over again. Right, CEP was the fad about five, six years ago. CEP fills a gap in Hadoop just like it did in the relational world. All of that together, that is the place, the domain of devs today, that has to become tool enabled over the next two years. So it's not this dev ops thing, it's ops has clear artifacts, the schedules are maintainable without a developer coming in, the sub jobs are maintainable, compartmentalizable, secureable without dev coming back in. So you basically have to say, I need to start up leveling the ops tooling out of cluster management into schedules and work artifacts and then all the way up to, at some point devs are writing widgets that then analysts and scientists are using to assemble work with ops people. Let's talk about something that we're seeing with this Horton data platform release you guys put out interesting features. Obviously, age catalogs got our interest, that's nice, nice feature, you need to have the metadata. Monitoring's interesting, you have the new project out there with the monitoring. The next evolution is analytics, right? So you get the monitoring grade, you see more of that. We have a lot of people pushing analytics visualization per cost per seed. Is there areas like that where you look at the community and you go, guys, we're going to have to come in and just lay this all down for free in the platform. Because there's demand for analytics and visualization where you're going to have to come in and do the right thing and put it into the platform because there are people that want to do visualization but don't want to pay the tax and this is no offense to the guys out there doing the visualization. This is just the reality is that the developer community wants that. So how do you deal with that? Tell us something, give us some good news. Oh, yeah, yeah. I thought about it, we've thought about it and I think there is good news there. You could do a better best kind of approach. So we're going to have to pull a lot of stuff that's causing swirl out there. Visualization's a great example where it's a free-for-all. Visualization guys are coming into the platform and saying, I'm going to integrate Hadoop into my visualization stack and hand you a turnkey product. And lo and behold, the Hadoop that I designed around is not the Hadoop you guys all deployed out in community. Something's different every time, be it versions on a simple level or on a complex level, just the usage models and the best practices aren't there when they embed a set of assumptions around how Hadoop is consumed. So net net basically, we know we need things like H-Catalog, we know we need things like ETL tools and data integration services on top of Hadoop and we will do the work there, JDBC, ODBC connectors, but you're right, it will not be sufficient because the visualization guys need to come to us and say, let's design a set of standards so that the visualization guys can build everything they want to build. And the only way I can see forward here is that we basically build a free version where everything is pluggable and we define a set of standards. Let the market innovate, let the market innovate. You're not really going to kill anyone, just do the best, move that good enough and let the innovation happen. You want talent and informatica to continue to play their differentiated ETL roles. You want Pentaho and Actuate with Bird Exchange and JasperSoft and BO to play their differentiated analytics roles and visualization tools, Tibco, Spotfire, you want it to work on Hadoop clusters. The way you do that is you say, we need to define, I don't want to say there needs to exist a standards body like the Java community process, JCP maybe there's a Hadoop community process, but there needs to be a vendor process. Let's all talk to customers, let's all talk about the APIs you guys need, let's rationalize down to a set of standard interfaces and we will build a reference implementation out in the open for free that is good enough for the free user, but everyone else will bolt on to it. I know Jeff wants to ask a question. Take the whole interview geeking out here. My last question, Jeff, and you can take over for the last two minutes. All the talk about analytics is great because that's a big part of the business side of it, as well as the infrastructure. We're not going to have time to talk about the VMware side and all that infrastructure action, but everyone wants to put the cores, the processor cores in at where the data is. That's kind of a generally accepted philosophy we've been hearing over and over again. Bringing the processing power to where the data is, not the other way around. You mentioned people who graduated at Hadoop are bringing in here, changing the versions. How do you scale that product so that the analytics can sit where the data is? Is that going to be something you guys are going to have to tweak a bit? Or is it good to where it is? There's work to be done. Part of why Hadoop 2 is alpha is because it needs to be tested more and finished. Part of why Hadoop 2 is alpha is we haven't run the use cases we envision. So one example is Yarn, MapReduce 2 is supposed to be capable of generic resource management, generic application management, not just Hive, Pig, MapReduce, and HBase. So when you say you're capable of generic app management, that's one thing, but let's load some apps up into it and see how is someone like SAS Institute or RMR or things like the stats package from those guys. How are all these, even Cassandra, how is it going to run in a Hadoop cluster? Let's create some users, some vendor customers off the technology, and then we'll actually harden the APIs and make them consumable. But one thing that's clear is people want the data out of HBase, for example, or out of Hadoop fast. And they want a lot of it. They don't want just trickling it in and out. They want all of it. And it's also clear that they're confused like Hadoop can do this, compute to the data inside the cluster, and it can push bits around very intelligently inside the cluster. But what happens when I leave the Hadoop cluster? Am I sending through a straw what is a firehose of data and back to my original problem? Yeah. Okay, Jeff, go ahead. Thank you, John. I wanted to go to the interview and I could go now. I wanted to follow up on a question we asked before about kind of getting the vendors, the analytics and visualization guys to come to you and come to the community and start really innovating. So you mentioned Pentaho and JasperSoft and so those guys have that kind of open source tradition or legacy. Well, what about the more traditional proprietary type players? Is that conversation starting at all? And what would you say to the business objects and SAP business objects and even Oracle and IBM and others, is it critical for them to come to the table if they want to stay relevant in a big data world? I don't want to say whether people will or won't be relevant. I think that it's critical to come to the Hadoop community. True open source communities do not come to enterprise proprietary vendors. It's just, I can't think of a time when open source has come to the door of enterprise and said, let me in. Enterprise always has to be the lead. It's like, it's a relationship, it's a dance, it's, there are norms. And if you tell us you're willing to spend money in our community, money is really a proxy for time or time is a proxy for money, but spend time with us, talk about your systems on our platforms and not compete with us. We're there, we're ready. We are expecting to have those conversations. Most of the big names you rattled off, they're in the game already. They are talking to the community through one open source vendor or another and they're working with us and we have their needs in-house and we're just early days and trying to piece it all together. But conceptually it's like, it's the same problem whether you're Pentaho or your SAP business objects. You've got to come to the community and say, what can I do for you guys? I have all this technology plug it into me and you do that through the shepherds, through the Horton works as in the cloud areas of the world. And we're certainly spending our time in the community here covering it like a blanket two days. We're going to be falling down and get a couple more interviews, Jeff. Final question I have for you on the roadmap because you have an exciting job. You get to work on the product side but you also, because your strategy is 100% technology innovation to open source, you also have a lot of big biz dev deals that mentioned folks are working on and you have to go out to the customers. You've got to manage the product management side of it which is deliver value to your customers which is the community and partners, right? So you got to see a little bit of both sides. So what's your vision for the big guys? We like to call them up. They're back in their M&A truck to the marketplace. Slew entrepreneurs here, people handing cards around, exchanging information because like your experience back with Eric given that talk, this is the beginning of a major ecosystem, right? So there's people going to be actors and successes and some failures. What's your vision for the next three years in this space? What do you see happening in the landscape out there? I see a lot of noise in the startup world that's going to kind of calm down and consolidation will occur. So the well-funded startups with more of a platform play will swallow up some of the smaller niche guys and get to market and what we discussed earlier. You know, the reference implementation of visualization for example, I could buy that, I could build it, I could ask the community to build it for me but some of those value added services are going to consolidate up into the bigger startups and then you're going to see the bigger startups get prematurely taken out unless we are careful. So the big guys, you know, you always talk when you fund a company, like what's your escape plan? How do you not get too on the radar and get sucked up before you deliver all the value to the investors and the answer in the Duke market is everyone's got a spotlight shining here. Sauron's eye is focused here. If you consider big guys evil, if you consider it like something benevolent from above, you know, touching down on us, everyone's watching us. So the big startups have to move fast. We have to innovate, we have to bring so many players together that we create a defensible space and my vision is we do that together. Really the other guys and us really work together to build a defensible territory for ourselves that the biggest guys in the room on data management can't come and pick us up because the community won't want them to because we've got too many hands and too many pots and we're delivering too much value to the community and we don't need to be taken out to build a database company. And at the same time, you know, we are still helpful and supportive of everyone from the biggest guys in their benevolent role to the smallest guys adding a feature. All right, that's great candid vision. I really appreciate that. I agree with it, I like that. I think don't sell out just because it's money because it's a bigger picture here. You know, I've been saying on theCUBE now for months and just recently this past summer tour that to me the big data ecosystem that's building right now is like the PC revolution. I mean, imagine if the PC industry got started getting taken out by the mini computer guys. We wouldn't have the software industry we had. We wouldn't have client server. So I think this industry, you guys in particular in Cloudera have an opportunity to build an industry, not be an extension of cloud or extension of data warehouses and databases. So because it's a different market, I mean just the revolution is about productivity across the board and Jeffrey Moore was great. I don't know if you saw his speech, he's talking about, you know, from the military to retail, every industry is affected by big data and this is not just some tech thing, just like the PCs affected every industry and put productivity in the hands of the consumer. Computing power, big data is going to change again. So I think that vision, if you guys can hold the line and reinvest in that mission and get the ecosystem to draft with you on that, I think it'll be a home run for everyone. So and what that means is wealth creation, not just for the sake of getting rich, but for the sake of creating society value around the world. So that's our mission. We've got our next guest, Chief Product Officer Ari here at Hortonworks. Congratulations, great vision, very inspiring. Thanks for that talk. We'll be right back with Danny Ryan from Riot Games to wrap up the show and then Jeff and I will do a wrap up. So we'll be right back after this short break.