 Exclusive, extensive, continuous coverage of Hadoop World 2012. I'm John Furrier with SiliconANGLE.com. This is theCUBE, where we extract the signal from the noise and share that with you. My co-host for this event is Jeff Kelly, lead analyst at wikibond.org. The best big data analyst on the planet. Put out the first market size report. Influencer, Jeff, welcome back. Thanks, I'm not going to turn down the flattery, I appreciate it. Of course, yeah, of course, yeah. It's my editorial. But more importantly, we have John Christ of the VP of Marketing for Hortonworks in Sega John, welcome to theCUBE again. Thanks very much. Thank you. This is your event, and you know, this is the first event that you guys really had your hands on last year. It was Yahoo, Hortonworks was spun out of Yahoo. That's right. Very specific orchestration on your part on Hortonworks with this event, not to be very grandstanding, Hortonworks specific. Yep. And be very trustworthy and being a good citizen. Talk about the event. You guys put a lot of money into it. You organized it for the community. You know, kudos for that. But explain to the folks the event, it's not just Hortonworks, Yahoo's still involved. Sure, absolutely. Thanks, John. And thanks, Jeff. Very good to be back on theCUBE here again with you guys. Always a well-run activity. So, absolutely. So we tried very hard to make sure that the Hadoop Summit was a community-driven event. So we did a couple different things to ensure that in terms of building the content, wanted to make sure we embraced the community to drive the content that the community wanted to see. So kind of selected by the community for the community, really, is the way that we were going. And so there was a couple different things we did. We set up panels to select the sessions for each of the tracks. We have seven different tracks here at Summit. There were seven different independent panels selected. And those panels did the content. So as the submissions came in, so we had kind of a free call for papers. We had 267 submissions for the 84 available slots. And then each of the committees went through and selected the content that they thought they were most wanted to see. So they were able to get 12 sessions per track. In addition to that, we also ran a community voting process where the community could see all of the sessions and run votes for all of the sessions that were available. And we guarantee that each of the top seven community voter getters were guaranteed to slot in the conference. So the community really did choose the content and then provide the top content and provide all of the content for the session. So how did you guys, I want to know how do you stop people from gaming it because you got the Alpha Geeks out there. It's funny, yeah. Was there any chances of the gaming of the voting? We did watch for that. And we considered, do we need to do something to kind of prevent that? How Alpha Geeks will definitely do that? We definitely saw people get very clever about promoting their sessions, which in the end was the kind of thing we wanted. Quite frankly. It was one thing, but you know. I mean, manipulation is the other, right? It wasn't really, I don't think there was really a way to overly game it because you didn't have to register in order to vote. And each registered voter got a certain number of votes. So you couldn't automate it. It really did need to be a human. So no matter what, it's human in the loop. You couldn't really game it, but we had a great process. All of the top community vote getters have been very clever. And there was a lot of people you had to send away. So just talk about some of the sessions that were approved, the focus categories, and kind of there was a queue of people still wanting to submit stuff. Yeah, so I mean, we were lucky to have way more sessions. You know, as I think most technology conferences nowadays, way more sessions than you could possibly select. And that's always a tough, tough process because the line, particularly right at the, right at the things that didn't quite make it to things that did make it is really, really tough. And we took it from the chairs of each of the committees, their kind of stack ranked selection on what to put in. And so we really did divide the sessions evenly across each of the tracks, whether it was a technology track, a future of Hadoop track to the Hadoop and use track. So we've got all the different tracks that people were using or that people were selecting for. And really we took the track chairs recommendation and we did not defer from what the community and the community leaders said they wanted to see. And that's really what we went with. So what, you know, in terms of the sessions that you know, ended up winning out, what does that say about the attendees here and the kind of breakdown of developers versus more business side versus, you know, DBA is looking to, how are they going to get involved with big data? So it's a great question. And the one thing we did do to try to, I think help the selection process was the track chairs that we nominated. So we just nominated these folks or, you know, sort of went to and asked them to be the track chair. We tried to select somebody who had knowledge in the particular domain area. So if it's Hadoop and operations, then we wanted to find somebody who would came from that background or had that kind of background such that they could select content that was, they were knowledgeable at the kind of content they were trying to select so that the content and the track that was built would be relevant for those kinds of people. Consequently, if you look at the different pieces of content, whether it's the future of Hadoop, whether it's the Hadoop in the enterprise, those different sessions, each one of them lines up very well to the audience that you're going to select. Consequently, we've got a very broad audience here because we go all the way from future of Hadoop, deep technical details about operating Hadoop, Hadoop in the enterprise, reference architectures for the ecosystem. So it's actually probably the broadest swath of the enterprise and the ecosystem that's been addressed at this conference, of any conference. Yeah, I mean, so you talk a little bit about, I love the analytics and the BI, so talk about that track a little bit and kind of what, because we think about big data and really delivering business value and it really is at that kind of application analytics layer. So talk a little bit about that track and what the focus is there. So the analytics and BI track, which is a really great track and there's a lot of really good content in there, is an important area, just like you said, Jeff, because that is, in some respects, where the rubber meets the road in terms of big data is people getting the value out from doing the analysis. So there's a nice blend of vendors in there, everything from vendors who are working directly within the ecosystem and have kind of spec-built applications for doing analytics to more traditional enterprise software vendors who are connecting to and providing value to attach to Hadoop as a data source for their system. So it's a very nice and mixed set of vendors and a set of ecosystem representation that are presenting in that track. So talk about the business model. We had your CEO on early, he was great. Rob's great guy. We had some good questions, easy questions. We had some uncomfortable questions around what everyone wants to know about, which is the community. And there's no real negative vibe right now within Apache. I mean, everyone pretty much agrees that it's growing like crazy. And there's plenty of beachhead for everybody. Pool's big enough for everybody. And that whole silly debate with CloudEra, Hortonworks is over, right? And it's pretty clear and talk to everyone you talk to, it's fine. But you've always had that comparison because CloudEra was the first one, venture-backed. You guys are venture-backed, both with Tier 1, BC. But what's going on around you guys is a bigger ecosystem. So how do you guys look at the market now? I mean, because you have to look at new dynamics. You got entrants coming in like EMC, like the big guys like IBM. And they're slower traditionally than the startups. At the same time, you guys got to build your rocket ship. And it's kind of going on, right? And Arun was talking about some of the high availability features, age catalog, things you guys announced. So how do you as the VP of marketing balance that? Because the solutions are really the conversation, not the inner workings, which has been good. How do you balance all this? Because you're kind of constructing the venture. And the tech, at the same time, with the community. 100% open source. And then you got business relationships you've got to do with the marketplace. So you've got to balance that. So how do you do that? And I mean, I have a long history of kind of traditional enterprise marketing background. And then I've had four plus years of open source marketing background. So I think I have a nice blend of understanding what the enterprise wants and the B2B market and then the open source market and how that works. So I tend to run kind of a blended model of making sure that we are getting the awareness and evangelizing the technology as we need to, as it still needs to in the market and bringing the visibility to Hortonworks. And then also, from a company strategy standpoint, we're really targeting kind of, if you will, the enterprises that are on the other side of the chasm, right? We want the early majority of enterprises to feel comfortable with the technology. And we realize, in order to do that, the company's completely aligned around making Apache Dupe easy to use and easy to consume. So we're going to, you can see that in the release that we just had, the Hortonworks data platform, the kind of components we put in there are really designed for that, to make it so that the enterprises consume that. So from my standpoint, in terms of the messaging, I want to make sure that we are talking to those folks that are out there thinking about Hadoop, not quite sure what they want to do with it, but are looking for kind of the way to enter into that. And that also involves working with the ecosystem because we believe that the ecosystem is a key part of really what's going to make Apache Dupe a successful open source project. Yeah, we've got some good feedback. We've heard from some folks here in the Cube and also kind of back channels that the enterprise-ready messaging that you guys have with the products, not just messaging, this actual product, it's not just good for you guys, it's good for the app developers, right? So there's plenty of white space to work on. And so, as Doug Cutting says, the long list of things to do, and then everyone kind of is going, yeah, we got to do them, and it's a good list. It's not little things. There's big things. That's substantial. As real time becomes a key thing. So that's good. But enterprise-ready is hard. With Apache, it's okay, but with real time it is. So how are you guys onboarding those developers, also the developers key? What's your strategy for onboarding the developers? For onboarding developers? Well, I mean, so the first thing is making it easy to use and consume, right? Cause just get them to where they can get the tech in their hands and get to start to use that. We want to make sure that there's a broad set of open APIs that are industry standard inside of the platform. So making sure there's, for example, RESTful APIs into the file system. RESTful APIs into the metadata sharing functionality in each catalog. And really at every layer of the stack. So that developers are comfortable with common standard APIs so they can quickly build applications. And if I'm working on an existing application, I can easily see how to tie that into the infrastructure and embrace it in a much deeper level. Which means I'm going to be able to provide a much better end user experience, right? Because I can at some point abstract away the complexity of that there's this Hadoop thing running underneath it and still continue to push out a good user, either analytic experience or just general application experience. Well, you mentioned, you know, obviously you've got to make the technology easy to use and consume. And that's kind of a tools and technology perspective. But there's also that kind of skills gap and training the users. So, you know, I know Hortonworks is doing some good work there in terms of training and offering training. But tell us a little bit about kind of the interplay of the two. We have to kind of meet in the middle, it seems to me, where make the tools easier to use, but also you have to still build up the skillset. Yeah, right. You're right, Jeff. So we do offer training. We absolutely have Hortonworks University, which is offering developer and administrator training and working on a whole bunch of new classes. So the state of the market is that you still, there are so many people just at that basic blocking and tackling level, I'll call it, right? That they're just trying to figure out the basics of how do I do a MapReduce job? You know, how do I do some simple pig and get things going? And then, and it's really the kind of crawl and then walk and then run. And certainly the classes and the education strategy that we're trying to employ is to really move people along that path and move them across that. So that they can get to be fully operationalized, fully capable of developing applications so that they can move and take the infrastructure and build and make new solutions to your point about becoming about solutions to meet needs. I mean, we need to see more specific, spec-built apps like the Tresadas of the world and the Karma Sphere and the Data Mayors. And, you know, just we need to see more and more of those kinds of applications and tools that are addressing specific targets in the market. So yeah, I mean, in terms of, you know, meeting kind of halfway between the big, the tools easier to use and the training, but how much, you know, maybe we're not there yet. So how much is that kind of holding back your business? I mean, do you find that, hey, you know, there's just this whole area out here where we could potential business, but we just can't tackle it right now because the two just haven't met in the middle. Yeah, so I think there is definitely a need for skills. There are definitely, I don't know if it's holding back the business per se, I would say that those are growing at the same rate. And there is certainly a lot of need for education. If you just look at the, you know, indeed.com job graph, I mean, everybody's seen that, you know, how quickly the, or how much demand there is for Hadoop skills in terms of jobs. And that absolutely has to be leading because we need people out there with the skills in order to keep the tech moving forward. Now, to me, it's, and for us, it's a dual prong strategy because the more you can integrate with traditional vendors and you see the, the terror data is the world and others integrating deeply with the technology and able to abstract away that technology gap, then you can get a pull market because if I'm already comfortable with running a terror data warehouse and managing that, if the Hadoop infrastructure kind of runs as a plug-in or subservient to that, then, and it is abstracted away, I don't have to have as much new skill training, right? So we can't train enough data scientists in order to make the Hadoop market function. We have to enable the people with the existing skills to be able to leverage the technology with the skills they have or maybe only slightly augmented skills above what they have. So yeah, I mean, how, in terms of, you mentioned, we talked a little bit about your partners and how is that really, how do you kind of balance that relationship between partners like terror data and others? Where, in a lot of cases, they're complimentary approaches but potentially down the road, they could be not complimentary. How do you kind of balance those and is that something, is that just too far, too far into the future at this point? That horizon is way, way off for us. I mean, as you said with Doug and as we know very much, in order to make Hadoop easy to use and easy to consume, there's a long, long list. And if we can make that platform work and function, if we can get the entire ecosystem to rally around Apache Hadoop, then the market will function, that tide will lift all boats, the addressable market is tremendous, there's enough room for us and others. And so, we know if we make the market function, it'll be good for everybody and that's really what we're staying focused. So, one topic we haven't talked a lot about today is kind of like the data privacy and around security and things like that. Love to talk with you a little bit about that. We're seeing, we've seen some high profile data breaches out there in the press and some situations where big data has, I believe it was target, the target example, which a lot of people are familiar with, that was highlighted in New York Times. So, what are you doing, do you come up against that a lot in terms of potential customers saying, well, all right, this sounds great, we've got a lot of cool things we can do here with big data, but we've got to secure this. I mean, I can't risk a data breach in this day and age, it's just, the PR alone is going to kill us. So, good question, I would say we don't come up against it a lot, I think it depends on a per industry basis and a per use case basis. So, if I'm running back in infrastructure, which is refining data and analyzing models for a recommendation engine, which is the kind of customer facing up front piece, then I'm not going to have that as much. And that's, a lot of these cases for Hadoop are kind of background infrastructure and infrastructure is driving that. Now, as it moves closer to the four, then you start to see more of the request for data security and the requirements to drive that. And again, it's industry specific, so if they're dealing with customer data and have to load customer data in, then you end up seeing it more in those use cases, possibly in the government, healthcare obviously very well known, kind of security around health records and the like. So, it depends on the use case and the kind of data that they're loading. But I think still the addressable market for Hadoop in its state that it is, is still tremendous. And the security, like other things that are being addressed like HA and others, will get there and they'll get there sort of an adjust in time kind of developed way. John, my final question for you is, share with the folks out there that aren't here, because you guys did sell out, it's huge demand. What the vibe is because every time we do an event with whether it's Cloudera, the original Hadoop world that you were involved in, small, it's just massively grown since. It's a sell out upon sell out, H-based conference was a sell out, Strada was a sell out, Strada takes over Hadoop world, so you know it's a commercial business, it's for real. When you have a real business, the suits come in, right? So that one of the complaints with Strada was too many suits and not enough tech. Explain to the folks out there the vibe here and specifically the focus, so around how you kept it business-y but not business-y. You know what I'm saying? So part of that was, and the strategy was very deliberate to broaden the audience that could come and appreciate Hadoop and appreciate the Hadoop Summit and benefit from attending the conference. So we wanted to keep the core roots, Hadoop summits always had the core roots of a developer conference, lots of techy sessions going on. We wanted to maintain that but add a flavor of the business track, the Hadoop and use, Hadoop and the enterprise pieces. So the way we helped guide that process was really just in defining those seven tracks and saying, look, here's the kind of content we want to have, stay heavy with tech, make sure that that is well represented to the core of what Hadoop Summit started five years ago. And that was specifically you heard from the community saying, let's keep it tech. Yeah, let's make sure, you know, we know thy audience, that's what's a lot of folks here, tech tracks, you know, we need to hear about the future of what's going on in the technology. I want to see code up on the slides. You know, I really want to get, you know, stay geeky. And then, hey, I also want to hear more about how it's being used, right? I want to make sure that I- It's not so much business, that's use cases, right? That's more path to the market. Yeah, that's right. It's like, just validation that there's a market there. Not so much- It's being used for real workloads and to solve real problems and provide real value in the enterprise. With a technical context. That's what we've heard. Often does, yeah. So that's what we've heard here. So, great event. Thanks for having us. Thank you. I know you arranged to have us bring the cube here and let us have space here. Wouldn't have a conversation with you guys. We do have some people walking behind us and there's a party going on behind us. So hopefully we can continue to broadcast. John Christ of the VP of Marketing at Hortonworks. Congratulations. Hortonworks is on a tear. Like Cloudera in their space. The startups are growing up in front of our eyes and maturing as well as the whole marketplace. So, congratulations. We'll be right back with our next guest here inside the cube. So if you're an angle.tv productions right after this short break. Thank you.