 Live from San Jose, California in the heart of Silicon Valley, it's theCUBE. Covering Hadoop Summit 2016, brought to you by Hortonworks. Here's your host, George Gilbert. Welcome back, this is George Gilbert. We're at Hadoop Summit 2016. We're in the San Jose Convention Center and we have with us Mark Herring, VP Developer Evangelism at Hortonworks. Welcome, Mark. Thank you so much, George. So, we were talking a few minutes before we got going. Hadoop is an ecosystem has been showing incredible innovation and expansion. But the downside is that can make it hard to target for a developer. What can I choose as a developer that I can assume everyone would have? How has that answer evolved over the last year? So, I think what we've seen, just like we saw in the open source realm on the platform side, the whole migration open source was to allow innovation to happen everywhere, not just in one place. And we've tried to do the same thing now in Hortonworks on the community side or on the developer side going. So, Mr. Developer, where are you? So, we've created this thing called community.hortonworks.com. Set up great technical articles, Q&A type things to allow this evolved platform to go and ask questions like, so I'm new and I'm trying to do an analytic workload, what should I use? And you might then try and solve that at the spark level and down at the, that's in Bolton and there's sort of articles and Q&A down there. Or you might be starting more at the Zephyr notebook level. Depends where you're coming in from it. And so our view of the world is start wherever you want to and ask the community for help, right? So, it's sort of this more of a collaborative way than Hortonworks says this is the way to do it or not that this is the way to do it. So, we're trying to embrace that whole community style to our developers and to developers at large to go and say, this is how you embrace the whole ecosystem. So, it sounds almost like you want to help the community be self-organizing around different solution platforms. Yeah, the platforms just maybe too stronger would maybe solution areas. So, like data science and analytics would be a good area. It's not initially there's a data science and analytics platform, but that's sort of a particular audience type or set of developers or data science that will go and live there. Just like there's, we have sort of another area. We call them tracks in the platform or inside the community. And you might have something there around data ingestion and streaming. Now, inside of there, you might be using different technologies or not worry, you don't even know which technology to start with, but you start there going, hey, I've got the streaming problem. How do I go from solve the streaming problem? And the answer might be, hey, use NIFI. It might be to use a different technology, but you really, you know, we're involved in the community connection. Obviously, our developers are on there, our SEs are on there, but it's the broader community that's really then answering those questions. And because we think very much like open source did to the platform, almost democratizing the platform, we want to democratize some of that knowledge you as developers choose. So, okay, we're taking a developer perspective, but the developer is also the one who translate that translate the infrastructure value into business value. So what are the early applications? We asked her, can you assist? But I'd like to get your opinion since you're working with the developers every day, like ETL offload from the data warehouse was sort of the first universal application. What are some of the most popular ones you're seeing now? There's still a lot of ETL offload, right? There's still a lot, you know, I don't sort of say there's still, we're in the phase of, you know, let me capture the data. Let me get the data in there and I'll analyze it. But we're seeing a lot more of, you know, how do I bring in data in motion into data rest? I'm not necessarily just ETL from sort of my relational database system, but you know, how do I start capturing all this signal out there and putting it into place so I can then look for the noise or look through the noise and find the signal. But as I said, the most common thing now that we're seeing a lot of sort of questions on is how do I find those signals, right? So whether it's using, you know, a SQL way of looking it up or sort of looking at how it translates to get that information. I sort of see that's where I see a lot of my developers or developers in the community asking a lot of questions is how am I going to get that to find that stuff? So we've done, we're at the phase of, I've captured all this stuff now, but now how do I go and look for that stuff inside of there? So would it be fair to say that this community helps developers, corporate developers, to pick the right set of building blocks to solve problems? But commercial developers need assumptions about a volume sort of, maybe platform, that's the wrong word, but a volume collection of services that are in common. So does this Hortonworks community help the corporate developer, the commercial developer or both? Both, right? I think that, you know, the whole thing is we have all different types sitting out there, right? So we have the, you know, deep data scientists that's publishing different Zeppelin notebooks up, you know, on the community going, here's my notebook as I've done it. You've got sort of the, you know, hardcore, you know, Hadoop Core developer telling you about, you know, how did they configure the different clusters and the different nodes? And we have a great set of articles that are also defining what these look like. I think you can engage at this at any area you like and if there's not, if someone hasn't answered your question there or you haven't, you don't see an answer, ask your own question, how do I do this? You know, having said that, we do see people starting a lot in sort of the Hortonworks sandbox, right? So that's a, you know, collection stuff always starting with, you know, basically the NIFI experience, which has got a set of preset services. We've got, let me go and start with that. But what I find with a lot of developers, you know, they have a particular problem in mind and they're trying to look through the different solutions, you know, with help to try and give back to the business they're going to say, okay, I've found the technology that's going to go and solve this problem as to business. And then once they've gone through that, then it's maybe the, you know, when operations play and we see a lot of sort of, you know, DevOps people that they're going, how do I, how do I, well, you know, how do I worry about scale? What do, you know, how do I sort of understand replication? How do I understand this whole hybrid cloud thing? What should run here and yet? And you know, it's again, they're asking the questions of the community. We're part of the community. We have ideas on that. But I think the whole theme from us, and you know, it's very much like, why do we embrace open sources? You know, this knowledge base is infinite if we appeal to the community. And so yes, although we're part of it, we're not trying to be prescriptively part of it. Okay, interesting. So someone who's taking the first steps towards IoT applications might say, okay, I have an ingest need data in motion. I might start with Kafka, but that might really just be useful for data coming in, you know, from a single site. Or I might have NIFI, which is sort of Kafka grown up for the whole world. And then I might feed Spark. Exactly. But in that case, these aren't really project, I was going to say, these aren't projects that are mentored in a sense by Hortonworks, but they are in the case of NIFI. So it doesn't really matter. You're not trying to be prescriptive with Hortonworks projects. Absolutely not. Do you want to support whatever combination of services? We wouldn't allow for that conversation to happen, right? So if someone's using Kafka and then going, what's the best way of doing it? And, you know, we might have one of our engineers or an engineer in the community going, you know, hey, this is the way that I've done my Kafka implementation. You know, for instance, on the community, we have users from Cloudera, from MapR, asking questions out there as well. Why? Because it's the wealth of information. I think we've got to get rid of these whole silos going, this is the Hortonworks domain, right? And so we spend a lot of time within this community to go and say, how do we expand it to allow for the Kafka discussion to happen with the NIFI discussion because there's good and all of these things, right? And so, yes, from a, you know, when we get down to, hey, how do we want to implement it? And that type of thing, and I'll, as he's made a custom, you know, maybe they're looking at it and then you're paying for our expertise. That's not the goal of the community. The goal is, yes, expertise is there, but if, you know, obviously bringing up professional services, you know, we obviously will use more of our technologies, but that's not the community goal. Well, let me ask one last question before we wrap up. But as we move more to the cloud, or if customers are, they're not junking, obviously, what they've started on-prem, but maybe new workloads, and there's got to be some movement of data or some inter-operation of the applications. How does that developer evangelism discussion change in the cloud? In other words, are there services that, you know, whether storage or other elements that are going to be more appropriate for the cloud? Well, I think there's obviously the big challenge in the cloud, and you saw some of the keynote today, right, is, you know, you have different cloud providers and their goal as a cloud provider is to lock you into their particular cloud. I mean, it's a pretty simple concept, right? So I think, you know, what we've seen in a lot of discussions around, if I'm deploying for the cloud, what do I need to worry about if I'm deploying into AWS, you know, versus Azure, versus Google Cloud, what are the gotchas and how would I move these things together? So yes, there's different services that you might be thinking of. You know, a lot of developers I see out in our community are more looking just sort of at the base services, and you know, hey, you know, what am I going to use for, you know, HDFS in terms of the storage, or that, you know, that you do the storage side of the world, what am I going to use for data ingest? And there's a lot of concern and to be like, a debate, I would sort of say, maybe there's better than concern on, hey, if I'm going to go and be targeting this particular cloud instance, what do I need to worry about in case I ever want to move it? There's still that thought, those thoughts out there that, and there's healthy communication happening, I wouldn't say there's a definite answer. So it sounds like that age-old cross-platform tension between how do I take advantage of one platform uniquely versus how do I balance the mean for portability? Maybe part of that, that old age-old question is, just be aware of what you're doing. So you've made these choices, therefore you know you've done it, right? The last thing you want to do is go, I didn't know I made the choice. And so it's trying to give people the choice, and if they haven't thought about it, hopefully by posting a question on community.hortonworks.com, they have other people with a lot more knowledge, surface issue, have you thought about this? What about this? So there's again a lot of sort of good checklists, checklists for thinking about the cloud, checklists for doing data ingest, right? Checklists for, you know, and so again it's, I'd love to say well, the content comes from us, absolutely doesn't, right? I mean, it's a community effort. All right, with that we're going to have to take it and then leave off till part two at the next Hadoop Summit. All right, this is George Gilbert with Mark Herring, VP developer evangelism and community at Hortonworks, and we'll be back after this short break, thanks.