 So Aaron you're a tech geek for a cloud era You're a whiz kid from Brown. Yeah, how many brown people co-engineers here at a cloud era as of Monday We have five full-timers and two interns at the moment. We're trying to hire more all the time So it's how many interns two interns from Brown this year this summer a few more from other schools I'm John furry with silicon angle.com silicon angle TV We're here in the cloud era opposite my little mini studio hasn't been built out yet It was studio. We had to break it down for Dr. Ralph Kimble not Richard Kimble from I called him on Twitter But I keep on him, but a data warehouse guru was in here and you guys are attracting a lot of talent Aaron So tell us a little bit about you know, how cloud era is making it happen and You know, what's the big deal here? I mean people Smart here. It's mature. It's not the first time around this company. This company has some some senior execs And there's been a lot of a lot of people. Yeah in the market who have been talking about You know a lot of first-time entrepreneurs doing their startups and I've been hearing for some folks in the trenches that There's been a frustration and startups out there that there's a lot of first-time entrepreneurs And everyone wants to be the next Twitter and there's some kind of companies that are straddling failure out there And I was having that conversation with someone just today. I said they said wasn't like a cloud era And I said this is not a first-time crew here in cloud era So share with the folks out there what you're seeing for cloud era and the management team Sure Well, one of the most attractive parts about working at cloud era for me and one of the reasons I really came here was I've been incredibly experienced management team Mike, Amar, Charles They've all you know, they're they're the top of this org and they have all done this before they've Founded startups grown startups sold startups and especially in contrast with my the place where I worked previously the Amount of experience here is just tremendous. You see them not making mistakes where I'm sure others would and I mean Mike Olson is a veteran He's been he's an advisor to startups. I know he's been some investors Amar was obviously PhD candidates bolted out Did a startup sold it to yahoo who worked at yahoo came back finished his PhD at Stanford? Undermendal over there in the PhD program over this we banged at his PhD came back entrepreneur residents at Excel partners Yeah, now it does cloud era When did you join the company and just take us through who you are and when you join cloud era? What's your background sure so I grew I joined a little over a year ago is about 30 people at the time I came from a small startup of the music online music store in New York City Which doesn't really exist all that much anymore But you know, I sort of followed my other colleagues from Brown who worked here Was really sold by the management team and also by the tremendous market opportunity that that Hadoop has right now Clutter was very much the first commercial player there, which is really a unique experience And I think you've covered this pretty well before I think we all around here believe that The markets only growing and that we're gonna see the Hadoop market and the big data market in general get bigger and bigger in the next few years So so obviously computer science is all the rage and I'm particularly proud of hanging out We've had conversations in the hallway. Why are you tweeting about this and that and But you know silicon angles home is here So we've had I've had a chance to watch you and the other guys here grow from you know from your other office Was a San Mateo or San Bruno or somewhere in there like I was originally in Berlin game Then we relocate the headquarters Palo Alto and now we have a satellite up in San Francisco So you guys bolted out now you have a full-on blown San Francisco office So there was a big busting out the seams here in Palo Alto people commuting down Even building their burning man Yep, oh, yeah, sure skits here and they're constructing their homes here for burning man So what are you doing that in San Francisco? What's the vibe like in San Francisco? Tell us what's going on in San Francisco. San Francisco is great It's I'm I live in San Francisco has do a lot of us about half the engineering team works up there now We know we're running out of space there certainly And you already. Oh, yeah. Oh, yeah, we're hiring as fast as we absolutely can So definitely not space to to build the burning man huts there like that like there is down down in Palo Alto It's great up there. What are you working on right now for a project? Yeah, see computer science is one of the hot topics. We've been covering on silicon angle taking more of a social angle social media has You know moves from this PR kind of you know check-in Facebook fan page to hype to kind of a real deal Social marketplace where you know data social data gestural data mobile data geo data Data is the center of the value proposition. So you live that every day. So talk about sure Your view on the computer science landscape around data and why it's a big deal. Oh, sure. I think data is sort of one of those fundamental things that can be Mind for value across every industry. There's there's no industry out there that can't benefit from better understanding what their customers are Doing what their competitors are doing etc. And that's sort of the the unique Value proposition of you know stuff like Hadoop True truly we see interest from every sector that exists Which is great as for what the products that I'm specifically working on right now I primarily work on HDFS, which is the Hadoop distributed file system that underlies pretty much all the other Projects in the Hadoop ecosystem and I'm particularly working on with other colleagues at Clodara and at other companies Yahoo and Facebook on high availability for HDFS, which has been In some deployments is a serious concern Hadoop is primarily a batch processing system So it's less of a concern than in others But when you start talking about running H base which needs to be up all the time serving live traffic and having a highly available HDFS is a necessity and we're looking forward to delivering that talk about the criticism that HDFS has been having Well, I shouldn't say criticism. I mean it's been a great great product I mean that produced the HDFS a core parts of Hadoop. Oh, yeah And you guys been contributing to the standard of Apache. That's no secret to the folks out there that Clodara leads that effort But there's new companies out there kind of trying a new approach and they're saying they're doing it better What are they saying in terms and what's really happening? So, you know, there's some argument like oh, we can do it better And what's why are they doing it that was just to make money do a new venture or is there what's your opinion on this? Yeah, sure. I mean, I think it's natural to To want to go after parts of the core Hadoop system and say, you know, Hadoop is a great ecosystem But what if we just swapped out this part or swapped out that part? Couldn't couldn't we, you know, get some some really easy gains? And you know, sometimes that will be true. I have confidence that that just will not simply not be true in In the very near future one of the great benefits about Apache Hadoop being open source is that we have a huge Worldwide network of developers working at some of the best engineering organizations in the world who are all collaborating on this stuff And you know, I firmly believe that the collaborative open-source process produces the best software And that's that's what Hadoop is that it's very core. What about The arguments that we're saying that oh, I need to commercialize it differently for my install base bolt-on a little proprietary extensions That's legitimate argument EMC might take that approach or or you know map are obviously trying to try to rewrite HTFS me. Is it legitimate? I mean, I see is there fighting going on in the standards? Maybe that's a political question you might want to answer, but I mean the Hadoop, you know, isn't there there's no open standard for Hadoop You can't say like this is this is like Hadoop compatible or anything like that But you know what what you can say is like this is Apache Hadoop and so in that sense, there's no there's no fighting to be had there So yahoo Struggling as a company, but you know, there's a strong Hadoop DNA at yahoo. Oh, certainly I talked with the the founder of the startup Hortonworks just announced today that they have a new Board member. He's the guy who's the CEO of Hortonworks is now on Gluster. I'm sorry Gluster announced they have Rob from benchmark on the board. So he's the CEO of Hortonworks and and one of my Not criticisms, but points about Hortonworks. This guy's an engineer never run a company before. He's no Mike Olson Okay, so, you know Mike Olson has a lot of experience. So this guy comes in to run it. He's out He in in open source Is that good for yahoo and open source? He they say they're going to continue to invest in Hadoop. They clearly are are still using a lot of Hadoop Oh, it's certainly how is that changing Apache? Is that causing more? Consolidation is that causing more energy? What's your view on the whole Hortonworks thing? You know yahoo is Has been and and will continue to be a huge contributor Hadoop they You know, I can't you know say it for sure But I feel pretty confident that they have more data under management under Hadoop than anyone else in the world And there's no question my mind that they'll continue to invest huge amounts of both QA effort and engineering effort and All the things that Hadoop needs to to advance I'm sure that Hortonworks will continue to work very closely with yahoo and we're excited to see More and more contributors to to Hadoop Both from Hortonworks and from yahoo proper Cool. Well, I just want to clarify for the folks out there who don't understand what this whole yahoo thing is It was not a spin-out. These were key Hadoop Core guys who left the company to form a startup of which yahoo financed with benchmark capital so yahoo is clearly and told me and Reaffirmed that with me that they are clearly investing more in Hadoop internally as well So there's more people inside yahoo that work on Hadoop than they're in the entire Horton's work company So that's very clear. So just to clear that up out there Aaron so you're you're a young gun, right? You're a young whiz like Todd, but I'm on here explain to the folks out there Who are a little bit older? Maybe guys in their 30s or CIOs a lot of people are doing, you know They're kicking the tires on big data. They're hearing about real-time analytics. They're hearing about benefits They've never heard before Dave Vellante and I on the Cube talk about you know the transformations that are going on You see emcee getting into big data. Everyone's transforming at the enterprise level and service provider What's explained to the folks why Hadoop is so important? Why is Hadoop? If not the fastest or one of the fastest growing projects in Apache ever sure You need to be faster than the web server project, which is they're one of the better better big ones Why is Hadoop so important explain to them what it is? Well, you know, it's been it's pretty well covered that there's been an explosion of data that more data is produced Every every year over and over you know we talk about exabytes Which is a quantity of data that you know is so large that pretty much no one can you know really theoretically comprehend it and more and more Organizations want to store and process and learn from you know get insights from that data In addition to just the explosion of data You know that there is simply more data organizations are less willing to discard data one of the beauties of Hadoop is truly that it's so very Inexpensive per terabyte to store data that you don't have to think upfront about what you want to store what you want to discard Store it all and figure out later What is you know the most useful bits we call that sort of schema on read as opposed to you know figuring out the schema a Priority and that is a very powerful You know shift in dynamics of data storage in general, and I think that's very attractive to all sorts of organizations You're a see a brown graduate, and you have some interns from Brown Taz at Brown Premier computer science program almost as good as where I went to school at Northeastern University The unsung heroes of computer science only kidding your browns great program, but you know cutting-edge computer science How dare as known as obviously leading in a lot of the computer science areas I do in general is known that you gotta be pretty savvy to be either masters level PhD to kind of play in this area Not a lot of adoption what I call kind of the grassroots developers. Mm-hmm. What's your vision and how do you see? the computer science Younger generation even younger than you kind of growing up into this because sure those tools aren't yet developed You still got to be You're pretty strong from a computer science perspective and also explain to the folks who aren't necessarily at the browns of the world Or getting into computer science. What's about what is it this revolution about and where's it going? What is some of the things you see happening around the corner that that might not be obvious ever? Sure sure There's a few questions there You're part of it is how do people coming out of college get into this thing? It's not taught all that much in school. How do you sort of make the leap from? the standard computer science curriculum into this sort of thing and You know part of it is that really we're seeing more and more schools offering Distributed computing classes or they have grids available To do this stuff. There is some research coming out of Brown actually and lots of other schools About Hadoop proper and the behavior of Hadoop under failure scenarios that sort of stuff, which is very interesting Google actually has classes that they teach I believe in conjunction with the University of Washington where they teach Undergraduates and your master's level graduate students about MapReduce and distributed computing and they actually use Hadoop to do it because it is and the architecture of Hadoop is modeled after Google's internal infrastructure So, you know that's that's one way we're seeing more and more people who are just coming out of college who have distributed systems Knowledge like this another question the other part of the question you asked is how does How does the ordinary developer get into this stuff and the answer is we're working hard You know we and others in the Hadoop community are working hard on making it making Hadoop It's much easier to consume. We released you cover this fair bit the SCM Express project That lets you install Hadoop with just minimal effort, you know as close to one one click as possible and there's lots of Sort of layers built on top of Hadoop to make it more easily consumed by developers hive Is a sort of sequel like interface on top of MapReduce and pig as its own DSL for programming against MapReduce So you don't have to write, you know heart You don't have to write straight MapReduce code anything like that and it's getting easier for operators every day Well, I mean evolution wise. I mean you guys are actually working on that. I see a cloud era What about what about some of the abstractions you seeing a big the Rage is you know I look back a year ago VM world coming up and no plugs look an angle TV will be broadcasting live at VM world You know Eli's been on the Q ex-VM where? Spring source was a big announcement that they made Heroku had bought by Salesforce clouds software frameworks are big sure Now what does that look like and how does that relate to Hadoop sure in the ecosystem around Hadoop? Where you know the rage is these software frameworks and networks kind of collide and you get that you got the kind of the Intersection of you know software frameworks and networks obviously, you know in the big players We talk about EMC and these guys it's clear that they realize that software is going to be their key differentiator Yep, so It's it's got to get to a framework standpoint is Hadoop and Apache talking about this kind of Evolution for for Hadoop sure. Well, you know, I think we're seeing you know very much the commoditization of hardware You just can't buy bigger and bigger computers anymore. They just don't exist So you're going to need something that can take a lot of little computers and make it look like one big computer And that's what Hadoop is especially good at you know We talk about scaling out instead of scaling up you can just buy more relatively inexpensive computers and that's great and sort of the the beauty of Hadoop is that it will grow, you know linearly as your data set as your Your your scale your traffic whatever Grows and you you don't have to have this exponential price increase of buying bigger and bigger computers You can just buy more and you know that that's sort of the beauty of it. It is a software framework that if you write against it You don't have to think about the scaling anymore. It will do that for you Okay, a question for you It's kind of kind of a weird question, but try to tackle it You're at a party having a few cocktails Yeah, and a few beers with your buddies and your buddies who works at a big enterprise says man We got all this legacy structured data all these systems. I need to implement some big data strategy. Yeah all this stuff What do I do? Sure? Sure? Not the question. I thought you were gonna ask me No, we're a G rated program here Okay, I thought you're gonna ask me how to how do I explain what I do to you know People that will get to that next. Okay. All right Yeah, I mean I would say that the first thing to do is to implement a You know start start small implement a proof of concept Yeah, get a subset of the data that you would like to analyze put it put Hadoop on a few machines four or five something like that and Start you know writing some hive queries start writing some some pig scripts And I think you'll you know pretty quickly and easily see the value that you can get out of it You can do so with the knowledge that when you do want to operate over your entire data set You will absolutely be able to sort of trivially scale to that size Okay, so now the question that I want to ask is that you're at a party. It's what do you do? Mm-hmm. I usually tell for people on my hedge fund manager. No, but seriously I Tell people I work on Distributed super computers software for distributed super computers and that you know people have some idea what distributed means and super computers They figure that out. So final question for I know you got to go get back to programming some code here What's the future of Hadoop in the sense of from a developer standpoint I was having a conversation with a developer who's a big data jockey and talking about Missileek it's anything and get his hands on Geo data text data is a data data junkie and he says I just don't know what to build What are some of the enabling apps that you may see out there and or you haven't just conceived and just brainstorming out there? What's possible with with data? Can you share your vision the next five years? What are you gonna see evolve and what's some of the coolest things you've seen that might that are happening right now? Sure. Sure. I mean, I think you're gonna see Just the front ends to these things getting just easier and easier and easier to interact with and At some point you won't even know that you're interacting with the Hadoop cluster That will be the engine underneath the hood, but you know, you'll you'll be from your perspective You'll be driving a Ferrari and by that I mean, you know standard BI tools standard sequel query language We'll all be implemented on top of the stuff and you know from that perspective you could implement You know really anything you want. We're seeing a lot of great Work coming out of just identifying trends amongst masses of data that you know If you tried to analyze it with any other tool You'd either have to distill it down so far that you would you would question your results or that you could only run the Very simplest sort of queries over and not really get those like powerful deep insights those sort of correlative Insights that we're seeing people do so I think you'll see you'll continue to see great recommendations systems coming out of this stuff you'll see Root cause analysis you'll see great work coming out of the advertising industry To you know to really say which ad was like responsible for this purchase Was it really the last ad they clicked on or was it the ad they saw five weeks ago? They you know put the thought in mind that sort of correlative Analysis is you know being empowered by big data systems like Hadoop Well, I'm bullish on big data I think people I think it's gonna be even bigger than I think you know have some kids come out of college You say I could use big data to create a differentiation and build an airline based on one differentiation these are cool new ways and and Data we've never seen before so Aaron Thanks for coming on the truth. Thanks for having me your inside Palo Alto studio and we're gonna go