 Facebook is a big name draw I like to think it's me but yeah I think it might have something to do with you're very handsome and you know very attractive I'd go to your session seriously though I mean H base we were I was tweeting earlier and yesterday H base has got some significant traction right now in the marketplace you guys are big big proponents of Hadoo Facebook's got massive data we had Jeff Hammerbock around earlier the data guy from early on at Facebook and I'll see now co-founder Cloudera you guys are a great example of how you use data in the Facebook platform it's been an amazing success Facebook Connect you got external data all kinds of data internal data I can only imagine a tsunami that's shep and those guys put together over there and we've interviewed before it's just massive you do a lot of homegrown stuff you know project haystack was well-renowned photos are a big app you got user data I mean it's complex there's a lot of data for sure and then a lot of analysis of all that data which is even more data so Facebook you guys are cranking it out 800 million now last I heard hundred million active and growing like crazy so what are the data challenges that you have and then we'll jump into the H base thing big data challenges I mean there's kind of two sides of the house there's the OLTP stuff and there's the OLAP stuff and in the past I was traditionally working on all of the OLAP stuff so the Hadoop data warehouse the Hive data warehouse and there it's you know keeping up with capacity we make it really really easy at Facebook to put data into the warehouse so that means that there's constantly and constantly data going in there I think the most recent stat is something like 250 terabytes of data into the warehouse every day so just data growth is astronomical I mean it's really just keeping up with literally putting in more disk drives to keep up with it and then on the other side of the house the transactional side which is where I'm working now the challenges there is pretty much just scale I mean it's it's the same challenge I guess which features of the Facebook application are you working on is there all of them all specifically there's at the ad side is it well talking so if we kind of switch to H base a little bit the first project that we did with H base that was transactional was user data it was an OLTP application that was Facebook messages and the reason that we looked at H base there is really because we were looking at our shard and my SQL setup however many you know petabytes of data we're storing there and we were looking at launching an email service persisting every single instant message that gets sent on Facebook which is to the tune of 50,000 a second and persisting all that forever and so I video which I know you're not storing we are not storing I asked Tony base that in the press conference no that's a ton of message so every single chat every single email message has to be stored persistently so you don't have to be indexed so it's searchable and we need it to be randomly accessible by all of the users and one of the big challenges also is we had an existing message of service on my SQL so we had to migrate the whole thing onto H base while it was live in production so it was definitely a challenging project but the reason that we went with H base was looking at the data requirements it was going to double our size of our user database tier I mean all the data we have in there it was just going to double so the prospect of just doubling data was really really a scary one and so that's why engineers kind of decided what else could we do is there something that's going to be more optimized for this kind of storage so actually with with H base you get the massive rows you don't have the structure like met sequel also the low latency is there those all the key requirements as well yeah I mean it's it's it's the right stuff obviously if you're persisting every instant message the rights are insane I'd say another big difference has to do with the fact that relational databases get slower and slower as their table grows and so even if you have a bunch of cold data because you're not accessing it doesn't matter it's still going to slow down your accesses to the hot data just because of the nature of relational database H base is much better at kind of archiving old data so that you really don't need to touch it unless you really need it and so something like email I mean how often do you read last year's emails never never so right you're always reading I should be deleting them all but nobody does no one no one deletes anything and no one reads it so you have to store it because maybe they want to read it but you don't want it to be slowing down so you don't want to cost a lot you don't want to impact performance of the core stuff unread messages or other things right so on the hardware side you playing with us you flash and sst is hot and you know we smoke in hot area for the hot data we do we buy a lot of fusion IO at Facebook as you may have heard yeah we could be covering fusion for day one yeah they're smoking I think that's some cool stuff coming around the corner too I mean we're under NDA with fuges we really can't talk about the future but from what I can see it's right amazing amazing stuff so right yeah it's made my job harder because we're using flash now on the on the mysql tiers and so now you know because hbase we need to basically be up against mysql and the team I'm on now it's called database infrastructure that's actually where the mysql team is as well and so hbase and mysql's together same roof which is cool but now we have to we have the mysql bar yeah I mean it's a high quality bar so which is the cooler team because Facebook talks about being cool is the hbase team cooler than the mysql team very political answer I think the of course the hbase team that's what you're on it's it's the new stuff yeah yeah not this old relational 25 years old so obviously hadoop has been your core thing for dumping data on in the past right yeah I was the key component there yeah what new thing about hbase surprise you anything like wow this is really cool or is it like oh this is cool it's still needs more work to be done I mean I think the really cool thing is the cross-section between online offline that you really every system before that was an online system or an offline system and for the first time we have a system like Puma which is the second thing I talked about my talk today that is kind of real-time streaming MapReduce and so we're actually using hbase as of analytics kind of data warehousing BI tool but the web queries it directly so we can do all the aggregations inside the same hbase cluster how do you do that in front of the room what's the project called Puma so our age-based cluster just uses thrift has a thrift server and the web connects well directly to the hbase cluster through thrift so just sits there I'm fully addressable it's there nice nice we do that as I explained with our Twitter date of Silicon angle so what's it like at Facebook right now the engineering you know our office is wasn't is in Cloudera we're gonna be moving out soon but yeah I'm always hanging out the nut house and I see yeah Facebook people there yeah that's the spot yeah you guys are hiring like crazy is it massive higher and like crazy mean what's the growth you move to the new campus yet I've been there for two years and the company's doubled twice you know so you're still in California Avenue area or so I'm floating between there's all of our buildings are kind of right next to each other on California Avenue in the middle Park area some people have but I think soon in the next I'm not sure exactly when but soon we're moving everybody's moving to memo do you guys really ring the bell when a new feature ships ring the bell Zuckerberg did the bell on the new feature yeah there's like a launch switch yeah that's what it is I haven't been there in a while but it's done I think we launched too many things now we'd be flipping the switch Jonathan Gray at Facebook what lessons have you learned at scale with H base I see for folks out there you guys are a great example of to me the future application that enterprises are struggling with me Facebook use had a clean sheet of paper built a great platform you your product model has been about introducing products and iterating fast no real clunky clunky R&D can slow things down it's really applied research in real time it is you'll launch some stuff smaller teams really good code of ethical there with them among the engineers which is cool what have you learned and what could you share for the folks out there who are moving to cloud building their apps trying to be more apified in the sense of being like a Facebook in the sense of being an IT ops dev ops environment what are the biggest challenge I'd say that I love about Facebook is the way that our ops and dev ops work is we have what we call app ops people that are on our engineering teams so rather than kind of in a lot of organizations there's a distinct wall there is engineering and there's operations and there's a lot of finger pointing and there's another main problem is the fact that doop and H base and these other technologies they're so early that there's no such thing as ops I mean it's engineering it's engineering ops right so it's one thing that I've really really liked is we have ops guys that have an engineering event and then all of our engineers are very kind of aware of ops and so it's created a really good environment and allowed us to iterate quickly there that's the culture basically that's that's the culture so if you're if you have that siloed you know ops skills you're screwed basically what you're saying but I'm saying that it's I said that it's an uphill battle and it depends on the ops guys and there's nothing like sitting with people you know I mean in the end of the day people are people and sitting with people you become tolerant and you understand what their position is and all that kind of stuff so Google has this hiring philosophy where you're going to have the certain GPA and scores Facebook has a similar kind of culture and we heard from Hummerbacher that you know what he's seeing in data science and data is not just about you know the killer comps I guy but looking at guys sociologists you know in a researcher and you have a psychology degree but they're really good at math so okay here's a manual learn how to code is that kind of culture at Facebook what is how would you define like the hiring criteria at Facebook or the cultural criteria I mean I'd say it depends a little bit because there are really really product-y front-end people there's more infrastructure people and then we have a huge data science team data sciences is a really awesome area and exactly that I mean we're hiring kind of really unique people but I'd say overall it's an entrepreneurial culture an entrepreneurial hiring thing and it's kind of the gift in the curse of Facebook is you hire people who are self-starters I mean there's very little process small teams and so the kind of people you need to hire the kind of people who don't need direction who are gonna just with really really loosely coupled requirements be able to just figure things out I mean people who want to be kind of given like this is what you're gonna be doing and then go sit down and do it it's people are not happy at Facebook doing that kind of stuff yeah certainly Facebook's inspired zillions of people out there I have a 16 year old kid and Zuckerberg just moved in the neighborhood around the corner from me so it's like they know that he lives they want to go knock on his door and say hello I'm like oh my god but but what can you share let's talk to about to those young kids out there at 16 year olds or or the you know the high school kids or the college kids who really get like the future is different they don't have that legacy view of you know and then the Occupy Wall Street view set right now people are like they want a new future what would you share with the new generation of coders and developers around you know what's it like what kind of skills such that they be acquiring to if they want to play in this new development environment databases are changing and the whole new generation of shift is happening we've been talking about it how the database market or you know relational databases with Oracle and that will start up and that spawned and doops doing the same thing so we now have a whole new generation of an industry yeah those 16 year olds will be entry-level employees soon and the college kids all over the world yeah what do you say to them in what's advice would you give them well open source would be the first thing I would say I made a career out of open source and and it's been the gift that keeps on giving and so the first thing I would say is do open source one hiring criteria Facebook uses a lot is open source there is no better way to tell the quality of a candidate than seeing their interactions in an open source community seeing all their code things like that the other thing is you have transferable skills I mean I can go and work at Microsoft on my Microsoft SQL server I can work at Facebook on H base and then I can go somewhere else and work on H base right so open source gives you that transferability a piece of advice I recently gave to a class of Berkeley undergrads was a lesson I learned interning during college which was if you work in tech work for a tech company yeah good it's a lesson I think that that that really technologists I think it could be a really frustrating experience that a lot of people I know have gone through being at a company where technology itself is really not the focus and that could be a frustrating experience for engineers we're here with Jonathan Gray from Facebook he's the guy who leads the H base charge my final question we'll call it a wrap is what do you want what do you want to have happen with H base going forward what are your what are your goals for H base what's next and how is it going to morph in out for you in Facebook I think it's as always it's got to be stability I mean operability all of that kind of thing I think Hadoop and H base both have really a long way to go there I'd love to see an H base that you don't have to be Facebook to run to that scale you know I mean we're a really special kind of set of guys with a huge number of engineers that are supporting the effort and I think ease of use would be something that would be awesome okay Jonathan Gray from Facebook thanks for coming inside the huge and great great conversation we're gonna take a quick break we'll be right back