YouTube home #DearMe


HBase and Hive at StumbleUpon





The interactive transcript could not be loaded.



Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Uploaded on Sep 5, 2011

Jean-Daniel Cryans of StumbleUpon presents "HBase and Hive at StumbleUpon" at Oscon Data 2011.

PPT slides:

From the official conference description at

We deployed Hive at StumbleUpon early this year as a tool for mining our HBase production datasets. It has been quite a success with both engineering and our analysts; engineers no longer have to write the analysts' reports and the analysts don't have to deal with cranky engineers.

In this presentation, we will first cover the reasons why someone would use Hive with HBase instead of directly using HDFS files, and which goals can be accomplished. We will then review how the Hive-HBase integration works to better understand the state and drawbacks of the current implementation.

The second part will cover how we deployed Hive internally at StumbleUpon and how the data is fed into the system. This will include how we are live replicating the data from our MySQL and real-time HBase clusters into an analytical Hadoop/HBase cluster in a ETL fashion. We will also present some of our use cases and how they translate into the Hive query language.

The presentation will end with our lessons learned and how we expect to grow our Hive usage as the company does. At the time of writing we are signing up more than 600,000 new users per month and we just passed 15M total users.


to add this to Watch Later

Add to