FOSDEM 2012 - Apache Giraph: Distributed Graph Processing in the Cloud (1/2)
Sign in to YouTube
Sign in to YouTube
Sign in to YouTube
Uploaded on Feb 7, 2012
Web and online social graphs have been rapidly growing in size and scale during the past decade. In 2008, Google estimated that the number of web pages reached over a trillion. Online social networking and email sites, including Yahoo!, Google, Microsoft, Facebook, LinkedIn, and Twitter, have hundreds of millions of users and are expected to grow much more in the future. Processing these graphs plays a big role in relevant and personalized information for users, such as results from a search engine or news in an online social networking site.
The Apache Giraph  project is a fault-tolerant in-memory distributed graph processing system which runs on top of a standard Hadoop  cluster and is capable of running any standard Bulk Synchronous Parallel (BSP) operation over any large generic data set which can be represented as a graph. Apache Giraph is a loose implementation of Google Pregel but can be added to any Hadoop job pipeline as a normal MapReduce job. Giraph entered the ASF Incubator in July 2011, where it has enlisted the aid of committers from Yahoo!, Facebook, LinkedIn, and Twitter.
The talk will describe why running iterative MapReduce jobs for graph processing is not well suited for typical MapReduce jobs, introducing the reason why Google designed Pregel at first place. Next, the BSP model and how it is applied to graph processing will be explained. The last part of the talk will be dedicated to Apache Giraph, with a description of the programming model (i.e. the API, some typical examples such as PageRank and Single Source Shortest Path) along with a technical overview of how the architecture of Giraph works and how it leverages the Hadoop infrastructure.
Claudio Martella is an Apache Giraph PPMC Member and Committer.
He's a Phd candidate at the Large-scale Distributed Systems group of the Vrije University of Amsterdam where he works on distributed processing of social interactions/networks (read: complex networks).
Twitter: @claudiomartella, Blog: http://blog.acaro.org
- 18:03 FOSDEM 2012 - Apache Giraph: Distributed Graph Processing in the Cloud (2/2)by Leonhard Euler874 views
- 42:13 Processing Over a Billion Edges on Apache Giraphby HadoopSummit1,666 views
- 43:07 Clang MapReduce -- Automatic C++ Refactoring at Google Scaleby LLVMProject13,298 views
- 3:06 Introduction to Apache Mahoutby BTI36013,151 views
- 49:03 NIPS 2011 Big Learning - Algorithms, Systems, & Tools Workshop: Graphlab 2...by GoogleTechTalks1,586 views
- 37:17 FOSDEM 2012 - Cypher Query Languageby Leonhard Euler990 views
- 1:09:55 Introduction to Graph Databasesby GraphMaven30,846 views
- 14:25 FOSDEM 2012 - An Example Graph Visualization with Processing.jsby Leonhard Euler1,667 views
- 42:59 FOSDEM 2012 - Challenges in the Design of a Graph Database Benchmarkby Leonhard Euler788 views
- 14:30 Shortest Paths Using Apache Giraphby Fady El-Rukby253 views
- 50:37 Running Large Graph Algorithms: Evaluation of Current State-Of-the-Art and Lessons Learnedby GoogleTechTalks17,708 views
- 1:08:54 O'Reilly Webcast: An Introduction to Hadoopby OreillyMedia110,709 views
- 6:10 FOSDEM 2012 - Welcome to Graph Processing Devroomby Leonhard Euler705 views
- 1:21 Graph Database Commercialby STEVENS INK859 views
- 27:18 Mahout Corporate Training | Mahout Online Training | Mahout Tutorial | Free Mahout Training |by Intellipaat Software Solutions1,309 views
- 48:29 Text Classification Powered by Apache Mahout and Lucene, Isabel Drost-Fromm, ASF/Nokia Gate 5by LuceneSolrRevolution72 views
- 1:25:24 Introduction to Apache Mahout: How to Build a Recommenderby maprtech2,383 views
- 10:51 Apache Mahout Recommender Introductionby Fady El-Rukby2,197 views
- 39:39 Large Scale Search Discovery and Analytics with Hadoop, Mahout and Solrby HadoopSummit2,978 views
- 27:59 2013 May - Working with Mahoutby Data Science MD385 views
- Loading more suggestions...