Ken Krugler, the founder of Bixo Labs, describes the Public Terabyte Dataset project - a large-scale web crawl that uses SimpleDB, Hadoop, Cascading and Bixo in the Amazon's EMR cloud.
See the slides from this presentation on the Yahoo! Developer Network's Hadoop Blog: http://bit.ly/dpq67T
Link to this comment:
All Comments (0)