Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on Aug 1, 2014
This talk was given at Midwest.io 2014.
Cloudera's Data Science Team has a simple mission: build an analytics infrastructure so awesome that it makes Google's Ads Quality Team seethe with jealousy. To that end, I'll give an overview of Cloudera's current data science tools, including Oryx and Spark for building and serving machine learning models, Gertrude for multivariate testing, and Impala for ludicrously high-performance SQL queries against HDFS.
About the Speaker
Josh Wills is Cloudera's Senior Director of Data Science, working with customers and engineers to develop Hadoop-based solutions across a wide-range of industries. He is the founder and VP of the Apache Crunch project for creating optimized MapReduce pipelines in Java and lead developer of Cloudera ML, a set of open-source libraries and command-line tools for building machine learning models on Hadoop. Prior to joining Cloudera, Josh worked at Google, where he worked on the ad auction system and then led the development of the analytics infrastructure used in Google+.