Open-Source Machine Learning on Big Data with H2O: Free open-source download at http://h2o.ai/download
We parse a dataset with 116M rows, 31 columns, and build 3 different models (Elastic Net Logistic Regression, Deep Learning and Gradient Boosting), all in less than 10 minutes. Everything runs on distributed servers. The only dependency is Java.
Software: H2O 3.0 (open-source) by H2O.ai http://h2o.ai/
Hardware: 8-node cluster on Amazon EC2, c3.2xlarge, 8 cores (Xeon E5-2680 v2), 15GB per node (12 GB for H2O)
Dataset: Airlines 1987-2007, 12GB CSV, 116M rows, 31 columns, 725 predictors
Models: Elastic Net Logistic Regression, Deep Learning and Gradient Boosting
Presenter: Arno Candel, PhD, Chief Architect, H2O.ai
Note: Models are built using the Flow GUI, not tuned for accuracy. H2O comes with R and Python client packages, as well as native integration with Java and Scala. H2O runs stand-alone or on top of Hadoop and Spark, HDFS, Yarn, Mesos, etc.
More info at http://h2o.ai Join the Movement: open source machine learning software from H2O.ai, go to Github repository https://github.com/h2oai
Do you like this? Check out more talks on open source machine learning software at: http://www.slideshare.net/0xdata