Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on May 31, 2016
PyData Berlin 2016
Python machine learning libraries like scikit-learn are a fantastic resource but not always well suited to large datasets. How can we use Python for machine learning in such cases? This talk will introduce PySpark and MLlib as tools for distributed machine learning. We will discuss what these tools are, how they work, and cover some basic code examples of machine learning on a cluster.
1) Intro a. Why is scikit-learn not enough? b. What is Spark? c. What is MLlib?
2) Spark a. Overview of Spark b. Overview of PySpark c. PySpark code sample
3) MLlib a. Overview of MLlib b. MLlib code samples