Nathan Epstein - Machine Learning at Scale





The interactive transcript could not be loaded.


Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on May 31, 2016

PyData Berlin 2016

Python machine learning libraries like scikit-learn are a fantastic resource but not always well suited to large datasets. How can we use Python for machine learning in such cases? This talk will introduce PySpark and MLlib as tools for distributed machine learning. We will discuss what these tools are, how they work, and cover some basic code examples of machine learning on a cluster.

1) Intro
a. Why is scikit-learn not enough?
b. What is Spark?
c. What is MLlib?

2) Spark
a. Overview of Spark
b. Overview of PySpark
c. PySpark code sample

3) MLlib
a. Overview of MLlib
b. MLlib code samples

Comments are turned off
When autoplay is enabled, a suggested video will automatically play next.

Up next

to add this to Watch Later

Add to

Loading playlists...