Matthew Rocklin | Dask for ad hoc distributed computing





The interactive transcript could not be loaded.


Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on Oct 24, 2016

PyData DC 2016

This talk discusses parallel and distributed computing in Python, particularly for ad-hoc and custom algorithms. It focuses on Dask, a Python solution for flexible distributed computing.

The Python data science stack contains efficient algorithms with intuitive interfaces for sophisticated and friendly analysis. As the data science community tackles larger problems with larger hardware we naturally ask how best to parallelize this software stack both across many cores in a single computer and across computers in a cluster. This turns out to be harder than it looks, even with traditional Big Data tools like MapReduce, Storm, and Spark. Both the complexity of the algorithms and the high expectations for interactivity raise challenges for these systems. This talk lays out the benefits and challenges of parallelizing a numeric analytic stack, and then describes Dask, a parallel framework gaining traction within the Python community for interactive performant parallel computing, and finally goes through a few domains where this work is enabling novel science today.

Comments are disabled for this video.
When autoplay is enabled, a suggested video will automatically play next.

Up next

to add this to Watch Later

Add to

Loading playlists...