Vincent Warmerdam - PySpark and Warcraft Data





The interactive transcript could not be loaded.


Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on Aug 2, 2015

Vincent Warmerdam - PySpark and Warcraft Data
[EuroPython 2015]
[21 July 2015]
[Bilbao, Euskadi, Spain]

In this talk I will describe how to use Apache Spark (PySpark) with
some data from the World of Warcraft API from an iPython notebook.
Spark is interesting because it speeds up iterative processes on your
hadoop cluster as well as your local machine.

I will give basic benchmarks (comparing it to numpy/pandas/scikit),
explain the architecture/performance behind the technology and will
give a live demo on how I used Spark to analyse an interesting
dataset. I'll explain why you might want to use Spark and I'll also go
in and explain when you don't want to use it.

The dataset I will be using is a 22Gb json blob containing auction
house data from all world of warcraft servers over a period of time.
The goal of the analysis will be to determine when and if basic
economics still applies in a massively online game.

I will assume that the everyone knows what the ipython notebook is and
I will assume a basic knowledge of numpy/pandas but nothing fancy. The
dataset has been chosen such that people who are less interested in
Spark can still enjoy the analysis part of the talk. If you know very
little about data science but if you love video games then you should
like this talk.


When autoplay is enabled, a suggested video will automatically play next.

Up next

to add this to Watch Later

Add to

Loading playlists...