Katharine Jarmul - Holy D@t*! How to Deal with Imperfect, Unclean Datasets





The interactive transcript could not be loaded.


Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on May 31, 2016

PyData Berlin 2016

Ever wondered what sort of sick person created the datasets you work with? Sadly, we can't answer that question directly, but we can aim to handle messy data problems. From the non-significant or null datasets, to unclean and unclear string data, to difficult formats like PDFs, we'll take a closer look at how to best work with imperfect data and what questions you can answer given your datasets.

This talk will cover how to handle and manage working with unclean and imperfect datasets. We'll cover several issues and suggestions as well as some code examples for managing that messy data.

- The Noble Quest against Messy Data
- Working with null data
- Insignificant data
- Messy strings Regex Fuzzy Match
- XML / HTML Data
- PDF Data
- Where to go from here

Slides: https://docs.google.com/presentation/...

Comments are turned off
When autoplay is enabled, a suggested video will automatically play next.

Up next

to add this to Watch Later

Add to

Loading playlists...