Dedupe, Merge, and Purge: the Art of Normalization

Loading...

Sign in or sign up now!
Alert icon
Upgrade to the latest Flash Player for improved playback performance. Upgrade now or more info.
263 views
Loading...
Alert icon
Sign in or sign up now!
Alert icon

Uploaded by on Nov 9, 2011

Big Noise always accompanies Big Data, especially when extracting entities from the tangle of duplicate, partial, fragmented and heterogeneous information we call the Internet. The ~17m physical businesses in the US, for example, are found on over 1 billion webpages and endpoints across 5 million domains and applications. Organizing such a disparate collection of pages into a canonical set of things requires a combination of distributed data processing and human-based domain knowledge. This presentation stresses the importance of entity resolution within a business context and provides real-world examples and pragmatic insight into the process of canonicalization.

Info on Strata Conference website: http://strataconf.com/stratany2011/public/schedule/detail/21389

Slides on Slideshare: http://www.slideshare.net/TylerBell/dedupe-merge-and-purge-the-art-of-normali...

Category:

People & Blogs

Tags:

License:

Standard YouTube License

Link to this comment:

Share to:
see all

All Comments (0)

Sign In or Sign Up now to post a comment!
Loading...

Alert icon
0 / 00Unsaved Playlist Return to active list
    1. Your queue is empty. Add videos to your queue using this button:
      or sign in to load a different list.
    Loading...Loading...Saving...
    • Clear all videos from this list
    • Learn more