Alert icon
We're changing our privacy policy. This stuff matters.  Learn more  Dismiss

Sampling Techniques for Massive Data

Loading...

Sign in or sign up now!
Alert icon
Upgrade to the latest Flash Player for improved playback performance. Upgrade now or more info.
4,652
Loading...
Alert icon
Sign in or sign up now!
Alert icon

Uploaded by on Oct 8, 2007

Google Tech Talks
March 27, 2007

ABSTRACT

Consider a giant data matrix A of N rows and D columns. At Web scale, both N and D can be in the order of billions. In applications including duplicate (doc) detections, word associations, databases, nearest neighbors, kernels (e.g., for SVM), it is often desirable to store a very small fraction (sample) of the data to fit in physical memory for quickly computing summary statistics (e.g. L1 or L2 distances). Because the data are often highly sparse, conventional sampling methods (i.e., randomly selecting a few columns from the data matrix) would not work well. Two sampling methods, conditional random sampling (CRS) and stable random projections (SRP),...

Category:

Howto & Style

Tags:

License:

Standard YouTube License

  • likes, 0 dislikes

Link to this comment:

Share to:
see all

All Comments (2)

Sign In or Sign Up now to post a comment!
  • dis video is maaaaaaaaaaaaaaad long. lol-LB3

  • Could you please add more information on random smples or random sampling?

Loading...

Alert icon
0 / 00Unsaved Playlist Return to active list
    1. Your queue is empty. Add videos to your queue using this button:
      or sign in to load a different list.
    Loading...Loading...Saving...
    • Clear all videos from this list
    • Learn more