Google Tech Talks
March 27, 2007
ABSTRACT
Consider a giant data matrix A of N rows and D columns. At Web scale, both N and D can be in the order of billions. In applications including duplicate (doc) detections, word associations, databases, nearest neighbors, kernels (e.g., for SVM), it is often desirable to store a very small fraction (sample) of the data to fit in physical memory for quickly computing summary statistics (e.g. L1 or L2 distances). Because the data are often highly sparse, conventional sampling methods (i.e., randomly selecting a few columns from the data matrix) would not work well. Two sampling methods, conditional random sampling (CRS) and stable random projections (SRP),...
dis video is maaaaaaaaaaaaaaad long. lol-LB3
ihatehumans11213 3 years ago
Could you please add more information on random smples or random sampling?
Bonzts 4 years ago