In some applications such as filling out a customer information form on the web, some missing values may not be explicitly represented as such, but instead appear as potentially valid data values. Such missing values are known as disguised missing data, which may impair the quality of data analysis severely.
To handle this problem, we developed DiMaC, an effective tool to find the frequently used disguise values in data sets without any domain background knowledge. This video clip is a demonstration of DiMaC.
Please feel free to contact us if you are interested in applying DiMaC to clean your data sets. For more information, visit http://www.cs.sfu.ca/~mhua/personal/
Enjoy!
Related Publications:
1. M. Hua and J. Pei. "Cleaning Disguised Missing Data: A Heuristic Approach". In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'07), San Jose, California, USA, August 12-15, 2007.
2. M. Hua and J. Pei. "DiMaC: A Disguised Missing Data Cleaning Tool" (system demonstration). In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'08), Las Vegas, NV, USA, August 24-27, 2008.
3. M. Hua and J. Pei. "DiMaC: A System for Cleaning Disguised Missing Data" (system demonstration). In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD'08), Vancouver, Canada, June 11-14, 2008.
excellent work!
1888junkteam 2 years ago