Uploaded by GoogleTechTalks on Oct 30, 2008
Google Tech Talks
October 29, 2008
ABSTRACT
For six years, I have worked on learning quality predictors from NASA data. Based on that experimence, this talk offers the following lessons from the trenches:
1) Real world data collection is more like ambulance chasing that bus driving. The old DoD model of rigorous process control just breaks down in the modern era of distributed software development. Rather than lament lack of formal process, we should adapt our learning methods to handle the idiosyncrasies of our data.
2) Accuracy, correlation and precision and not accurate or precise and may not correlate with any decision making process. This is especially true for data sets where only a small percentage of the data contains the target concept.
3) Static code attributes are a wide and shallow well- easy to get to the bottom, very hard to get much further. Our learners may have learned all they can learn from these attributes.
4) The only way up is sideways. My data miners have struck a performance ceiling and the only way up is to change the performance target.
5) The performance ceiling is very close- we can exploit that. Rather than large-scale automatic methods, it may be more productive to explore human-in-the-loop interactive learning strategies.
6) We can talk, but will they listen? Many times, I have found a clear signal in a software engineering data set. Clearly, our learners are good enough to assist managers in the difficult task of managing software projects. Sadly, all too often, some management edict is applied that effectively ends that project (e.g. collection of that data source is terminated). I offer some speculations on this peculiar effect.
References:
* "Implications of Ceiling Effects in Defect Predictors" by T. Menzies and B. Turhan and A. Bener and G. Gay and B. Cukic and Y. Jiang. Proceedings of PROMISE 2008 Workshop (ICSE) 2008 . Available from http://menzies.us/pdf/08ceiling.pdf .
* "Learning Better IVV Practices" by T. Menzies and M. Benson and K. Costello and C. Moats and M. Northey and J. Richarson. Innovations in Systems and Software Engineering March 2008 . Available from http://menzies.us/pdf/07ivv.pdf .
* Data Mining Static Code Attributes to Learn Defect Predictors" by Tim Menzies and Jeremy Greenwald and Art Frank. IEEE Transactions on Software Engineering January 2007 . Available from http://menzies.us/pdf/06learnPredict.pdf .
* "Problems with Precision" by Tim Menzies and Alex Dekhtyar and Justin Distefano and Jeremy Greenwald. IEEE Transactions on Software Engineering September 2007 . http://menzies.us/pdf/07precision.pdf .
* "Finding the Right Data for Software Cost Modeling" by Zhihao Chen and Tim Menzies and Dan Port and Barry Boehm. IEEE Software Nov 2005 . http://menzies.us/pdf/05chen.pdf
Speaker: Tim Menzies
Dr. Tim Menzies (tim@menzies.us) has been working on advanced modeling and AI since 1986. He received his PhD from the University of New South Wales, Sydney, Australia and is the author of over 170 refereeed papers.
A former research chair for NASA, Dr. Menzies is now a associate professor at the West Virginia University's Lane Department of Computer Science and Electrical Engineering.
Category:
Tags:
License:
Standard YouTube License
-
13 likes, 0 dislikes
Link to this comment:
36:02The Thorium Molten-Salt Reactor: Why Didn't Thi...by GoogleTechTalks19,468 views
1:00:07SPDY Essentialsby GoogleTechTalks1,334 views
57:54Mobile Web Performanceby GoogleTechTalks6,120 views
29:04HCIR 2011: Human Computer Information Retrieval...by GoogleTechTalks368 views
5:18Complete Control Data Flow - Carlson Software, Incby 3DMachineControl3,293 views
10:03Quantitative Methods Introby pomscm29,195 views
4:14Statics (Statics-07-10)by MrStatics6,208 views
51:44Lecture - 6 Formal Specificationby nptelhrd13,871 views
4:57Play That Funky Music - Wild Cherry (1976)by djbuddyloveoldschool2,611,705 views
3:20Barry Boehm at ISERN singing FORTRANby mzelkowitz369 views
7:01Functional Observers (Part 2 of 7)by tyronelfernando101 views
7:58Engineering Statics Video 02-20cby MrStatics44,715 views
37:57The Clean Code Talks - Don't Look For Things!by GoogleTechTalks46,956 views
7:05Learning Electrical Engineering from The Qur'anby zaidg6,028 views
1:29:54Think faster focus better and remember moreRewi...by GoogleTechTalks150,900 views
1:02:30Using Static Analysis For Software Defect Detec...by Google2,307 views
2:20Learn Microsoft Excel 2007by kenthephotoman34,037 views
56:31Teaching Kids To Codeby GoogleTechTalks59,668 views
1:01:15Faster HTML and CSS: Layout Engine Internals fo...by GoogleTechTalks52,584 views
32:08"The Clean Code Talks -- Unit Testing"by GoogleTechTalks61,488 views
- Loading more suggestions...
A fuck me? A fuck me? a fuck you!
M107CQ 2 years ago
At 49:30, Tim receives a question asking whether smaller modules have a higher probability of defects. The answer should be that smaller modules have less but proportionally more defects. This is called the theory of relative defect proneness. See Koru et al., Journal of Empirical Software Engineering, 13(5): p. 473-498
A. Gunes Koru
gkoru 3 years ago