Lecture 10 | Machine Learning (Stanford)
Top Comments
All Comments (16)
-
@darfunkelidas I was thinking of logistic regression and decision trees. Both are blazingly fast compared to SVM, to train and predict, and great for quick evaluation when you are frequently changing features :) Libsvm recommends you do cross-validation to find good parameters (C and gamma for Gaussian), which for me was painfully slow to watch. My data was in the 10,000s with dimensions around 12 maybe, don't have the dataset on my computer so can't give definitiive numbers.
-
@nghiaho12 I know, VC theory is very complex! SVM's are usually fast (depending on your speed standards!) but I guess you took time optimizing the parameters for the kernel and error penalty, etc? What was the size of your dataset? If its a large dataset then I guess it's "notmal" that its picking up lots of SV.
-
@darfunkelidas I actually got a bit lost in the theoretical stuff, kinda dozed off so can't help you there :) As a side comment. I have used SVM briefly and have found them to be very slow in training and classification, using libsvm. It seems to pick way too many support vectors (in the 100s and 1000s) in general (at least for my problems), which slows down classification considerably. This issue wasn't discussed in the 2011 online Machine Learning course, which he should.
-
@nghiaho12 I don't get one thing. He sayd VC dimension bounds are lose so he doesn't use them for model selection. But then he says that SVMs are the best machine learning algorithm, when the very thing that is supposed to make them better is the fact that they are based on he idea of minimizing those theoretical bounds. Isn't that kind of contradictory?
-
These lectures are excellent. I appreciate the clarity of his concepts.



The first 40 min or so is about theoretical bounds for the number of training data required. The more interesting and practical stuff is after that, where he presents practical issues on applying classifiers to real world problems. Here is my quick summary:
40:10 - Model selection, avoiding under/over fitting, Hold out cross validation
44:40 - K-fold cross validation
53:25 - Rule of thumb for number of training set vs number of features, for logistic regression
54:30 - Feature selection
nghiaho12 4 months ago 4
i love how everyone (in the comments) disappeared after the support vector lecture :)
VancouverData 8 months ago 2