Lecture 4 | Machine Learning (Stanford)
Top Comments
All Comments (30)
-
This video went viral on Vietnam
-
topics: Newton's method, more logistic regression, GLM
-
@iheartalgebra: I have totally mistaken Bernoulli Distribution for a Binomial Distribution, sorry.
May I ask you a question about this class, assuming you have continued studying it and maybe done some of its homeworks? Does it get more practical? I'm concerned that by the end of it, I might be able to derive GLM in my sleep, but be unable to apply Machine Learning to practice very well.
-
@MegaCrazyTaxiDriver Thanks for your reply. To your first point, I think it is common to refer to a random variable whose distribution function is Bernoulli as a Bernoulli random variable. I agree with your second point, but it does not speak to my question. Allow me to clarify: by "outcomes in {0,1}" I mean that P(X in {0,1}) = 1, i.e. the random variable must take on zero or one with probability one. Such a random variable has a Bernoulli distribution.
-
@iheartalgebra: No. First of all, Bernoulli is a distribution function, not a random variable type. Second of all, the range of a random variable doesn't mean much. It's the *probability* of each of those outcomes that completely define a random variable.
If a random variable gives us the number of successes of a 0-1 trial, then it does indeed have a Bernoulli distribution. Note, the variable itself has a range of all Natural Numbers.



Typically in machine learning, the likelihood function is expressed in terms of its logarithm. Since the logarithm is monotonic the maxima and minima are in the same location, but the signs are switched, so typically whether it is maxima or minima is understood from context.
As far as the problem of local minima, simple logistic sigmoid problems don't have them too often. For more sophisticated problems though, it happens a lot. Approximate stochastic methods are used then.
netheron 2 years ago 8
The derivation of newton method shown there is for minimising f(theta).
Then suddenly, f is replaced by L'(theta) in the hope that when L'(theta)=0, L(theta) attains its maximum.
First off, the way it was shown, L(theta) should be minimised. But even if it can be maximised, how can first derivative=0 be enough to tell if we have reached the local maximum? It can be any of the minimum, saddle point or the maximum?
saeedanwar77 3 years ago 4