The comment at about 47:14 is a little unclear. If there are only two classes, then you haven't made a "choice" to model the class variable as Bernoulli: any random variable with outcomes in {0,1} is Bernoulli, right?
@iheartalgebra: No. First of all, Bernoulli is a distribution function, not a random variable type. Second of all, the range of a random variable doesn't mean much. It's the *probability* of each of those outcomes that completely define a random variable.
If a random variable gives us the number of successes of a 0-1 trial, then it does indeed have a Bernoulli distribution. Note, the variable itself has a range of all Natural Numbers.
@MegaCrazyTaxiDriver Thanks for your reply. To your first point, I think it is common to refer to a random variable whose distribution function is Bernoulli as a Bernoulli random variable. I agree with your second point, but it does not speak to my question. Allow me to clarify: by "outcomes in {0,1}" I mean that P(X in {0,1}) = 1, i.e. the random variable must take on zero or one with probability one. Such a random variable has a Bernoulli distribution.
@iheartalgebra: I have totally mistaken Bernoulli Distribution for a Binomial Distribution, sorry.
May I ask you a question about this class, assuming you have continued studying it and maybe done some of its homeworks? Does it get more practical? I'm concerned that by the end of it, I might be able to derive GLM in my sleep, but be unable to apply Machine Learning to practice very well.
This comment has received too many negative votesshow
Useless Lecturer. He can't even explain without papers. How can these jerks can go to stanford. very stupid lecture. reading from paper and then say "ahem" then proceed. cramming theory.
It may look like he's lost in the woods, and the handwriting gets really bad, but then read the lecture notes (on his webpage): That's very clear and well presented material as far as I've seen it. I think his preparation is excellent when writing those notes, i just can't read his scribbles - so going over the lecture notes becomes crucial.
He kind of lost for forest for the trees in this lecture. He went into elaborate detail about the exponential family of funcitons without really making clear WHY we were doing this.
No disrespect to Prof Ng intended, though. As an educator myself, I appreciate how tricky it is to bring this much material together and see it through the eyes of the intended audience.
The derivation of newton method shown there is for minimising f(theta).
Then suddenly, f is replaced by L'(theta) in the hope that when L'(theta)=0, L(theta) attains its maximum.
First off, the way it was shown, L(theta) should be minimised. But even if it can be maximised, how can first derivative=0 be enough to tell if we have reached the local maximum? It can be any of the minimum, saddle point or the maximum?
Typically in machine learning, the likelihood function is expressed in terms of its logarithm. Since the logarithm is monotonic the maxima and minima are in the same location, but the signs are switched, so typically whether it is maxima or minima is understood from context.
As far as the problem of local minima, simple logistic sigmoid problems don't have them too often. For more sophisticated problems though, it happens a lot. Approximate stochastic methods are used then.
I always wonder why newton's method can converge once it is close enough... If there is a point where f ' (theta) is zero or near to zero, then the next point can fall far away.
This has been flagged as spam show
This is one very "robust" video. =p
eddyallen615 1 week ago
This video went viral on Vietnam
tysonstuart13 1 month ago
This has been flagged as spam show
very well topic.. machine learning and statistical pattern recognition..
lovelplants 1 month ago
This has been flagged as spam show
He looks like Spiderman.
grunder20 1 month ago
This has been flagged as spam show
Expecting more videos from Ng. Keep it up!
grunder20 2 months ago
This has been flagged as spam show
The answer is : It does not change.! Oh! Professor..
szproxy 8 months ago
topics: Newton's method, more logistic regression, GLM
VancouverData 8 months ago
Professor Ng in indeed a great teacher. Thanks to Stanford University !!
railibra 8 months ago
The comment at about 47:14 is a little unclear. If there are only two classes, then you haven't made a "choice" to model the class variable as Bernoulli: any random variable with outcomes in {0,1} is Bernoulli, right?
iheartalgebra 9 months ago
@iheartalgebra: No. First of all, Bernoulli is a distribution function, not a random variable type. Second of all, the range of a random variable doesn't mean much. It's the *probability* of each of those outcomes that completely define a random variable.
If a random variable gives us the number of successes of a 0-1 trial, then it does indeed have a Bernoulli distribution. Note, the variable itself has a range of all Natural Numbers.
MegaCrazyTaxiDriver 8 months ago
@MegaCrazyTaxiDriver Thanks for your reply. To your first point, I think it is common to refer to a random variable whose distribution function is Bernoulli as a Bernoulli random variable. I agree with your second point, but it does not speak to my question. Allow me to clarify: by "outcomes in {0,1}" I mean that P(X in {0,1}) = 1, i.e. the random variable must take on zero or one with probability one. Such a random variable has a Bernoulli distribution.
iheartalgebra 8 months ago
@iheartalgebra: I have totally mistaken Bernoulli Distribution for a Binomial Distribution, sorry.
May I ask you a question about this class, assuming you have continued studying it and maybe done some of its homeworks? Does it get more practical? I'm concerned that by the end of it, I might be able to derive GLM in my sleep, but be unable to apply Machine Learning to practice very well.
MegaCrazyTaxiDriver 8 months ago
This comment has received too many negative votes show
Useless Lecturer. He can't even explain without papers. How can these jerks can go to stanford. very stupid lecture. reading from paper and then say "ahem" then proceed. cramming theory.
ictunsw2113 1 year ago
Chemistry + Machine Learning = QSPR + Fun :)
darfunkelidas 1 year ago
slowly its making sense!
kapildalwani 1 year ago
It may look like he's lost in the woods, and the handwriting gets really bad, but then read the lecture notes (on his webpage): That's very clear and well presented material as far as I've seen it. I think his preparation is excellent when writing those notes, i just can't read his scribbles - so going over the lecture notes becomes crucial.
rewtnode 1 year ago
@rewtnode where can i find his lecture notes?
bullbunnies 1 year ago
He kind of lost for forest for the trees in this lecture. He went into elaborate detail about the exponential family of funcitons without really making clear WHY we were doing this.
No disrespect to Prof Ng intended, though. As an educator myself, I appreciate how tricky it is to bring this much material together and see it through the eyes of the intended audience.
NorwalkPost 1 year ago
Comment removed
NorwalkPost 1 year ago
lol they ask pretty good questions, but he can't answer.
this is a very superficial lecture .....
pinochet222 1 year ago
are the discussion sessions on line?
gekorio 1 year ago
excellent work!
1888junkteam 2 years ago
its awsum......one of the best lecture
naughtyamit007 2 years ago
This has been flagged as spam show
wtf i s this guy on abart what a load aof shite
crazybeautifulworld 2 years ago
The derivation of newton method shown there is for minimising f(theta).
Then suddenly, f is replaced by L'(theta) in the hope that when L'(theta)=0, L(theta) attains its maximum.
First off, the way it was shown, L(theta) should be minimised. But even if it can be maximised, how can first derivative=0 be enough to tell if we have reached the local maximum? It can be any of the minimum, saddle point or the maximum?
saeedanwar77 3 years ago 4
Typically in machine learning, the likelihood function is expressed in terms of its logarithm. Since the logarithm is monotonic the maxima and minima are in the same location, but the signs are switched, so typically whether it is maxima or minima is understood from context.
As far as the problem of local minima, simple logistic sigmoid problems don't have them too often. For more sophisticated problems though, it happens a lot. Approximate stochastic methods are used then.
netheron 2 years ago 8
Comment removed
saeedanwar77 3 years ago
I always wonder why newton's method can converge once it is close enough... If there is a point where f ' (theta) is zero or near to zero, then the next point can fall far away.
Scutchris 3 years ago
0:46:50
it is not proof the pram = thita T * X,
but the pram is defined as thita T * X previously.
lqk1985 3 years ago
This has been flagged as spam show
first post lol
11111lololol11111 3 years ago