Added: 3 years ago
From: StanfordUniversity
Views: 48,171
Sort by time | Sort by thread (beta)

Link to this comment:

Share to:
see all

All Comments (30)

Sign In or Sign Up now to post a comment!
  • This video went viral on Vietnam

  • topics: Newton's method, more logistic regression, GLM

  • Professor Ng in indeed a great teacher. Thanks to Stanford University !!

  • The comment at about 47:14 is a little unclear. If there are only two classes, then you haven't made a "choice" to model the class variable as Bernoulli: any random variable with outcomes in {0,1} is Bernoulli, right?

  • @iheartalgebra: No. First of all, Bernoulli is a distribution function, not a random variable type. Second of all, the range of a random variable doesn't mean much. It's the *probability* of each of those outcomes that completely define a random variable.

    If a random variable gives us the number of successes of a 0-1 trial, then it does indeed have a Bernoulli distribution. Note, the variable itself has a range of all Natural Numbers.

  • @MegaCrazyTaxiDriver Thanks for your reply. To your first point, I think it is common to refer to a random variable whose distribution function is Bernoulli as a Bernoulli random variable. I agree with your second point, but it does not speak to my question. Allow me to clarify: by "outcomes in {0,1}" I mean that P(X in {0,1}) = 1, i.e. the random variable must take on zero or one with probability one. Such a random variable has a Bernoulli distribution.

  • @iheartalgebra: I have totally mistaken Bernoulli Distribution for a Binomial Distribution, sorry.

    May I ask you a question about this class, assuming you have continued studying it and maybe done some of its homeworks? Does it get more practical? I'm concerned that by the end of it, I might be able to derive GLM in my sleep, but be unable to apply Machine Learning to practice very well.

  • Chemistry + Machine Learning = QSPR + Fun :)

  • slowly its making sense! 

  • It may look like he's lost in the woods, and the handwriting gets really bad, but then read the lecture notes (on his webpage): That's very clear and well presented material as far as I've seen it. I think his preparation is excellent when writing those notes, i just can't read his scribbles - so going over the lecture notes becomes crucial.

  • @rewtnode where can i find his lecture notes?

  • He kind of lost for forest for the trees in this lecture. He went into elaborate detail about the exponential family of funcitons without really making clear WHY we were doing this.

    No disrespect to Prof Ng intended, though. As an educator myself, I appreciate how tricky it is to bring this much material together and see it through the eyes of the intended audience.

  • Comment removed

  • lol they ask pretty good questions, but he can't answer.

    this is a very superficial lecture .....

  • are the discussion sessions on line?

  • excellent work!

  • its awsum......one of the best lecture

  • The derivation of newton method shown there is for minimising f(theta).

    Then suddenly, f is replaced by L'(theta) in the hope that when L'(theta)=0, L(theta) attains its maximum.

    First off, the way it was shown, L(theta) should be minimised. But even if it can be maximised, how can first derivative=0 be enough to tell if we have reached the local maximum? It can be any of the minimum, saddle point or the maximum?

  • Typically in machine learning, the likelihood function is expressed in terms of its logarithm. Since the logarithm is monotonic the maxima and minima are in the same location, but the signs are switched, so typically whether it is maxima or minima is understood from context.

    As far as the problem of local minima, simple logistic sigmoid problems don't have them too often. For more sophisticated problems though, it happens a lot. Approximate stochastic methods are used then.

  • Comment removed

  • I always wonder why newton's method can converge once it is close enough... If there is a point where f ' (theta) is zero or near to zero, then the next point can fall far away.

  • 0:46:50

    it is not proof the pram = thita T * X,

    but the pram is defined as thita T * X previously.

Loading...
0 / 00Unsaved Playlist Return to active list
    1. Your queue is empty. Add videos to your queue using this button:
      or sign in to load a different list.
    Loading...Loading...Saving...
    • Clear all videos from this list
    • Learn more