Lecture 3 | Machine Learning (Stanford)

Help us caption and translate this video on Amara.org:

Lecture by Professor Andrew Ng for Machine Learning (CS 229) in the Stanford Computer Science department. Professor Ng delves into locally weighted regression, probabilistic interpretation and logistic regression and how it relates to machine learning.

This course provides a broad introduction to machine learning and statistical pattern recognition. Topics include supervised learning, unsupervised learning, learning theory, reinforcement learning and adaptive control. Recent applications of machine learning, such as to robotic control, data mining, autonomous navigation, bioinformatics, speech recognition, and text and web data processing are also discussed.

Complete Playlist for the Course:

CS 229 Course Website:

Stanford University:

Stanford University Channel on YouTube:

30 comments

  1. jlastre says:

    Typically when normalizing you subtract the mean of the y not the assumed
    errors which is 0. If you are having problems visualizing why we can assume
    N(0,sigma) IID might I suggest looking up the Central Limit Theorem or
    quincunx (bean machine).

  2. groovANDsooth says:

    Andrews voice is my ASMR trigger i love to listen to him ! other than that
    i don’t know what the hell he is talking about ……..

  3. Jinchuan Tang says:

    It’s not a Gaussian function, but a bell-shape function (Even though
    Gaussian is also a bell shape). Check Wiki for Normal distribution
    definition (just see it as a function not as distribution), note sqrt(2*pi)
    is missing. To tactics40’s comment: x is fixed, x_i is changing. Also think
    it in another way, at 22:29. If it is a Gaussian, because of the stretching
    of a Gaussian – in order to maintain the whole integration to 1, the peak
    size of the Gaussian will scale down or up. But we have the intention to
    make this peak to 1 when x_i is very close to x. So Gaussian could not be
    used here unless the you want x_i behaves differently close to x. (Also at
    22:11, he erased a Gaussian Shape curve and redraw another bell shape which
    has the same peak size)
    When doing research, you may find that math is the most beautiful and
    convincing way to persuade people to understand a question or agree with a
    paper (You may not agree, but that’s fine).
    University Calculus (MIT), Linear Algebra and Probability for undergraduate
    are all the knowledge that u need. Just find some good online videos to
    teach u)

  4. E B says:

    27.00 Expanding on that guy’s question, given a large dataset of relatively
    close points, if you’re asked to predict for a given x, why not simply take
    the two closest observations on either side, and do linear interpolation
    between them? As the distance between the observations shrinks, you will
    just end up with something very closely approximating the observed curve.
    You can do this with a lookup table, which would be computationally very
    fast to return an answer.

  5. Benoit Descamps says:

    the people in that lecture look really old (around 25-30), although the
    level of the lecture seems like bachelor level…

  6. Benoit Descamps says:

    In which lecture does the machine learning part begin? The lectures so far
    were basic statistics and numerical methods…

  7. Level Icarus says:

    46:21 where did the plus come from in the (y(i) + theta transpose *
    x(i))^2? Shouldn’t that be a minus?

  8. Vyas Sathya says:

    He promised that no one in the class would be shown in lecture one. 4:20 ,
    gotcha andrew ng…

  9. James Pollard says:

    13:50 Using capital X as the label for the X axis, lower case x as a query
    point, and crosses as points on the graph… Nice one, Doc.

  10. James Pollard says:

    34:40 I gather that the central limit theorem wouldn’t apply if one had a
    bunch of effects which WOULDN’T apply (for example, for housing prices, ZIP
    code). Could the Gaussian curve be used as a measuring stick to detect
    measurement problems?

  11. Rohit says:

    Lecture1: Machine Learning is so cool
    Lecture2: Okay! Let me understand this.
    Lecture3: FML.

  12. welcomehelloJ says:

    i love these videos…the students have many questions which clarifies a gd
    deal of stuffs….so far the best set of lectures i have seen…thank you
    stanford!

  13. Akshat says:

    If I have to predict each value individually in locally weighted
    regression, then how do I plot the graph in general for the same?

  14. Computer Science Through Real Problems & Solutions says:

    Only formulas? more time should be spent on explaining the concept why are
    we squaring why rooting why epsilon.
    Every lecture should start with problems and solve them using concept and
    than mathematics, looks like it is reverse.

  15. Akshat says:

    I don’t understand the part where he says, [theta] is not a random
    variable. I mean, aren’t we changing the values of [theta] and obtaining
    the probability density of the error? And doesn’t that make [theta] a
    random variable?

Comments are closed.