Scikit Learn Machine Learning Tutorial with Python p. 7

In this scikit-learn machine learning tutorial with Python, we cover how to grab S&P 500 index data so that we can use the performance of the market as a benchmark to compare the stock price performance to. This way, we can label stocks as having been a good buy or not, based on whether or not they outperformed the market.

We’re going to use to get S&P 500 data.

sample code:

Bitcoin donations: 1GV7srgR4NJx4vrk7avCmmVQQrqmv87ty6


  1. Callum Ryan says:

    hi sendtex, thanks for the tutorial… when i run the python script nothing
    happens other than printing Total Debt/Equity mrq… When i then go and
    look at the csv file, it has only output the column headers though the
    first column header is blank, followed by Date (in Excel A2), Unix, Tick,
    DE Ratio… Do you know what could be wrong? I’ve read over the script a
    number of times now and see no differences…. I’m doing Python 2.7 by the
    way.. Thanks 

  2. Jordan says:

    Hey Sentdex, when I saved the CSV file I’m not sure where it was saved too
    in regards to the previous video. Also I’m not sure where to put the YAHOO
    INDEX GSPC file as its saying it doesn’t exist so I think I have it in the
    wrong folder.

  3. Jordan says:

    But I think I just got it. intial CSV was saved to the Python34 file and
    for it to work the YAHOO INDEX GSPC had to be in the same Python34 file

  4. Jakub Nowak says:

    Hi Sentdex. Thanks very much for this introduction to the machine learning.
    I am enjoying it very much and can’t wait to get the next part.

    I have got some question/comment for one bit of the code you generated in
    this part.
    When you defining stock_price. I am not sure if you getting all possible
    values from the dataset.

    stock_price =
    Basically after compiling the above I noticed that not all tickers are
    showing up with stock_price and when I looked in their webpage source code
    I noticed that in some cases the stock price is not directly preceded
    by ‘‘ phrase.

    Check for instance AON page source code.
    It looks like that:
    so whatever you passing to float contains a string ‘46.12′.
    I am not sure if this causes a problem with returning stock_price value for
    AON but it is completely missing. At least in my case.

    Maybe I have it wrong so thanks for clarification. Otherwise is there an
    easy fix for that problem?



  5. Taylor Dye says:

    When I print the stock_price and ticker, the entire file path is being
    printed with the stock ticker at the end of the file path instead of just
    the ticker being printed; how do I just print the ticker (i.e: a, aapl,
    etc)? Do I need to modify the source? My code is verbatim to that of yours.

    Running: Python 2.7 via PyCharm on Mac OS Yosemite

  6. Praful k says:

    Hey! Is it normal to take so long to mesh the data..I’m trying to get the
    S&P values and other feature data together in the same file. The meshing is
    taking time(like a day). Is that a lot?


    Man you are genius!! I like your way of teaching by getting our hands
    dirty. Thanks for posting such awesome videos!!!!!!!

  8. Π. ΚΑΡΑΠΑΣ says:

    1) I really love your tutorials,2)I cant really understand what this line
    does….: sp500_value = float(row[“Adjusted Close”])

  9. SantaUPSB says:

    The line :
    sp500_value = float(row[“Adjusted Close”])

    should be changed to:
    sp500_value = float(row[“Adj Close”])

    This was preventing the dataframe from being appended onto, leaving me with
    a blank csv file. Otherwise, great tutorial!

  10. Brian Lambert says:

    So my stock_price list is correct printing, however, it is only returning
    values up to “stock_price: 1.67 ticker: aig” and I am not receiving an
    error message?

  11. 0xtech says:

    As of now, the “Adjusted Close” column was renamed to “Adj Close” .. so
    this line becomes:
    sp500_value = float(row[“Adj Close”])

  12. Marco Amardeep Singh says:


    It seems that a lot of the prices in the HTML does not occur between the
    ” and “ code”.
    This causes errors. The quick fix would be to move the “stock_price =
    float(…)” just after the “value = float(…)” part.
    This way, the error caused from loading the price results in the code
    skipping the “try: sp500_date …. except: …” part.

    A better way, would be to search for two different types of prices, i.e one
    between the existing code and one that looks for the price in
    “id=”yfs_l84_act”>119.86“. I haven’t figured out to do this yet.

  13. mmuuuuhh says:

    I totally have no background on shares and trading. So could someone
    repeat/explain for me, what we are going to analyze here?

    Ticker := abbreviation for the company
    DE Ratio := ???
    Stock_Price := How much one share of that company costs
    SP500 := Average value of 500 shares

  14. Jayaprakash Subramaniam says:

    value = float(source.split(gather+’:



    sp500_date = datetime.fromtimestamp(unix_time).strftime(‘%Y-%m-%d’)
    row = sp500_df[(sp500_df.index == sp500_date)]
    sp500_value = float(row[“Adjusted Close”])
    sp500_date = datetime.fromtimestamp(unix_time-259200).strftime(‘%Y-%m-%d’)
    row = sp500_df[(sp500_df.index == sp500_date)]
    sp500_value = float(row[“Adjusted Close”])

    stock_price =
    print(“stock_price:”,stock_price,”ticker:”, ticker)

    I dont know why the values of “stock_price, date_stamp, unix_time, ticker,
    value, stock_price, sp500_value” are not passing after this try. i tried
    the stock_price before 2nd try its working fine but not after it!!! I
    couldnt figure out why?

Comments are closed.