Scikit Learn Machine Learning Tutorial for investing with Python p. 5

In this video, we build on the previous machine learning with scikit-learn tutorial, and we’re going to be pulling out the specific data point that we’re interested in as using as a feature.

sample code:

Bitcoin donations: 1GV7srgR4NJx4vrk7avCmmVQQrqmv87ty6

18 comments

  1. Ancient Entity New Channel says:

    A bit ago I requested a video and I was wondering if they will be at the
    end of your list soon it was taking screenshots and compiling screen shots
    into a video also a new request is a screen recorder

  2. Trog Lobyte says:

    Love your vids and appreciate them very much. Is it possible that you can
    give a 3 second pause between code runs? I regularly try and pause at just
    the right time and run mine against yours. I run Fedora and the ticket =
    each_dir.split(“\”)[1] fails, clearly the difference of running Windoze vs
    NixNux. Are you able to point out OS differences in the future?

  3. Rocco Cocolombo says:

    To access the ticker should it not ne:

    ticker = each_dir.split(“/”)[-1] # -1 is the last field
    or
    ticker = each_dir.split(“\”)[-1]
    ?

  4. Tony Fly says:

    Hello,Harrison.
    Thanks for the tutorial. I found that my code can not get all the ‘Total
    Debt/Equity’.
    After geting some of the value ,it start to throw me an error [IndexError:
    list index out of range].

    I checked the html sourcecode .the standard one we searching should be:

    Total Debt/Equity
    (mrq):0

    BUT.there are some exceptions on those source code :

    intraQuarter/_KeyStats/aapl/20060207091730.html:

    Total Debt/Equity (mrq):0

    intraQuarter/_KeyStats/aee/20090221005651.html:

    Total Debt/Equity (mrq):N/A

    would you show us some beautiful soup skills to get around it? Thank you .

  5. moeabdol says:

    there is a much better way for parsing the data using ‘requests’ and
    ‘BeautifulSoup’ python libraries. They are super easy to learn and use.
    cheers

  6. hanlonko1 says:

    I am not sure your code works on Mac’s. I watched 24 episodes and then
    started over agin this time executing your code lesson by lesson. I am
    stopped cold in this lesson as I keep getting an error with the date_stamp
    when it gets to appl. I have cut and pasted your code and get same error.
    error reads – ValueError: time data ‘.DS_Store’ does not match format
    ‘%Y%m%d%H%M%S.html’
    Any ideas?
    .

  7. Nick Duddy says:

    Hi +sentdex. I’m a total beginner to Python and the most I’ve ever coded is
    HTML. So, I’m a super newbie. I’m running the script but not seeing an
    output when I print nor do I see any errors. I’m a bit confused to what’s
    going wrong :S

  8. Mon Baroi says:

    Tip people who are using Mac operating systems…
    For the “ticker = each_dir.split” snippet of code. What worked for me is
    going through my entire file directory till I got to the file ticker name.
    So it kinda looked like this:

    ticker =
    each_dir.split(‘/Users/UserName/Desktop/intraQuarter/_KeyStats/’)[1]

    Hopefully this will help some folks that might be stuck on Mac computers 🙂
    @sentdex Im liking the tutorials so far! Great job FYI.

  9. 何念泰 says:

    Still shows out of range
    ticker =
    each_dir.split(“/Users/xxx/Desktop/Coding/Python/MachineLearningStockData/intraQuarter/_KeyStats/”)[1]
    IndexError: list index out of range

  10. Dan Fisher says:

    For Mac users, ValueError: time data ‘.DS_Store’ does not match format
    ‘%Y%m%d%H%M%S.html’ is due to Mac OS automatically creating .DS_Store files
    for each folder. They are hidden but the python script includes them. If
    you run into this error, all you need to do is delete the .DS_Store file.
    Search “recursively remove .DS_Store files” for instructions.

  11. Kenneth Nielsen says:

    Hi (and thanks for all these really nice tutorials), it seems that there is
    something iffy with the aapl ticker for the file named 20060203134959.html
    (and others as well). Using source.split(gather+’:

    ‘)[1].split(‘

    ‘)[0]
    results in a “list index out of range” error. I did ctrl+U on it, and it
    seems that the line is cut off after . I did a hack to circumvent,
    which is:
    try:
    value = source.split(gather+’:

    ‘)[1].split(‘

    ‘)[0]
    except Exception as e:
    print str(e)
    value = float(‘nan’)
    but it is not a very good hack since the value should be 0.

  12. Arnav Arora says:

    Great set of lectures!
    I had an issue, hopefully you can assist me with it.

    While parsing the local files, the code picks up the files from srcl (in
    KeyStats) and proceeds further instead of starting from a (the first file)
    for no apparent reason. Can’t seem to figure out the reason why. I’ve tried
    using the same code as the one published on your website, same thing
    happens.

Comments are closed.