Scikit Learn Machine Learning Tutorial for investing with Python p. 3

In this part of our machine learning tutorial with scikit-learn and Python, we’re covering how to acquire, label and organize our data, as well as figure out which machine learning algorithm to use.

Playlist link:

Flowchart for figuring out which machine learning algorithm to use:

To get company data, you can use sec.gov, finance.yahoo.com, or many other locations.

To alleviate the need for people to suck up tons of bandwidth, I have compiled and zipped up a sample dataset that is the straight HTML data as if you had parsed Yahoo Finance for over a decade.

The location:

sample code:

Bitcoin donations: 1GV7srgR4NJx4vrk7avCmmVQQrqmv87ty6

30 comments

  1. sentdex says:

    Searched Google for an EDGAR API, found some great information here. I’ll
    be continuing using this method for a bit, but eventually I am going to see
    if we can use EDGAR for more data.

  2. Oliver Insulander says:

    Thanks for the videos. Great work. There is a company that has done alot of
    work on fundamental investing and compiling fundamental data, thought I’d
    leave you their link if you have not already seen it. 

  3. dave597 says:

    Usually I will check the site for a text file called robots.txt, here it is
    it contains information on what browsers it
    allows, what scraping frequency to use, also sec.gov has a xml sitemap
    containing links thru the entire site, and you can filter them to the ones
    you want to download.

  4. Arpit Agrawal says:

    which compiler are u using…? I am using ANACONDA. The Problem that i am
    facing is I can’t write the code in new script.I have to write it in
    console itself..

  5. Jon doe says:

    another great tutorial. do you explain how to isolate/select links from a
    page and how to open those links? I scraped the yahoo/nasdaq/composite
    page, now would like to isolate and open all the letters of the alphabet on
    the page and to see and open all those links…just for the general
    practice of scraping a page, opening the links on the page(s), extracting
    and saving the info, then manipulating the info. Sounds pretty big, and it
    seems like that is where you are heading. Thanks man.

  6. Edward Shin says:

    Great Tuts!! Any chance of sharing the code for parsing yahoo finance? or
    perhaps guide to one of your awesome tutorials to learn how to do?

  7. Miles Lilly says:

    +sentdex In the video refer to P/E as a price to equity ratio, I believe
    P/E refers to price to earnings. source:
    

  8. Max Dignan says:

    P/E Ratio is Price (of a share currently) / Earnings (per a single share
    over the past 12 months).

  9. Lucas Pelegrino says:

    Hey thanks for taking the time to make these videos, it has been really
    helpful for beginners like me.

  10. test test says:

    hey Santdex, excellent videos dude. I finish all 26 straight. Can wait for
    you new videos. Pls keep up the great work.

    Can you share how you got the intraQuarter.zip? that seems to be the
    missing puzzle.

  11. Fernando Lovera says:

    Question here!
    How did you parse the yahoo webpage to actual data ? That would be
    interesting to know ( your way of doing it 😀 ).
    Again, great video tutorial series (Y)

  12. JP G says:

    I’m trying to learn Machine Learning & I’m not a stock market geek. The
    first 20 minutes of this was a waste of my time. You could have just told
    me to download the “intraQuarter.zip” and moved on.

  13. Pablo Torre says:

    The arelle project allows you to get all of the data from the SEC’s EDGAR.
    🙂
    It is a python project, and it even has an SQL dump that you can download
    in one shot for PostgreSQL and Titan DBs.

    it also allows you to process files downloaded throught the SEC’s ftp
    service
    (the proper way to get SEC files to avoid the whole DDOS…)
    

  14. sofo boachie says:

    Hello +sentdex , I am coming here from your website, I just noticed the
    “for investing” part when I came here. I am not quite familiar with
    financial terms and it makes it a little hard understanding and so I was
    wondering what should be my path if I want to go through the tuts without
    dealing with all the financial part? Thanks for all your good work, you
    really helped me.

  15. Jesús Gómez says:

    9:06 you ask “what are the terms for parsing”. This is not a direct answer
    but a related info: One of the footer links says “Open Government”. That
    page has a “dataset” section you may find interesting.

  16. Ajay Jaiswal says:

    Thanks Man !!! It is always great watching your videos. Good Work. Can you
    tell us how to play with the yahoo finance URLs to get historical data of
    the companies. Thanks again

  17. Zach Hespelt says:

    User-agent: *
    Allow: /Archives/edgar/data
    Disallow: /Archives/bin
    Disallow: /Archives/etc
    Disallow: /Archives/usr
    Disallow: /cgi-bin
    Disallow: /bin
    Disallow: /Archives/edgar/vprr/XXXX
    Disallow: /Archives/edgar/vprr/vprr_removal
    Disallow: /Archives/edgar/vprr/bin
    Disallow: /nb
    Disallow: /include
    Disallow: /0
    Disallow: /video/samples
    Disallow: /video/live
    Disallow: /Archives/edgar/data/1473971/000109181814000042/ex101002.gif
    sitemap:

    No rules listed on how much/often you can scrape.
    Looks like you can:
    1. use any user-agent,
    2. scrape from /Archives/edgar/data only
    3. get the sitemap from 

  18. Mohanish Nehete says:

    It will be really helpful if you show us the code for parsing or anything
    that might be helpful.

  19. Liam MacConn says:

    Robots.txt usually covers the rules for spiders and automated downloads.
    For example: http://www.sec.gov/robots.txt

    User-agent: * Allow: /Archives/edgar/data Disallow: /Archives/bin Disallow:
    /Archives/etc Disallow: /Archives/usr Disallow: /cgi-bin Disallow: /bin
    Disallow: /Archives/edgar/vprr/XXXX Disallow:
    /Archives/edgar/vprr/vprr_removal Disallow: /Archives/edgar/vprr/bin
    Disallow: /nb Disallow: /include Disallow: /0 Disallow: /video/samples
    Disallow: /video/live Disallow: /video2/samples Disallow: /video2/live
    Disallow: /Archives/edgar/data/1473971/000109181814000042/ex101002.gif
    sitemap: 

  20. Rami Alshafi says:

    How did you get the historical key statistics and earnings? I searched
    yahoo’s site and I got the historical stock price but only the current key
    stat and earnings, nothing older than that… I was not able to find any
    old key stats like you did (which I understand you zipped up and offered
    for download) but I would like to learn how you gathered it together? or at
    least how to find any old key stats in yahoo’s site?
    Thanks!

Comments are closed.