Introduction to Machine Learning on Apache Spark MLlib

Speaker: Juliet Hougland, Senior Data Scientist, Cloudera

Spark MLlib is a library for performing machine learning and associated tasks on massive datasets. With MLlib, fitting a machine-learning model to a billion observations can take only a few lines of code, and leverage hundreds of machines. This talk will demonstrate how to use Spark MLlib to fit an ML model that can predict which customers of a telecommunications company are likely to stop using their service. It will cover the use of Spark’s DataFrames API for fast data manipulation, as well as ML Pipelines for making the model development and refinement process easier.


  1. Henrik B Sørensen says:

    Brilliant speaker. Love the content and the topic. However, she’s much too
    “cute/cuddly” with the poll and the “reaction to the dog”… Great
    presentation and certainly not the last time, I’m watching her videos.

  2. Arindom Bhattacharjee says:

    Thanks for the informative video.
    I could not find the tutorial link the speaker was referring to. can
    anybody please share.

Comments are closed.