Speaker: Juliet Hougland, Senior Data Scientist, Cloudera
Spark MLlib is a library for performing machine learning and associated tasks on massive datasets. With MLlib, fitting a machine-learning model to a billion observations can take only a few lines of code, and leverage hundreds of machines. This talk will demonstrate how to use Spark MLlib to fit an ML model that can predict which customers of a telecommunications company are likely to stop using their service. It will cover the use of Spark’s DataFrames API for fast data manipulation, as well as ML Pipelines for making the model development and refinement process easier.