Rich Caruana, Microsoft Research
In machine learning often a tradeoff must be made between accuracy and intelligibility: the most accurate models usually are not very intelligible (e.g., deep nets, boosted trees, and random forests), and the most intelligible models usually are less accurate (e.g., linear or logistic regression). This tradeoff often limits the accuracy of models that can be used in mission-critical applications such as healthcare where being able to understand, validate, edit, and ultimately trust a learned model is important. We have developed a learning method based on generalized additive models called GA2Ms that is often as accurate as full complexity models, but as intelligible as linear/logistic regression models. GA2Ms not only make it easy to understand what a model learned and how it makes predictions, but it also makes it easier to edit the model when it learns “bad” things. These bad things typically arise not because the learning algorithm is wrong, but because the data has unexpected “landmines” hidden in it. Making it possible for experts to understand a model and interactively repair it is critical for safe deployment because most data has such landmines. In the talk I’ll present cases studies where these transparent, high-performance GAMs are applied to problems in healthcare and recidivism prediction, and explain what we’re doing to make the models easier for experts to understand and edit.
KDD2017 Conference is published on