Implicit Mixtures of Experts for Rigorous Interpretable Machine Learning

Elazar, Nathan

Implicit Mixtures of Experts for Rigorous Interpretable Machine Learning

Date

2025

Authors

Elazar, Nathan

Abstract

This work aims to provide a rigorous framework for practical Interpretable Machine Learning (IML). While IML research has exploded in popularity in recent years, the vast majority of works in the field propose no definition of what interpretability actually means, and even fewer propose practical objective metrics which can quantify the interpretability of models. In this work I define interpretability to be inversely proportional to complexity. While it is difficult to compare the complexities of two arbitrary functions, there are certain model classes which do permit easy measurement of complexity, most notably: the complexity of a decision tree can be measured by the number of nodes it contains. Therefore, as long we restrict ourselves to only considering models from a complexity-measurable class, interpretability is well defined and objectively quantifiable. Unfortunately, for many tasks decision trees perform sub-optimally even when they are allowed to be arbitrarily large. For these tasks, we would like to be able to use more powerful models, such as neural networks. The major technical contribution of this work is the development of the Implicit Mixtures of Experts via Neural Networks (IMoENN) methodology. IMoENN uses neural networks to implicitly generate Mixture of Experts (MoE) models. While explicitly training a MoE directly is only feasible for small mixtures, IMoENN can produce implicit mixtures with arbitrarily many experts. In addition, IMoENN's prediction performance generalizes as well as the neural network architectures that it employs, whereas explicit MoE are prone to overfit to the training dataset. Even when using extremely simple experts, such as linear functions, IMoENN can match deep neural network classification accuracy on image benchmark datasets MNIST10, Fashion-MNIST10 and CIFAR10, provided that there are sufficiently many experts in the mixture. If the experts that make up a mixture are simple models, such as linear functions or decision trees of fixed size, then the complexity of a MoE is readily given by the number of experts in the mixture. In addition, when using IMoENN, the accuracy of these mixtures increases up to that of a black-box classifier as the number of experts is increased. With these facts in mind, I argue that MoE are the ideal model class on which to base complexity-grounded IML. In this work I will demonstrate how IMoENN can be used to create mixtures of linear experts which match black-box accuracy on MNIST10, Fashion-MNIST10, and CIFAR10. These mixtures provide local interpretability by showing the expert responsible for classifying a particular data point. The quality of that local interpretation can be measured by the number of experts in the mixture. IMoENN can also be used for global interpretability, so long as the mixture is small enough that every expert can be inspected. When using such small mixtures IMoENN is unable to match black-box accuracy, so I propose two variants of IMoENN which exploit properties of natural images to improve performance. These variants can match black-box accuracy on MNIST10, but are still worse on Fashion-MNIST10 and CIFAR10. Finally, I show how IMoENN can be used to both learn semantically meaningful features from data and assign feature importances to those features at the same time. This method can provide a degree of global interpretability in the form of global feature importances, and can match black-box accuracy on MNIST10 and Fashion-MNIST10. While I have focused on image classification benchmarks in this work, IMoENN is a very general methodology and can potentially be applied to any task. Therefore, IMoENN is a promising approach to making complexity-grounded IML practically viable on complex datasets, thereby providing a much needed objective metric of interpretability.