Machine truth serum: a surprisingly popular approach to improving ensemble methods

Luo, Tianyi; Liu, Yang

doi:10.1007/s10994-022-06183-y

Wisdom of the crowd (Surowiecki, 2005a) disclosed a striking fact that the majority voting answer from a crowd is usually more accurate than a few individual experts. The same story is observed in machine learning - ensemble methods (Dietterich, 2000) leverage this idea to exploit multiple machine learning algorithms in various settings e.g., supervised learning and semi-supervised learning to achieve better performance by aggregating the predictions of different algorithms than that obtained from any constituent algorithm alone. Nonetheless, the existing aggregating rule would fail when the majority answer of all the constituent algorithms is more likely to be wrong. In this paper, we extend the idea proposed in Bayesian Truth Serum (Prelec, 2004) that “a surprisingly more popular answer is more likely to be the true answer instead of the majority one” to supervised classification further improved by ensemble final predictions method and semi-supervised classification (e.g., MixMatch (Berthelot et al., 2019)) enhanced by ensemble data augmentations method. The challenge for us is to define or detect when an answer should be considered as being “surprising”. We present two machine learning aided methods which can reveal the truth when the minority instead of majority has the true answer on both settings of supervised and semi-supervised classification problems. We name our proposed method the Machine Truth Serum. Our experiments on a set of classification tasks (image, text, etc.) show that the classification performance can be further improved by applying Machine Truth Serum in the ensemble final predictions step (supervised) and in the ensemble data augmentations step (semi-supervised).

More Like this