Measuring the Relative Similarity and Difficulty Between AI Benchmark Problems

Pereyda, Christopher; Holder, Lawrence

Citation Details

There has been an explosion of challenge problems, algorithmic tests and datasets for evaluating AI systems. Yet no methodology exists to objectively measure either the collective difficulty of these problems or their similarity. This is an obstacle to creating more general AI systems. We pro- pose a theory for measuring the similarity between pair-wise problems. We evaluate this theory by utilizing a methodology based on a deep neural network to objectively measure these properties between test problems using foundational datasets. An implementation of these methods is then used to measure the similarity between well known datasets. Results show that the proposed measure successfully identifies the difficulty and similarity among problems. This can be used to ensure diversity in test suites used to evaluate AI systems. more »

Award ID(s):: 1757632

PAR ID:: 10332037

Author(s) / Creator(s):: Pereyda, Christopher; Holder, Lawrence

Date Published:: 2020-02-17

Journal Name:: Workshop on Evaluating Evaluation of AI Systems, AAAI Conference on Artificial Intelligence

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this