NSF PAR Search | NSF Public Access Repository

Mutual information and the encoding of contingency tables

https://doi.org/10.1103/PhysRevE.110.064306

Jerdee, Maximilian; Kirkley, Alec; Newman, M_E_J (December 2024, Physical Review E)

Mutual information is commonly used as a measure of similarity between competing labelings of a given set of objects, for example to quantify performance in classification and community detection tasks. As argued recently, however, the mutual information as conventionally defined can return biased results because it neglects the information cost of the so-called contingency table, a crucial component of the similarity calculation. In principle the bias can be rectified by subtracting the appropriate information cost, leading to the modified measure known as the reduced mutual information, but in practice one can only ever compute an upper bound on this information cost, and the value of the reduced mutual information depends crucially on how good a bound is established. In this paper we describe an improved method for encoding contingency tables that gives a substantially better bound in typical use cases, and approaches the ideal value in the common case where the labelings are closely similar, as we demonstrate with extensive numerical results.

Full Text Available

Patterns of wins and losses in pairwise contests, such as occur in sports and games, consumer research and paired comparison studies, and human and animal social hierarchies, are commonly analyzed using probabilistic models that allow one to quantify the strength of competitors or predict the outcome of future contests. Here, we generalize this approach to incorporate two additional features: an element of randomness or luck that leads to upset wins, and a “depth of competition” variable that measures the complexity of a game or hierarchy. Fitting the resulting model, we estimate depth and luck in a range of games, sports, and social situations. In general, we find that social competition tends to be “deep,” meaning it has a pronounced hierarchy with many distinct levels, but also that there is often a nonzero chance of an upset victory. Competition in sports and games, by contrast, tends to be shallow, and in most cases, there is little evidence of upset wins.

Search for: All records