Network Analytics Enabled by Generating a Pool of Network Variants from Noisy Data

Mandviwalla, Aamir; Elsisy, Amr; Atique, Muhammad Saad; Kuzmin, Konstantin; Gaiteri, Chris; Szymanski, Boleslaw K

doi:10.3390/e25081118

Mapping network nodes and edges to communities and network functions is crucial to gaining a higher level of understanding of the network structure and functions. Such mappings are particularly challenging to design for covert social networks, which intentionally hide their structure and functions to protect important members from attacks or arrests. Here, we focus on correctly inferring the structures and functions of such networks, but our methodology can be broadly applied. Without the ground truth, knowledge about the allocation of nodes to communities and network functions, no single network based on the noisy data can represent all plausible communities and functions of the true underlying network. To address this limitation, we apply a generative model that randomly distorts the original network based on the noisy data, generating a pool of statistically equivalent networks. Each unique generated network is recorded, while each duplicate of the already recorded network just increases the repetition count of that network. We treat each such network as a variant of the ground truth with the probability of arising in the real world approximated by the ratio of the count of this network’s duplicates plus one to the total number of all generated networks. Communities of variants with frequently occurring duplicates contain persistent patterns shared by their structures. Using Shannon entropy, we can find a variant that minimizes the uncertainty for operations planned on the network. Repeatedly generating new pools of networks from the best network of the previous step for several steps lowers the entropy of the best new variant. If the entropy is too high, the network operators can identify nodes, the monitoring of which can achieve the most significant reduction in entropy. Finally, we also present a heuristic for constructing a new variant, which is not randomly generated but has the lowest expected cost of operating on the distorted mappings of network nodes to communities and functions caused by noisy data.

More Like this