Scalable Score Computation for Learning Multinomial Bayesian Networks over Distributed Data

Rao, P.; Katib, A.; Barnard, K.; Kamhaoua, C.; Kwiat, K.; Njilla, L.

Citation Details

In this paper, we focus on the problem of learning a Bayesian network over distributed data stored in a commodity cluster. Specifically, we address the challenge of computing the scoring function over distributed data in a scalable manner, which is a fundamental task during learning. We propose a novel approach designed to achieve: (a) scalable score computation using the principle of gossiping; (b) lower resource consumption via a probabilistic approach for maintaining scores using the properties of a Markov chain; and (c) effective distribution of tasks during score computation (on large datasets) by synergistically combining well-known hashing techniques. Through theoretical analysis, we show that our approach is superior to a MapReduce-style computation in terms of communication and width. Further, it is superior to the batchstyle processing of MapReduce for recomputing scores when new data are available. more »

Award ID(s):: 1740858

PAR ID:: 10111668

Author(s) / Creator(s):: Rao, P.; Katib, A.; Barnard, K.; Kamhaoua, C.; Kwiat, K.; Njilla, L.

Date Published:: 2016-11-18

Journal Name:: The AAAI-17 Workshop on Distributed Machine Learning

Page Range / eLocation ID:: 498-504

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this