Suppose L simultaneous independent stochastic systems generate observations, where the observations from each system depend on the underlying parameter of that system. The observations are unlabeled (anonymized), in the sense that an analyst does not know which observation came from which stochastic system. How can the analyst estimate the underlying parameters of the L systems? Since the anonymized observations at each time are an unordered set of L measurements (rather than a vector), classical stochastic gradient algorithms cannot be directly used. By using symmetric polynomials, we formulate a symmetric measurement equation that maps the observation set to a unique vector. We then construct an adaptive filtering algorithm that yields a statistically consistent estimate of the underlying parameters.
more »
« less
Adaptive Filtering Algorithms for Set-Valued Observations—Symmetric Measurement Approach to Unlabeled and Anonymized Data
Suppose L simultaneous independent stochastic sys- tems generate observations, where the observations from each system depend on the underlying model parameter of that system. The observations are unlabeled (anonymized), in the sense that an analyst does not know which observation came from which stochastic system. How can the analyst estimate the underlying model parameters of the L systems? Since the anonymized observations at each time are an unordered set of L measurements (rather than a vector), classical stochastic gradient algorithms cannot be directly used. By using symmetric polynomials, we formulate a symmetric measurement equation that maps the observation set to a unique vector. By exploiting the fact that the algebraic ring of multi-variable polynomials is a unique factorization domain over the ring of one-variable polynomials, we construct an adaptive filtering algorithm that yields a statistically consistent estimate of the underlying param- eters. We analyze the asymptotic covariance of these estimates to quantify the effect of anonymization. Finally, we characterize the anonymity of the observations in terms of the error probability of the maximum aposteriori Bayesian estimator. Specifically using Blackwell dominance of mean preserving spreads, we construct a partial ordering of the noise densities which relates the anonymity of the observations to the asymptotic covariance of the adaptive filtering algorithm.
more »
« less
- Award ID(s):
- 2312198
- PAR ID:
- 10518483
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- IEEE Transactions on Signal Processing
- Volume:
- 71
- ISSN:
- 1053-587X
- Page Range / eLocation ID:
- 2760 to 2775
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract As discovered by W. Thurston, the action of a complex one-variable polynomial on its Julia set can be modeled by a geodesic lamination in the disk, provided that the Julia set is connected. It also turned out that the parameter space of such dynamical laminations of degree two gives a model for the bifurcation locus in the space of quadratic polynomials. This model is itself a geodesic lamination, the so calledquadratic minor laminationof Thurston. In the same spirit, we consider the space of allcubic symmetric polynomials$$f_\unicode{x3bb} (z)=z^3+\unicode{x3bb} ^2 z$$in three articles. In the first one, we construct thecubic symmetric comajor laminationtogether with the corresponding quotient space of the unit circle. As is verified in the third paper, this yields a monotone model of thecubic symmetric connectedness locus, that is, the space of all cubic symmetric polynomials with connected Julia sets. In the present paper, the second in the series, we develop an algorithm for generating the cubic symmetric comajor lamination analogous to the Lavaurs algorithm for constructing the quadratic minor lamination.more » « less
-
We explore the problem of sharing data that pertains to individuals with anonymity guarantees, where each user requires a desired level of privacy. We propose the first shared-memory as well as distributed memory parallel algorithms for the k-anonimity problem that achieves this goal, and produces high quality anonymized datasets. The new algorithm is based on an optimization procedure that iteratively computes weights on the edges of a dissimilarity matrix, and at each iteration computes a minimum weighted b-edgecover in the graph. We describe how a 2-approximation algorithm for computing the b-edgecover can be used to solve the adaptive anonymity problem in parallel. We are able to solve adaptive anonymity problems with hundreds of thousands of instances and hundreds of features on a supercomputer in under five minutes. Our algorithm scales up to 8000 cores on a distributed memory supercomputer, while also providing good speedups on shared memory multiprocessors. On smaller problems where an a Belief Propagation algorithm is feasible, our algorithm is two orders of magnitude faster.more » « less
-
We develop algorithms to automate discovery of stochastic dynamical system models from noisy, vector-valued time series. By discovery, we mean learning both a nonlinear drift vector field and a diagonal diffusion matrix for an Itô stochastic differential equation in Rd . We parameterize the vector field using tensor products of Hermite polynomials, enabling the model to capture highly nonlinear and/or coupled dynamics. We solve the resulting estimation problem using expectation maximization (EM). This involves two steps. We augment the data via diffusion bridge sampling, with the goal of producing time series observed at a higher frequency than the original data. With this augmented data, the resulting expected log likelihood maximization problem reduces to a least squares problem. We provide an open-source implementation of this algorithm. Through experiments on systems with dimensions one through eight, we show that this EM approach enables accurate estimation for multiple time series with possibly irregular observation times. We study how the EM method performs as a function of the amount of data augmentation, as well as the volume and noisiness of the data.more » « less
-
Optimal designs minimize the number of experimental runs (samples) needed to accurately estimate model parameters, resulting in algorithms that, for instance, efficiently minimize parameter estimate variance. Governed by knowledge of past observations, adaptive approaches adjust sampling constraints online as model parameter estimates are refined, continually maximizing expected information gained or variance reduced. We apply adaptive Bayesian inference to estimate transition rates of Markov chains, a common class of models for stochastic processes in nature. Unlike most previous studies, our sequential Bayesian optimal design is updated with each observation and can be simply extended beyond two-state models to birth–death processes and multistate models. By iteratively finding the best time to obtain each sample, our adaptive algorithm maximally reduces variance, resulting in lower overall error in ground truth parameter estimates across a wide range of Markov chain parameterizations and conformations.more » « less
An official website of the United States government

