We consider the problem of recovering a rank-one matrix when a perturbed subset of its entries is revealed. We propose a method based on least squares in the log-space and show its performance matches the lower bounds that we derive for this problem in the small-perturbation regime, which are related to the spectral gap of a graph representing the revealed entries. Unfortunately, we show that for larger disturbances, potentially exponentially growing errors are unavoidable for any consistent recovery method. We then propose a second algorithm relying on encoding the matrix factorization in the stationary distribution of a certain Markov chain. We show that, under the stronger assumption of known upper and lower bounds on the entries of the true matrix, this second method does not have exponential error growth for large disturbances. Both algorithms can be implemented in nearly linear time.
more »
« less
Adversarial Crowdsourcing Through Robust Rank-One Matrix Completion
We consider the problem of reconstructing a rank-one matrix from a revealed subset of its entries when some of the revealed entries are corrupted with perturbations that are unknown and can be arbitrarily large. It is not known which revealed entries are corrupted. We propose a new algorithm combining alternating minimization with extreme-value filtering and provide sufficient and necessary conditions to recover the original rank-one matrix. In particular, we show that our proposed algorithm is optimal when the set of revealed entries is given by an Erdos-Renyi random graph.
These results are then applied to the problem of classification from crowdsourced data under the assumption that while the majority of the workers are governed by the standard single-coin David-Skene model (i.e., they output the correct answer with a certain probability), some of the workers can deviate arbitrarily from this model. In particular, the adversarial'' workers could even make decisions designed to make the algorithm output an incorrect answer. Extensive experimental results show our algorithm for this problem, based on rank-one matrix completion with perturbations, outperforms all other state-of-the-art methods in such an adversarial scenario.
more »
« less
- PAR ID:
- 10276814
- Date Published:
- Journal Name:
- Advances in neural information processing systems
- ISSN:
- 1049-5258
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We consider the problem of recovering a rank one matrix when a perturbed subset of its entries is revealed. We propose a method based on least squares in the log-space and show its performance matches the lower bounds that we derive for this problem in the small perturbation regime, which are related to the spectral gap of a graph representing the revealed entries. Unfortunately, we show that for larger disturbances, potentially exponentially growing errors are unavoidable for any consistent recovery method. We then propose a second algorithm relying on encoding the matrix factorization in the stationary distribution of a certain Markov chain. We show that, under the stronger assumption of known upper and lower bounds on the entries of the true matrix, this second method does not have exponential error growth for large disturbances. Both algorithms can be implemented in nearly linear timemore » « less
-
null (Ed.)Matrix completion, the problem of completing missing entries in a data matrix with low-dimensional structure (such as rank), has seen many fruitful approaches and analyses. Tensor completion is the tensor analog that attempts to impute missing tensor entries from similar low-rank type assumptions. In this paper, we study the tensor completion problem when the sampling pattern is deterministic and possibly non-uniform. We first propose an efficient weighted Higher Order Singular Value Decomposition (HOSVD) algorithm for the recovery of the underlying low-rank tensor from noisy observations and then derive the error bounds under a properly weighted metric. Additionally, the efficiency and accuracy of our algorithm are both tested using synthetic and real datasets in numerical simulations.more » « less
-
We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted. Recent work gave the first polynomial time algorithms for this problem with near-optimal error guarantees for several natural structured distributions. Our main contribution is to develop faster algorithms for this problem whose running time nearly matches that of computing the empirical covariance. Given N = Ω(d^2/\eps^2) samples from a d-dimensional Gaussian distribution, an \eps-fraction of which may be arbitrarily corrupted, our algorithm runs in time O(d^{3.26}/ poly(\eps)) and approximates the unknown covariance matrix to optimal error up to a logarithmic factor. Previous robust algorithms with comparable error guarantees all have runtimes Ω(d^{2ω}) when \eps = Ω(1), where ω is the exponent of matrix multiplication. We also provide evidence that improving the running time of our algorithm may require new algorithmic techniques.more » « less
-
Finding an approximate second-order stationary point (SOSP) is a well-studied and fundamental problem in stochastic nonconvex optimization with many applications in machine learning. However, this problem is poorly understood in the presence of outliers, limiting the use of existing nonconvex algorithms in adversarial settings. In this paper, we study the problem of finding SOSPs in the strong contamination model, where a constant fraction of datapoints are arbitrarily corrupted. We introduce a general framework for efficiently finding an approximate SOSP with dimension-independent accuracy guarantees, using $\widetilde{O}({D^2}/{\epsilon})$ samples where $D$ is the ambient dimension and $\epsilon$ is the fraction of corrupted datapoints. As a concrete application of our framework, we apply it to the problem of low rank matrix sensing, developing efficient and provably robust algorithms that can tolerate corruptions in both the sensing matrices and the measurements. In addition, we establish a Statistical Query lower bound providing evidence that the quadratic dependence on $D$ in the sample complexity is necessary for computationally efficient algorithms.more » « less