 NSFPAR ID:
 10371730
 Date Published:
 Journal Name:
 ICASSP 2022  2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

null (Ed.)Summary This paper is concerned with empirical likelihood inference on the population mean when the dimension $p$ and the sample size $n$ satisfy $p/n\rightarrow c\in [1,\infty)$. As shown in Tsao (2004), the empirical likelihood method fails with high probability when $p/n>1/2$ because the convex hull of the $n$ observations in $\mathbb{R}^p$ becomes too small to cover the true mean value. Moreover, when $p> n$, the sample covariance matrix becomes singular, and this results in the breakdown of the first sandwich approximation for the log empirical likelihood ratio. To deal with these two challenges, we propose a new strategy of adding two artificial data points to the observed data. We establish the asymptotic normality of the proposed empirical likelihood ratio test. The proposed test statistic does not involve the inverse of the sample covariance matrix. Furthermore, its form is explicit, so the test can easily be carried out with low computational cost. Our numerical comparison shows that the proposed test outperforms some existing tests for highdimensional mean vectors in terms of power. We also illustrate the proposed procedure with an empirical analysis of stock data.more » « less

Abstract. Endmember mixing analysis (EMMA) is a method of interpreting stream water chemistry variations and is widely used for chemical hydrograph separation. It is based on the assumption that stream water is a conservative mixture of varying contributions from wellcharacterized source solutions (endmembers). These endmembers are typically identified by collecting samples of potential endmember source waters from within the watershed and comparing these to the observations. Here we introduce a complementary datadriven method (convex hull endmember mixing analysis – CHEMMA) to infer the endmember compositions and their associated uncertainties from the stream water observations alone. The method involves two steps. The first uses convex hull nonnegative matrix factorization (CHNMF) to infer possible endmember compositions by searching for a simplex that optimally encloses the stream water observations. The second step uses constrained Kmeans clustering (COPKMEANS) to classify the results from repeated applications of CHNMF and analyzes the uncertainty associated with the algorithm. In an example application utilizing the 1986 to 1988 Panola Mountain Research Watershed dataset, CHEMMA is able to robustly reproduce the three fieldmeasured endmembers found in previous research using only the stream water chemical observations. CHEMMA also suggests that a fourth and a fifth endmember can be (less robustly) identified. We examine uncertainties in endmember identification arising from nonuniqueness, which is related to the data structure, of the CHNMF solutions, and from the number of samples using both real and synthetic data. The results suggest that the mixing space can be identified robustly when the dataset includes samples that contain extremely small contributions of one endmember, i.e., samples containing extremely large contributions from one endmember are not necessary but do reduce uncertainty about the endmember composition.more » « less

For a set P of n points in the unit ball b ⊆ R d , consider the problem of finding a small subset T ⊆ P such that its convexhull εapproximates the convexhull of the original set. Specifically, the Hausdorff distance between the convex hull of T and the convex hull of P should be at most ε. We present an efficient algorithm to compute such an ε ′ approximation of size kalg, where ε ′ is a function of ε, and kalg is a function of the minimum size kopt of such an εapproximation. Surprisingly, there is no dependence on the dimension d in either of the bounds. Furthermore, every point of P can be ε approximated by a convexcombination of points of T that is O(1/ε2 )sparse. Our result can be viewed as a method for sparse, convex autoencoding: approximately representing the data in a compact way using sparse combinations of a small subset T of the original data. The new algorithm can be kernelized, and it preserves sparsity in the original input.more » « less

Summary The upper bounds on the coverage probabilities of the confidence regions based on blockwise empirical likelihood and nonstandard expansive empirical likelihood methods for time series data are investigated via studying the probability of violating the convex hull constraint. The large sample bounds are derived on the basis of the pivotal limit of the blockwise empirical loglikelihood ratio obtained under fixed b asymptotics, which has recently been shown to provide a more accurate approximation to the finite sample distribution than the conventional χ2approximation. Our theoretical and numerical findings suggest that both the finite sample and the large sample upper bounds for coverage probabilities are strictly less than 1 and the blockwise empirical likelihood confidence region can exhibit serious undercoverage when the dimension of moment conditions is moderate or large, the time series dependence is positively strong or the block size is large relative to the sample size. A similar finite sample coverage problem occurs for nonstandard expansive empirical likelihood. To alleviate the coverage bound problem, we propose to penalize both empirical likelihood methods by relaxing the convex hull constraint. Numerical simulations and data illustrations demonstrate the effectiveness of our proposed remedies in terms of delivering confidence sets with more accurate coverage. Some technical details and additional simulation results are included in online supplemental material.

We develop a variant of the multinomial logit model with impatient customers and study assortment optimization and pricing problems under this choice model. In our choice model, a customer incrementally views the assortment of available products in multiple stages. The patience level of a customer determines the maximum number of stages in which the customer is willing to view the assortments of products. In each stage, if the product with the largest utility provides larger utility than a minimum acceptable utility, which we refer to as the utility of the outside option, then the customer purchases that product right away. Otherwise, the customer views the assortment of products in the next stage as long as the customer’s patience level allows the customer to do so. Under the assumption that the utilities have the Gumbel distribution and are independent, we give a closedform expression for the choice probabilities. For the assortmentoptimization problem, we develop a polynomialtime algorithm to find the revenuemaximizing sequence of assortments to offer. For the pricing problem, we show that, if the sequence of offered assortments is fixed, then we can solve a convex program to find the revenuemaximizing prices, with which the decision variables are the probabilities that a customer reaches different stages. We build on this result to give a 0.878approximation algorithm when both the sequence of assortments and the prices are decision variables. We consider the assortmentoptimization problem when each product occupies some space and there is a constraint on the total space consumption of the offered products. We give a fully polynomialtime approximation scheme for this constrained problem. We use a data set from Expedia to demonstrate that incorporating patience levels, as in our model, can improve purchase predictions. We also check the practical performance of our approximation schemes in terms of both the quality of solutions and the computation times.more » « less