We consider the problem of estimating the location of a single change point in a network generated by a dynamic stochastic block model mechanism. This model produces community structure in the network that exhibits change at a single time epoch. We propose two methods of estimating the change point, together with the model parameters, before and after its occurrence. The first employs a leastsquares criterion function and takes into consideration the full structure of the stochastic block model and is evaluated at each point in time. Hence, as an intermediate step, it requires estimating the community structure based on a clustering algorithm at every time point. The second method comprises the following two steps: in the first one, a leastsquares function is used and evaluated at each time point, but ignoring the community structure and only considering a random graph generating mechanism exhibiting a change point. Once the change point is identified, in the second step, all network data before and after it are used together with a clustering algorithm to obtain the corresponding community structures and subsequently estimate the generating stochastic block model parameters. The first method, since it requires knowledge of the community structure and hence clustering atmore »
Crowdsourcing via Annotator Cooccurrence Imputation and Provable Symmetric Nonnegative Matrix Factorization
Unsupervised learning of the DawidSkene (D&S) model from noisy, incomplete and crowdsourced annotations has been a longstanding challenge, and is a critical step towards reliably labeling massive data. A recent work takes a coupled nonnegative matrix factorization (CNMF) perspective, and shows appealing features: It ensures the identifiability of the D&S model and enjoys low sample complexity, as only the estimates of the cooccurrences of annotator labels are involved. However, the identifiability holds only when certain somewhat restrictive conditions are met in the context of crowdsourcing. Optimizing the CNMF criterion is also costly—and convergence assurances are elusive. This work recasts the pairwise cooccurrence based D&S model learning problem as a symmetric NMF (SymNMF) problem—which offers enhanced identifiability relative to CNMF. In practice, the SymNMF model is often (largely) incomplete, due to the lack of colabeled items by some annotators. Two lightweight algorithms are proposed for cooccurrence imputation. Then, a lowcomplexity shifted rectified linear unit (ReLU)empowered SymNMF algorithm is proposed to identify the D&S model. Various performance characterizations (e.g., missing cooccurrence recoverability, stability, and convergence) and evaluations are also presented.
 Editors:
 Meila, Marina; Zhang, Tong
 Award ID(s):
 2007836
 Publication Date:
 NSFPAR ID:
 10292988
 Journal Name:
 Proceedings of the 38th International Conference on Machine Learning
 Volume:
 139
 Page Range or eLocationID:
 45444554
 Sponsoring Org:
 National Science Foundation
More Like this


We consider the periodic review dynamic pricing and inventory control problem with fixed ordering cost. Demand is random and price dependent, and unsatisfied demand is backlogged. With complete demand information, the celebrated [Formula: see text] policy is proved to be optimal, where s and S are the reorder point and orderupto level for ordering strategy, and [Formula: see text], a function of onhand inventory level, characterizes the pricing strategy. In this paper, we consider incomplete demand information and develop online learning algorithms whose average profit approaches that of the optimal [Formula: see text] with a tight [Formula: see text] regret rate. A number of salient features differentiate our work from the existing online learning researches in the operations management (OM) literature. First, computing the optimal [Formula: see text] policy requires solving a dynamic programming (DP) over multiple periods involving unknown quantities, which is different from the majority of learning problems in OM that only require solving singleperiod optimization questions. It is hence challenging to establish stability results through DP recursions, which we accomplish by proving uniform convergence of the profittogo function. The necessity of analyzing actiondependent state transition over multiple periods resembles the reinforcement learning question, considerably more difficult thanmore »

How to cluster event sequences generated via different point processes is an interesting and important problem in statistical machine learning. To solve this problem, we propose and discuss an effective modelbased clustering method based on a novel Dirichlet mixture model of a special but significant type of point processes — Hawkes process. The proposed model generates the event sequences with different clusters from the Hawkes processes with different parameters, and uses a Dirichlet distribution as the prior distribution of the clusters. We prove the identifiability of our mixture model and propose an effective variational Bayesian inference algorithm to learn our model. An adaptive inner iteration allocation strategy is designed to accelerate the convergence of our algorithm. Moreover, we investigate the sample complexity and the computational complexity of our learning algorithm in depth. Experiments on both synthetic and realworld data show that the clustering method based on our model can learn structural triggering patterns hidden in asynchronous event sequences robustly and achieve superior performance on clustering purity and consistency compared to existing methods.

The data deluge comes with high demands for data labeling. Crowdsourcing (or, more generally, ensemble learning) techniques aim to produce accurate labels via integrating noisy, nonexpert labeling from annotators. The classic DawidSkene estimator and its accompanying expectation maximization (EM) algorithm have been widely used, but the theoretical properties are not fully understood. Tensor methods were proposed to guarantee identification of the DawidSkene model, but the sample complexity is a hurdle for applying such approachessince the tensor methods hinge on the availability of thirdorder statistics that are hard to reliably estimate given limited data. In this paper, we propose a framework using pairwise cooccurrences of the annotator responses, which naturally admits lower sample complexity. We show that the approach can identify the DawidSkene model under realistic conditions. We propose an algebraic algorithm reminiscent of convex geometrybased structured matrix factorization to solve the model identification problem efficiently, and an identifiabilityenhanced algorithm for handling more challenging and critical scenarios. Experiments show that the proposed algorithms outperform the stateofart algorithms under a variety of scenarios.

Obeid, Iyad Selesnick (Ed.)The Temple University Hospital EEG Corpus (TUEG) [1] is the largest publicly available EEG corpus of its type and currently has over 5,000 subscribers (we currently average 35 new subscribers a week). Several valuable subsets of this corpus have been developed including the Temple University Hospital EEG Seizure Corpus (TUSZ) [2] and the Temple University Hospital EEG Artifact Corpus (TUAR) [3]. TUSZ contains manually annotated seizure events and has been widely used to develop seizure detection and prediction technology [4]. TUAR contains manually annotated artifacts and has been used to improve machine learning performance on seizure detection tasks [5]. In this poster, we will discuss recent improvements made to both corpora that are creating opportunities to improve machine learning performance. Two major concerns that were raised when v1.5.2 of TUSZ was released for the Neureka 2020 Epilepsy Challenge were: (1) the subjects contained in the training, development (validation) and blind evaluation sets were not mutually exclusive, and (2) high frequency seizures were not accurately annotated in all files. Regarding (1), there were 50 subjects in dev, 50 subjects in eval, and 592 subjects in train. There was one subject common to dev and eval, five subjects common to dev andmore »