Classification and topic modeling are popular techniques in machine learning that extract information from large-scale datasets. By incorporating a priori information such as labels or important features, methods have been developed to perform classification and topic modeling tasks; however, most methods that can perform both do not allow for guidance of the topics or features. In this paper, we propose a novel method, namely Guided Semi-Supervised Non-negative Matrix Factorization (GSSNMF), that performs both classification and topic modeling by incorporating supervision from both pre-assigned document class labels and user-designed seed words. We test the performance of this method on legal documents provided by the California Innocence Project and the 20 Newsgroups dataset. Our results show that the proposed method improves both classification accuracy and topic coherence in comparison to past methods such as Semi-Supervised Non-negative Matrix Factorization (SSNMF), Guided Non-negative Matrix Factorization (Guided NMF), and Topic Supervised NMF.
more »
« less
A Fast Scale-Invariant Algorithm for Non-negative Least Squares with Non-negative Data
- Award ID(s):
- 2007757
- PAR ID:
- 10495311
- Publisher / Repository:
- 36th Conference on Neural Information Processing Systems (NeurIPS 2022)
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Let ℳ 0 n {\mathcal{M}_{0}^{n}} be the class of closed, simply connected, non-negatively curved Riemannian n -manifolds admitting an isometric, effective, isotropy-maximal torus action. We prove that if M ∈ ℳ 0 n {M\in\mathcal{M}_{0}^{n}} , then M is equivariantly diffeomorphic to the free, linear quotient by a torus of a product of spheres of dimensions greater than or equal to 3. As a special case, we then prove the Maximal Symmetry Rank Conjecture for all M ∈ ℳ 0 n {M\in\mathcal{M}_{0}^{n}} . Finally, we showthe Maximal Symmetry Rank Conjecture for simply connected, non-negatively curved manifolds holds for dimensions less than or equal to 9 without additional assumptions on the torus action.more » « less
-
Abstract Studies on the conditional relationships between PM2.5 concentrations among different regions are of great interest for the joint prevention and control of air pollution. Because of seasonal changes in atmospheric conditions, spatial patterns of PM2.5 may differ throughout the year. Additionally, concentration data are both non-negative and non-Gaussian. These data features pose significant challenges to existing methods. This study proposes a heterogeneous graphical model for non-negative and non-Gaussian data via the score matching loss. The proposed method simultaneously clusters multiple datasets and estimates a graph for variables with complex properties in each cluster. Furthermore, our model involves a network that indicate similarity among datasets, and this network can have additional applications. In simulation studies, the proposed method outperforms competing alternatives in both clustering and edge identification. We also analyse the PM2.5 concentrations' spatial correlations in Taiwan's regions using data obtained in year 2019 from 67 air-quality monitoring stations. The 12 months are clustered into four groups: January–March, April, May–September and October–December, and the corresponding graphs have 153, 57, 86 and 167 edges respectively. The results show obvious seasonality, which is consistent with the meteorological literature. Geographically, the PM2.5 concentrations of north and south Taiwan regions correlate more respectively. These results can provide valuable information for developing joint air-quality control strategies.more » « less
-
For each $$t\in \mathbb{R}$$ , we define the entire function $$\begin{eqnarray}H_{t}(z):=\int _{0}^{\infty }e^{tu^{2}}\unicode[STIX]{x1D6F7}(u)\cos (zu)\,du,\end{eqnarray}$$ where $$\unicode[STIX]{x1D6F7}$$ is the super-exponentially decaying function $$\begin{eqnarray}\unicode[STIX]{x1D6F7}(u):=\mathop{\sum }_{n=1}^{\infty }(2\unicode[STIX]{x1D70B}^{2}n^{4}e^{9u}-3\unicode[STIX]{x1D70B}n^{2}e^{5u})\exp (-\unicode[STIX]{x1D70B}n^{2}e^{4u}).\end{eqnarray}$$ Newman showed that there exists a finite constant $$\unicode[STIX]{x1D6EC}$$ (the de Bruijn–Newman constant ) such that the zeros of $$H_{t}$$ are all real precisely when $$t\geqslant \unicode[STIX]{x1D6EC}$$ . The Riemann hypothesis is equivalent to the assertion $$\unicode[STIX]{x1D6EC}\leqslant 0$$ , and Newman conjectured the complementary bound $$\unicode[STIX]{x1D6EC}\geqslant 0$$ . In this paper, we establish Newman’s conjecture. The argument proceeds by assuming for contradiction that $$\unicode[STIX]{x1D6EC}<0$$ and then analyzing the dynamics of zeros of $$H_{t}$$ (building on the work of Csordas, Smith and Varga) to obtain increasingly strong control on the zeros of $$H_{t}$$ in the range $$\unicode[STIX]{x1D6EC}more » « less
An official website of the United States government

