skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A network model that combines latent factors and sparse graphs
Abstract We propose a combined model, which integrates the latent factor model and a sparse graphical model, for network data. It is noticed that neither a latent factor model nor a sparse graphical model alone may be sufficient to capture the structure of the data. The proposed model has a latent (i.e., factor analysis) model to represent the main trends (a.k.a., factors), and a sparse graphical component that captures the remaining ad‐hoc dependence. Model selection and parameter estimation are carried out simultaneously via a penalized likelihood approach. The convexity of the objective function allows us to develop an efficient algorithm, while the penalty terms push towards low‐dimensional latent components and a sparse graphical structure. The effectiveness of our model is demonstrated via simulation studies, and the model is also applied to four real datasets: Zachary's Karate club data, Kreb's U.S. political book dataset (http://www.orgnet.com), U.S. political blog dataset , and citation network of statisticians; showing meaningful performances in practical situations.  more » « less
Award ID(s):
2015363
PAR ID:
10452206
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Statistical Analysis and Data Mining: The ASA Data Science Journal
Volume:
14
Issue:
2
ISSN:
1932-1864
Page Range / eLocation ID:
p. 97-115
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), funded by the US National Science Foundation, National Institutes of Health, and Department of Energy, has served structural biologists and Protein Data Bank (PDB) data consumers worldwide since 1999. RCSB PDB, a founding member of the Worldwide Protein Data Bank (wwPDB) partnership, is the US data center for the global PDB archive housing biomolecular structure data. RCSB PDB is also responsible for the security of PDB data, as the wwPDB‐designated Archive Keeper. Annually, RCSB PDB serves tens of thousands of three‐dimensional (3D) macromolecular structure data depositors (using macromolecular crystallography, nuclear magnetic resonance spectroscopy, electron microscopy, and micro‐electron diffraction) from all inhabited continents. RCSB PDB makes PDB data available from its research‐focusedRCSB.orgweb portal at no charge and without usage restrictions to millions of PDB data consumers working in every nation and territory worldwide. In addition, RCSB PDB operates an outreach and educationPDB101.RCSB.orgweb portal that was used by more than 800,000 educators, students, and members of the public during calendar year 2020. This invited Tools Issue contribution describes (i) how the archive is growing and evolving as new experimental methods generate ever larger and more complex biomolecular structures; (ii) the importance of data standards and data remediation in effective management of the archive and facile integration with more than 50 external data resources; and (iii) new tools and features for 3D structure analysis and visualization made available during the past yearviatheRCSB.orgweb portal. 
    more » « less
  2. Abstract BackgroundThe pan-genome of a species is the union of the genes and non-coding sequences present in all individuals (cultivar, accessions, or strains) within that species. ResultsHere we introduce PGV, a reference-agnostic representation of the pan-genome of a species based on the notion of consensus ordering. Our experimental results demonstrate that PGV enables an intuitive, effective and interactive visualization of a pan-genome by providing a genome browser that can elucidate complex structural genomic variations. ConclusionsThe PGV software can be installed via conda or downloaded fromhttps://github.com/ucrbioinfo/PGV. The companion PGV browser athttp://pgv.cs.ucr.educan be tested using example bed tracks available from the GitHub page. 
    more » « less
  3. High-throughput gene expression profiling measures individual gene expression across conditions. However, genes are regulated in complex networks, not as individual entities, limiting the interpretability of gene expression data. Machine learning models that incorporate prior biological knowledge are a powerful tool to extract meaningful biology from gene expression data. Pathway-level information extractor (PLIER) is an unsupervised machine learning method that defines biological pathways by leveraging the vast amount of published transcriptomic data. PLIER converts gene expression data into known pathway gene sets, termed latent variables (LVs), to substantially reduce data dimensionality and improve interpretability. In the current study, we trained the first mouse PLIER model on 190,111 mouse brain RNA-sequencing samples, the greatest amount of training data ever used by PLIER. We then validated the mousiPLIER approach in a study of microglia and astrocyte gene expression across mouse brain aging. mousiPLIER identified biological pathways that are significantly associated with aging, including one latent variable (LV41) corresponding to striatal signal. To gain further insight into the genes contained in LV41, we performedk-means clustering on the training data to identify studies that respond strongly to LV41. We found that the variable was relevant to striatum and aging across the scientific literature. Finally, we built a Web server (http://mousiplier.greenelab.com/) for users to easily explore the learned latent variables. Taken together, this study defines mousiPLIER as a method to uncover meaningful biological processes in mouse brain transcriptomic studies. 
    more » « less
  4. Abstract Biochemistry is about structure and function, but it is also about data and this is where computers come in. From my time as a graduate student and post doc, whenever I encountered data I thought, “I can work this up by hand, but I think a computer could do a better job.” Since that time, I have been working at the interface of biochemistry and computers, by attracting talented students and collaborating with colleagues with complementary skills. This has resulted in several exciting projects: a simulation of 2D electrophoresis and tandem mass spectrometry, the human visualization project, and two different programs that enable biochemists to search protein structures for enzyme active sites: ProMOL (promol.org) and Moltimate (moltimate.appspot.com). The human side of software development for education involved finding the right students and colleagues, communicating effectively across disciplines, building and managing effective teams and the importance of serendipity throughout the process. 
    more » « less
  5. This article presents the latest developments to ClaimBuster’s claim-spotting model, which tackles the critical task of identifying check-worthy claims from large streams of information. We introduce the first adversarially regularized, transformer-based claim-spotting model, which achieves state-of-the-art results on several benchmark datasets. In addition to analyzing model performance metrics, we also quantitatively and qualitatively analyze the impact of ClaimBuster’s real-world deployment. Moreover, to help facilitate reproducibility and community engagement, we publicly release our codebase, dataset, data curation platform, API, Google Colab notebooks, and various ClaimBuster-based demo systems, atclaimbuster.org. 
    more » « less