skip to main content


Search for: All records

Creators/Authors contains: "Zhao, Yunpeng"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Many patients with mental disorders take dietary supplement, but their use patterns remain unclear. In this study, we developed a method to detect signals of associations between dietary supplement intake and mental disorder in Twitter data. We developed an annotated dataset and trained a convolutional neural network classifier that can identify language use pattern of dietary supplement intake with an F1-score of 0.899, a precision of 0.900, and a recall of 0.900. Using the classifier, we discovered that melatonin and vitamin D were the most commonly used supplements among Twitter users who self-diagnosed mental disorders. Sentiment analysis using Linguistic Inquiry and Word Count has shown that among Twitter users who posted mental disorder self-diagnosis, users who indicated supplement intake are more active and express more negative emotions and fewer positive emotions than those who have not mentioned supplement intake. 
    more » « less
  2. Social media embed rich but noisy signals of physical locations of their users. Accurately inferring a user's location can significantly improve the user's experience on the social media and enable the development of new location-based applications. This paper proposes a novel community-based approach for predicting the location of a user by using communities in the egonet of the user. We further propose both geographical proximity and structural proximity metrics to profile communities in the ego-net of a user, and then evaluate the effectiveness of each individual metric on real social media data. We discover that geographical proximity metrics, such as average/median haversine distance and community closeness, are strong indicators of a good community for geotagging. In addition, structural proximity metric conductance performs comparable to geographical proximity metrics while triangle participation ratio and internal density are weak location indicators. To the best of our knowledge, this is the first effort to infer the physical location of a user from the perspective of latent communities in the user's ego-net. 
    more » « less
  3. Summary

    When searching for gene pathways leading to specific disease outcomes, additional information on gene characteristics is often available that may facilitate to differentiate genes related to the disease from irrelevant background when connections involving both types of genes are observed and their relationships to the disease are unknown. We propose method to single out irrelevant background genes with the help of auxiliary information through a logistic regression, and cluster relevant genes into cohesive groups using the adjacency matrix. Expectation–maximization algorithm is modified to maximize a joint pseudo-likelihood assuming latent indicators for relevance to the disease and latent group memberships as well as Poisson or multinomial distributed link numbers within and between groups. A robust version allowing arbitrary linkage patterns within the background is further derived. Asymptotic consistency of label assignments under the stochastic blockmodel is proven. Superior performance and robustness in finite samples are observed in simulation studies. The proposed robust method identifies previously missed gene sets underlying autism related neurological diseases using diverse data sources including de novo mutations, gene expressions, and protein–protein interactions.

     
    more » « less