- Award ID(s):
- 1927554
- Publication Date:
- NSF-PAR ID:
- 10391940
- Journal Name:
- Empirical Methods in Natural Language Processing
- Sponsoring Org:
- National Science Foundation
More Like this
-
Introduction Social media has created opportunities for children to gather social support online (Blackwell et al., 2016; Gonzales, 2017; Jackson, Bailey, & Foucault Welles, 2018; Khasawneh, Rogers, Bertrand, Madathil, & Gramopadhye, 2019; Ponathil, Agnisarman, Khasawneh, Narasimha, & Madathil, 2017). However, social media also has the potential to expose children and adolescents to undesirable behaviors. Research showed that social media can be used to harass, discriminate (Fritz & Gonzales, 2018), dox (Wood, Rose, & Thompson, 2018), and socially disenfranchise children (Page, Wisniewski, Knijnenburg, & Namara, 2018). Other research proposes that social media use might be correlated to the significant increase in suicide rates and depressive symptoms among children and adolescents in the past ten years (Mitchell, Wells, Priebe, & Ybarra, 2014). Evidence based research suggests that suicidal and unwanted behaviors can be promulgated through social contagion effects, which model, normalize, and reinforce self-harming behavior (Hilton, 2017). These harmful behaviors and social contagion effects may occur more frequently through repetitive exposure and modelling via social media, especially when such content goes “viral” (Hilton, 2017). One example of viral self-harming behavior that has generated significant media attention is the Blue Whale Challenge (BWC). The hearsay about this challenge is that individuals at allmore »
-
Interest in physical therapy and individual exercises such as yoga/dance has increased alongside the well-being trend, and people globally enjoy such exercises at home/office via video streaming platforms. However, such exercises are hard to follow without expert guidance. Even if experts can help, it is almost impossible to give personalized feedback to every trainee remotely. Thus, automated pose correction systems are required more than ever, and we introduce a new captioning dataset named FixMyPose to address this need. We collect natural language descriptions of correcting a “current” pose to look like a “target” pose. To support a multilingual setup, we collect descriptions in both English and Hindi. The collected descriptions have interesting linguistic properties such as egocentric relations to the environment objects, analogous references, etc., requiring an understanding of spatial relations and commonsense knowledge about postures. Further, to avoid ML biases, we maintain a balance across characters with diverse demographics, who perform a variety of movements in several interior environments (e.g., homes, offices). From our FixMyPose dataset, we introduce two tasks: the pose-correctional-captioning task and its reverse, the target-pose-retrieval task. During the correctional-captioning task, models must generate the descriptions of how to move from the current to the target posemore »
-
Abstract There are currently >1.3 million human –omics samples that are publicly available. This valuable resource remains acutely underused because discovering particular samples from this ever-growing data collection remains a significant challenge. The major impediment is that sample attributes are routinely described using varied terminologies written in unstructured natural language. We propose a natural-language-processing-based machine learning approach (NLP-ML) to infer tissue and cell-type annotations for genomics samples based only on their free-text metadata. NLP-ML works by creating numerical representations of sample descriptions and using these representations as features in a supervised learning classifier that predicts tissue/cell-type terms. Our approach significantly outperforms an advanced graph-based reasoning annotation method (MetaSRA) and a baseline exact string matching method (TAGGER). Model similarities between related tissues demonstrate that NLP-ML models capture biologically-meaningful signals in text. Additionally, these models correctly classify tissue-associated biological processes and diseases based on their text descriptions alone. NLP-ML models are nearly as accurate as models based on gene-expression profiles in predicting sample tissue annotations but have the distinct capability to classify samples irrespective of the genomics experiment type based on their text metadata. Python NLP-ML prediction code and trained tissue models are available at
https://github.com/krishnanlab/txt2onto . -
Digital health technology is becoming more ubiquitous in monitoring individuals’ health as both device functionality and overall prevalence increase. However, as individuals age, challenges arise with using this technology particularly when it involves neurodegenerative issues (e.g., for individuals with Parkinson’s disease, Alzheimer’s disease, and ALS). Traditionally, neurodegenerative diseases have been assessed in clinical settings using pen-and-paper style assessments; however, digital health systems allow for the collection of far more data than we ever could achieve using traditional methods. The objective of this work is the formation and implementation of a neurocognitive digital health system designed to go beyond what pen-and-paper based solutions can do through the collection of (a) objective, (b) longitudinal, and (c) symptom-specific data, for use in (d) personalized intervention protocols. This system supports the monitoring of all neurocognitive functions (e.g., motor, memory, speech, executive function, sensory, language, behavioral and psychological function, sleep, and autonomic function), while also providing methodologies for personalized intervention protocols. The use of specifically designed tablet-based assessments and wearable devices allows for the collection of objective digital biomarkers that aid in accurate diagnosis and longitudinal monitoring, while patient reported outcomes (e.g., by the diagnosed individual and caregivers) give additional insights for use in themore »
-
Abstract Graph and language embedding models are becoming commonplace in large scale analyses given their ability to represent complex sparse data densely in low-dimensional space. Integrating these models’ complementary relational and communicative data may be especially helpful if predicting rare events or classifying members of hidden populations—tasks requiring huge and sparse datasets for generalizable analyses. For example, due to social stigma and comorbidities, mental health support groups often form in amorphous online groups. Predicting suicidality among individuals in these settings using standard network analyses is prohibitive due to resource limits (e.g., memory), and adding auxiliary data like text to such models exacerbates complexity- and sparsity-related issues. Here, I show how merging graph and language embedding models (
metapath2vec anddoc2vec ) avoids these limits and extracts unsupervised clustering data without domain expertise or feature engineering. Graph and language distances to a suicide support group have little correlation (ρ < 0.23), implying the two models are not embedding redundant information. When used separately to predict suicidality among individuals, graph and language data generate relatively accurate results (69% and 76%, respectively) but have moderately large false-positive (25% and 21%, respectively) and false-negative (38% and 27%, respectively) rates; however, when integrated, both data produce highly accurate predictions (90%,more »