skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Deep Learning Architecture for Psychometric Natural Language Processing
Psychometric measures reflecting people’s knowledge, ability, attitudes, and personality traits are critical for many real-world applications, such as e-commerce, health care, and cybersecurity. However, traditional methods cannot collect and measure rich psychometric dimensions in a timely and unobtrusive manner. Consequently, despite their importance, psychometric dimensions have received limited attention from the natural language processing and information retrieval communities. In this article, we propose a deep learning architecture, PyNDA, to extract psychometric dimensions from user-generated texts. PyNDA contains a novel representation embedding, a demographic embedding, a structural equation model (SEM) encoder, and a multitask learning mechanism designed to work in unison to address the unique challenges associated with extracting rich, sophisticated, and user-centric psychometric dimensions. Our experiments on three real-world datasets encompassing 11 psychometric dimensions, including trust, anxiety, and literacy, show that PyNDA markedly outperforms traditional feature-based classifiers as well as the state-of-the-art deep learning architectures. Ablation analysis reveals that each component of PyNDA significantly contributes to its overall performance. Collectively, the results demonstrate the efficacy of the proposed architecture for facilitating rich psychometric analysis. Our results have important implications for user-centric information extraction and retrieval systems looking to measure and incorporate psychometric dimensions.  more » « less
Award ID(s):
1822378 1816504 1629450 1553109 2039915
PAR ID:
10602062
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Association for Computing Machinery (ACM)
Date Published:
Journal Name:
ACM Transactions on Information Systems
Volume:
38
Issue:
1
ISSN:
1046-8188
Format(s):
Medium: X Size: p. 1-29
Size(s):
p. 1-29
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Adverse event detection is critical for many real-world applications including timely identification of product defects, disasters, and major socio-political incidents. In the health context, adverse drug events account for countless hospitalizations and deaths annually. Since users often begin their information seeking and reporting with online searches, examination of search query logs has emerged as an important detection channel. However, search context - including query intent and heterogeneity in user behaviors - is extremely important for extracting information from search queries, and yet the challenge of measuring and analyzing these aspects has precluded their use in prior studies. We propose DeepSAVE, a novel deep learning framework for detecting adverse events based on user search query logs. DeepSAVE uses an enriched variational autoencoder encompassing a novel query embedding and user modeling module that work in concert to address the context challenge associated with search-based detection of adverse events. Evaluation results on three large real-world event datasets show that DeepSAVE outperforms existing detection methods as well as comparison deep learning auto encoders. Ablation analysis reveals that each component of DeepSAVE significantly contributes to its overall performance. Collectively, the results demonstrate the viability of the proposed architecture for detecting adverse events from search query logs. 
    more » « less
  2. Mobile gaming has emerged as a promising market with billion-dollar revenues. A variety of mobile game platforms and services have been developed around the world. One critical challenge for these platforms and services is to understand user churn behavior in mobile games. Accurate churn prediction will bene t many stakeholders such as game developers, advertisers, and platform operators. In this paper, we present the rst large- scale churn prediction solution for mobile games. In view of the common limitations of the state-of-the-art methods built upon traditional machine learning models, we devise a novel semi- supervised and inductive embedding model that jointly learns the prediction function and the embedding function for user- app relationships. We model these two functions by deep neural networks with a unique edge embedding technique that is able to capture both contextual information and relationship dynamics. We also design a novel attributed random walk technique that takes into consideration both topological adjacency and attribute similarities. To evaluate the performance of our solution, we collect real-world data from the Samsung Game Launcher platform that includes tens of thousands of games and hundreds of millions of user-app interactions. The experimental results with this data demonstrate the superiority of our proposed model against existing state-of-the-art methods. 
    more » « less
  3. null (Ed.)
    ncreasingly, large collections of datasets are made available to the public via the Web, ranging from government-curated datasets like those of data.gov to communally-sourced datasets such as Wikipedia tables. It has become clear that traditional search techniques are insufficient for such sources, especially when the user is unfamiliar with the terminology used by the creators of the relevant datasets. We propose to address this problem by elevating the datum to a first-class object that is indexed, thereby making it less dependent on how a dataset is structured. In a data table, a cell contains a value for a particular row as described by a particular column. In our cell-centric indexing approach, we index the metadata of each cell, so that information about its dataset and column simply become metadata rather than constraining concepts. In this paper we define cell-centric indexing and present a system architecture that supports its use in exploring datasets. We describe how cell-centric indexing can be implemented using traditional information retrieval technology and evaluate the scalability of the architecture. 
    more » « less
  4. With the increasing demand for computationally intensive services like deep learning tasks, emerging distributed computing platforms such as edge computing (EC) systems are becoming more popular. Edge computing systems have shown promising results in terms of latency reduction compared to the traditional cloud systems. However, their limited processing capacity imposes a trade-off between the potential latency reduction and the achieved accuracy in computationally-intensive services such as deep learning-based services. In this paper, we focus on finding the optimal accuracy-time trade-off for running deep learning services in a three-tier EC platform where several deep learning models with different accuracy levels are available. Specifically, we cast the problem as an Integer Linear Program, where optimal task scheduling decisions are made to maximize overall user satisfaction in terms of accuracy-time trade-off. We prove that our problem is NP-hard and then provide a polynomial constant-time greedy algorithm, called GUS, that is shown to attain near-optimal results. Finally, upon vetting our algorithmic solution through numerical experiments and comparison with a set of heuristics, we deploy it on a testbed implemented to measure for real-world results. The results of both numerical analysis and real-world implementation show that GUS can outperform the baseline heuristics in terms of the average percentage of satisfied users by a factor of at least 50%. 
    more » « less
  5. Abstract Ranking models are the main components of information retrieval systems. Several approaches to ranking are based on traditional machine learning algorithms using a set of hand-crafted features. Recently, researchers have leveraged deep learning models in information retrieval. These models are trained end-to-end to extract features from the raw data for ranking tasks, so that they overcome the limitations of hand-crafted features. A variety of deep learning models have been proposed, and each model presents a set of neural network components to extract features that are used for ranking. In this paper, we compare the proposed models in the literature along different dimensions in order to understand the major contributions and limitations of each model. In our discussion of the literature, we analyze the promising neural components, and propose future research directions. We also show the analogy between document retrieval and other retrieval tasks where the items to be ranked are structured documents, answers, images and videos. 
    more » « less