skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Spatio-Temporal Check-in Time Prediction with Recurrent Neural Network based Survival Analysis
We introduce a novel check-in time prediction problem. The goal is to predict the time a user will check-in to a given location. We formulate check-in prediction as a survival analysis problem and propose a Recurrent-Censored Regression (RCR) model. We address the key challenge of check-in data scarcity, which is due to the uneven distribution of check-ins among users/locations. Our idea is to enrich the check-in data with potential visitors, i.e., users who have not visited the location before but are likely to do so. RCR uses recurrent neural network to learn latent representations from historical check-ins of both actual and potential visitors, which is then incorporated with censored regression to make predictions. Experiments show RCR outperforms state-of-the-art event time prediction techniques on real-world datasets.  more » « less
Award ID(s):
1646881 1619028 1707498
PAR ID:
10066663
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Page Range / eLocation ID:
2976 to 2983
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Understanding human mobility has become an important aspect of location-based services in tasks such as personalized recommendation and individual moving pattern recognition, enabled by the large volumes of data from geo-tagged social media (GTSM). Prior studies mainly focus on analyzing human historical footprints collected by GTSM and assuming the veracity of the data, which need not hold when some users are not willing to share their real footprints due to privacy concerns—thereby affecting reliability/authenticity. In this study, we address the problem of Inferring Real Mobility (IRMo) of users, from their unreliable historical traces. Tackling IRMo is a non-trivial task due to the: (1) sparsity of check-in data; (2) suspicious counterfeit check-in behaviors; and (3) unobserved dependencies in human trajectories. To address these issues, we develop a novel Graph-enhanced Attention model calledIRMoGA, which attempts to capture underlying mobility patterns and check-in correlations by exploiting the unreliable spatio-temporal data. Specifically, we incorporate the attention mechanism (rather than solely relying on traditional recursive models) to understand the regularity of human mobility, while employing a graph neural network to understand the mutual interactions from human historical check-ins and leveraging prior knowledge to alleviate the inferring bias. Our experiments conducted on four real-world datasets demonstrate the superior performance of IRMoGA over several state-of-the-art baselines, e.g., up to 39.16% improvement regarding the Recall score on Foursquare. 
    more » « less
  2. In the recent years, reciprocal link prediction has received some attention from the data mining and social network analysis researchers, who solved this problem as a binary classification task. However, it is also important to predict the interval time for the creation of reciprocal link. This is a challenging problem for two reasons: First, the lack of effective features, because well-known link prediction features are designed for undirected networks and for the binary classification task, hence they do not work well for the interval time prediction; Second, the presence of censored data instances makes the traditional supervised regression methods unsuitable for solving this problem. In this paper, we propose a solution for the reciprocal link interval time prediction task. We map this problem into survival analysis framework and show through extensive experiments on real-world datasets that, survival analysis methods perform better than traditional regression, neural network based model and support vector regression (SVR). 
    more » « less
  3. Crowdfunding has gained widespread attention in recent years. Despite the huge success of crowdfunding platforms, the percentage of projects that succeed in achieving their desired goal amount is only around 40%. Moreover, many of these crowdfunding platforms follow "all-or-nothing" policy which means the pledged amount is collected only if the goal is reached within a certain predefined time duration. Hence, estimating the probability of success for a project is one of the most important research challenges in the crowdfunding domain. To predict the project success, there is a need for new prediction models that can potentially combine the power of both classification (which incorporate both successful and failed projects) and regression (for estimating the time for success). In this paper, we formulate the project success prediction as a survival analysis problem and apply the censored regression approach where one can perform regression in the presence of partial information. We rigorously study the project success time distribution of crowdfunding data and show that the logistic and log-logistic distributions are a natural choice for learning from such data. We investigate various censored regression models using comprehensive data of 18K Kickstarter (a popular crowdfunding platform) projects and 116K corresponding tweets collected from Twitter. We show that the models that take complete advantage of both the successful and failed projects during the training phase will perform significantly better at predicting the success of future projects compared to the ones that only use the successful projects. We provide a rigorous evaluation on many sets of relevant features and show that adding few temporal features that are obtained at the project's early stages can dramatically improve the performance. 
    more » « less
  4. This paper investigates the relationship between demographics and the frequency of censored posts (weibos) on Sina Weibo. Our results indicate that demographics such as location, gender and paid for features do not provide a good degree of predictive power but help explain how censorship is applied on social media. Using a dataset of 226 million weibos collected in 2012, we apply a binomial regression model to evaluate the predictive quality of user demographics to identify candidates that may be targeted for censorship. Our results suggest male users who are verified (pay for mobile and security features) are more likely to be censored than females or users who are not verified. In addition, users from provinces such as Hong Kong, Macao, and Beijing are more heavily censored compared to any other province in China over the same period. 
    more » « less
  5. Abstract Copula is a popular method for modeling the dependence among marginal distributions in multivariate censored data. As many copula models are available, it is essential to check if the chosen copula model fits the data well for analysis. Existing approaches to testing the fitness of copula models are mainly for complete or right-censored data. No formal goodness-of-fit (GOF) test exists for interval-censored or recurrent events data. We develop a general GOF test for copula-based survival models using the information ratio (IR) to address this research gap. It can be applied to any copula family with a parametric form, such as the frequently used Archimedean, Gaussian, and D-vine families. The test statistic is easy to calculate, and the test procedure is straightforward to implement. We establish the asymptotic properties of the test statistic. The simulation results show that the proposed test controls the type-I error well and achieves adequate power when the dependence strength is moderate to high. Finally, we apply our method to test various copula models in analyzing multiple real datasets. Our method consistently separates different copula models for all these datasets in terms of model fitness. 
    more » « less