skip to main content

Title: Data analysis and modeling pipelines for controlled networked social science experiments
There is large interest in networked social science experiments for understanding human behavior at-scale. Significant effort is required to perform data analytics on experimental outputs and for computational modeling of custom experiments. Moreover, experiments and modeling are often performed in a cycle, enabling iterative experimental refinement and data modeling to uncover interesting insights and to generate/refute hypotheses about social behaviors. The current practice for social analysts is to develop tailor-made computer programs and analytical scripts for experiments and modeling. This often leads to inefficiencies and duplication of effort. In this work, we propose a pipeline framework to take a significant step towards overcoming these challenges. Our contribution is to describe the design and implementation of a software system to automate many of the steps involved in analyzing social science experimental data, building models to capture the behavior of human subjects, and providing data to test hypotheses. The proposed pipeline framework consists of formal models, formal algorithms, and theoretical models as the basis for the design and implementation. We propose a formal data model, such that if an experiment can be described in terms of this model, then our pipeline software can be used to analyze data efficiently. The merits of the more » proposed pipeline framework is elaborated by several case studies of networked social science experiments. « less
Authors:
; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Award ID(s):
1916670
Publication Date:
NSF-PAR ID:
10203865
Journal Name:
PloS one
Volume:
15
Page Range or eLocation-ID:
1-58
ISSN:
1932-6203
Sponsoring Org:
National Science Foundation
More Like this
  1. The prevalence of mobile phones and wearable devices enables the passive capturing and modeling of human behavior at an unprecedented resolution and scale. Past research has demonstrated the capability of mobile sensing to model aspects of physical health, mental health, education, and work performance, etc. However, most of the algorithms and models proposed in previous work follow a one-size-fits-all (i.e., population modeling) approach that looks for common behaviors amongst all users, disregarding the fact that individuals can behave very differently, resulting in reduced model performance. Further, black-box models are often used that do not allow for interpretability and human behaviormore »understanding. We present a new method to address the problems of personalized behavior classification and interpretability, and apply it to depression detection among college students. Inspired by the idea of collaborative-filtering, our method is a type of memory-based learning algorithm. It leverages the relevance of mobile-sensed behavior features among individuals to calculate personalized relevance weights, which are used to impute missing data and select features according to a specific modeling goal (e.g., whether the student has depressive symptoms) in different time epochs, i.e., times of the day and days of the week. It then compiles features from epochs using majority voting to obtain the final prediction. We apply our algorithm on a depression detection dataset collected from first-year college students with low data-missing rates and show that our method outperforms the state-of-the-art machine learning model by 5.1% in accuracy and 5.5% in F1 score. We further verify the pipeline-level generalizability of our approach by achieving similar results on a second dataset, with an average improvement of 3.4% across performance metrics. Beyond achieving better classification performance, our novel approach is further able to generate personalized interpretations of the models for each individual. These interpretations are supported by existing depression-related literature and can potentially inspire automated and personalized depression intervention design in the future« less
  2. During the COVID-19 pandemic, many students lost opportunities to explore science in labs due to school closures. Remote labs provide a possible solution to mitigate this loss. However, most remote labs to date are based on a somehow centralized model in which experts design and conduct certain types of experiments in well-equipped facilities, with a few options of manipulation provided to remote users. In this paper, we propose a distributed framework, dubbed remote labs 2.0, that offers the flexibility needed to build an open platform to support educators to create, operate, and share their own remote labs. Similar to themore »transformation of the Web from 1.0 to 2.0, remote labs 2.0 can greatly enrich experimental science on the Internet by allowing users to choose and contribute their subjects and topics. As a reference implementation, we developed a platform branded as Telelab. In collaboration with a high school chemistry teacher, we conducted remote chemical reaction experiments on the Telelab platform with two online classes. Pre/post-test results showed that these high school students attained significant gains (t(26)=8.76, p<0.00001) in evidence-based reasoning abilities. Student surveys revealed three key affordances of Telelab: live experiments, scientific instruments, and social interactions. All 31 respondents were engaged by one or more of these affordances. Students behaviors were characterized by analyzing their interaction data logged by the platform. These findings suggest that appropriate applications of remote labs 2.0 in distance education can, to some extent, reproduce critical effects of their local counterparts on promoting science learning.« less
  3. Recently, aligning users among different social networks has received significant attention. However, most of the existing studies do not consider users’ behavior information during the aligning procedure and thus still suffer from the poor learning performance. In fact, we observe that social network alignment and behavior analysis can benefit from each other. Motivated by such an observation, we propose to jointly study the social network alignment problem and user behavior analysis problem. We design a novel end-to-end framework named BANANA. In this framework, to leverage behavior analysis for social network alignment at the distribution level, we design an earth mover’smore »distance based alignment model to fuse users’ behavior information for more comprehensive user representations. To further leverage social network alignment for behavior analysis, in turn, we design a temporal graph neural network model to fuse behavior information in different social networks based on the alignment result. Two models above can work together in an end-to-end manner. Through extensive experiments on real-world datasets, we demonstrate that our proposed approach outperforms the state-of-the-art methods in the social network alignment task and the user behavior analysis task, respectively.

    « less
  4. Learning explicit and implicit patterns in human trajectories plays an important role in many Location-Based Social Networks (LBSNs) applications, such as trajectory classification (e.g., walking, driving, etc.), trajectory-user linking, friend recommendation, etc. A particular problem that has attracted much attention recently – and is the focus of our work – is the Trajectory-based Social Circle Inference (TSCI), aiming at inferring user social circles (mainly social friendship) based on motion trajectories and without any explicit social networked information. Existing approaches addressing TSCI lack satisfactory results due to the challenges related to data sparsity, accessibility and model efficiency. Motivated by the recentmore »success of machine learning in trajectory mining, in this paper we formulate TSCI as a novel multi-label classification problem and develop a Recurrent Neural Network (RNN)-based framework called DeepTSCI to use human mobility patterns for inferring corresponding social circles. We propose three methods to learn the latent representations of trajectories, based on: (1) bidirectional Long Short-Term Memory (LSTM); (2) Autoencoder; and (3) Variational autoencoder. Experiments conducted on real-world datasets demonstrate that our proposed methods perform well and achieve significant improvement in terms of macro-R, macro-F1 and accuracy when compared to baselines.« less
  5. In anagram games, players are provided with letters for forming as many words as possible over a specified time duration. Anagram games have been used in controlled experiments to study problems such as collective identity, effects of goal setting, internal-external attributions, test anxiety, and others. The majority of work on anagram games involves individual players. Recently, work has expanded to group anagram games where players cooperate by sharing letters. In this work, we analyze experimental data from online social networked experiments of group anagram games. We develop mechanistic and data driven models of human decision-making to predict detailed game playermore »actions (e.g., what word to form next). With these results, we develop a composite agent-based modeling and simulation platform that incorporates the models from data analysis. We compare model predictions against experimental data, which enables us to provide explanations of human decision-making and behavior. Finally, we provide illustrative case studies using agent-based simulations to demonstrate the efficacy of models to provide insights that are beyond those from experiments alone.« less