skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Examining User Heterogeneity in Digital Experiments
Digital experiments are routinely used to test the value of a treatment relative to a status quo control setting — for instance, a new search relevance algorithm for a website or a new results layout for a mobile app. As digital experiments have become increasingly pervasive in organizations and a wide variety of research areas, their growth has prompted a new set of challenges for experimentation platforms. One challenge is that experiments often focus on the average treatment effect (ATE) without explicitly considering differences across major sub-groups — heterogeneous treatment effect (HTE). This is especially problematic because ATEs have decreased in many organizations as the more obvious benefits have already been realized. However, questions abound regarding the pervasiveness of user HTEs and how best to detect them. We propose a framework for detecting and analyzing user HTEs in digital experiments. Our framework combines an array of user characteristics with double machine learning. Analysis of 27 real-world experiments spanning 1.76 billion sessions and simulated data demonstrates the effectiveness of our detection method relative to existing techniques. We also find that transaction, demographic, engagement, satisfaction, and lifecycle characteristics exhibit statistically significant HTEs in 10% to 20% of our real-world experiments, underscoring the importance of considering user heterogeneity when analyzing experiment results, otherwise personalized features and experiences cannot happen, thus reducing effectiveness. In terms of the number of experiments and user sessions, we are not aware of any study that has examined user HTEs at this scale. Our findings have important implications for information retrieval, user modeling, platforms, and digital experience contexts, in which online experiments are often used to evaluate the effectiveness of design artifacts.  more » « less
Award ID(s):
2039915
PAR ID:
10482734
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Association for Computing Machinery (ACM)
Date Published:
Journal Name:
ACM Transactions on Information Systems
ISSN:
1046-8188
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Precise estimation of treatment effects is crucial for accurately evaluating the intervention. While deep learning models have exhibited promising performance in learning counterfactual representations for treatment effect estimation (TEE), a major limitation in most of these models is that they often overlook the diversity of treatment effects across potential subgroups that have varying treatment effects and characteristics, treating the entire population as a homogeneous group. This limitation restricts the ability to precisely estimate treatment effects and provide targeted treatment recommendations. In this paper, we propose a novel treatment effect estimation model, named SubgroupTE, which incorporates subgroup identification in TEE. SubgroupTE identifies heterogeneous subgroups with different responses and more precisely estimates treatment effects by considering subgroup-specific treatment effects in the estimation process. In addition, we introduce an expectation–maximization (EM)-based training process that iteratively optimizes estimation and subgrouping networks to improve both estimation and subgroup identification. Comprehensive experiments on the synthetic and semi-synthetic datasets demonstrate the outstanding performance of SubgroupTE compared to the existing works for treatment effect estimation and subgrouping models. Additionally, a real-world study demonstrates the capabilities of SubgroupTE in enhancing targeted treatment recommendations for patients with opioid use disorder (OUD) by incorporating subgroup identification with treatment effect estimation. 
    more » « less
  2. null (Ed.)
    Network embedding has demonstrated effective empirical performance for various network mining tasks such as node classification, link prediction, clustering, and anomaly detection. However, most of these algorithms focus on the single-view network scenario. From a real-world perspective, one individual node can have different connectivity patterns in different networks. For example, one user can have different relationships on Twitter, Facebook, and LinkedIn due to varying user behaviors on different platforms. In this case, jointly considering the structural information from multiple platforms (i.e., multiple views) can potentially lead to more comprehensive node representations, and eliminate noises and bias from a single view. In this paper, we propose a view-adversarial framework to generate comprehensive and robust multi-view network representations named VANE, which is based on two adversarial games. The first adversarial game enhances the comprehensiveness of the node representation by discriminating the view information which is obtained from the subgraph induced by neighbors of that node. The second adversarial game improves the robustness of the node representation with the challenging of fake node representations from the generative adversarial net. We conduct extensive experiments on downstream tasks with real-world multi-view networks, which shows that our proposed VANE framework significantly outperforms other baseline methods. 
    more » « less
  3. Virtual reality (VR) platforms enable a wide range of applications, however, pose unique privacy risks. In particular, VR devices are equipped with a rich set of sensors that collect personal and sensitive information (e.g., body motion, eye gaze, hand joints, and facial expression). The data from these newly available sensors can be used to uniquely identify a user, even in the absence of explicit identifiers. In this paper, we seek to understand the extent to which a user can be identified based solely on VR sensor data, within and across real-world apps from diverse genres. We consider adversaries with capabilities that range from observing APIs available within a single app (app adversary) to observing all or selected sensor measurements across multiple apps on the VR device (device adversary). To that end, we introduce BehaVR, a framework for collecting and analyzing data from all sensor groups collected by multiple apps running on a VR device. We use BehaVR to collect data from real users that interact with 20 popular real-world apps. We use that data to build machine learning models for user identification within and across apps, with features extracted from available sensor data. We show that these models can identify users with an accuracy of up to 100%, and we reveal the most important features and sensor groups, depending on the functionality of the app and the adversary. To the best of our knowledge, BehaVR is the first to analyze user identification in VR comprehensively, i.e., considering all sensor measurements available on consumer VR devices, collected by multiple real-world, as opposed to custom-made, apps. 
    more » « less
  4. Current approaches to A/B testing in networks focus on limiting interference, the concern that treatment effects can “spill over” from treatment nodes to control nodes and lead to biased causal effect estimation. In the presence of interference, two main types of causal effects are direct treatment effects and total treatment effects. In this paper, we propose two network experiment designs that increase the accuracy of direct and total effect estimations in network experiments through minimizing interference between treatment and control units. For direct treatment effect estimation, we present a framework that takes advantage of independent sets and assigns treatment and control only to a set of non-adjacent nodes in a graph, in order to disentangle peer effects from direct treatment effect estimation. For total treatment effect estimation, our framework combines weighted graph clustering and cluster matching approaches to jointly minimize interference and selection bias. Through a series of simulated experiments on synthetic and real-world network datasets, we show that our designs significantly increase the accuracy of direct and total treatment effect estimation in network experiments. 
    more » « less
  5. This work contributes to just and pro-social treatment of digital pieceworkers ("crowd collaborators") by reforming the handling of crowd-sourced labor in academic venues. With the rise in automation, crowd collaborators' treatment requires special consideration, as the system often dehumanizes crowd collaborators as components of the “crowd” [41]. Building off efforts to (proxy-)unionize crowd workers and facilitate employment protections on digital piecework platforms, we focus on employers: academic requesters sourcing machine learning (ML) training data. We propose a cover sheet to accompany submission of work that engages crowd collaborators for sourcing (or labeling) ML training data. The guidelines are based on existing calls from worker organizations (e.g., Dynamo [28]); professional data workers in an alternative digital piecework organization; and lived experience as requesters and workers on digital piecework platforms. We seek feedback on the cover sheet from the ACM community 
    more » « less