Statewise: Human Identity Investigator for the United States

Handzlik, Dakota; Jones, Jason Jeffrey; Skiena, Steven S

doi:10.1609/icwsm.v19i1.35949

Citation Details

Statewise: Human Identity Investigator for the United States

Self-reported biographical strings on social media profiles provide a powerful tool to study personal identity. We present Statewise, a dataset based on 50 million unique Twitter user profiles over a 12 year period identified to be in the United States. Users within this dataset can be accurately partitioned into 52 states/territories at each observation, allowing queries into state-specific language choices over time. We report on the major design decisions underlying Statewise, including the methodology behind the location detection system and measurements of user/state transitions across time. We demonstrate the power of Statewise to study the relative prevalences of different token groups, showing clear and consistent regional differences in language usage. We analyze emoji usage by comparing inclusion rates against external state-level statistics, finding that emoji inclusion shares a significant correlation with state unemployment and poverty rates. Finally, we use Gini coefficients as a measure of token usage inequality across all observed territories and demonstrate a clear stratification based on token content. more »

Award ID(s):: 2208664

PAR ID:: 10657606

Author(s) / Creator(s):: Handzlik, Dakota; Jones, Jason Jeffrey; Skiena, Steven S

Publisher / Repository:: Proceedings of the International AAAI Conference on Web and Social Media

Date Published:: 2025-06-07

Journal Name:: Proceedings of the International AAAI Conference on Web and Social Media

Volume:: 19

ISSN:: 2162-3449

Page Range / eLocation ID:: 2465 to 2476

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Journal Article:
https://doi.org/10.1609/icwsm.v19i1.35949

More Like this