Demographic information such as gender, age, ethnicity, level of education, disabilities, employment, and socio-economic status are important in the area of social science, survey and marketing. But it is difficult to obtain the demographic information from users due to reluctance of users to participate and low response rate. Through automated demographics prediction from smart phone sensor data, researchers can obtain this valuable information in a nonintrusive and cost-effective manner. We approach the problem of demographic prediction, namely, classification of gender, age group and job type, through the use of a graphical feature based framework. The framework represents information collected from sensor networks as graphs, extracts useful and relevant graphical features, and predicts demographic information. We evaluated our approach on the Nokia Mobile Phone dataset for the three classification tasks: gender, age-group and job-type. Our approach produced comparable results with most of the state of the art methods while having the additional advantage of general applicability to sensor networks without using sophisticated and application-specific feature generation techniques, background knowledge and special techniques to address class imbalance.
more »
« less
Neural User Factor Adaptation for Text Classification: Learning to Generalize Across Author Demographics
Language use varies across different demographic factors, such as gender, age, and geographic location. However, most existing document classification methods ignore demographic variability. In this study, we examine empirically how text data can vary across four demographic factors: gender, age, country, and region. We propose a multitask neural model to account for demographic variations via adversarial training. In experiments on four English-language social media datasets, we find that classification performance improves when adapting for user factors.
more »
« less
- Award ID(s):
- 1657338
- PAR ID:
- 10112015
- Date Published:
- Journal Name:
- Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)
- Page Range / eLocation ID:
- 136 to 146
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Data imbalance is a fundamental challenge in ap- plying language models to biomedical applications, particularly in ICD code prediction tasks where label and demographic distributions are uneven. While state-of-the-art language models have been increasingly adopted in biomedical tasks, few studies have systematically examined how data imbalance affects model performance and fairness across demographic groups. This study fills the gap by statistically probing the relationship between data imbalance and model performance in ICD code prediction. We analyze imbalances in a standard benchmark data across gender, age, ethnicity, and social determinants of health by state- of-the-art biomedical language models. By deploying diverse performance metrics and statistical analyses, we explore the influence of data imbalance on performance variations and demographic fairness. Our study shows that data imbalance significantly impacts model performance and fairness, but feature similarity to the majority class may be a more critical factor. We believe this study provides valuable insights for developing more equitable and robust language models in healthcare applications.more » « less
-
NA (Ed.)Facial attribute classification algorithms frequently manifest demographic biases by obtaining differential performance across gender and racial groups. Existing bias mitigation techniques are mostly in-processing techniques, i.e., implemented during the classifier’s training stage, that often lack generalizability, require demographically annotated training sets, and exhibit a trade-off between fairness and classification accuracy. In this paper, we propose a technique to mitigate bias at the test time i.e., during the deployment stage, by harnessing prediction uncertainty and human–machine partnership. To this front, we propose to utilize those lowest percentages of test data samples identified as outliers with high prediction uncertainty. These identified uncertain samples at test-time are labeled by human analysts for decision rendering and for subsequently retraining the deep neural network in a continual learning framework. With minimal human involvement and through iterative refinement of the network with human guidance at test-time, we seek to enhance the accuracy as well as the fairness of the already deployed facial attribute classification algorithms. Extensive experiments are conducted on gender and smile attribute classification tasks using four publicly available datasets and with gender and race as the protected attributes. The obtained outcomes consistently demonstrate improved accuracy by up to 2% and 5% for the gender and smile attribute classification tasks, respectively, using our proposed approaches. Further, the demographic bias was significantly reduced, outperforming the State-of-the-Art (SOTA) bias mitigation and baseline techniques by up to 55% for both classification tasks.more » « less
-
Abstract With growing expectations to use AI-based educational technology (AI-EdTech) to improve students’ learning outcomes and enrich teaching practice, teachers play a central role in the adoption of AI-EdTech in classrooms. Teachers’ willingness to accept vulnerability by integrating technology into their everyday teaching practice, that is, their trust in AI-EdTech, will depend on how much they expect it to benefit them versus how many concerns it raises for them. In this study, we surveyed 508 K-12 teachers across six countries on four continents to understand which teacher characteristics shape teachers’ trust in AI-EdTech, and its proposed antecedents, perceived benefits and concerns about AI-EdTech. We examined a comprehensive set of characteristics including demographic and professional characteristics (age, gender, subject, years of experience, etc.), cultural values (Hofstede’s cultural dimensions), geographic locations (Brazil, Israel, Japan, Norway, Sweden, USA), and psychological factors (self-efficacy and understanding). Using multiple regression analysis, we found that teachers with higher AI-EdTech self-efficacy and AI understanding perceive more benefits, fewer concerns, and report more trust in AI-EdTech. We also found geographic and cultural differences in teachers’ trust in AI-EdTech, but no demographic differences emerged based on their age, gender, or level of education. The findings provide a comprehensive, international account of factors associated with teachers’ trust in AI-EdTech. Efforts to raise teachers’ understanding of, and trust in AI-EdTech, while considering their cultural values are encouraged to support its adoption in K-12 education.more » « less
-
This paper has two aims. One aim is to consider non-structural (language attitude and use) variables as valid in the field of dialect and linguistic geography in an inner Himalayan valley of Nepal, where four languages have traditionally co- existed asymmetrically and which demonstrate different degrees of vitality vs. endangerment. The other aim is an application of modified spatiality as it aligns with speaker attitudes and practices amidst recent and ongoing socio-economic and population changes. We demonstrate that variation in self-reported attitudes and practices across languages in this region can be explained as much with adjusted spatial factors (labeled ‘social space’) as with traditional social factors (e.g. gender, age, formal education, occupation, etc.). As such, our study contributes to a dis- course on the role and potential of spatiality in sociolinguistic analyses of smaller language communities.more » « less
An official website of the United States government

