skip to main content

Title: A Comparative Analysis of Classic and Deep Learning Models for Inferring Gender and Age of Twitter Users [A Comparative Analysis of Classic and Deep Learning Models for Inferring Gender and Age of Twitter Users]
Authors:
; ;
Award ID(s):
1934925 1934494
Publication Date:
NSF-PAR ID:
10280458
Journal Name:
Proceedings of the 2nd International Conference on Deep Learning Theory and Applications - DeLTA,
Page Range or eLocation-ID:
48 to 58
Sponsoring Org:
National Science Foundation
More Like this
  1. The #MeToo movement is one of several calls for social change to gain traction on Twitter in the past decade. The movement went viral after prominent individuals shared their experiences, and much of its power continues to be derived from experience sharing. Because millions of #MeToo tweets are published every year, it is important to accurately identify experience-related tweets. Therefore, we propose a new learning task and compare the effectiveness of classic machine learning models, ensemble models, and a neural network model that incorporates a pre-trained language model to reduce the impact of feature sparsity. We find that even with limited training data, the neural network model outperforms the classic and ensemble classifiers. Finally, we analyze the experience-related conversation in English during the first year of the #MeToo movement and determine that experience tweets represent a sizable minority of the conversation and are moderately correlated to major events.
  2. In this project, competition-winning deep neural networks with pretrained weights are used for image-based gender recognition and age estimation. Transfer learning is explored using both VGG19 and VGGFace pretrained models by testing the effects of changes in various design schemes and training parameters in order to improve prediction accuracy. Training techniques such as input standardization, data augmentation, and label distribution age encoding are compared. Finally, a hierarchy of deep CNNs is tested that first classifies subjects by gender, and then uses separate male and female age models to predict age. A gender recognition accuracy of 98.7% and an MAE of 4.1 years is achieved. This paper shows that, with proper training techniques, good results can be obtained by retasking existing convolutional filters towards a new purpose.
  3. Deep learning (DL) is growing in popularity for many data analytics applications, including among enterprises. Large business-critical datasets in such settings typically reside in RDBMSs or other data systems. The DB community has long aimed to bring machine learning (ML) to DBMS-resident data. Given past lessons from in-DBMS ML and recent advances in scalable DL systems, DBMS and cloud vendors are increasingly interested in adding more DL support for DB-resident data. Recently, a new parallel DL model selection execution approach called Model Hopper Parallelism (MOP) was proposed. In this paper, we characterize the particular suitability of MOP for DL on data systems, but to bring MOP-based DL to DB-resident data, we show that there is no single "best" approach, and an interesting tradeoff space of approaches exists. We explain four canonical approaches and build prototypes upon Greenplum Database, compare them analytically on multiple criteria (e.g., runtime efficiency and ease of governance) and compare them empirically with large-scale DL workloads. Our experiments and analyses show that it is non-trivial to meet all practical desiderata well and there is a Pareto frontier; for instance, some approaches are 3x-6x faster but fare worse on governance and portability. Our results and insights can helpmore »DBMS and cloud vendors design better DL support for DB users. All of our source code, data, and other artifacts are available at https://github.com/makemebitter/cerebro-ds.« less