Two primary goals of psychological science should be to understand what aspects of human psychology are universal and the way that context and culture produce variability. This requires that we take into account the importance of culture and context in the way that we write our papers and in the types of populations that we sample. However, most research published in our leading journals has relied on sampling WEIRD (Western, educated, industrialized, rich, and democratic) populations. One might expect that our scholarly work and editorial choices would by now reflect the knowledge that Western populations may not be representative of humans generally with respect to any given psychological phenomenon. However, as we show here, almost all research published by one of our leading journals,
Data preprocessing is an integral step prior to analyzing data in psychological science, with implications for its potentially guiding policy. This article reports how psychological researchers address data preprocessing or quality concerns, with a focus on aberrant responses and missing data in self-report measures. 240 articles were sampled from four journals: Psychological Science, Journal of Personality and Social Psychology, Developmental Psychology, and Abnormal Psychology from 2012 to 2018. Nearly half of the studies did not report any missing data treatment (111/240; 46.25%), and if they did, the most common approach was listwise deletion (71/240; 29.6%). Studies that remove data due to missingness removed, on average, 12% of the sample. Likewise, most studies do not report any aberrant responses (194/240; 80%), but if they did, they classified 4% of the sample as suspect. Most studies are either not transparent enough about their data preprocessing steps or may be leveraging suboptimal procedures. Recommendations can improve transparency and data quality.
more » « less- Award ID(s):
- 1853166
- PAR ID:
- 10401828
- Publisher / Repository:
- SAGE Publications
- Date Published:
- Journal Name:
- Policy Insights from the Behavioral and Brain Sciences
- Volume:
- 10
- Issue:
- 1
- ISSN:
- 2372-7322
- Format(s):
- Medium: X Size: p. 11-17
- Size(s):
- p. 11-17
- Sponsoring Org:
- National Science Foundation
More Like this
-
Psychological Science , relies on Western samples and uses these data in an unreflective way to make inferences about humans in general. To take us forward, we offer a set of concrete proposals for authors, journal editors, and reviewers that may lead to a psychological science that is more representative of the human condition. -
Abstract A prior study found that mailing prepaid incentives with $5 cash visible from outside the envelope increased the response rate to a mail survey by 4 percentage points compared to cash that was not externally visible. This “visible cash effect” suggests opportunities to improve survey response at little or no cost, but many unknowns remain. Among them: Does the visible cash effect generalize to different survey modes, respondent burdens, and cash amounts? Does it differ between fresh samples and reinterview samples? Does it affect data quality or survey costs? This article examines these questions using two linked studies where incentive visibility was randomized in a large probability sample for the American National Election Studies. The first study used $10 incentives with invitations to a long web questionnaire (median 71 minutes, n = 17,849). Visible cash increased response rates in a fresh sample for both screener and extended interview response (by 6.7 and 4.8 percentage points, respectively). Visible cash did not increase the response rate in a reinterview sample where the baseline reinterview response rate was very high (72 percent). The second study used $5 incentives with invitations to a mail-back paper questionnaire (n = 8,000). Visible cash increased the response rate in a sample of prior nonrespondents by 4.0 percentage points (from 31.5 to 35.5), but it did not increase the response rate in a reinterview sample where the baseline reinterview rate was very high (84 percent). In the two studies, several aspects of data quality were investigated, including speeding, non-differentiation, item nonresponse, nonserious responses, noncredible responses, sample composition, and predictive validity; no adverse effects of visible cash were detected, and sample composition improved marginally. Effects on survey costs were either negligible or resulted in net savings. Accumulated evidence now shows that visible cash can increase incentives’ effectiveness in several circumstances.
-
Abstract Widespread failures of replication and generalization are, ironically, a scientific triumph, in that they confirm the fundamental metascientific theory that underlies our field. Generalizable and replicable findings require testing large numbers of subjects from a wide range of demographics with a large, randomly‐sampled stimulus set, and using a variety of experimental parameters. Because few studies accomplish any of this, meta‐scientists predict that findings will frequently fail to replicate or generalize. We argue that to be more robust and replicable, developmental psychology needs to find a mechanism for collecting data at a greater scale and from more diverse populations. Luckily, this mechanism already exists as follows: Citizen science, in which large numbers of uncompensated volunteers provide data. While best‐known for its contributions to astronomy and ecology, citizen science has also produced major findings in neuroscience and psychology, and increasingly in developmental psychology. We provide examples, address practical challenges, discuss limitations, and compare to other methods of obtaining large datasets. Ultimately, we argue that the range of studies where it makes sense *not* to use citizen science is steadily dwindling.
-
Raw datasets collected for fake news detection usually contain some noise such as missing values. In order to improve the performance of machine learning based fake news detection, a novel data preprocessing method is proposed in this paper to process the missing values. Specifically, we have successfully handled the missing values problem by using data imputation for both categorical and numerical features. For categorical features, we imputed missing values with the most frequent value in the columns. For numerical features, the mean value of the column is used to impute numerical missing values. In addition, TF-IDF vectorization is applied in feature extraction to filter out irrelevant features. Experimental results show that Multi-Layer Perceptron (MLP) classifier with the proposed data preprocessing method outperforms baselines and improves the prediction accuracy by more than 15%.more » « less
-
The collection and use of demographic data in psychological sciences has the potential to aid in transforming inequities brought about by unjust social conditions toward equity. However, many current methods surrounding demographic data do not achieve this goal. Some methods function to reduce, but not eliminate, inequities, whereas others may perpetuate harmful stereotypes, invalidate minoritized identities, and exclude key groups from research participation or access to disseminated findings. In this article, we aim to (a) review key ethical and social-justice dilemmas inherent to working with demographic data in psychological research and (b) introduce a framework positioned in ethics and social justice to help psychologists and researchers in social-science fields make thoughtful decisions about the collection and use of demographic data. Although demographic data methods vary across subdisciplines and research topics, we assert that these core issues—and solutions—are relevant to all research within the psychological sciences, including basic and applied research. Our overarching aim is to support key stakeholders in psychology (e.g., researchers, funding agencies, journal editors, peer reviewers) in making ethical and socially-just decisions about the collection, analysis, reporting, interpretation, and dissemination of demographic data.