skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Researcher reasoning meets computational capacity: Machine learning for social science
Computational power and big data have created new opportunities to explore and understand the social world. A special synergy is possible when social scientists combine human attention to certain aspects of the problem with the power of algorithms to automate other aspects of the problem. We review selected exemplary applications where machine learning amplifies researcher coding, summarizes complex data, relaxes statistical assumptions, and targets researcher attention to further social science research. We aim to reduce perceived barriers to machine learning by summarizing several fundamental building blocks and their grounding in classical statistics. We present a few guiding principles and promising approaches where we see particular potential for machine learning to transform social science inquiry. We conclude that machine learning tools are increasingly accessible, worthy of attention, and ready to yield new discoveries for social research.  more » « less
Award ID(s):
2104607
PAR ID:
10377063
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Social science research
Volume:
108
ISSN:
0049-089X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. There has been an explosion of growth in using AI, data science, and machine learning in all aspects of our daily life. There is a global competition among governments, industry, and academic institutions to lead research and development in this area. This paper discusses a novel multidisciplinary graduate education and research program at our institution to help develop a trained workforce to meet the demands required to understand and develop AI, data science and machine learning technologies. The program brings together faculty and students in engineering, computer science, and social science to build a traineeship program where cohort teams study fundamental and applied data science research, using compact modules across courses to personalize instruction and prepare each trainee with skills tailored to their prior experience and future career goals. 
    more » « less
  2. Upon encountering this publication, one might ask the obvious question, "Why do we need another deep learning and natural language processing book?" Several excellent ones have been published, covering both theoretical and practical aspects of deep learning and its application to language processing. However, from our experience teaching courses on natural language processing, we argue that, despite their excellent quality, most of these books do not target their most likely readers. The intended reader of this book is one who is skilled in a domain other than machine learning and natural language processing and whose work relies, at least partially, on the automated analysis of large amounts of data, especially textual data. Such experts may include social scientists, political scientists, biomedical scientists, and even computer scientists and computational linguists with limited exposure to machine learning. Existing deep learning and natural language processing books generally fall into two camps. The first camp focuses on the theoretical foundations of deep learning. This is certainly useful to the aforementioned readers, as one should understand the theoretical aspects of a tool before using it. However, these books tend to assume the typical background of a machine learning researcher and, as a consequence, I have often seen students who do not have this background rapidly get lost in such material. To mitigate this issue, the second type of book that exists today focuses on the machine learning practitioner; that is, on how to use deep learning software, with minimal attention paid to the theoretical aspects. We argue that focusing on practical aspects is similarly necessary but not sufficient. Considering that deep learning frameworks and libraries have gotten fairly complex, the chance of misusing them due to theoretical misunderstandings is high. We have commonly seen this problem in our courses, too. This book, therefore, aims to bridge the theoretical and practical aspects of deep learning for natural language processing. We cover the necessary theoretical background and assume minimal machine learning background from the reader. Our aim is that anyone who took introductory linear algebra and calculus courses will be able to follow the theoretical material. To address practical aspects, this book includes pseudo code for the simpler algorithms discussed and actual Python code for the more complicated architectures. The code should be understandable by anyone who has taken a Python programming course. After reading this book, we expect that the reader will have the necessary foundation to immediately begin building real-world, practical natural language processing systems, and to expand their knowledge by reading research publications on these topics. https://doi.org/10.1017/9781009026222 
    more » « less
  3. Cyberbullying has become increasingly prevalent, particularly on social media. There has also been a steady rise in cyberbullying research across a range of disciplines. Much of the empirical work from computer science has focused on developing machine learning models for cyberbullying detection. Whereas machine learning cyberbullying detection models can be improved by drawing on psychological theories and perspectives, there is also tremendous potential for machine learning models to contribute to a better understanding of psychological aspects of cyberbullying. In this paper, we discuss how machine learning models can yield novel insights about the nature and defining characteristics of cyberbullying and how machine learning approaches can be applied to help clinicians, families, and communities reduce cyberbullying. Specifically, we discuss the potential for machine learning models to shed light on the repetitive nature of cyberbullying, the imbalance of power between cyberbullies and their victims, and causal mechanisms that give rise to cyberbullying. We orient our discussion on emerging and future research directions, as well as the practical implications of machine learning cyberbullying detection models. 
    more » « less
  4. Abstract Computational thinking is crucial for STEM researchers and practitioners, as it involves more than just developing skills—it is a way of thinking that enables effective problem-solving. STEM disciplines approach different problems and as such employ computational thinking uniquely, so students cannot rely solely on computer science to develop computational thinking. Less attention has been given to social aspects of computation, such as collaborating and communicating with and about computation even though social aspects are essential to problem solving. We utilized computational literacy as an alternative framework that explicitly includes social elements as a primary pillar. We conducted 15 interviews with STEM researchers to identify and organize the social aspects that play a role in their research. We organized goals by motivation (persuasion and productivity) and representation (visual and non-visual) to contextualize the use of communication in computation. We found that researchers use computation to explain research results, navigate decision making, establish rigor, ensure reproducibility, facilitate lab stability, and promote research efficiency. We used Activity Theory to describe the tools, norms, and communities associated with these goals to offer a more detailed framework for the social pillar of computational literacy within the context of science and engineering. Examples from each discipline within STEM are described. This social computational literacy framework can act as a guide for STEM educators and practitioners alike to use and teach social aspects of computation. 
    more » « less
  5. Data science pipelines inform and influence many daily decisions, from what we buy to who we work for and even where we live. When designed incorrectly, these pipelines can easily propagate social inequity and harm. Traditional solutions are technical in nature; e.g., mitigating biased algorithms. In this vision paper, we introduce a novel lens for promoting responsible data science using theories of behavior change that emphasize not only technical solutions but also the behavioral responsibility of practitioners. By integrating behavior change theories from cognitive psychology with data science workflow knowledge and ethics guidelines, we present a new perspective on responsible data science. We present example data science interventions in machine learning and visual data analysis, contextualized in behavior change theories that could be implemented to interrupt and redirect potentially suboptimal or negligent practices while reinforcing ethically conscious behaviors. We conclude with a call to action to our community to explore this new research area of behavior change interventions for responsible data science. 
    more » « less