We outline a process for using large coder teams (10 + coders) to code large-scale qualitative data sets. The process reflects experience recruiting and managing large teams of novice and trainee coders for 18 projects in the last decade, each engaging a coding team of 12 (minimum) to 54 (maximum) coders. We identify four unique challenges to large coder teams that are not presently discussed in the methodological literature: (1) recruiting and training coders, (2) providing coder compensation and incentives, (3) maintaining data quality and ensuring coding reliability at scale, and (4) building team cohesion and morale. For each challenge, we provide associated guidance. We conclude with a discussion of advantages and disadvantages of large coder teams for qualitative research and provide notes of caution for anyone considering hiring and/or managing large coder teams for research (whether in academia, government and non-profit sectors, or industry).
more »
« less
An Inquiry into the Use of Intercoder Reliability Measures in Qualitative Research
In this theory paper, we set out to consider, as a matter of methodological interest, the use of quantitative measures of inter-coder reliability (e.g., percentage agreement, correlation, Cohen’s Kappa, etc.) as necessary and/or sufficient correlates for quality within qualitative research in engineering education. It is well known that the phrase qualitative research represents a diverse body of scholarship conducted across a range of epistemological viewpoints and methodologies. Given this diversity, we concur with those who state that it is ill advised to propose recipes or stipulate requirements for achieving qualitative research validity and reliability. Yet, as qualitative researchers ourselves, we repeatedly find the need to communicate the validity and reliability—or quality—of our work to different stakeholders, including funding agencies and the public. One method for demonstrating quality, which is increasingly used in qualitative research in engineering education, is the practice of reporting quantitative measures of agreement between two or more people who code the same qualitative dataset. In this theory paper, we address this common practice in two ways. First, we identify instances in which inter-coder reliability measures may not be appropriate or adequate for establishing quality in qualitative research. We query research that suggests that the numerical measure itself is the goal of qualitative analysis, rather than the depth and texture of the interpretations that are revealed. Second, we identify complexities or methodological questions that may arise during the process of establishing inter-coder reliability, which are not often addressed in empirical publications. To achieve this purposes, in this paper we will ground our work in a review of qualitative articles, published in the Journal of Engineering Education, that have employed inter-rater or inter-coder reliability as evidence of research validity. In our review, we will examine the disparate measures and scores (from 40% agreement to 97% agreement) used as evidence of quality, as well as the theoretical perspectives within which these measures have been employed. Then, using our own comparative case study research as an example, we will highlight the questions and the challenges that we faced as we worked to meet rigorous standards of evidence in our qualitative coding analysis, We will explain the processes we undertook and the challenges we faced as we assigned codes to a large qualitative data set approached from a post positivist perspective. We will situate these coding processes within the larger methodological literature and, in light of contrasting literature, we will describe the principled decisions we made while coding our own data. We will use this review of qualitative research and our own qualitative research experiences to elucidate inconsistencies and unarticulated issues related to evidence for qualitative validity as a means to generate further discussion regarding quality in qualitative coding processes.
more »
« less
- Award ID(s):
- 1664228
- PAR ID:
- 10089476
- Date Published:
- Journal Name:
- ASEE Annual Conference proceedings
- ISSN:
- 1524-4644
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Although the paradigm wars between quantitative and qualitative research methods and the associated epistemologies may have settled down in recent years within the mathematics education research community, the high value placed on quantitative methods and randomized control trials remain as the gold standard at the policy-making level (USDOE, 2008). Although diverse methods are valued in the mathematics education community, if mathematics educators hope to influence policy to cultivate more equitable education systems, then we must engage in rigorous quantitative research. However, quantitative research is limited in what it can measure by the quantitative tools that exist. In mathematics education, it seems as though the development of quantitative tools and studying their associated validity and reliability evidence has lagged behind the important constructs that rich qualitative research has uncovered. The purpose of this study is to describe quantitative instruments related to mathematics teacher behavior and affect in order to better understand what currently exists in the field, what validity and reliability evidence has been published for such instruments, and what constructs each measure. 1. How many and what types of instruments of mathematics teacher behavior and affect exist? 2. What types of validity and reliability evidence are published for these instruments? 3. What constructs do these instruments measure? 4. To what extent have issues of equity been the focus of the instruments found?more » « less
-
Problem. Extant measures of students’ cybersecurity self-efficacy lack sufficient evidence of validity based on internal structure. Such evidence of validity is needed to enhance confidence in conclusions drawn from use of self-efficacy measures in the cybersecurity domain. Research Question. To address this identified problem, we sought to answer our research question: What is the underlying factor structure of a new self-efficacy for Information Security measure? Method. We leveraged exploratory factor analysis (EFA) to deter- mine the number of factors underlying a new measure of student self-efficacy to conduct information security. This measure was created to align with the five elements of the information security section of the K-12 Cybersecurity Education framework. Participants were 190 undergraduate students recruited from computer science courses across the U.S. Findings. Results from the EFA indicated that a four-factor solution best fit the data while maximizing interpretability of the factors. The internal reliability of the measure was quite strong (𝛼 = .99). Implications. The psychometric quality of this measure was demonstrated, and thus evidence of validity based on internal structure has been established. Future work will conduct a confirmatory factor analysis (CFA) and assess measurement invariance across sub- groups of interest (e.g., over- vs. under-represented race/ethnicity groups, gender).more » « less
-
Miller, B; Martin, C (Ed.)Quantitative measures in mathematics education have informed policies and practices for over a century. Thus, it is critical that such measures in mathematics education have sufficient validity evidence to improve mathematics experiences for students. This article provides a systematic review of the validity evidence related to measures used in elementary mathematics education. The review includes measures that focus on elementary students as the unit of analyses and attends to validity as defined by current conceptions of measurement. Findings suggest that one in ten measures in mathematics education include rigorous evidence to support intended uses. Recommendations are made to support mathematics education researchers to continue to take steps to improve validity evidence in the design and use of quantitative measures.more » « less
-
Assessment instruments that generate quantitative data on attributes (cognitive, affective, behavioral,etc.) of participants are commonly used in the chemistry education community to draw conclusions in research studies or inform practice. Recently, articles and editorials have stressed the importance of providing evidence for the validity and reliability of data collected with these instruments following guidance from the Standards for Educational and Psychological Testing. This study examines how quantitative instruments have been used in the journalChemistry Education Research and Practice(CERP) from 2010–2021. Of the 369 unique researcher-developed instruments used during this time frame, the majority only appeared in a single publication (89.7%) and were rarely reused. Cognitive topics were the most common target of the instruments (56.6%). Validity and/or reliability evidence was provided in 64.4% of instances where instruments were used inCERPpublications. The most frequently reported evidence was single administration reliability (e.g., coefficient alpha), appearing in 47.9% of instances. Only 37.2% of instances reported evidence of both validity and reliability. These results indicate that, as a field, opportunities exist to increase the amount of validity and reliability evidence available for data collected with instruments and that reusing instruments may be one method of increasing this type of data quality evidence for instruments used by the chemistry education community.more » « less
An official website of the United States government

