skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Optimizing Measurement Reliability in Within-Person Research: Guidelines for Research Design and R Shiny Web Application Tools
Within-person research has become increasingly popular in Psychology for its unique theoretical and methodological advantages for studying dynamic psychological processes. Despite the advancements, there remain serious challenges for many organizational researchers to fully appreciate and appropriately implement within-person research—more specifically, to correctly conceptualize and compute the within-person measurement reliability, as well as navigate key within-person research design factors (e.g., number of measurement occasions, T; number of participants, N; and scale length, I) to optimize within-person reliability. By conducting a comprehensive Monte Carlo simulation with 3240 data conditions, we offer a practical guideline table showing the expected within-person reliability as a function of key design factors. In addition, we provide three easy-to-use, free R Shiny web applications for within-person researchers to conveniently (a) compute expected within-person reliability based on their customized research design, (b) compute observed validity based on the expected reliability and hypothesized within-person validity, and (c) compute observed within-person (as well as between-person) reliability from collected within-person research datasets. We hope these much-needed evidence-based guidelines and practical tools will help enhance within-person research in organizational studies.  more » « less
Award ID(s):
1704309
PAR ID:
10343634
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Journal of Business and Psychology
ISSN:
0889-3268
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this theory paper, we set out to consider, as a matter of methodological interest, the use of quantitative measures of inter-coder reliability (e.g., percentage agreement, correlation, Cohen’s Kappa, etc.) as necessary and/or sufficient correlates for quality within qualitative research in engineering education. It is well known that the phrase qualitative research represents a diverse body of scholarship conducted across a range of epistemological viewpoints and methodologies. Given this diversity, we concur with those who state that it is ill advised to propose recipes or stipulate requirements for achieving qualitative research validity and reliability. Yet, as qualitative researchers ourselves, we repeatedly find the need to communicate the validity and reliability—or quality—of our work to different stakeholders, including funding agencies and the public. One method for demonstrating quality, which is increasingly used in qualitative research in engineering education, is the practice of reporting quantitative measures of agreement between two or more people who code the same qualitative dataset. In this theory paper, we address this common practice in two ways. First, we identify instances in which inter-coder reliability measures may not be appropriate or adequate for establishing quality in qualitative research. We query research that suggests that the numerical measure itself is the goal of qualitative analysis, rather than the depth and texture of the interpretations that are revealed. Second, we identify complexities or methodological questions that may arise during the process of establishing inter-coder reliability, which are not often addressed in empirical publications. To achieve this purposes, in this paper we will ground our work in a review of qualitative articles, published in the Journal of Engineering Education, that have employed inter-rater or inter-coder reliability as evidence of research validity. In our review, we will examine the disparate measures and scores (from 40% agreement to 97% agreement) used as evidence of quality, as well as the theoretical perspectives within which these measures have been employed. Then, using our own comparative case study research as an example, we will highlight the questions and the challenges that we faced as we worked to meet rigorous standards of evidence in our qualitative coding analysis, We will explain the processes we undertook and the challenges we faced as we assigned codes to a large qualitative data set approached from a post positivist perspective. We will situate these coding processes within the larger methodological literature and, in light of contrasting literature, we will describe the principled decisions we made while coding our own data. We will use this review of qualitative research and our own qualitative research experiences to elucidate inconsistencies and unarticulated issues related to evidence for qualitative validity as a means to generate further discussion regarding quality in qualitative coding processes. 
    more » « less
  2. null (Ed.)
    The purpose of this study is to re-examine the validity evidence of the engineering design self-efficacy (EDSE) scale scores by Carberry et al. (2010) within the context of secondary education. Self-efficacy refers to individuals’ belief in their capabilities to perform a domain-specific task. In engineering education, significant efforts have been made to understand the role of self-efficacy for students considering its positive impact on student outcomes such as performance and persistence. These studies have investigated and developed measures for different domains of engineering self-efficacy (e.g., general academic, domain-general, and task-specific self-efficacy). The EDSE scale is a frequently cited measure that examines task-specific self-efficacy within the domain of engineering design. The original scale contains nine items that are intended to represent the engineering design process. Initial score validity evidence was collected using a sample consisting of 202 respondents with varying degrees of engineering experience including undergraduate/graduate students and faculty members. This scale has been primarily used by researchers and practitioners with engineering undergraduate students to assess changes in their engineering design self-efficacy as a result of active learning interventions, such as project-based learning. Our work has begun to experiment using the scale in a secondary education context in conjunction with an increased introduction to engineering in K-12 education. Yet, there still is a need to examine score validity and reliability of this scale in non-undergraduate populations such as secondary school student populations. This study fills this important gap by testing construct validity of the original nine items of the EDSE scale, supporting proper use of the scale for researchers and practitioners. This study was conducted as part of a larger, e4usa project investigating the development and implementation of a yearlong project-based engineering design course for secondary school students. Evidence of construct validity and reliability was collected using a multi-step process. First, a survey that includes the EDSE scale was administered to the project participating students at nine associated secondary schools across the US at the beginning of Spring 2020. Analysis of collected data is in progress and includes Exploratory Factor Analysis (EFA) on the 137 responses. The evidence of score reliability will be obtained by computing the internal consistency of each resulting factor. The resulting factor structure and items will be analyzed by comparing it with the original EDSE scale. The full paper will provide details about the psychometric evaluation of the EDSE scale. The findings from this paper will provide insights on the future usage of the EDSE scale in the context of secondary engineering education. 
    more » « less
  3. The Survey of Physics Reasoning on Uncertainty Concepts in Experiments (SPRUCE) was designed to measure students’ proficiency with measurement uncertainty concepts and practices across ten different assessment objectives to help facilitate the improvement of laboratory instruction focused on this important topic. To ensure the reliability and validity of this assessment, we conducted a comprehensive statistical analysis using classical test theory. This analysis includes an evaluation of the test as a whole, as well as an in-depth examination of individual items and assessment objectives. We make use of a previously reported on scoring scheme involving pairing items with assessment objectives, creating a new unit for statistical analysis referred to as a “couplet.” The findings from our analysis provide evidence for the reliability and validity of SPRUCE as an assessment tool for undergraduate physics labs. This increases both instructors’ and researchers’ confidence in using SPRUCE for measuring students’ proficiency with measurement uncertainty concepts and practices to ultimately improve laboratory instruction. Additionally, our results using couplets and assessment objectives demonstrate how these can be used with traditional classic test theory analysis. Published by the American Physical Society2024 
    more » « less
  4. Abstract Identity development frameworks provide insight into why and to what extent individuals engage in STEM‐related activities. While studies of “STEM identity” often build off previously validated disciplinary and/or science identity frameworks, quantitative analyses of constructs that specifically measure STEM identity and its antecedents are scarce, making it challenging for researchers or practitioners to apply a measurement‐based perspective of participation in opportunities billed as “STEM.” In this study, we tested two expanded structural equation models of STEM identity development, building off extensions of science and disciplinary‐identity frameworks, that incorporated additional factors relevant to identity development: gender, ethnicity, home science support, parental education, and experiencing science talk in the home. Our models test theorized relationships between interest, sense of recognition, performance‐competence, and identity in the context of STEM with undergraduate students (N = 522) enrolled in introductory STEM courses at a Hispanic Serving Institution. Our findings support our measurement of STEM identity and its indicators, providing researchers with a predictive model associated with academic intentions across disciplinary domains in STEM. Further, our expanded model (i.e., Model I+) indicates significant contributions of participant gender, which has a larger indirect effect on STEM identity (β = 0.50) than the direct effect of STEM interest (β = 0.29), and of home support in relation to performance‐competence in academic contexts. Our model also posits a significant contribution of family science talk to sense of recognition as a STEM person, expanding our understandings of the important role of the home environment while challenging prior conceptions of science capital and habitus. We situate our results within a broader discussion regarding the validity of “STEM identity” as a concept and construct in the context of communities often marginalized in STEM fields. 
    more » « less
  5. Problem. Extant measures of students’ cybersecurity self-efficacy lack sufficient evidence of validity based on internal structure. Such evidence of validity is needed to enhance confidence in conclusions drawn from use of self-efficacy measures in the cybersecurity domain. Research Question. To address this identified problem, we sought to answer our research question: What is the underlying factor structure of a new self-efficacy for Information Security measure? Method. We leveraged exploratory factor analysis (EFA) to deter- mine the number of factors underlying a new measure of student self-efficacy to conduct information security. This measure was created to align with the five elements of the information security section of the K-12 Cybersecurity Education framework. Participants were 190 undergraduate students recruited from computer science courses across the U.S. Findings. Results from the EFA indicated that a four-factor solution best fit the data while maximizing interpretability of the factors. The internal reliability of the measure was quite strong (𝛼 = .99). Implications. The psychometric quality of this measure was demonstrated, and thus evidence of validity based on internal structure has been established. Future work will conduct a confirmatory factor analysis (CFA) and assess measurement invariance across sub- groups of interest (e.g., over- vs. under-represented race/ethnicity groups, gender). 
    more » « less