skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Evaluation capacity building in theory and practice: Revisiting models from practitioner perspectives
This article seeks to bring practitioner experience to bear on existing models of ECB with a particular focus on the models through the lens of our own ECB practice. We reflect on how our ECB practices align with or challenge these models, and how the insights that come from those reflections can inform future ECB research and frameworks for evaluating ECB initiatives. As is often the case when theory collides with practice, current models may not always reflect and serve the work at hand, and the value and usefulness, as well as accuracy and relevance, of existing models is worth investigating. With this in mind, we offer input to inform future models of ECB that are more inclusive of and relevant for the broad spectrum of current ECB practice and, subsequently, its evaluation.  more » « less
Award ID(s):
1841985
PAR ID:
10552622
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Mason, Sarah; Montrosse-Moorhead, Bianca
Publisher / Repository:
Wiley Periodicals, LLC. and the American Evaluation Association
Date Published:
Journal Name:
New Directions for Evaluation
Volume:
2024
Issue:
183
ISSN:
1097-6736
Page Range / eLocation ID:
29 to 41
Subject(s) / Keyword(s):
Evaluation Capacity Building Evaluation
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract In this article, we reflect on a decade of using the Kirkpatrick four‐level model to evaluate a multifaceted evaluation capacity building (ECB) initiative. Traditionally used to assess business training efforts, the Kirkpatrick model encourages evidence to be gathered at four levels: reaction, learning, behavior, and results. We adapted these levels to fit the context and information needs of the EvaluATE project, an ECB initiative funded by the National Science Foundation. As members of the external evaluation and project teams, throughout the article we describe how each level was modified and translated into evaluation questions. Our adapted Kirkpatrick levels are implementation and reach, satisfaction, learning, application, and impact. Using these adapted Kirkpatrick levels to ground our evaluation challenged us to integrate multiple data sources to tell a comprehensive story that served the information needs of the project team and the funder. Overall, we found the Kirkpatrick model to be practical, accessible, and flexible, allowing us to capture the multidimensional aspects of the ECB initiative. However, there are opportunities to enhance the utility of the Kirkpatrick framework by integrating other evaluation approaches, such as culturally responsive and equitable evaluation and principles‐focused evaluation. 
    more » « less
  2. Researchers across various fields have investigated how users experience moderation through different perspectives and methodologies. At present, there is a pressing need of synthesizing and extracting key insights from prior literature to formulate a systematic understanding of what constitutes a moderation experience and to explore how such understanding could further inform moderation-related research and practices. To answer this question, we conducted a systematic literature review (SLR) by analyzing 42 empirical studies related to moderation experiences and published between January 2016 and March 2022. We describe these studies' characteristics and how they characterize users' moderation experiences. We further identify five primary perspectives that prior researchers use to conceptualize moderation experiences. These findings suggest an expansive scope of research interests in understanding moderation experiences and considering moderated users as an important stakeholder group to reflect on current moderation design but also pertain to the dominance of the punitive, solutionist logic in moderation and ample implications for future moderation research, design, and practice. 
    more » « less
  3. Cross-Document Event Coreference (CDEC) annotation is challenging and difficult to scale, resulting in existing datasets being small and lacking diversity. We introduce a new approach leveraging large language models (LLMs) to decontextualize event mentions, by simplifying the document-level annotation task to sentence pairs with enriched context, enabling the creation of Richer EventCorefBank (RECB), a denser and more expressive dataset annotated at faster speed. Decontextualization has been shown to improve annotation speed without compromising quality and to enhance model performance. Our baseline experiment indicates that systems trained on RECB achieve comparable results on the EventCorefBank(ECB+) test set, showing the high quality of our dataset and its generalizability on other CDEC datasets. In addition, our evaluation shows that the strong baseline models are still struggling with RECB comparing to other CDEC datasets, suggesting that the richness and diversity of RECB present significant challenges to current CDEC systems. 
    more » « less
  4. We categorize meta-learning evaluation into two settings: in-distribution [ID], in which the train and test tasks are sampled iid from the same underlying task distribution, and out-of-distribution [OOD], in which they are not. While most meta-learning theory and some FSL applications follow the ID setting, we identify that most existing few-shot classification benchmarks instead reflect OOD evaluation, as they use disjoint sets of train (base) and test (novel) classes for task generation. This discrepancy is problematic because -- as we show on numerous benchmarks -- meta-learning methods that perform better on existing OOD datasets may perform significantly worse in the ID setting. In addition, in the OOD setting, even though current FSL benchmarks seem befitting, our study highlights concerns in 1) reliably performing model selection for a given meta-learning method, and 2) consistently comparing the performance of different methods. To address these concerns, we provide suggestions on how to construct FSL benchmarks to allow for ID evaluation as well as more reliable OOD evaluation. Our work aims to inform the meta-learning community about the importance and distinction of ID vs. OOD evaluation, as well as the subtleties of OOD evaluation with current benchmarks. 
    more » « less
  5. In this review, we analyze the current state of the art of compu- tational models for in-vehicle User Interface (UI) design. Driver distraction, often caused by drivers performing Non Driving Re- lated Tasks (NDRTs), is a major contributor to vehicle crashes. Accordingly, in-vehicle UIs must be evaluated for their distraction potential. Computational models are a promising solution to au- tomate this evaluation, but are not yet widely used, limiting their real-world impact. We systematically review the existing literature on computational models for NDRTs to analyze why current ap- proaches have not yet found their way into practice. We found that while many models are intended for UI evaluation, they focus on small and isolated phenomena that are disconnected from the needs of automotive UI designers. In addition, very few approaches make predictions detailed enough to inform current design pro- cesses. Our analysis of the state of the art, the identified research gaps, and the formulated research potentials can guide researchers and practitioners toward computational models that improve the automotive UI design process. 
    more » « less