skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Crowd-Sourced Reliability of an Assessment of Lower Facial Aging Using a Validated Visual Scale
Background: Reliable and valid assessments of the visual endpoints of aesthetic surgery procedures are needed. Currently, most assessments are based on the opinion of patients and their plastic surgeons. The objective of this research was to analyze the reliability of crowdworkers assessing de-identified photographs using a validated scale that depicts lower facial aging. Methods: Twenty photographs of the facial nasolabial region of various non-identifiable faces were obtained for which various degrees of facial aging were present. Independent crowds of 100 crowd workers were tasked with assessing the degree of aging using a photograph numeric scale. Independent groups of crowdworkers were surveyed at 4 different times (weekday daytime, weekday nighttime, weekend daytime, weekend nighttime), once a week for 2 weeks. Results: Crowds assessing midface region photographs had an overall correlation of R = 0.979 (weekday daytime R = 0.991; weekday nighttime R = 0.985; weekend daytime R = 0.997; weekend nighttime R = 0.985). Bland−Altman test for test-retest agreement showed a normal distribution of assessments over the various times tested, with the differences in the majority of photographs being within 1 SD of the average difference in ratings. Conclusions: Crowd assessments of facial aging in de-identified photographs displayed very strong concordance with each other, regardless of time of day or week. This shows promise toward obtaining reliable assessments of pre and postoperative results for aesthetic surgery procedures. More work must be done to quantify the reliability of assessments for other pretreatment states or the corresponding results following treatment.  more » « less
Award ID(s):
1847610
PAR ID:
10441763
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Plastic and Reconstructive Surgery - Global Open
Volume:
9
Issue:
1
ISSN:
2169-7574
Page Range / eLocation ID:
e3315
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Exposing students to low-quality assessments such as multiple-choice questions (MCQs) and short answer questions (SAQs) is detrimental to their learning, making it essential to accurately evaluate these assessments. Existing evaluation methods are often challenging to scale and fail to consider their pedagogical value within course materials. Online crowds offer a scalable and cost-effective source of intelligence, but often lack necessary domain expertise. Advancements in Large Language Models (LLMs) offer automation and scalability, but may also lack precise domain knowledge. To explore these trade-offs, we compare the effectiveness and reliability of crowdsourced and LLM-based methods for assessing the quality of 30 MCQs and SAQs across six educational domains using two standardized evaluation rubrics. We analyzed the performance of 84 crowdworkers from Amazon's Mechanical Turk and Prolific, comparing their quality evaluations to those made by the three LLMs: GPT-4, Gemini 1.5 Pro, and Claude 3 Opus. We found that crowdworkers on Prolific consistently delivered the highest-quality assessments, and GPT-4 emerged as the most effective LLM for this task. Our study reveals that while traditional crowdsourced methods often yield more accurate assessments, LLMs can match this accuracy in specific evaluative criteria. These results provide evidence for a hybrid approach to educational content evaluation, integrating the scalability of AI with the nuanced judgment of humans. We offer feasibility considerations in using AI to supplement human judgment in educational assessment. 
    more » « less
  2. rowdsourcing has been used to produce impactful and large-scale datasets for Machine Learning and Artificial Intelligence (AI), such as ImageNET, SuperGLUE, etc. Since the rise of crowdsourcing in early 2000s, the AI community has been studying its computational, system design, and data-centric aspects at various angles. We welcome the studies on developing and enhancing of crowdworker-centric tools, that offer task matching, requester assessment, instruction validation, among other topics. We are also interested in exploring methods that leverage the integration of crowdworkers to improve the recognition and performance of the machine learning models. Thus, we invite studies that focus on shipping active learning techniques, methods for joint learning from noisy data and from crowds, novel approaches for crowd-computer interaction, repetitive task automation, and role separation between humans and machines. Moreover, we invite works on designing and applying such techniques in various domains, including e-commerce and medicine. 
    more » « less
  3. With inputs from human crowds, usually through the Internet, crowdsourcing has become a promising methodology in AI and machine learning for applications that require human knowledge. Researchers have recently proposed interval-valued labels (IVLs), instead of commonly used binary-valued ones, to manage uncertainty in crowdsourcing [19]. However, that work has not yet taken the crowd worker’s reliability into consideration. Crowd workers usually come with various social and economic backgrounds, and have different levels of reliability. To further improve the overall quality of crowdsourcing with IVLs, this work presents practical methods that quantitatively estimate worker’s reliability in terms of his/her correctness, confidence, stability, and predictability from his/her IVLs. With worker’s reliability, this paper proposes two learning schemes: weighted interval majority voting (WIMV) and weighted preferred matching probability (WPMP). Computational experiments on sample datasets demonstrate that both WIMV and WPMP can significantly improve learning results in terms of higher precision, accuracy, and F1-score than other methods. 
    more » « less
  4. AI-based educational technologies may be most welcome in classrooms when they align with teachers' goals, preferences, and instructional practices. Teachers, however, have scarce time to make such customizations themselves. How might the crowd be leveraged to help time-strapped teachers? Crowdsourcing pipelines have traditionally focused on content generation. It is an open question how a pipeline might be designed so the crowd can succeed in a revision/customization task. In this paper, we explore an initial version of a teacher-guided crowdsourcing pipeline designed to improve the adaptive math hints of an AI-based tutoring system so they fit teachers' preferences, while requiring minimal expert guidance. In two experiments involving 144 math teachers and 481 crowdworkers, we found that such an expert-guided revision pipeline could save experts' time and produce better crowd-revised hints (in terms of teacher satisfaction) than two comparison conditions. The revised hints however, did not improve on the existing hints in the AI tutor, which were carefully-written but still have room for improvement and customization. Further analysis revealed that the main challenge for crowdworkers may lie in understanding teachers' brief written comments and implementing them in the form of effective edits, without introducing new problems. We also found that teachers preferred their own revisions over other sources of hints, and exhibited varying preferences for hints. Overall, the results confirm that there is a clear need for customizing hints to individual teachers' preferences. They also highlight the need for more elaborate scaffolds so the crowd can have specific knowledge of the requirements that teachers have for hints. The study represents a first exploration in the literature of how to support crowds with minimal expert guidance in revising and customizing instructional materials. 
    more » « less
  5. We present a novel egocentric visual localization algorithm for an indoor navigation system, called PERCEPT-V, which is designed to assist the blind and visually impaired users traveling independently in an unfamiliar indoor space. Through the integration of a background extraction module based on Robust Principle Component Analysis (RPCA) into the localization algorithm, we successfully improve the resilience of camera localization to the presence of crowds in the observed scene. Experiments using datasets of videos containing various levels of crowd activity show that the proposed algorithm can increase prominently the reliability of localization performance. 
    more » « less