skip to main content


Title: A Novel Algorithm for Aggregating Crowdsourced Data
Similar content has tremendous utility in classroom and online learning environments. For example, similar content can be used to combat cheating, track students’ learning over time, and model students’ latent knowledge. These different use cases for similar content all rely on different notions of similarity, which make it difficult to determine contents’ similarities. Crowdsourcing is an effective way to identify similar content in a variety of situations by providing workers with guidelines on how to identify similar content for a particular use case. However, crowdsourced opinions are rarely homogeneous and therefore must be aggregated into what is most likely the truth. This work presents the Dynamically Weighted Majority Vote method. A novel algorithm that combines aggregating workers’ crowdsourced opinions with estimating the reliability of each worker. This method was compared to the traditional majority vote method in both a simulation study and an empirical study, in which opinions on seventh grade mathematics problems’ similarity were crowdsourced from middle school math teachers and college students. In both the simulation and the empirical study the Dynamically Weighted Majority Vote method outperformed the traditional majority vote method, suggesting that this method should be used instead of majority vote in future crowdsourcing endeavors.  more » « less
Award ID(s):
1931419
NSF-PAR ID:
10292994
Author(s) / Creator(s):
;
Date Published:
Journal Name:
International Conference on Educational Data Mining
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Similar content has tremendous utility in classroom and online learning environments. For example, similar content can be used to combat cheating, track students’ learning over time, and model students’ latent knowledge. These different use cases for similar content all rely on different notions of similarity, which make it difficult to determine contents’ similarities. Crowdsourcing is an effective way to identify similar content in a variety of situations by providing workers with guidelines on how to identify similar content for a particular use case. However, crowdsourced opinions are rarely homogeneous and therefore must be aggregated into what is most likely the truth. This work presents the Dynamically Weighted Majority Vote method. A novel algorithm that combines aggregating workers’ crowdsourced opinions with estimating the reliability of each worker. This method was compared to the traditional majority vote method in both a simulation study and an empirical study, in which opinions on seventh grade mathematics problems’ similarity were crowdsourced from middle school math teachers and college students. In both the simulation and the empirical study the Dynamically Weighted Majority Vote method outperformed the traditional majority vote method, suggesting that this method should be used instead of majority vote in future crowdsourcing endeavors. 
    more » « less
  2. Crowdsourcing has rapidly become a computing paradigm in machine learning and artificial intelligence. In crowdsourcing, multiple labels are collected from crowd-workers on an instance usually through the Internet. These labels are then aggregated as a single label to match the ground truth of the instance. Due to its open nature, human workers in crowdsourcing usually come with various levels of knowledge and socio-economic backgrounds. Effectively handling such human factors has been a focus in the study and applications of crowdsourcing. For example, Bi et al studied the impacts of worker's dedication, expertise, judgment, and task difficulty (Bi et al 2014). Qiu et al offered methods for selecting workers based on behavior prediction (Qiu et al 2016). Barbosa and Chen suggested rehumanizing crowdsourcing to deal with human biases (Barbosa 2019). Checco et al studied adversarial attacks on crowdsourcing for quality control (Checco et al 2020). There are many more related works available in literature. In contrast to commonly used binary-valued labels, interval-valued labels (IVLs) have been introduced very recently (Hu et al 2021). Applying statistical and probabilistic properties of interval-valued datasets, Spurling et al quantitatively defined worker's reliability in four measures: correctness, confidence, stability, and predictability (Spurling et al 2021). Calculating these measures, except correctness, does not require the ground truth of each instance but only worker’s IVLs. Applying these quantified reliability measures, people have significantly improved the overall quality of crowdsourcing (Spurling et al 2022). However, in real world applications, the reliability of a worker may vary from time to time rather than a constant. It is necessary to monitor worker’s reliability dynamically. Because a worker j labels instances sequentially, we treat j’s IVLs as an interval-valued time series in our approach. Assuming j’s reliability relies on the IVLs within a time window only, we calculate j’s reliability measures with the IVLs in the current time window. Moving the time window forward with our proposed practical strategies, we can monitor j’s reliability dynamically. Furthermore, the four reliability measures derived from IVLs are time varying too. With regression analysis, we can separate each reliability measure as an explainable trend and possible errors. To validate our approaches, we use four real world benchmark datasets in our computational experiments. Here are the main findings. The reliability weighted interval majority voting (WIMV) and weighted preferred matching probability (WPMP) schemes consistently overperform the base schemes in terms of much higher accuracy, precision, recall, and F1-score. Note: the base schemes are majority voting (MV), interval majority voting (IMV), and preferred matching probability (PMP). Through monitoring worker’s reliability, our computational experiments have successfully identified possible attackers. By removing identified attackers, we have ensured the quality. We have also examined the impact of window size selection. It is necessary to monitor worker’s reliability dynamically, and our computational results evident the potential success of our approaches.This work is partially supported by the US National Science Foundation through the grant award NSF/OIA-1946391.

     
    more » « less
  3. This Research Full paper focuses on perceptions and experiences of freshman and sophomore engineering students when playing an online serious engineering game that was designed to improve engineering intuition and knowledge of statics. Use of serious educational engineering games has increased in engineering education to help students increase technical competencies in engineering disciplines. However, few have investigated how these engineering games are experienced by the students; how games influence students' perceptions of learning, or how these factors may lead to inequitable perspectives among diverse populations of students. Purpose/Hypothesis: The purpose of this study was to explore the perceptions, appeal, and opinions about the efficacy of educational online games among a diverse population of students in an engineering mechanics statics course. It was hypothesized that compared to majority groups (e.g., men, White), women of color who are engineering students would experience less connections to the online educational game in terms of ease of use and level of frustration while playing. It is believed that these discordant views may negatively influence the game's appeal and efficacy towards learning engineering in this population of students. Design/Method: The Technology Acceptance Model (TAM) is expanded in this study, where the perspectives of women of colour (Latinx, Asian and African American) engineering students are explored. The research approach employed in this study is a mixed-method sequential exploratory design, where students first played the online engineering educational game, then completed a questionnaire, followed by participation in a focus group. Responses were initially analyzed through open and magnitude coding approaches to understand whether students thought these educational games reflected their personal culture. Results: Preliminary results indicate that though the majority of the students were receptive to using the online engineering software for their engineering education, merely a few intimated that they would use this software for engineering exam or technical job interview preparation. A level-one categorical analysis identified a few themes that comprised unintended preservation of inequality in favor of students who enjoyed contest-based education and game technology. Competition-based valuation of presumed mastery of course content fostered anxiety and intimidation among students, which caused some to "game the game" instead of studying the material, to meet grade goals. Some students indicated that they spent more time (than necessary) to learn the goals of the game than engineering content itself, suggesting a need to better integrate course material while minimizing cognitive effort in learning to navigate the game. Conclusions: Preliminary results indicate that engineering software's design and the way is coupled with course grading and assessment of learning outcomes, affect student perceptions of the technology's acceptance, usefulness, and ease of use as a "learning tool." Students were found to have different expectations of serious games juxtaposed software/apps designed for entertainment. Conclusions also indicate that acceptance of inquiry-based educational games in a classroom among diverse populations of students should clearly articulate and connect the game goals/objectives with class curriculum content. Findings also indicate that a multifaceted schema of tools, such as feedback on game challenges, and explanations for predictions of the game should be included in game/app designs. 
    more » « less
  4. In recent years crowdsourcing has become the method of choice for gathering labeled training data for learning algorithms. Standard approaches to crowdsourcing view the process of acquiring labeled data separately from the process of learning a classifier from the gathered data. This can give rise to computational and statistical challenges. For example, in most cases there are no known computationally efficient learning algorithms that are robust to the high level of noise that exists in crowdsourced data, and efforts to eliminate noise through voting often require a large number of queries per example. In this paper, we show how by interleaving the process of labeling and learning, we can attain computational efficiency with much less overhead in the labeling cost. In particular, we consider the realizable setting where there exists a true target function in F and consider a pool of labelers. When a noticeable fraction of the labelers are perfect, and the rest behave arbitrarily, we show that any F that can be efficiently learned in the traditional realizable PAC model can be learned in a computationally efficient manner by querying the crowd, despite high amounts of noise in the responses. Moreover, we show that this can be done while each labeler only labels a constant number of examples and the number of labels requested per example, on average, is a constant. When no perfect labelers exist, a related task is to find a set of the labelers which are good but not perfect. We show that we can identify all good labelers, when at least the majority of labelers are good. 
    more » « less
  5. null (Ed.)

    The performance of clustering depends on an appropriately defined similarity between two items. When the similarity is measured based on human perception, human workers are often employed to estimate a similarity score between items in order to support clustering, leading to a procedure called crowdsourced clustering. Assuming a monetary reward is paid to a worker for each similarity score and assuming the similarities between pairs and workers' reliability have a large diversity, when the budget is limited, it is critical to wisely assign pairs of items to different workers to optimize the clustering result. We model this budget allocation problem as a Markov decision process where item pairs are dynamically assigned to workers based on the historical similarity scores they provided. We propose an optimistic knowledge gradient policy where the assignment of items in each stage is based on the minimum-weight K-cut defined on a similarity graph. We provide simulation studies and real data analysis to demonstrate the performance of the proposed method.

     
    more » « less