skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Microtask programming: building software with a crowd
Microtask crowdsourcing organizes complex work into workflows, decomposing large tasks into small, relatively independent microtasks. Applied to software development, this model might increase participation in open source software development by lowering the barriers to contribution and dramatically decrease time to market by increasing the parallelism in development work. To explore this idea, we have developed an approach to decomposing programming work into microtasks. Work is coordinated through tracking changes to a graph of artifacts, generating appropriate microtasks and propagating change notifications to artifacts with dependencies. We have implemented our approach in CrowdCode, a cloud IDE for crowd development. To evaluate the feasibility of microtask programming, we performed a small study and found that a small crowd of 12 workers was able to successfully write 480 lines of code and 61 unit tests in 14.25 person-hours of time.  more » « less
Award ID(s):
1302522
PAR ID:
10080460
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ACM Symposium on User Interface Software and Technology
Page Range / eLocation ID:
43 to 54
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Traditional forms of crowdsourcing such as open source software development harness crowd contributions to democratize the creation of software. However, potential contributors must first overcome joining barriers forcing casually committed contributors to spend days or weeks onboarding and thereby reducing participation. To more effectively harness potential contributions from the crowd, we propose a method for programming in which work occurs entirely through microtasks, offering contributors short, self-contained tasks such as implementing part of a function or updating a call site invoking a function to match a change made to the function. In microtask programming, microtasks involve changes to a single artifact, are automatically generated as necessary by the system, and nurture quality through iteration. A study examining the feasibility of microtask programming to create small programs found that developers were able to complete 1008 microtasks, onboard and submit their first microtask in less than 15 minutes, complete all types of microtasks in less than 5 minutes on average, and create 490 lines of code and 149 unit tests. The results demonstrate the potential feasibility as well as revealing a number of important challenges to address to successfully scale microtask programming to larger and more complex programs. 
    more » « less
  2. Crowd development is a development process designed for transient workers of varying skill. Work is organized into microtasks, which are short, self-descriptive, and modular. Microtasks recursively spawn microtasks and are matched to workers, who accrue points reflecting value created. Crowd development might help to reduce time to market and software development costs, increase programmer productivity, and make programming more fun. 
    more » « less
  3. Crowd workers struggle to earn adequate wages. Given the limited task-related information provided on crowd platforms, workers often fail to estimate how long it would take to complete certain microtasks. Although there exist a few third-party tools and online communities that provide estimates of working times, such information is limited to microtasks that have been previously completed by other workers, and such tasks are usually booked immediately by experienced workers. This paper presents a computational technique for predicting microtask working times (i.e., how much time it takes to complete microtasks) based on past experiences of workers regarding similar tasks. The following two challenges were addressed during development of the proposed predictive model — (i) collection of sufficient training data labeled with accurate working times, and (ii) evaluation and optimization of the prediction model. The paper first describes how 7,303 microtask submission data records were collected using a web browser extension — installed by 83 Amazon Mechanical Turk (AMT) workers — created for characterization of the diversity of worker behavior to facilitate accurate recording of working times. Next, challenges encountered in defining evaluation and/or objective functions have been described based on the tolerance demonstrated by workers with regard to prediction errors. To this end, surveys were conducted in AMT asking workers how they felt regarding prediction errors in working times pertaining to microtasks simulated using an “imaginary” AI system. Based on 91,060 survey responses submitted by 875 workers, objective/evaluation functions were derived for use in the prediction model to reflect whether or not the calculated prediction errors would be tolerated by workers. Evaluation results based on worker perceptions of prediction errors revealed that the proposed model was capable of predicting worker-tolerable working times in 73.6% of all tested microtask cases. Further, the derived objective function contributed to realization of accurate predictions across microtasks with more diverse durations. 
    more » « less
  4. null (Ed.)
    Capturing analytic provenance is important for refining sensemaking analysis. However, understanding this provenance can be difficult. First, making sense of the reasoning in intermediate steps is time-consuming. Especially in distributed sensemaking, the provenance is less cohesive because each analyst only sees a small portion of the data without an understanding of the overall collaboration workflow. Second, analysis errors from one step can propagate to later steps. Furthermore, in exploratory sensemaking, it is difficult to define what an error is since there are no correct answers to reference. In this paper, we explore provenance analysis for distributed sensemaking in the context of crowdsourcing, where distributed analysis contributions are captured in microtasks. We propose crowd auditing as a way to help individual analysts visualize and trace provenance to debug distributed sensemaking. To evaluate this concept, we implemented a crowd auditing tool, CrowdTrace. Our user study-based evaluation demonstrates that CrowdTrace offers an effective mechanism to audit and refine multi-step crowd sensemaking 
    more » « less
  5. Code search is vital in the maintenance and extension of software systems. Past works have used separate language models for the natural language and programming language artifacts on models with multiple encoders and different loss functions. Similarly, this work approaches code search for Python as a translation retrieval problem while the natural language queries and the programming language are treated as two types of languages. By using dual encoders, these two types of language sequences are projected onto a shared embedding space, in which the distance reflects the similarity between a given pair of query and code. However, in contrast to previous work, this approach uses a unified language model, and a dual encoder structure with a cosine similarity loss function. A unified language model helps the model take advantage of the considerable overlap of words between the artifacts, making the learning much easier. On the other hand, the dual encoders trained with cosine similarity loss helps the model learn the underlining patterns of which terms are important for predicting linked pairs of artifacts. Evaluation shows the proposed model achieves performance better than state-of-the-art code search models. In addition, this model is much less expensive in terms of time and complexity, offering a cheaper, faster, and better alternative. 
    more » « less