skip to main content


Title: How Graduate Computing Students Search When Using an Unfamiliar Programming Language
Developers and computing students are usually expected to master multiple programming languages. To learn a new language, developers often turn to online search to find information and code examples. However, insights on how learners perform code search when working with an unfamiliar language are lacking. Understanding how learners search and the challenges they encounter when using an unfamiliar language can motivate future tools and techniques to better support subsequent language learners. Research on code search behavior typically involves monitoring developers during search activities through logs or in situ surveys. We conducted a study on how computing students search for code in an unfamiliar programming language with 18 graduate students working on VBA tasks in a lab environment. Our surveys explicitly asked about search success and query reformulation to gather reliable data on those metrics. By analyzing the combination of search logs and survey responses, we found that students typically search to explore APIs or find example code. Approximately 50% of queries that precede clicks on documentation or tutorials successfully solved the problem. Students frequently borrowed terms from languages with which they are familiar when searching for examples in an unfamiliar language, but term borrowing did not impede search success. Edit distances between reformulated queries and non-reformulated queries were nearly the same. These results have implications for code search research, especially on reformulation, and for research on supporting programmers when learning a new language.  more » « less
Award ID(s):
1749936
NSF-PAR ID:
10171045
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
International Conference on Program Comprehension (ICPC)
Page Range / eLocation ID:
160-171
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A great part of software development involves conceptualizing or communicating the underlying procedures and logic that needs to be expressed in programs. One major difficulty of programming is turning concept into code , especially when dealing with the APIs of unfamiliar libraries. Recently, there has been a proliferation of machine learning methods for code generation and retrieval from natural language queries , but these have primarily been evaluated purely based on retrieval accuracy or overlap of generated code with developer-written code, and the actual effect of these methods on the developer workflow is surprisingly unattested. In this article, we perform the first comprehensive investigation of the promise and challenges of using such technology inside the PyCharm IDE, asking, “At the current state of technology does it improve developer productivity or accuracy, how does it affect the developer experience, and what are the remaining gaps and challenges?” To facilitate the study, we first develop a plugin for the PyCharm IDE that implements a hybrid of code generation and code retrieval functionality, and we orchestrate virtual environments to enable collection of many user events (e.g., web browsing, keystrokes, fine-grained code edits). We ask developers with various backgrounds to complete 7 varieties of 14 Python programming tasks ranging from basic file manipulation to machine learning or data visualization, with or without the help of the plugin. While qualitative surveys of developer experience are largely positive, quantitative results with regards to increased productivity, code quality, or program correctness are inconclusive. Further analysis identifies several pain points that could improve the effectiveness of future machine learning-based code generation/retrieval developer assistants and demonstrates when developers prefer code generation over code retrieval and vice versa. We release all data and software to pave the road for future empirical studies on this topic, as well as development of better code generation models. 
    more » « less
  2. Despite some prior research and commercial systems, if someone sees an unfamiliar American Sign Language (ASL) word and wishes to look up its meaning in a dictionary, this remains a difficult task. There is no standard label a user can type to search for a sign, and formulating a query based on linguistic properties is challenging for students learning ASL. Advances in sign-language recognition technology will soon enable the design of a search system for ASL word look-up in dictionaries, by allowing users to generate a query by submitting a video of themselves performing the word they believe they encountered somewhere. Users would then view a results list of video clips or animations, to seek the desired word. In this research, we are investigating the usability of such a proposed system, a webcam-based ASL dictionary system, using a Wizard-of-Oz prototype and enhanced the design so that it can support sign language word look-up even when the performance of the underlying sign-recognition technology is low. We have also investigated the requirements of students learning ASL in regard to how results should be displayed and how a system could enable them to filter the results of the initial query, to aid in their search for a desired word. We compared users’ satisfaction when using a system with or without post-query filtering capabilities. We discuss our upcoming study to investigate users’ experience with a working prototype based on actual sign-recognition technology that is being designed. Finally, we discuss extensions of this work to the context of users searching datasets of videos of other human movements, e.g. dance moves, or when searching for words in other languages. 
    more » « less
  3. This paper shares the design principles of one Advanced Placement Computer Science Principles (AP CSP) course, Beauty and Joy of Computing (BJC), both for schools considering curriculum, and for developers in this still-new field. BJC students not only learn about CS, but do some and analyze its social implications; we feel that the job of enticing students into the field isn’t complete until students find programming, itself, something they enjoy and know they can do, and its key ideas accessible. Students must feel invited to use their own creativity and logic, and enjoy the power of their logic and the beauty and elegance of the code by which they express it. All kids need genuine challenge and sensible supports so all can have the joy of making—seeing themselves as creators, not just consumers, and seeing that it is their own intellect, not just our instructions, that is the source of that making. Framework standards are woven into a consistent social and intellectual storyline to give the curriculum integrity. Principles guide even our choice of programming language. Learners should focus on the logic and structure of their thinking, not on misplaced semicolons; attention to such syntactic detail is antithetical to broadening participation. We feature recursion and higher order functions because they beautifully exemplify abstraction, a key idea in CS and the CSP framework. BJC also places significant emphasis on the social implications of computing, balancing fundamental optimism about computing technology with a critical view of specific uses of technology. 
    more » « less
  4. Users of search systems often reformulate their queries by adding query terms to reflect their evolving information need or to more precisely express their information need when the system fails to surface relevant content. Analyzing these query reformulations can inform us about both system and user behavior. In this work, we study a special category of query reformulations that involve specifying demographic group attributes, such as gender, as part of the reformulated query (e.g., “olympic 2021 soccer results” → “olympic 2021 women‘s soccer results”). There are many ways a query, the search results, and a demographic attribute such as gender may relate, leading us to hypothesize different causes for these reformulation patterns, such as under-representation on the original result page or based on the linguistic theory of markedness. This paper reports on an observational study of gender-specializing query reformulations—their contexts and effects—as a lens on the relationship between system results and gender, based on large-scale search log data from Bing. We find that these reformulations sometimes correct for and other times reinforce gender representation on the original result page, but typically yield better access to the ultimately-selected results. The prevalence of these reformulations—and which gender they skew towards—differ by topical context. However, we do not find evidence that either group under-representation or markedness alone adequately explains these reformulations. We hope that future research will use such reformulations as a probe for deeper investigation into gender (and other demographic) representation on the search result page. 
    more » « less
  5. Early in the pandemic we gathered a group of educators to create and share at-home educational opportunities for families to design and make STEAM projects while at home. As this effort, CoBuild19, continued, we decided to extend our offerings to include basic computer programming. To accomplish this, we created an offering called the Design with Code Club (DwCC). We structured DwCC to be different from other common coding offerings in that we wanted the main focus to be on kids designing solutions to problems that might include the use of technology and coding. We were purposeful in this decision for two main reasons. First, we wanted to make our coding club more interesting to girls, where previous research demonstrates their interest in designing solutions. Second, we wanted this effort to be different from most programming instruction, where coding activities use programming as the core of instruction and application in authentic and student-selected contexts plays a secondary role. DwCC was set up so that each of the first four weeks had a different larger challenge that was COVID-19 related and sessions unfolded with alternating smaller challenges, discussion around design and coding instruction that would develop their skills and knowledge of micro:bit capabilities. We culminated DwCC with an open-ended project where the kids were given the challenge of coming up with their own problem for which they might incorporate micro:bit as part of the solution. Because we were doing all of this online, we used the micro:bit interface through Microsoft MakeCode, which includes a functional simulator. From our experiences we realized that simulations are not as enticing as physical computing with a tangible device, so we set up an incentive where youth who participated in at least three sessions of the club would receive a physical micro:bit. We advertised DwCC through Facebook and twitter and had nearly 200 families register their kids to participate. In the end, a total of 52 micro:bits were sent to youth participants. Based on this success, we sought to expand the effort and increase accessibility for groups that are traditionally underrepresented in STEM. In spring 2021, we offered a Girls DwCC. This was a redesigned version of the club where the focus was even more on problem-solving through design. The club was run by all women, including one from the US, an Industrial Engineer from Mexico and a computer programmer from Albania. More than 50 girls from 17 countries participated in the club! We are working on another version of GDwCC that will be offered in Spanish and focus on Latina girls in the US and Mexico. In the most recent iteration of DwCC we are working with an educator at a school for deaf students to create a version of the club that works for their students. We are doing some modification of activities and recreating videos that involve sign language interpretation. In this presentation we will report on the variants of DwCC, results from participant feedback surveys and plans for future versions. 
    more » « less