skip to main content


Title: CooBa: Cross-project Bug Localization via Adversarial Transfer Learning

Bug localization plays an important role in software quality control. Many supervised machine learning models have been developed based on historical bug-fix information. Despite being successful, these methods often require sufficient historical data (i.e., labels), which is not always available especially for newly developed software projects. In response, cross-project bug localization techniques have recently emerged whose key idea is to transferring knowledge from label-rich source project to locate bugs in the target project. However, a major limitation of these existing techniques lies in that they fail to capture the specificity of each individual project, and are thus prone to negative transfer.To address this issue, we propose an adversarial transfer learning bug localization approach, focusing on only transferring the common characteristics (i.e., public information) across projects. Specifically, our approach (CooBa) learns the indicative public information from cross-project bug reports through a shared encoder, and extracts the private information from code files by an individual feature extractor for each project. CooBa further incorporates adversarial learning mechanism to ensure that public information shared between multiple projects could be effectively extracted. Extensive experiments on four large-scale real-world data sets demonstrate that the proposed CooBa significantly outperforms the state of the art techniques.

 
more » « less
Award ID(s):
1939725 1947135 1715385
NSF-PAR ID:
10200359
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IJCAI
Page Range / eLocation ID:
3565 to 3571
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    In June 2020, at the annual conference of the American Society for Engineering Education (ASEE), which was held entirely online due to the impacts of COVID-19 (SARS-CoV-2), engineering education researchers and social justice scholars diagnosed the spread of two diseases in the United States: COVID-19 and racism. During a virtual workshop (T614A) titled, “Using Power, Privilege, and Intersectionality as Lenses to Understand our Experiences and Begin to Disrupt and Dismantle Oppressive Structures Within Academia,” Drs. Nadia Kellam, Vanessa Svihla, Donna Riley, Alice Pawley, Kelly Cross, Susannah Davis, and Jay Pembridge presented what we might call a pathological analysis of institutionalized racism and various other “isms.” In order to address the intersecting impacts of this double pandemic, they prescribed counter practices and protocols of anti-racism, and strategies against other oppressive “isms” in academia. At the beginning of the virtual workshop, the presenters were pleasantly surprised to see that they had around a hundred attendees. Did the online format of the ASEE conference afford broader exposure of the workshop? Did recent uprising of Black Lives Matter (BLM) protests across the country, and internationally, generate broader interest in their topic? Whatever the case, at a time when an in-person conference could not be convened without compromising public health safety, ASEE’s virtual conference platform, furnished by Pathable and supplemented by Zoom, made possible the broader social impacts of Dr. Svihla’s land acknowledgement of the unceded Indigenous lands from which she was presenting. Svihla attempted to go beyond a hollow gesture by including a hyperlink in her slides to a COVID-19 relief fund for the Navajo Nation, and encouraged attendees to make a donation as they copied and pasted the link in the Zoom Chat. Dr. Cross’s statement that you are either a racist or an anti-racist at this point also promised broader social impacts in the context of the virtual workshop. You could feel the intensity of the BLM social movements and the broader political climate in the tone of the presenters’ voices. The mobilizing masses on the streets resonated with a cutting-edge of social justice research and education at the ASEE virtual conference. COVID-19 has both exacerbated and made more obvious the unevenness and inequities in our educational practices, processes, and infrastructures. This paper is an extension of a broader collaborative research project that accounts for how an exceptional group of engineering educators have taken this opportunity to socially broaden their curricula to include not just public health matters, but also contemporary political and social movements. Engineering educators for change and advocates for social justice quickly recognized the affordances of diverse forms of digital technologies, and the possibilities of broadening their impact through educational practices and infrastructures of inclusion, openness, and accessibility. They are makers of what Gary Downy calls “scalable scholarship”—projects in support of marginalized epistemologies that can be scaled up from ideation to practice in ways that unsettle and displace the dominant epistemological paradigm of engineering education.[1] This paper is a work in progress. It marks the beginning of a much lengthier project that documents the key positionality of engineering educators for change, and how they are socially situated in places where they can connect social movements with industrial transitions, and participate in the production of “undone sciences” that address “a structured absence that emerges from relations of inequality.”[2] In this paper, we offer a brief glimpse into ethnographic data we collected virtually through interviews, participant observation, and digital archiving from March 2019 to August 2019, during the initial impacts of COVID-19 in the United States. The collaborative research that undergirds this paper is ongoing, and what is presented here is a rough and early articulation of ideas and research findings that have begun to emerge through our engagement with engineering educators for change. This paper begins by introducing an image concept that will guide our analysis of how, in this historical moment, forms of social and racial justice are finding their way into the practices of engineering educators through slight changes in pedagogical techniques in response the debilitating impacts of the pandemic. Conceptually, we are interested in how small and subtle changes in learning conditions can socially broaden the impact of engineering educators for change. After introducing the image concept that guides this work, we will briefly discuss methodology and offer background information about the project. Next, we discuss literature that revolves around the question, what is engineering education for? Finally, we introduce the notion of situating engineering education and give readers a brief glimpse into our ethnographic data. The conclusion will indicate future directions for writing, research, and intervention. 
    more » « less
  2. Searching for parking has been a problem faced by many drivers, especially in urban areas. With an increasing public demand for parking information and services, as well as the proliferation of advanced smartphones, a range of smartphone-based parking management services began to emerge. Funded by the National Science Foundation, our research aims to explore the potential of smartphone-based parking management services as a solution to parking problems, to deepen our understandings of travelers’ parking behaviors, and to further advance the analytical foundations and methodologies for modeling and assessing parking solutions. This paper summarizes progress and results from our research projects on smartphone-based parking management, including parking availability information prediction, parking searching strategy, the development of a mobile parking application, and our next steps to learn and discover new knowledge from its deployment. To predict future parking occupancy, we proposed a practical framework that integrates machine-learning techniques with a model-based core approach that explicitly models the stochastic parking process. The framework is able to predict future parking occupancy from historical occupancy data alone, and can handle complex arrival and departure patterns in real-world case studies, including special event. With the predicted probabilistic availability information, a cost-minimizing parking searching strategy is developed. The parking searching problem for an individual user is a stochastic Markov decision process and is formalized as a dynamic programming problem. The cost-minimizing parking searching strategy is solved by value iteration. Our simulated experiments showed that cost-minimizing strategy has the lowest expected cost but tends to direct a user to visit more parking facilities compared with two greedy strategies. Currently, we are working on implementing the predictive framework and the searching algorithm in a mobile phone application. We are working closely with Arizona State University (ASU) Parking and Transit Services to implement a three-stage pilot deployment of the prototype application around the ASU main campus. In the first stage, our application will provide real-time information and we will incorporate availability prediction and searching guidance in the second and third stages. Once the mobile application is deployed, it will provide unique opportunities to collect data on parking search behaviors, discover emerging scenarios of smartphone-based parking management services, and assess the impacts of such systems. 
    more » « less
  3. Searching for parking has been a problem faced by many drivers, especially in urban areas. With an increasing public demand for parking information and services, as well as the proliferation of advanced smartphones, a range of smartphone-based parking management services began to emerge. Funded by the National Science Foundation, our research aims to explore the potential of smartphone-based parking management services as a solution to parking problems, to deepen our understandings of travelers’ parking behaviors, and to further advance the analytical foundations and methodologies for modeling and assessing parking solutions. This paper summarizes progress and results from our research projects on smartphone-based parking management, including parking availability information prediction, parking searching strategy, the development of a mobile parking application, and our next steps to learn and discover new knowledge from its deployment. To predict future parking occupancy, we proposed a practical framework that integrates machine-learning techniques with a model-based core approach that explicitly models the stochastic parking process. The framework is able to predict future parking occupancy from historical occupancy data alone, and can handle complex arrival and departure patterns in real-world case studies, including special event. With the predicted probabilistic availability information, a cost-minimizing parking searching strategy is developed. The parking searching problem for an individual user is a stochastic Markov decision process and is formalized as a dynamic programming problem. The cost-minimizing parking searching strategy is solved by value iteration. Our simulated experiments showed that cost-minimizing strategy has the lowest expected cost but tends to direct a user to visit more parking facilities compared with two greedy strategies. Currently, we are working on implementing the predictive framework and the searching algorithm in a mobile phone application. We are working closely with Arizona State University (ASU) Parking and Transit Services to implement a three-stage pilot deployment of the prototype application around the ASU main campus. In the first stage, our application will provide real-time information and we will incorporate availability prediction and searching guidance in the second and third stages. Once the mobile application is deployed, it will provide unique opportunities to collect data on parking search behaviors, discover emerging scenarios of smartphone-based parking management services, and assess the impacts of such systems. 
    more » « less
  4. It takes great effort to manually or semi-automatically convert free-text phenotype narratives (e.g., morphological descriptions in taxonomic works) to a computable format before they can be used in large-scale analyses. We argue that neither a manual curation approach nor an information extraction approach based on machine learning is a sustainable solution to produce computable phenotypic data that are FAIR (Findable, Accessible, Interoperable, Reusable) (Wilkinson et al. 2016). This is because these approaches do not scale to all biodiversity, and they do not stop the publication of free-text phenotypes that would need post-publication curation. In addition, both manual and machine learning approaches face great challenges: the problem of inter-curator variation (curators interpret/convert a phenotype differently from each other) in manual curation, and keywords to ontology concept translation in automated information extraction, make it difficult for either approach to produce data that are truly FAIR. Our empirical studies show that inter-curator variation in translating phenotype characters to Entity-Quality statements (Mabee et al. 2007) is as high as 40% even within a single project. With this level of variation, curated data integrated from multiple curation projects may still not be FAIR. The key causes of this variation have been identified as semantic vagueness in original phenotype descriptions and difficulties in using standardized vocabularies (ontologies). We argue that the authors describing characters are the key to the solution. Given the right tools and appropriate attribution, the authors should be in charge of developing a project's semantics and ontology. This will speed up ontology development and improve the semantic clarity of the descriptions from the moment of publication. In this presentation, we will introduce the Platform for Author-Driven Computable Data and Ontology Production for Taxonomists, which consists of three components: a web-based, ontology-aware software application called 'Character Recorder,' which features a spreadsheet as the data entry platform and provides authors with the flexibility of using their preferred terminology in recording characters for a set of specimens (this application also facilitates semantic clarity and consistency across species descriptions); a set of services that produce RDF graph data, collects terms added by authors, detects potential conflicts between terms, dispatches conflicts to the third component and updates the ontology with resolutions; and an Android mobile application, 'Conflict Resolver,' which displays ontological conflicts and accepts solutions proposed by multiple experts. a web-based, ontology-aware software application called 'Character Recorder,' which features a spreadsheet as the data entry platform and provides authors with the flexibility of using their preferred terminology in recording characters for a set of specimens (this application also facilitates semantic clarity and consistency across species descriptions); a set of services that produce RDF graph data, collects terms added by authors, detects potential conflicts between terms, dispatches conflicts to the third component and updates the ontology with resolutions; and an Android mobile application, 'Conflict Resolver,' which displays ontological conflicts and accepts solutions proposed by multiple experts. Fig. 1 shows the system diagram of the platform. The presentation will consist of: a report on the findings from a recent survey of 90+ participants on the need for a tool like Character Recorder; a methods section that describes how we provide semantics to an existing vocabulary of quantitative characters through a set of properties that explain where and how a measurement (e.g., length of perigynium beak) is taken. We also report on how a custom color palette of RGB values obtained from real specimens or high-quality specimen images, can be used to help authors choose standardized color descriptions for plant specimens; and a software demonstration, where we show how Character Recorder and Conflict Resolver can work together to construct both human-readable descriptions and RDF graphs using morphological data derived from species in the plant genus Carex (sedges). The key difference of this system from other ontology-aware systems is that authors can directly add needed terms to the ontology as they wish and can update their data according to ontology updates. a report on the findings from a recent survey of 90+ participants on the need for a tool like Character Recorder; a methods section that describes how we provide semantics to an existing vocabulary of quantitative characters through a set of properties that explain where and how a measurement (e.g., length of perigynium beak) is taken. We also report on how a custom color palette of RGB values obtained from real specimens or high-quality specimen images, can be used to help authors choose standardized color descriptions for plant specimens; and a software demonstration, where we show how Character Recorder and Conflict Resolver can work together to construct both human-readable descriptions and RDF graphs using morphological data derived from species in the plant genus Carex (sedges). The key difference of this system from other ontology-aware systems is that authors can directly add needed terms to the ontology as they wish and can update their data according to ontology updates. The software modules currently incorporated in Character Recorder and Conflict Resolver have undergone formal usability studies. We are actively recruiting Carex experts to participate in a 3-day usability study of the entire system of the Platform for Author-Driven Computable Data and Ontology Production for Taxonomists. Participants will use the platform to record 100 characters about one Carex species. In addition to usability data, we will collect the terms that participants submit to the underlying ontology and the data related to conflict resolution. Such data allow us to examine the types and the quantities of logical conflicts that may result from the terms added by the users and to use Discrete Event Simulation models to understand if and how term additions and conflict resolutions converge. We look forward to a discussion on how the tools (Character Recorder is online at http://shark.sbs.arizona.edu/chrecorder/public) described in our presentation can contribute to producing and publishing FAIR data in taxonomic studies. 
    more » « less
  5. null (Ed.)
    As our nation’s need for engineering professionals grows, a sharp rise in P-12 engineering education programs and related research has taken place (Brophy, Klein, Portsmore, & Rogers, 2008; Purzer, Strobel, & Cardella, 2014). The associated research has focused primarily on students’ perceptions and motivations, teachers’ beliefs and knowledge, and curricula and program success. The existing research has expanded our understanding of new K-12 engineering curriculum development and teacher professional development efforts, but empirical data remain scarce on how racial and ethnic diversity of student population influences teaching methods, course content, and overall teachers’ experiences. In particular, Hynes et al. (2017) note in their systematic review of P-12 research that little attention has been paid to teachers’ experiences with respect to racially and ethnically diverse engineering classrooms. The growing attention and resources being committed to diversity and inclusion issues (Lichtenstein, Chen, Smith, & Maldonado, 2014; McKenna, Dalal, Anderson, & Ta, 2018; NRC, 2009) underscore the importance of understanding teachers’ experiences with complementary research-based recommendations for how to implement engineering curricula in racially diverse schools to engage all students. Our work examines the experiences of three high school teachers as they teach an introductory engineering course in geographically and distinctly different racially diverse schools across the nation. The study is situated in the context of a new high school level engineering education initiative called Engineering for Us All (E4USA). The National Science Foundation (NSF) funded initiative was launched in 2018 as a partnership among five universities across the nation to ‘demystify’ engineering for high school students and teachers. The program aims to create an all-inclusive high school level engineering course(s), a professional development platform, and a learning community to support student pathways to higher education institutions. An introductory engineering course was developed and professional development was provided to nine high school teachers to instruct and assess engineering learning during the first year of the project. This study investigates participating teachers’ implementation of the course in high schools across the nation to understand the extent to which their experiences vary as a function of student demographic (race, ethnicity, socioeconomic status) and resource level of the school itself. Analysis of these experiences was undertaken using a collective case-study approach (Creswell, 2013) involving in-depth analysis of a limited number of cases “to focus on fewer "subjects," but more "variables" within each subject” (Campbell & Ahrens, 1998, p. 541). This study will document distinct experiences of high school teachers as they teach the E4USA curriculum. Participants were purposively sampled for the cases in order to gather an information-rich data set (Creswell, 2013). The study focuses on three of the nine teachers participating in the first cohort to implement the E4USA curriculum. Teachers were purposefully selected because of the demographic makeup of their students. The participating teachers teach in Arizona, Maryland and Tennessee with predominantly Hispanic, African-American, and Caucasian student bodies, respectively. To better understand similarities and differences among teaching experiences of these teachers, a rich data set is collected consisting of: 1) semi-structured interviews with teachers at multiple stages during the academic year, 2) reflective journal entries shared by the teachers, and 3) multiple observations of classrooms. The interview data will be analyzed with an inductive approach outlined by Miles, Huberman, and Saldaña (2014). All teachers’ interview transcripts will be coded together to identify common themes across participants. Participants’ reflections will be analyzed similarly, seeking to characterize their experiences. Observation notes will be used to triangulate the findings. Descriptions for each case will be written emphasizing the aspects that relate to the identified themes. Finally, we will look for commonalities and differences across cases. The results section will describe the cases at the individual participant level followed by a cross-case analysis. This study takes into consideration how high school teachers’ experiences could be an important tool to gain insight into engineering education problems at the P-12 level. Each case will provide insights into how student body diversity impacts teachers’ pedagogy and experiences. The cases illustrate “multiple truths” (Arghode, 2012) with regard to high school level engineering teaching and embody diversity from the perspective of high school teachers. We will highlight themes across cases in the context of frameworks that represent teacher experience conceptualizing race, ethnicity, and diversity of students. We will also present salient features from each case that connect to potential recommendations for advancing P-12 engineering education efforts. These findings will impact how diversity support is practiced at the high school level and will demonstrate specific novel curricular and pedagogical approaches in engineering education to advance P-12 mentoring efforts. 
    more » « less