skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Quality Assessment for Large-Scale Industrial Software Systems: Experience Report at Alibaba
To assure high software quality for large-scale industrial software systems, traditional approaches of software quality assurance, such as software testing and performance engineering, have been widely used within Alibaba, the world's largest retailer, and one of the largest Internet companies in the world. However, there still exists a high demand for software quality assessment to achieve high sustainability of business growth and engineering culture in Alibaba. To address this issue, we develop an industrial solution for software quality assessment by following the GQM paradigm in an industrial setting. Moreover, we integrate multiple assessment methods into our solution, ranging from metric selection to rating aggregation. Our solution has been implemented, deployed, and adopted at Alibaba: (1) used by Alibaba's Business Platform Unit to continually monitor the quality for 60+ core software systems; (2) used by Alibaba's R&D Efficiency Unit to support group-wide quality-aware code search and automatic code inspection. This paper presents our proposed industrial solution, including its techniques and industrial adoption, along with the lessons learned during the development and deployment of our solution.  more » « less
Award ID(s):
1816615
PAR ID:
10190940
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Quality Assessment for Large-Scale Industrial Software Systems: Experience Report at Alibaba
Page Range / eLocation ID:
142 to 149
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Unit testing focuses on verifying the functions of individual units of a software system. It is challenging due to the high inter-dependencies among software units. Developers address this by mocking-replacing the dependency by a "faked" object. Despite the existence of powerful, dedicated mocking frameworks, developers often turn to a "hand-rolled" approach-inheritance. That is, they create a subclass of the dependent class and mock its behavior through method overriding. However, this requires tedious implementation and compromises the design quality of unit tests. This work contributes a fully automated refactoring framework to identify and replace the usage of inheritance by using Mockito-a well received mocking framework. Our approach is built upon the empirical experience from five open source projects that use inheritance for mocking. We evaluate our approach on four other projects. Results show that our framework is efficient, generally applicable to new datasets, mostly preserves test case behaviors in detecting defects (in the form of mutants), and decouples test code from production code. The qualitative evaluation by experienced developers suggests that the auto-refactoring solutions generated by our framework improve the quality of the unit test cases in various aspects, such as making test conditions more explicit, as well as improved cohesion, readability, understandability, and maintainability with test cases. 
    more » « less
  2. Software documentation supports a broad set of software maintenance tasks; however, creating and maintaining high-quality, multi-level software documentation can be incredibly time-consuming and therefore many code bases suffer from a lack of adequate documentation. We address this problem through presenting HGEN, a fully automated pipeline that leverages LLMs to transform source code through a series of six stages into a well-organized hierarchy of formatted documents. We evaluate HGEN both quantitatively and qualitatively. First, we use it to generate documentation for three diverse projects, and engage key developers in comparing the quality of the generated documentation against their own previously produced manually-crafted documentation. We then pilot HGEN in nine different industrial projects using diverse datasets provided by each project. We collect feedback from project stakeholders, and analyze it using an inductive approach to identify recurring themes. Results show that HGEN produces artifact hierarchies similar in quality to manually constructed documentation, with much higher coverage of the core concepts than the baseline approach. Stakeholder feedback highlights HGEN's commercial impact potential as a tool for accelerating code comprehension and maintenance tasks. 
    more » « less
  3. null (Ed.)
    Eye tracking tools are used in software engineering research to study various software development activities. However, a major limitation of these tools is their inability to track gaze data for activities that involve source code editing. We present a novel solution to support eye tracking experiments for tasks involving source code edits as an extension of the iTrace community infrastructure. We introduce the iTrace-Atom plugin and gazel—a Python data processing pipeline that maps gaze information to changing source code elements and provides researchers with a way to query this dynamic data. iTrace-Atom is evaluated via a series of simulations and is over 99% accurate at high eye-tracking speeds of over 1,000Hz. iTrace and gazel completely revolutionize the way eye tracking studies are conducted in realistic settings with the presence of scrolling, context switching, and now editing. This opens the doors to support many day-to-day software engineering tasks such as bug fixing, adding new features, and refactoring. 
    more » « less
  4. Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software engineering. However, the quality of code produced by a Code LLM varies significantly by programming language. Code LLMs produce impressive results on high-resource programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript), but struggle with low-resource languages that have limited training data available (e.g., OCaml, Racket, and several others). This paper presents an effective approach for boosting the performance of Code LLMs on low-resource languages using semi-synthetic data. Our approach, called MultiPL-T, generates high-quality datasets for low-resource languages, which can then be used to fine-tune any pretrained Code LLM. MultiPL-T translates training data from high-resource languages into training data for low-resource languages in the following way. 1) We use a Code LLM to synthesize unit tests for commented code from a high-resource source language, filtering out faulty tests and code with low test coverage. 2) We use a Code LLM to translate the code from the high-resource source language to a target low-resource language. This gives us a corpus of candidate training data in the target language, but many of these translations are wrong. 3) We use a lightweight compiler to compile the test cases generated in (1) from the source language to the target language, which allows us to filter our obviously wrong translations. The result is a training corpus in the target low-resource language where all items have been validated with test cases. We apply this approach to generate tens of thousands of new, validated training items for five low-resource languages: Julia, Lua, OCaml, R, and Racket, using Python as the source high-resource language. Furthermore, we use an open Code LLM (StarCoderBase) with open training data (The Stack), which allows us to decontaminate benchmarks, train models without violating licenses, and run experiments that could not otherwise be done. Using datasets generated with MultiPL-T, we present fine-tuned versions of StarCoderBase and Code Llama for Julia, Lua, OCaml, R, and Racket that outperform other fine-tunes of these base models on the natural language to code task. We also present Racket fine-tunes for two very recent models, DeepSeek Coder and StarCoder2, to show that MultiPL-T continues to outperform other fine-tuning approaches for low-resource languages. The MultiPL-T approach is easy to apply to new languages, and is significantly more efficient and effective than alternatives such as training longer. 
    more » « less
  5. Abstract This paper (Wu 2016), which was published in AI EDAM online on August 22, 2016, has been retracted by Cambridge University Press as it is very similar in content to a published ASME Conference Proceedings paper. The article in question and the ASME Conference Proceedings paper were submitted for review with AI EDAM and the ASME at similar times, but copyright was assigned to ASME before the paper was accepted in AI EDAM and therefore the article in AI EDAM is being retracted. (In recent years, industrial nations around the globe have invested heavily in new technologies, software, and services to advance digital design and manufacturing using cyber-physical systems, data analytics, and high-performance computing. Many of these initiatives, such as cloud-based design and manufacturing, fall under the umbrella of what has become known as Industry 4.0 or Industrial Internet and are often hailed as pillars of a new industrial revolution. While an increasing number of companies are developing or already offer commercial cloud-based software packages and services for digital design and manufacturing, little work has been reported on providing a review of the state of the art of these commercial software and services as well as identifying research gaps in this field. The objective of this paper is to present a state-of-the-art review of digital design and manufacturing software and services that are currently available on the cloud. The focus of this paper is on assessing to what extent engineering design, engineering analysis, manufacturing, and production across all phases of the product development lifecycles can already be performed based on the software and services accessed through the cloud. In addition, the key capabilities and benefits of these software packages and services are discussed. Based on the assessment of the core features of commercial software and services, it can be concluded that almost all phases of product realization can be conducted through digital design and manufacturing software and services on the cloud. Finally, existing research gaps and related challenges to overcome are identified. The state-of-the-art review serves to provide a technology guide for decision makers in their efforts to select suitable cloud-based software and services as alternatives to existing in-house resources as well as to recommend new research areas.) 
    more » « less