skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Han, Yi"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available June 11, 2026
  2. Context: Design anti-patterns can be symptoms of problems that lead to long-term maintenance difficulty. How should development teams prioritize their treatment? Which ones are more severe and deserve more attention? Does the impact of anti-patterns and general maintenance efforts differ with different programming languages? Objective: In this study, we assess the prevalence and severity of anti-patterns in different programming languages and the impact of dynamic typing in Python, as well as the impact scopes of prevalent anti-patterns that manifest the violation of design principles. Method: We conducted a large-scale study of anti-patterns using 1717 open-source projects written in Java, C/C++, and Python. For the 288 Python projects, we extracted both explicit and dynamic dependencies and compared how the detected anti-patterns and maintenance costs changed. Finally, we removed anti-patterns involving five or fewer files to assess the impact of trivial anti-patterns. Results: The results reveal that 99.55% of these projects contain anti-patterns. Modularity Violation – frequent co-changes among seemingly unrelated files – is most prevalent (detected in 83.54% of all projects) and costly (incurred 61.55% of maintenance effort on average). Unstable Interface and Crossing, caused by influential but unstable files, although not as prevalent, tend to incur severe maintenance costs. Duck typing in Python incurs more anti-patterns, and the churn spent on Python files multiplies that of C/C++ and Java files. Several prevalent anti-patterns have a large portion of trivial instances, meaning that these common symptoms are usually not harmful. Conclusion: Implicit and visible dependencies are the most expensive to maintain, and dynamic typing in Python exacerbates the issue. Influential but unstable files need to be monitored and rectified early to prevent the accumulation of high maintenance costs. The violations of design principles are widespread, but many are not high-maintenance. 
    more » « less
  3. Aspect-based sentiment analysis (ABSA) enables a systematic identification of user opinions on particular aspects, thus enhancing the idea creation process in the initial stages of product/service design. Attention-based large language models (LLMs) like BERT and T5 have proven powerful in ABSA tasks. Yet, several key limitations remain, both regarding the ABSA task and the capabilities of attention-based models. First, existing research mainly focuses on relatively simpler ABSA tasks such as aspect-based sentiment analysis, while the task of extracting aspect, opinion, and sentiment in a unified model remains largely unaddressed. Second, current ABSA tasks overlook implicit opinions and sentiments. Third, most attention-based LLMs like BERT use position encoding in a linear projected manner or through split-position relations in word distance schemes, which could lead to relation biases during the training process. This article addresses these gaps by (1) creating a new annotated dataset with five types of labels, including aspect, category, opinion, sentiment, and implicit indicator (ACOSI), (2) developing a unified model capable of extracting all five types of labels simultaneously in a generative manner, and (3) designing a new position encoding method in the attention-based model. The numerical experiments conducted on a manually labeled dataset scraped from three major e-Commerce retail stores for apparel and footwear products demonstrate the performance, scalability, and potential of the framework developed. The article concludes with recommendations for future research on automated need finding and sentiment analysis for user-centered design. 
    more » « less
  4. Abstract BackgroundRNA secondary structure (RSS) can influence the regulation of transcription, RNA processing, and protein synthesis, among other processes. 3′ untranslated regions (3′ UTRs) of mRNA also hold the key for many aspects of gene regulation. However, there are often contradictory results regarding the roles of RSS in 3′ UTRs in gene expression in different organisms and/or contexts. ResultsHere, we incidentally observe that the primary substrate of miR159a (pri-miR159a), when embedded in a 3′ UTR, could promote mRNA accumulation. The enhanced expression is attributed to the earlier polyadenylation of the transcript within the hybrid pri-miR159a-3′ UTR and, resultantly, a poorly structured 3′ UTR. RNA decay assays indicate that poorly structured 3′ UTRs could promote mRNA stability, whereas highly structured 3′ UTRs destabilize mRNA in vivo. Genome-wide DMS-MaPseq also reveals the prevailing inverse relationship between 3′ UTRs’ RSS and transcript accumulation in the transcriptomes ofArabidopsis, rice, and even human. Mechanistically, transcripts with highly structured 3′ UTRs are preferentially degraded by 3′–5′ exoribonuclease SOV and 5′–3′ exoribonuclease XRN4, leading to decreased expression inArabidopsis. Finally, we engineer different structured 3′ UTRs to an endogenousFTgene and alter theFT-regulated flowering time inArabidopsis. ConclusionsWe conclude that highly structured 3′ UTRs typically cause reduced accumulation of the harbored transcripts inArabidopsis. This pattern extends to rice and even mammals. Furthermore, our study provides a new strategy of engineering the 3′ UTRs’ RSS to modify plant traits in agricultural production and mRNA stability in biotechnology. 
    more » « less
  5. Abstract Eliciting informative user opinions from online reviews is a key success factor for innovative product design and development. The unstructured, noisy, and verbose nature of user reviews, however, often complicate large-scale need finding in a format useful for designers without losing important information. Recent advances in abstractive text summarization has created the opportunity to systematically generate opinion summaries from online reviews to inform the early stages of product design and development. However, two knowledge gaps hinder the applicability of opinion summarization methods in practice. First, there is a lack of formal mechanisms to guide the generative process with respect to different categories of product attributes and user sentiments. Second, the annotated training datasets needed for supervised training of abstractive summarization models are often difficult and costly to create. This article addresses these gaps by (1) devising an efficient computational framework for abstractive opinion summarization guided by specific product attributes and sentiment polarities, and (2) automatically generating a synthetic training dataset that captures various degrees of granularity and polarity. A hierarchical multi-instance attribute-sentiment inference mode is developed for assembling a high-quality synthetic dataset, which is utilized to fine-tune a pretrained language model for abstractive summary generation. Numerical experiments conducted on a large dataset scraped from three major e-Commerce retail store for apparel and footwear products indicate the performance, feasibility, and potentials of the developed framework. Several directions are provided for future exploration in the area of automated opinion summarization for user-centered design. 
    more » « less
  6. Aspect-based sentiment analysis (ABSA) provides an opportunity to systematically generate user's opinions of specific aspects to enrich the idea creation process in the early stage of product/service design process. Yet, the current ABSA task has two major limitations. First, existing research mostly focusing on the subsets of ABSA task, e.g. aspect-sentiment extraction, extract aspect, opinion, and sentiment in a unified model is still an open problem. Second, the implicit opinion and sentiment are ignored in the current ABSA task. This article tackles these gaps by (1) creating a new annotated dataset comprised of five types of labels, including aspect, category, opinion, sentiment, and implicit indicator (ACOSI) and (2) developing a unified model which could extract all five types of labels simultaneously in a generative manner. Numerical experiments conducted on the manually labeled dataset originally scraped from three major e-Commerce retail stores for apparel and footwear products indicate the performance, scalability, and potentials of the framework developed. Several directions are provided for future exploration in the area of automated aspect-based sentiment analysis for user-centered design. 
    more » « less
  7. Extracting and analyzing informative user opinion from large-scale online reviews is a key success factor in product design processes. However, user reviews are naturally unstructured, noisy, and verbose. Recent advances in abstractive text summrization provide an unprecedented opportunity to systematically generate summaries of user opinions to facilitate need finding for designers. Yet, two main gaps in the state-of-the-art opinion summarization methods limit their applicability to the product design domain. First is the lack of capabilities to guide the generative process with respect to various product aspects and user sentiments (e.g., polarity, subjectivity), and the second gap is the lack of annotated training datasets for supervised learning. This paper tackles these gaps by (1) devising an efficient and scalable methodology for abstractive opinion summarization from online reviews guided by aspects terms and sentiment polarities, and (2) automatically generating a reusable synthetic training dataset that captures various degrees of granularity and polarity. The methodology contributes a multi-instance pooling model with aspect and sentiment information integrated (MAS), a synthetic data assembled using the results of the MAS model, and a fine-tuned pretrained sequence-to-sequence model “T5” for summary generation. Numerical experiments are conducted on a large dataset scraped from a major e-commerce retail store for sneakers to demonstrate the performance, feasibility, and potentials of the developed methodology. Several directions are provided for future exploration in the area of automated opinion summarization for user-centered product design. 
    more » « less