NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Approaching code search for python as a translation retrieval problem with dual encoders

https://doi.org/10.1007/s10664-024-10580-3

Khan, Monoshiz Mahbub; Yu, Zhe (January 2025, Empirical Software Engineering)

Code search is vital in the maintenance and extension of software systems. Past works have used separate language models for the natural language and programming language artifacts on models with multiple encoders and different loss functions. Similarly, this work approaches code search for Python as a translation retrieval problem while the natural language queries and the programming language are treated as two types of languages. By using dual encoders, these two types of language sequences are projected onto a shared embedding space, in which the distance reflects the similarity between a given pair of query and code. However, in contrast to previous work, this approach uses a unified language model, and a dual encoder structure with a cosine similarity loss function. A unified language model helps the model take advantage of the considerable overlap of words between the artifacts, making the learning much easier. On the other hand, the dual encoders trained with cosine similarity loss helps the model learn the underlining patterns of which terms are important for predicting linked pairs of artifacts. Evaluation shows the proposed model achieves performance better than state-of-the-art code search models. In addition, this model is much less expensive in terms of time and complexity, offering a cheaper, faster, and better alternative.
more » « less
Free, publicly-accessible full text available January 1, 2026
Estimation of Participation Factors for Power System Oscillation From Measurements

https://doi.org/10.1109/TIA.2025.3530869

Xia, Tianwei; Yu, Zhe; Sun, Kai; Shi, Di; Huang, Kaiyang (January 2025, IEEE Transactions on Industry Applications)

In a power system, when the participation factors of generators are computed to rank their participations into an oscillatory mode, a model-based approach is conventionally used on the linearized system model by means of the corresponding right and left eigenvectors. This paper proposes a new approach for estimating participation factors directly from measurement data on generator responses under selected disturbances. The approach computes extended participation factors that coincide with accurate model-based participation factors when the measured responses satisfy an ideally symmetric condition. This paper relaxes this symmetric condition with the original measurement space by identifying and utilizing a coordinate transformation to a new space optimally recovering the symmetry. Thus, the optimal estimates of participation factors solely from measurements are achieved, and the accuracy and influencing factors are discussed. The proposed approach is first demonstrated in detail on a two-area system and then tested on an NPCC 48-machine power system. The penetration of inverter-based resources is also considered.
more » « less
Free, publicly-accessible full text available January 1, 2026
FairBalance: How to Achieve Equalized Odds With Data Pre-Processing

https://doi.org/10.1109/TSE.2024.3431445

Yu, Zhe; Chakraborty, Joymallya; Menzies, Tim (September 2024, IEEE Transactions on Software Engineering)

This research seeks to benefit the software engineering society by providing a simple yet effective pre-processing approach to achieve equalized odds fairness in machine learning software. Fairness issues have attracted increasing attention since machine learning software is increasingly used for high-stakes and high-risk decisions. It is the responsibility of all software developers to make their software accountable by ensuring that the machine learning software do not perform differently on different sensitive demographic groups—satisfying equalized odds. Different from prior works which either optimize for an equalized odds related metric during the learning process like a black-box, or manipulate the training data following some intuition; this work studies the root cause of the violation of equalized odds and how to tackle it. We found that equalizing the class distribution in each demographic group with sample weights is a necessary condition for achieving equalized odds without modifying the normal training process. In addition, an important partial condition for equalized odds (zero average odds difference) can be guaranteed when the class distributions are weighted to be not only equal but also balanced (1:1). Based on these analyses, we proposed FairBalance, a pre-processing algorithm which balances the class distribution in each demographic group by assigning calculated weights to the training data. On eight real-world datasets, our empirical results show that, at low computational overhead, the proposed pre-processing algorithm FairBalance can significantly improve equalized odds without much, if any damage to the utility. FairBalance also outperforms existing state-of-the-art approaches in terms of equalized odds. To facilitate reuse, reproduction, and validation, we made our scripts available at https://github.com/hil-se/FairBalance.
more » « less
Full Text Available
Identifying Self-Admitted Technical Debts With Jitterbug: A Two-Step Approach

https://doi.org/10.1109/TSE.2020.3031401

Yu, Zhe; Fahid, Fahmid Morshed; Tu, Huy; Menzies, Tim (May 2022, IEEE Transactions on Software Engineering)

Full Text Available
Supermassive Black Holes with High Accretion Rates in Active Galactic Nuclei. XIII. Ultraviolet Time Lag of Hβ Emission in Mrk 142

https://doi.org/10.3847/1538-4357/acfb72

Khatu, Viraja C.; Gallagher, Sarah C.; Horne, Keith; Cackett, Edward M.; Hu, Chen; Pasquini, Sofia; Hall, Patrick; Wang, Jian-Min; Bian, Wei-Hao; Li, Yan-Rong; et al (November 2023, The Astrophysical Journal)

Abstract We performed a rigorous reverberation-mapping analysis of the broad-line region (BLR) in a highly accreting (L/L_Edd= 0.74–3.4) active galactic nucleus, Markarian 142 (Mrk 142), for the first time using concurrent observations of the inner accretion disk and the BLR to determine a time lag for the Hβλ4861 emission relative to the ultraviolet (UV) continuum variations. We used continuum data taken with the Niel Gehrels Swift Observatory in theUVW2 band, and the Las Cumbres Observatory, Dan Zowada Memorial Observatory, and Liverpool Telescope in thegband, as part of the broader Mrk 142 multiwavelength monitoring campaign in 2019. We obtained new spectroscopic observations covering the Hβbroad emission line in the optical from the Gemini North Telescope and the Lijiang 2.4 m Telescope for a total of 102 epochs (over a period of 8 months) contemporaneous to the continuum data. Our primary result states a UV-to-Hβtime lag of ${8.68}_{- 0.72}^{+ 0.75}$ days in Mrk 142 obtained from light-curve analysis with a Python-based running optimal average algorithm. We placed our new measurements for Mrk 142 on the optical and UV radius–luminosity relations for NGC 5548 to understand the nature of the continuum driver. The positions of Mrk 142 on the scaling relations suggest that UV is closer to the “true” driving continuum than the optical. Furthermore, we obtain $\log (M_{•} / M_{⊙})$ = 6.32 ± 0.29 assuming UV as the primary driving continuum.
more » « less
Wide-area Measurement System-based Low Frequency Oscillation Damping Control through Reinforcement Learning

https://doi.org/10.1109/TSG.2020.3008364

Hashmy, Yousuf; Yu, Zhe; Shi, Di; Weng, Yang (July 2020, IEEE Transactions on Smart Grid)

Ensuring the stability of power systems is gaining more attention today than ever before due to the rapid growth of uncertainties in load and increased renewable energy penetration. Lately, wide-area measurement system (WAMS)-based centralized controlling techniques are offering flexibility and more robust control to keep the system stable. WAMS-based controlling techniques, however, face pressing challenges of irregular delays in long-distance communication channels and subsequent responses of equipment to control actions. This paper presents an innovative control strategy for damping down low-frequency oscillations in transmission systems. The method uses a reinforcement learning technique to overcome the challenges of communication delays and other non-linearity in wide-area damping control. It models the traditional problem of oscillation damping control as a novel faster exploration-based deep deterministic policy gradient (DDPG-S). An effective reward function is designed to capture necessary features of oscillations enabling timely damping of such oscillations, even under various kinds of uncertainties. A detailed analysis and a systematically designed numerical validation are presented to prove feasibility, scalability, interpretability, and comparative performance of the modelled low-frequency oscillation damping controller. The benefit of the technique is that stability is ensured even when uncertainties of load and generation are on the rise.
more » « less
Full Text Available
Joint Task Offloading and Resource Allocation in UAV-Enabled Mobile Edge Computing

https://doi.org/10.1109/JIOT.2020.2965898

Yu, Zhe; Gong, Yanmin; Gong, Shimin; Guo, Yuanxiong (April 2020, IEEE Internet of Things Journal)

Full Text Available
The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars

https://doi.org/10.1038/s41588-024-01695-w

Salojärvi, Jarkko; Rambani, Aditi; Yu, Zhe; Guyot, Romain; Strickler, Susan; Lepelley, Maud; Wang, Cui; Rajaraman, Sitaram; Rastas, Pasi; Zheng, Chunfang; et al (April 2024, Nature Genetics)

Abstract Coffea arabica, an allotetraploid hybrid ofCoffea eugenioidesandCoffea canephora, is the source of approximately 60% of coffee products worldwide, and its cultivated accessions have undergone several population bottlenecks. We present chromosome-level assemblies of a di-haploidC. arabicaaccession and modern representatives of its diploid progenitors,C. eugenioidesandC. canephora. The three species exhibit largely conserved genome structures between diploid parents and descendant subgenomes, with no obvious global subgenome dominance. We find evidence for a founding polyploidy event 350,000–610,000 years ago, followed by several pre-domestication bottlenecks, resulting in narrow genetic variation. A split between wild accessions and cultivar progenitors occurred ~30.5 thousand years ago, followed by a period of migration between the two populations. Analysis of modern varieties, including lines historically introgressed withC. canephora, highlights their breeding histories and loci that may contribute to pathogen resistance, laying the groundwork for future genomics-based breeding ofC. arabica.
more » « less
Full Text Available
Improving Vulnerability Inspection Efficiency Using Active Learning

https://doi.org/10.1109/TSE.2019.2949275

Yu, Zhe; Theisen, Christopher; Williams, Laurie; Menzies, Tim (October 2019, IEEE Transactions on Software Engineering)
null (Ed.)
Full Text Available

Search for: All records