Search for: All records

Creators/Authors contains: "Xie, Pengtao"

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

BiDoRA: Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation

Qin, Peijia; Zhang, Ruiyi; Xie, Pengtao (August 2025, Transactions on machine learning research)

Free, publicly-accessible full text available August 30, 2026
TapWeight: Reweighting Pretraining Objectives for Task-Adaptive Pretraining

Zhang, Ruiyi; Somayajula, Sai Ashish; Xie, Pengtao (June 2025, Transactions on machine learning research)

Large-scale general domain pretraining followed by downstream-specific finetuning has become a predominant paradigm in machine learning. However, discrepancies between the pretraining and target domains can still lead to performance degradation in certain cases, underscoring the need for task-adaptive continued pretraining (TAP). TAP methods typically involve continued pretraining on task-specific unlabeled datasets or introducing additional unsupervised learning objectives to enhance model capabilities. While many TAP methods perform continued pretraining with multiple pretraining objectives, they often determine the tradeoff parameters between objectives manually, resulting in suboptimal outcomes and higher computational costs. In this paper, we propose TapWeight, a task-adaptive pretraining framework which automatically determines the optimal importance of each pretraining objective based on downstream feedback. TapWeight reweights each pretraining objective by solving a multi-level optimization problem. We applied TapWeight to both molecular property prediction and natural language processing tasks, significantly surpassing baseline methods. Experimental results validate the effectiveness and generalizability of TapWeight.
more » « less
Free, publicly-accessible full text available June 11, 2026
Generative AI enables medical image segmentation in ultra low-data regimes

https://doi.org/10.1038/s41467-025-61754-6

Zhang, Li; Jindal, Basu; Alaa, Ahmed; Weinreb, Robert; Wilson, David; Segal, Eran; Zou, James; Xie, Pengtao (July 2025, Nature Communications)

Semantic segmentation of medical images is pivotal in applications like disease diagnosis and treatment planning. While deep learning automates this task effectively, it struggles in ultra low-data regimes for the scarcity of annotated segmentation masks. To address this, we propose a generative deep learning framework that produces high-quality image-mask pairs as auxiliary training data. Unlike traditional generative models that separate data generation from model training, ours uses multi-level optimization for end-to-end data generation. This allows segmentation performance to guide the generation process, producing data tailored to improve segmentation outcomes. Our method demonstrates strong generalization across 11 medical image segmentation tasks and 19 datasets, covering various diseases, organs, and modalities. It improves performance by 10–20% (absolute) in both same- and out-of-domain settings and requires 8–20 times less training data than existing approaches. This greatly enhances the feasibility and cost-effectiveness of deep learning in data-limited medical imaging scenarios.
more » « less
Free, publicly-accessible full text available July 14, 2026
Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization

Guo, Han; Hosseini, Ramtin; Zhang, Ruiyi; Somayajula, Sai Ashish; Chowdhury, Ranak Roy; Gupta, Rajesh K; Xie, Pengtao (April 2025, Transactions on machine learning research)

Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning. It operates by randomly masking image patches and reconstructing these masked patches using the unmasked ones. A key limitation of MAE lies in its disregard for the varying informativeness of different patches, as it uniformly selects patches to mask. To overcome this, some approaches propose masking based on patch informativeness. However, these methods often do not consider the specific requirements of downstream tasks, potentially leading to suboptimal representations for these tasks. In response, we introduce the Multi-level Optimized Mask Autoencoder (MLO-MAE), a novel framework that leverages end-to-end feedback from downstream tasks to learn an optimal masking strategy during pretraining. Our experimental findings highlight MLO-MAE's significant advancements in visual representation learning. Compared to existing methods, it demonstrates remarkable improvements across diverse datasets and tasks, showcasing its adaptability and efficiency. Our code is available at https://github.com/Alexiland/MLO-MAE
more » « less
Free, publicly-accessible full text available April 11, 2026
Learning From Mistakes: A Multilevel Optimization Framework

Zhang, Li; Garg, Bhanu; Sridhara, Pradyumna; Hosseini, Ramtin; Xie, Pengtao (January 2025, IEEE transactions on artificial intelligence)

Bi-level optimization methods in machine learning are popularly effective in subdomains of neural architecture search, data reweighting, etc. However, most of these methods do not factor in variations in learning difficulty, which limits their performance in real-world applications. To address the above problems, we propose a framework that imitates the learning process of humans. In human learning, learners usually focus more on the topics where mistakes have been made in the past to deepen their understanding and master the knowledge. Inspired by this effective human learning technique, we propose a multilevel optimization framework, learning from mistakes (LFM), for machine learning. We formulate LFM as a three-stage optimization problem: 1) the learner learns, 2) the learner relearns based on the mistakes made before, and 3) the learner validates his learning. We develop an efficient algorithm to solve the optimization problem. We further apply our method to differentiable neural architecture search and data reweighting. Extensive experiments on CIFAR-10, CIFAR-100, ImageNet, and other related datasets powerfully demonstrate the effectiveness of our approach. The code of LFM is available at: https://github.com/importZL/LFM.
more » « less
Free, publicly-accessible full text available January 27, 2026
Transformer Architecture Search for Improving Out-of-Domain Generalization in Machine Translation

He, Yiheng; Zhang, Ruiyi; Somayajula, Sai Ashish; Xie, Pengtao (December 2024, Transactions on machine learning research)

Interest in automatically searching for Transformer neural architectures for machine translation (MT) has been increasing. Current methods show promising results in in-domain settings, where training and test data share the same distribution. However, in real-world MT applications, it is common that the test data has a different distribution than the training data. In these out-of-domain (OOD) situations, Transformer architectures optimized for the linguistic characteristics of the training sentences struggle to produce accurate translations for OOD sentences during testing. To tackle this issue, we propose a multi-level optimization based method to automatically search for neural architectures that possess robust OOD generalization capabilities. During the architecture search process, our method automatically synthesizes approximated OOD MT data, which is used to evaluate and improve the architectures' ability of generalizing to OOD scenarios. The generation of approximated OOD data and the search for optimal architectures are executed in an integrated, end-to-end manner. Evaluated across multiple datasets, our method demonstrates strong OOD generalization performance, surpassing state-of-the-art approaches.
more » « less
Full Text Available
Type Information Utilized Event Detection via Multi-Channel GNNs in Electrical Power Systems

https://doi.org/10.1145/3577031

Li, Qian; Li, Jianxin; Wang, Lihong; Ji, Cheng; Hei, Yiming; Sheng, Jiawei; Sun, Qingyun; Xue, Shan; Xie, Pengtao (August 2023, ACM Transactions on the Web)

Event detection in power systems aims to identify triggers and event types, which helps relevant personnel respond to emergencies promptly and facilitates the optimization of power supply strategies. However, the limited length of short electrical record texts causes severe information sparsity, and numerous domain-specific terminologies of power systems makes it difficult to transfer knowledge from language models pre-trained on general-domain texts. Traditional event detection approaches primarily focus on the general domain and ignore these two problems in the power system domain. To address the above issues, we propose a Multi-Channel graph neural network utilizing Type information for Event Detection in power systems, named MC-TED , leveraging a semantic channel and a topological channel to enrich information interaction from short texts. Concretely, the semantic channel refines textual representations with semantic similarity, building the semantic information interaction among potential event-related words. The topological channel generates a relation-type-aware graph modeling word dependencies, and a word-type-aware graph integrating part-of-speech tags. To further reduce errors worsened by professional terminologies in type analysis, a type learning mechanism is designed for updating the representations of both the word type and relation type in the topological channel. In this way, the information sparsity and professional term occurrence problems can be alleviated by enabling interaction between topological and semantic information. Furthermore, to address the lack of labeled data in power systems, we built a Chinese event detection dataset based on electrical Power Event texts, named PoE . In experiments, our model achieves compelling results not only on the PoE dataset, but on general-domain event detection datasets including ACE 2005 and MAVEN.
more » « less
Full Text Available
Brain tumor classification based on neural architecture search

https://doi.org/10.1038/s41598-022-22172-6

Chitnis, Shubham; Hosseini, Ramtin; Xie, Pengtao (December 2022, Scientific Reports)

Abstract Brain tumor is a life-threatening disease and causes about 0.25 million deaths worldwide in 2020. Magnetic Resonance Imaging (MRI) is frequently used for diagnosing brain tumors. In medically underdeveloped regions, physicians who can accurately diagnose and assess the severity of brain tumors from MRI are highly lacking. Deep learning methods have been developed to assist physicians in detecting brain tumors from MRI and determining their subtypes. In existing methods, neural architectures are manually designed by human experts, which is time-consuming and labor-intensive. To address this problem, we propose to automatically search for high-performance neural architectures for classifying brain tumors from MRIs, by leveraging a Learning-by-Self-Explanation (LeaSE) architecture search method. LeaSE consists of an explainer model and an audience model. The explainer aims at searching for a highly performant architecture by encouraging the architecture to generate high-fidelity explanations of prediction outcomes, where explanations’ fidelity is evaluated by the audience model. LeaSE is formulated as a four-level optimization problem involving a sequence of four learning stages which are conducted end-to-end. We apply LeaSE for MRI-based brain tumor classification, including four classes: glioma, meningioma, pituitary tumor, and healthy, on a dataset containing 3264 MRI images. Results show that our method can search for neural architectures that achieve better classification accuracy than manually designed deep neural networks while having fewer model parameters. For example, our method achieves a test accuracy of 90.6% and an AUC of 95.6% with 3.75M parameters while the accuracy and AUC of a human-designed network—ResNet101—is 84.5% and 90.1% respectively with 42.56M parameters. In addition, our method outperforms state-of-the-art neural architecture search methods.
more » « less
Full Text Available
Joint Self-Supervised Image-Volume Representation Learning with Intra-inter Contrastive Clustering

https://doi.org/10.1609/aaai.v37i12.26687

Nguyen, Duy M.; Nguyen, Hoang; Mai, Truong T.; Cao, Tri; Nguyen, Binh T.; Ho, Nhat; Swoboda, Paul; Albarqouni, Shadi; Xie, Pengtao; Sonntag, Daniel (June 2023, Proceedings of the AAAI Conference on Artificial Intelligence)

Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most current SSL techniques in the medical field have been designed for either 2D images or 3D volumes. In practice, this restricts the capability to fully leverage unlabeled data from numerous sources, which may include both 2D and 3D data. Additionally, the use of these pre-trained networks is constrained to downstream tasks with compatible data dimensions.In this paper, we propose a novel framework for unsupervised joint learning on 2D and 3D data modalities. Given a set of 2D images or 2D slices extracted from 3D volumes, we construct an SSL task based on a 2D contrastive clustering problem for distinct classes. The 3D volumes are exploited by computing vectored embedding at each slice and then assembling a holistic feature through deformable self-attention mechanisms in Transformer, allowing incorporating long-range dependencies between slices inside 3D volumes. These holistic features are further utilized to define a novel 3D clustering agreement-based SSL task and masking embedding prediction inspired by pre-trained language models. Experiments on downstream tasks, such as 3D brain segmentation, lung nodule detection, 3D heart structures segmentation, and abnormal chest X-ray detection, demonstrate the effectiveness of our joint 2D and 3D SSL approach. We improve plain 2D Deep-ClusterV2 and SwAV by a significant margin and also surpass various modern 2D and 3D SSL approaches.
more » « less
Full Text Available
Learning from Mistakes – a Framework for Neural Architecture Search

https://doi.org/10.1609/aaai.v36i9.21258

Garg, Bhanu; Zhang, Li; Sridhara, Pradyumna; Hosseini, Ramtin; Xing, Eric; Xie, Pengtao (June 2022, Proceedings of the AAAI Conference on Artificial Intelligence)

Learning from one's mistakes is an effective human learning technique where the learners focus more on the topics where mistakes were made, so as to deepen their understanding. In this paper, we investigate if this human learning strategy can be applied in machine learning. We propose a novel machine learning method called Learning From Mistakes (LFM), wherein the learner improves its ability to learn by focusing more on the mistakes during revision. We formulate LFM as a three-stage optimization problem: 1) learner learns; 2) learner re-learns focusing on the mistakes, and; 3) learner validates its learning. We develop an efficient algorithm to solve the LFM problem. We apply the LFM framework to neural architecture search on CIFAR-10, CIFAR-100, and Imagenet. Experimental results strongly demonstrate the effectiveness of our model.
more » « less
Full Text Available

« Prev Next »