Graph-Based Bidirectional Transformer Decision Threshold Adjustment Algorithm for Class-Imbalanced Molecular Data

Hayes, Nicole; Merkurjev, Ekaterina; Wei, Guo-Wei

doi:10.1142/S2737416524500479

Citation Details

Graph-Based Bidirectional Transformer Decision Threshold Adjustment Algorithm for Class-Imbalanced Molecular Data

Data sets with imbalanced class sizes, where one class size is much smaller than that of others, occur exceedingly often in many applications, including those with biological foundations, such as disease diagnosis and drug discovery. Therefore, it is extremely important to be able to identify data elements of classes of various sizes, as a failure to do so can result in heavy costs. Nonetheless, many data classification procedures do not perform well on imbalanced data sets as they often fail to detect elements belonging to underrepresented classes. In this work, we propose the BTDT-MBO algorithm, incorporating Merriman–Bence–Osher (MBO) approaches and a bidirectional transformer, as well as distance correlation and decision threshold adjustments, for data classification tasks on highly imbalanced molecular data sets, where the sizes of the classes vary greatly. The proposed technique not only integrates adjustments in the classification threshold for the MBO algorithm in order to help deal with the class imbalance, but also uses a bidirectional transformer procedure based on an attention mechanism for self-supervised learning. In addition, the model implements distance correlation as a weight function for the similarity graph-based framework on which the adjusted MBO algorithm operates. The proposed method is validated using six molecular data sets and compared to other related techniques. The computational experiments show that the proposed technique is superior to competing approaches even in the case of a high class imbalance ratio. more »

Award ID(s):: 2052983

PAR ID:: 10616146

Author(s) / Creator(s):: Hayes, Nicole; Merkurjev, Ekaterina; Wei, Guo-Wei

Publisher / Repository:: Journal of Computational Biophysics and Chemistry

Date Published:: 2024-12-01

Journal Name:: Journal of Computational Biophysics and Chemistry

Volume:: 23

Issue:: 10

ISSN:: 2737-4165

Page Range / eLocation ID:: 1339 to 1358

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1142/S2737416524500479

More Like this