NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Transfer Naïve Bayes Learning using Augmentation and Stacking for SMS Spam Detection

https://doi.org/10.1109/ICKG55886.2022.00042

Ulus, Cihan; Wang, Zhiqiang; Iqbal, Sheikh M.A.; Khan, K.Md.Salman; Zhu, Xingquan (November 2022, 2022 IEEE International Conference on Knowledge Graph (ICKG))

Full Text Available
NTP-Miner: Nonoverlapping Three-Way Sequential Pattern Mining

https://doi.org/10.1145/3480245

Wu, Youxi; Luo, Lanfang; Li, Yan; Guo, Lei; Fournier-Viger, Philippe; Zhu, Xingquan; Wu, Xindong (June 2022, ACM Transactions on Knowledge Discovery from Data)

Nonoverlapping sequential pattern mining is an important type of sequential pattern mining (SPM) with gap constraints, which not only can reveal interesting patterns to users but also can effectively reduce the search space using the Apriori (anti-monotonicity) property. However, the existing algorithms do not focus on attributes of interest to users, meaning that existing methods may discover many frequent patterns that are redundant. To solve this problem, this article proposes a task called nonoverlapping three-way sequential pattern (NTP) mining, where attributes are categorized according to three levels of interest: strong, medium, and weak interest. NTP mining can effectively avoid mining redundant patterns since the NTPs are composed of strong and medium interest items. Moreover, NTPs can avoid serious deviations (the occurrence is significantly different from its pattern) since gap constraints cannot match with strong interest patterns. To mine NTPs, an effective algorithm is put forward, called NTP-Miner, which applies two main steps: support (frequency occurrence) calculation and candidate pattern generation. To calculate the support of an NTP, depth-first and backtracking strategies are adopted, which do not require creating a whole Nettree structure, meaning that many redundant nodes and parent–child relationships do not need to be created. Hence, time and space efficiency is improved. To generate candidate patterns while reducing their number, NTP-Miner employs a pattern join strategy and only mines patterns of strong and medium interest. Experimental results on stock market and protein datasets show that NTP-Miner not only is more efficient than other competitive approaches but can also help users find more valuable patterns. More importantly, NTP mining has achieved better performance than other competitive methods in clustering tasks. Algorithms and data are available at: https://github.com/wuc567/Pattern-Mining/tree/master/NTP-Miner .
more » « less
Full Text Available
Cyberbullying and Cyberviolence Detection: A Triangular User-Activity-Content View

Wang, Shuwen Wang; Zhu, Xingquan; Ding, Weiping; Yengejeh, Amir A. (May 2022, IEEECAA journal of automatica sinica)

Full Text Available
Attraction and Repulsion: Unsupervised Domain Adaptive Graph Contrastive Learning Network

https://doi.org/10.1109/TETCI.2022.3156044

Wu, Man; Pan, Shirui; Zhu, Xingquan (March 2022, IEEE Transactions on Emerging Topics in Computational Intelligence)

Full Text Available
Feature-Attention Graph Convolutional Networks for Noise Resilient Learning

https://doi.org/10.1109/TCYB.2022.3143798

Shi, Min; Tang, Yufei; Zhu, Xingquan; Zhuang, Yuan; Lin, Maohua; Liu, Jianxun (January 2022, IEEE Transactions on Cybernetics)

Noise and inconsistency commonly exist in real-world information networks, due to the inherent error-prone nature of human or user privacy concerns. To date, tremendous efforts have been made to advance feature learning from networks, including the most recent graph convolutional networks (GCNs) or attention GCN, by integrating node content and topology structures. However, all existing methods consider networks as error-free sources and treat feature content in each node as independent and equally important to model node relations. Noisy node content, combined with sparse features, provides essential challenges for existing methods to be used in real-world noisy networks. In this article, we propose feature-based attention GCN (FA-GCN), a feature-attention graph convolution learning framework, to handle networks with noisy and sparse node content. To tackle noise and sparse content in each node, FA-GCN first employs a long short-term memory (LSTM) network to learn dense representation for each node feature. To model interactions between neighboring nodes, a feature-attention mechanism is introduced to allow neighboring nodes to learn and vary feature importance, with respect to their connections. By using a spectral-based graph convolution aggregation process, each node is allowed to concentrate more on the most determining neighborhood features aligned with the corresponding learning task. Experiments and validations, w.r.t. different noise levels, demonstrate that FA-GCN achieves better performance than the state-of-the-art methods in both noise-free and noisy network environments.
more » « less
Full Text Available
Physics-Informed Tensor-Train ConvLSTM for Volumetric Velocity Forecasting of the Loop Current

https://doi.org/10.3389/frai.2021.780271

Huang, Yu; Tang, Yufei; Zhuang, Hanqi; VanZwieten, James; Cherubin, Laurent (December 2021, Frontiers in Artificial Intelligence)

According to the National Academies, a week long forecast of velocity, vertical structure, and duration of the Loop Current (LC) and its eddies at a given location is a critical step toward understanding their effects on the gulf ecosystems as well as toward anticipating and mitigating the outcomes of anthropogenic and natural disasters in the Gulf of Mexico (GoM). However, creating such a forecast has remained a challenging problem since LC behavior is dominated by dynamic processes across multiple time and spatial scales not resolved at once by conventional numerical models. In this paper, building on the foundation of spatiotemporal predictive learning in video prediction, we develop a physics informed deep learning based prediction model called—Physics-informed Tensor-train ConvLSTM (PITT-ConvLSTM)—for forecasting 3D geo-spatiotemporal sequences. Specifically, we propose (1) a novel 4D higher-order recurrent neural network with empirical orthogonal function analysis to capture the hidden uncorrelated patterns of each hierarchy, (2) a convolutional tensor-train decomposition to capture higher-order space-time correlations, and (3) a mechanism that incorporates prior physics from domain experts by informing the learning in latent space. The advantage of our proposed approach is clear: constrained by the law of physics, the prediction model simultaneously learns good representations for frame dependencies (both short-term and long-term high-level dependency) and inter-hierarchical relations within each time frame. Experiments on geo-spatiotemporal data collected from the GoM demonstrate that the PITT-ConvLSTM model can successfully forecast the volumetric velocity of the LC and its eddies for a period greater than 1 week.
more » « less
Full Text Available
Deep learning data augmentation for Raman spectroscopy cancer tissue classification

https://doi.org/10.1038/s41598-021-02687-0

Wu, Man; Wang, Shuwen; Pan, Shirui; Terentis, Andrew C.; Strasswimmer, John; Zhu, Xingquan (December 2021, Scientific Reports)

Abstract Recently, Raman Spectroscopy (RS) was demonstrated to be a non-destructive way of cancer diagnosis, due to the uniqueness of RS measurements in revealing molecular biochemical changes between cancerous vs. normal tissues and cells. In order to design computational approaches for cancer detection, the quality and quantity of tissue samples for RS are important for accurate prediction. In reality, however, obtaining skin cancer samples is difficult and expensive due to privacy and other constraints. With a small number of samples, the training of the classifier is difficult, and often results in overfitting. Therefore, it is important to have more samples to better train classifiers for accurate cancer tissue classification. To overcome these limitations, this paper presents a novel generative adversarial network based skin cancer tissue classification framework. Specifically, we design a data augmentation module that employs a Generative Adversarial Network (GAN) to generate synthetic RS data resembling the training data classes. The original tissue samples and the generated data are concatenated to train classification modules. Experiments on real-world RS data demonstrate that (1) data augmentation can help improve skin cancer tissue classification accuracy, and (2) generative adversarial network can be used to generate reliable synthetic Raman spectroscopic data.
more » « less
Full Text Available
A Deep Learning Model for Forecasting Velocity Structures of the Loop Current System in the Gulf of Mexico

https://doi.org/10.3390/forecast3040056

Muhamed Ali, Ali; Zhuang, Hanqi; VanZwieten, James; Ibrahim, Ali K.; Chérubin, Laurent (December 2021, Forecasting)

Despite the large efforts made by the ocean modeling community, such as the GODAE (Global Ocean Data Assimilation Experiment), which started in 1997 and was renamed as OceanPredict in 2019, the prediction of ocean currents has remained a challenge until the present day—particularly in ocean regions that are characterized by rapid changes in their circulation due to changes in atmospheric forcing or due to the release of available potential energy through the development of instabilities. Ocean numerical models’ useful forecast window is no longer than two days over a given area with the best initialization possible. Predictions quickly diverge from the observational field throughout the water and become unreliable, despite the fact that they can simulate the observed dynamics through other variables such as temperature, salinity and sea surface height. Numerical methods such as harmonic analysis are used to predict both short- and long-term tidal currents with significant accuracy. However, they are limited to the areas where the tide was measured. In this study, a new approach to ocean current prediction based on deep learning is proposed. This method is evaluated on the measured energetic currents of the Gulf of Mexico circulation dominated by the Loop Current (LC) at multiple spatial and temporal scales. The approach taken herein consists of dividing the velocity tensor into planes perpendicular to each of the three Cartesian coordinate system directions. A Long Short-Term Memory Recurrent Neural Network, which is best suited to handling long-term dependencies in the data, was thus used to predict the evolution of the velocity field in each plane, along each of the three directions. The predicted tensors, made of the planes perpendicular to each Cartesian direction, revealed that the model’s prediction skills were best for the flow field in the planes perpendicular to the direction of prediction. Furthermore, the fusion of all three predicted tensors significantly increased the overall skills of the flow prediction over the individual model’s predictions. The useful forecast period of this new model was greater than 4 days with a root mean square error less than 0.05 cm·s−1 and a correlation coefficient of 0.6.
more » « less
Full Text Available
Predictive modeling of clinical trial terminations using feature engineering and embedding learning

https://doi.org/10.1038/s41598-021-82840-x

Elkin, Magdalyn E.; Zhu, Xingquan (December 2021, Scientific Reports)
null (Ed.)
Abstract In this study, we propose to use machine learning to understand terminated clinical trials. Our goal is to answer two fundamental questions: (1) what are common factors/markers associated to terminated clinical trials? and (2) how to accurately predict whether a clinical trial may be terminated or not? The answer to the first question provides effective ways to understand characteristics of terminated trials for stakeholders to better plan their trials; and the answer to the second question can direct estimate the chance of success of a clinical trial in order to minimize costs. By using 311,260 trials to build a testbed with 68,999 samples, we use feature engineering to create 640 features, reflecting clinical trial administration, eligibility, study information, criteria etc. Using feature ranking, a handful of features, such as trial eligibility, trial inclusion/exclusion criteria, sponsor types etc. , are found to be related to the clinical trial termination. By using sampling and ensemble learning, we achieve over 67% Balanced Accuracy and over 0.73 AUC (Area Under the Curve) scores to correctly predict clinical trial termination, indicating that machine learning can help achieve satisfactory prediction results for clinical trial study.
more » « less
Full Text Available
Community and topic modeling for infectious disease clinical trial recommendation

https://doi.org/10.1007/s13721-021-00321-7

Elkin, Magdalyn E.; Zhu, Xingquan (December 2021, Network Modeling Analysis in Health Informatics and Bioinformatics)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records