Online Anomaly Detection (OAD) is critical for identifying rare yet important data points in large, dynamic, and complex data streams. A key challenge lies in achieving accurate and consistent detection of anomalies while maintaining computational and memory efficiency. Conventional OAD approaches, which depend on distributional deviations and static thresholds, struggle with model update delays and catastrophic forgetting, leading to missed detections and high false positive rates. To address these limitations, we propose a novel Streaming Anomaly Detection (SAD) method, grounded in a sparse active online learning framework. Our approach uniquely integrates ℓ1,2-norm sparse online learning with CUR decomposition-based active learning, enabling simultaneous fast feature selection and dynamic instance selection. The efficient CUR decomposition further supports real-time residual analysis for anomaly scoring, eliminating the need for manual threshold settings about temporal data distributions. Extensive experiments on diverse streaming datasets demonstrate SAD's superiority, achieving a 14.06% reduction in detection error rates compared to five state-of-the-art competitors.
more »
« less
Fisher ratio feature selection by manual peak area calculations on comprehensive two-dimensional gas chromatography data using standard mixtures with variable composition, storage, and interferences
Comprehensive two-dimensional gas chromatography (GC×GC) is becoming increasingly more common for non-targeted characterization of complex volatile mixtures. The information gained with higher peak capacity and sensitivity provides additional sample composition information when one-dimensional GC is not adequate. GC×GC generates complex multivariate data sets when using non-targeted analysis to discover analytes. Fisher ratio (FR) analysis is applied to discern class markers, limiting complex GC×GC profiles to the most discriminating compounds between classes. While many approaches for feature selection using FR analysis exist, FR can be calculated relatively easily directly on peak areas after any native software has performed peak detection. This study evaluated the success rates of manual FR calculation and comparison to a critical F-value for samples analyzed by GC×GC with defined concentration differences. Long-term storage of samples and other spiked interferences were also investigated to examine their impact on analyzing mixtures using this FR feature selection strategy. Success rates were generally high with mostly 90-100% success rates and some instances of percentages between 80 and 90%. There were rare cases of false positives present and a low occurrence of false negatives. When errors were made in the selection of a compound, it was typically due to chromatographic artifacts present in chromatograms and not from the FR approach itself. This work provides foundational experimental data on the use of manual FR calculations for feature selection from GC×GC data.
more »
« less
- Award ID(s):
- 1752607
- PAR ID:
- 10405161
- Date Published:
- Journal Name:
- Analytical and Bioanalytical Chemistry
- ISSN:
- 1618-2642
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
RationaleSilicone wristbands have emerged as valuable passive samplers for monitoring of personal exposure to environmental contaminants in the rapidly developing field ofexposomics. Once deployed, silicone wristbands collect and hold a wealth of chemical information that can be interrogated using high‐resolution mass spectrometry (HRMS) to provide a broad coverage of chemical mixtures. MethodsGas chromatography coupled to Orbitrap™ mass spectrometry (GC/Orbitrap™ MS) was used to simultaneously perform suspect screening (using in‐house database) and unknown screening (using vendor databases) of extracts from wristbands worn by volunteers. The goal of this study was to optimize a workflow that allows detection of low levels of priority pollutants, with high reliability. In this regard, a data processing workflow for GC/Orbitrap™ MS was developed using a mixture of 123 environmentally relevant standards consisting of pesticides, flame retardants, organophosphate esters, and polycyclic aromatic hydrocarbons as test compounds. ResultsThe optimized unknown screening workflow using a search index threshold of 750 resulted in positive identification of 70 analytes in validation samples, and a reduction in the number of false positives by over 50%. An average of 26 compounds with high confidence identification, 7 level 1 compounds and 19 level 2 compounds, were observed in worn wristbands. The data were further analyzed via suspect screening and retrospective suspect screening to identify an additional 36 compounds. ConclusionsThis study provides three important findings: (1) a clear evidence of the importance of sample cleanup in addressing complex sample matrices for unknown analysis, (2) a valuable workflow for the identification of unknown contaminants in silicone wristband samplers using electron ionization HRMS data, and (3) a novel application of GC/Orbitrap™ MS for the unknown analysis of organic contaminants that can be used in exposomics studies.more » « less
-
Abstract Phages used for phage therapy of multidrug resistant bacteria must be highly purified prior to use. There are limited purification approaches that are broadly applicable to many phage types. Electrokinetics has shown great potential to manipulate phages, but obstructions from the cell debris produced during phage propagation can severely diminish the capacity of an electrokinetic device to concentrate and purify phage samples. A multipart insulator‐based electrokinetic device is proposed here to remove the larger, undesirable components of mixtures from phage preparations while transferring the freshly purified and concentrated sample to a second stage for downstream analysis. By combining the large debris prescreen and analysis stages in a streamlined system, this approach simultaneously reduces the impact of clogging and minimizes the sample loss observed during manual transferring of purified samples. Polystyrene particles were used to demonstrate a diminished sample loss of approximately one order of magnitude when using the cascade device as opposed to a manual transfer scheme. The purification and concentration of three different phage samples were demonstrated using the first stage of the cascade device as a prescreen. This design provides a simple method of purifying and concentrating valuable samples from a complex mixture that might impede separation capacity in a single channel.more » « less
-
Abstract Purpose. This study aims to develop and validate a multi-view learning method by the combination of primary tumor radiomics and lymph node (LN) radiomics for the preoperative prediction of LN status in gastric cancer (GC). Methods. A total of 170 contrast-enhanced abdominal CT images from GC patients were enrolled in this retrospective study. After data preprocessing, two-step feature selection approach including Pearson correlation analysis and supervised feature selection method based on test-time budget (FSBudget) was performed to remove redundance of tumor and LN radiomics features respectively. Two types of discriminative features were then learned by an unsupervised multi-view partial least squares (UMvPLS) for a latent common space on which a logistic regression classifier is trained. Five repeated random hold-out experiments were employed. Results. On 20-dimensional latent common space, area under receiver operating characteristic curve (AUC), precision, accuracy, recall and F1-score are 0.9531 ± 0.0183, 0.9260 ± 0.0184, 0.9136 ± 0.0174, 0.9468 ± 0.0106 and 0.9362 ± 0.0125 for the training cohort respectively, and 0.8984 ± 0.0536, 0.8671 ± 0.0489, 0.8500 ± 0.0599, 0.9118 ± 0.0550 and 0.8882 ± 0.0440 for the validation cohort respectively (reported as mean ± standard deviation). It shows a better discrimination capability than single-view methods, our previous method, and eight baseline methods. When the dimension was reduced to 2, the model not only has effective prediction performance, but also is convenient for data visualization. Conclusions. Our proposed method by integrating radiomics features of primary tumor and LN can be helpful in predicting lymph node metastasis in patients of GC. It shows multi-view learning has great potential for guiding the prognosis and treatment decision-making in GC.more » « less
-
Identifying the directed connectivity that underlie networked activity between different cortical areas is critical for understanding the neural mechanisms behind sensory processing. Granger causality (GC) is widely used for this purpose in functional magnetic resonance imaging analysis, but there the temporal resolution is low, making it difficult to capture the millisecond-scale interactions underlying sensory processing. Magne- toencephalography (MEG) has millisecond resolution, but only provides low-dimensional sensor-level linear mixtures of neural sources, which makes GC inference challenging. Conventional methods proceed in two stages: First, cortical sources are estimated from MEG using a source localization technique, followed by GC inference among the estimated sources. However, the spatiotemporal biases in estimating sources propagate into the subsequent GC analysis stage, may result in both false alarms and missing true GC links. Here, we introduce the Network Localized Granger Causality (NLGC) inference paradigm, which models the source dynamics as latent sparse multivariate autoregressive processes and estimates their parameters directly from the MEG measurements, integrated with source localization, and employs the resulting parameter estimates to produce a precise statistical characterization of the detected GC links. We offer several theoretical and algorithmic innovations within NLGC and further examine its utility via comprehensive simulations and application to MEG data from an auditory task involving tone processing from both younger and older participants. Our simulation studies reveal that NLGC is markedly robust with respect to model mismatch, network size, and low signal-to-noise ratio, whereas the conventional two-stage methods result in high false alarms and mis-detections. We also demonstrate the advantages of NLGC in revealing the cortical network- level characterization of neural activity during tone processing and resting state by delineating task- and age-related connectivity changes.more » « less
An official website of the United States government

