skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Chen, Xinyu"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. This study addresses the problem of convolutional kernel learning in univariate, multivariate, and multidimensional time series data, which is crucial for interpreting temporal patterns in time series and supporting downstream machine learning tasks. First, we propose formulating convolutional kernel learning for univariate time series as a sparse regression problem with a non-negative constraint, leveraging the properties of circular convolution and circulant matrices. Second, to generalize this approach to multivariate and multidimensional time series data, we use tensor computations, reformulating the convolutional kernel learning problem in the form of tensors. This is further converted into a standard sparse regression problem through vectorization and tensor unfolding operations. In the proposed methodology, the optimization problem is addressed using the existing non-negative subspace pursuit method, enabling the convolutional kernel to capture temporal correlations and patterns. To evaluate the proposed model, we apply it to several real-world time series datasets. On the multidimensional ridesharing and taxi trip data from New York City and Chicago, the convolutional kernels reveal interpretable local correlations and cyclical patterns, such as weekly seasonality. For the monthly temperature time series data in North America, the proposed model can quantify the yearly seasonality and make it comparable across different decades. In the context of multidimensional fluid flow data, both local and nonlocal correlations captured by the convolutional kernels can reinforce tensor factorization, leading to performance improvements in fluid flow reconstruction tasks. Thus, this study lays an insightful foundation for automatically learning convolutional kernels from time series data, with an emphasis on interpretability through sparsity and non-negativity constraints. 
    more » « less
    Free, publicly-accessible full text available June 1, 2026
  2. Spatiotemporal systems are ubiquitous in a large number of scientific areas, representing underlying knowledge and patterns in the data. Here, a fundamental question usually arises as how to understand and characterize these spatiotemporal systems with a certain data-driven machine learning framework. In this work, we introduce an unsupervised pattern discovery framework, namely, dynamic autoregressive tensor factorization. Our framework is essentially built on the fact that the spatiotemporal systems can be well described by the time-varying autoregression on multivariate or even multidimensional data. In the modeling process, tensor factorization is seamlessly integrated into the time-varying autoregression for discovering spatial and temporal modes/patterns from the spatiotemporal systems in which the spatial factor matrix is assumed to be orthogonal. To evaluate the framework, we apply it to several real-world spatiotemporal datasets, including fluid flow dynamics, international import/export merchandise trade, and urban human mobility. On the international trade dataset with dimensions {country/region, product type, year}, our framework can produce interpretable import/export patterns of countries/regions, while the low-dimensional product patterns are also important for classifying import/export merchandise and understanding systematical differences between import and export. On the ridesharing mobility dataset with dimensions {origin, destination, time}, our framework is helpful for identifying the shift of spatial patterns of urban human mobility that changed between 2019 and 2022. Empirical experiments demonstrate that our framework can discover interpretable and meaningful patterns from the spatiotemporal systems that are both time-varying and multidimensional. 
    more » « less
    Free, publicly-accessible full text available June 4, 2026
  3. Free, publicly-accessible full text available May 6, 2026
  4. Spatiotemporal traffic data imputation is of great significance in intelligent transportation systems and data-driven decision-making processes. To perform efficient learning and accurate reconstruction from partially observed traffic data, we assert the importance of characterizing both global and local trends in time series. In the literature, substantial works have demonstrated the effectiveness of utilizing the low-rank property of traffic data by matrix/tensor completion models. In this study, we first introduce a Laplacian kernel to temporal regularization for characterizing local trends in traffic time series, which can be formulated as a circular convolution. Then, we develop a low-rank Laplacian convolutional representation (LCR) model by putting the circulant matrix nuclear norm and the Laplacian kernelized temporal regularization together, which is proved to meet a unified framework that has a fast Fourier transform (FFT) solution in log-linear time complexity. Through extensive experiments on several traffic datasets, we demonstrate the superiority of LCR over several baseline models for imputing traffic time series of various time series behaviors (e.g., data noises and strong/weak periodicity) and reconstructing sparse speed fields of vehicular traffic flow. The proposed LCR model is also an efficient solution to large-scale traffic data imputation over the existing imputation models. 
    more » « less
  5. While both the database and high-performance computing (HPC) communities utilize lossless compression methods to minimize floating-point data size, a disconnect persists between them. Each community designs and assesses methods in a domain-specific manner, making it unclear if HPC compression techniques can benefit database applications or vice versa. With the HPC community increasingly leaning towards in-situ analysis and visualization, more floating-point data from scientific simulations are being stored in databases like Key-Value Stores and queried using in-memory retrieval paradigms. This trend underscores the urgent need for a collective study of these compression methods' strengths and limitations, not only based on their performance in compressing data from various domains but also on their runtime characteristics. Our study extensively evaluates the performance of eight CPU-based and five GPU-based compression methods developed by both communities, using 33 real-world datasets assembled in the Floating-point Compressor Benchmark (FCBench). Additionally, we utilize the roofline model to profile their runtime bottlenecks. Our goal is to offer insights into these compression methods that could assist researchers in selecting existing methods or developing new ones for integrated database and HPC applications. 
    more » « less
  6. Chronic myeloid leukemia (CML) is treated with tyrosine kinase inhibitors (TKI) that target the pathological BCR-ABL1 fusion oncogene. The objective of this statistical meta-analysis was to assess the prevalence of other hematological adverse events (AEs) that occur during or after predominantly first-line treatment with TKIs. Data from seventy peer-reviewed, published studies were included in the analysis. Hematological AEs were assessed as a function of TKI drug type (dasatinib, imatinib, bosutinib, nilotinib) and CML phase (chronic, accelerated, blast). AE prevalence aggregated across all severities and phases was significantly different between each TKI (p < 0.05) for anemia—dasatinib (54.5%), bosutinib (44.0%), imatinib (32.8%), nilotinib (11.2%); neutropenia—dasatinib (51.2%), imatinib (29.8%), bosutinib (14.1%), nilotinib (14.1%); thrombocytopenia—dasatinib (62.2%), imatinib (30.4%), bosutinib (35.3%), nilotinib (22.3%). AE prevalence aggregated across all severities and TKIs was significantly (p < 0.05) different between CML phases for anemia—chronic (28.4%), accelerated (66.9%), blast (55.8%); neutropenia—chronic (26.7%), accelerated (63.8%), blast (36.4%); thrombocytopenia—chronic (33.3%), accelerated (65.6%), blast (37.9%). An odds ratio (OR) with 95% confidence interval was used to compare hematological AE prevalence of each TKI compared to the most common first-line TKI therapy, imatinib. For anemia, dasatinib OR = 1.65, [1.51, 1.83]; bosutinib OR = 1.34, [1.16, 1.54]; nilotinib OR = 0.34, [0.30, 0.39]. For neutropenia, dasatinib OR = 1.72, [1.53, 1.92]; bosutinib OR = 0.47, [0.38, 0.58]; nilotinib OR = 0.47, [0.42, 0.54]. For thrombocytopenia, dasatinib OR = 2.04, [1.82, 2.30]; bosutinib OR = 1.16, [0.97, 1.39]; nilotinib OR = 0.73, [0.65, 0.82]. Nilotinib had the greatest fraction of severe (grade 3/4) hematological AEs (30%). In conclusion, the overall prevalence of hematological AEs by TKI type was: dasatinib > bosutinib > imatinib > nilotinib. Study limitations include inability to normalize for dosage and treatment duration. 
    more » « less
  7. Influence maximization aims to select k most-influential vertices or seeds in a network, where influence is defined by a given diffusion process. Although computing optimal seed set is NP-Hard, efficient approximation algorithms exist. However, even state-of-the-art parallel implementations are limited by a sampling step that incurs large memory footprints. This in turn limits the problem size reach and approximation quality. In this work, we study the memory footprint of the sampling process collecting reverse reachability information in the IMM (Influence Maximization via Martingales) algorithm over large real-world social networks. We present a memory-efficient optimization approach (called HBMax) based on Ripples, a state-of-the-art multi-threaded parallel influence maximization solution. Our approach, HBMax, uses a portion of the reverse reachable (RR) sets collected by the algorithm to learn the characteristics of the graph. Then, it compresses the intermediate reverse reachability information with Huffman coding or bitmap coding, and queries on the partially decoded data, or directly on the compressed data to preserve the memory savings obtained through compression. Considering a NUMA architecture, we scale up our solution on 64 CPU cores and reduce the memory footprint by up to 82.1% with average 6.3% speedup (encoding overhead is offset by performance gain from memory reduction) without loss of accuracy. For the largest tested graph Twitter7 (with 1.4 billion edges), HBMax achieves 5.9× compression ratio and 2.2× speedup. 
    more » « less