skip to main content

Search for: All records

Creators/Authors contains: "Li, Xiaolin"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available August 2, 2024
  2. Free, publicly-accessible full text available March 1, 2024
  3. Aidong Zhang ; Huzefa Rangwala (Ed.)
    In many scenarios, 1) data streams are generated in real time; 2) labeled data are expensive and only limited labels are available in the beginning; 3) real-world data is not always i.i.d. and data drift over time gradually; 4) the storage of historical streams is limited. This learning setting limits the applicability and availability of many Machine Learning (ML) algorithms. We generalize the learning task under such setting as a semi-supervised drifted stream learning with short lookback problem (SDSL). SDSL imposes two under-addressed challenges on existing methods in semi-supervised learning and continuous learning: 1) robust pseudo-labeling under gradual shifts and 2) anti-forgetting adaptation with short lookback. To tackle these challenges, we propose a principled and generic generation-replay framework to solve SDSL. To achieve robust pseudo-labeling, we develop a novel pseudo-label classification model to leverage supervised knowledge of previously labeled data, unsupervised knowledge of new data, and, structure knowledge of invariant label semantics. To achieve adaptive anti-forgetting model replay, we propose to view the anti-forgetting adaptation task as a flat region search problem. We propose a novel minimax game-based replay objective function to solve the flat region search problem and develop an effective optimization solver. Experimental results demonstrate the effectiveness of the proposed method. 
    more » « less
  4. Taking an answer and its context as input, sequence-to-sequence models have made considerable progress on question generation. However, we observe that these approaches often generate wrong question words or keywords and copy answer-irrelevant words from the input. We believe that lacking global question semantics and exploiting answer position-awareness not well are the key root causes. In this paper, we propose a neural question generation model with two general modules: sentence-level semantic matching and answer position inferring. Further, we enhance the initial state of the decoder by leveraging the answer-aware gated fusion mechanism. Experimental results demonstrate that our model outperforms the state-of-the-art (SOTA) models on SQuAD and MARCO datasets. Owing to its generality, our work also improves the existing models significantly. 
    more » « less
  5. null (Ed.)
  6. null (Ed.)
  7. null (Ed.)
    A promising avenue for improving the effectiveness of behavioral-based malware detectors is to leverage two-phase detection mechanisms. Existing problem in two-phase detection is that after the first phase produces borderline decision, suspicious behaviors are not well contained before the second phase completes. This paper improves CHAMELEON, a framework to realize the uncertain environment. CHAMELEON offers two environments: standard–for software identified as benign by the first phase, and uncertain–for software received borderline classification from the first phase. The uncertain environment adds obstacles to software execution through random perturbations applied probabilistically. We introduce a dynamic perturbation threshold that can target malware disproportionately more than benign software. We analyzed the effects of the uncertain environment by manually studying 113 software and 100 malware, and found that 92% malware and 10% benign software disrupted during execution. The results were then corroborated by an extended dataset (5,679 Linux malware samples) on a newer system. Finally, a careful inspection of the benign software crashes revealed some software bugs, highlighting CHAMELEON's potential as a practical complementary antimalware solution. 
    more » « less
  8. Data-driven methods have attracted increasingly more attention in materials research since the advent of the material genome initiative. The combination of materials science with computer science, statistics, and data-driven methods aims to expediate materials research and applications and can utilize both new and archived research data. In this paper, we present a data driven and deep learning approach that builds a portion of the structure–property relationship for polymer nanocomposites. Analysis of archived experimental data motivates development of a computational model which allows demonstration of the approach and gives flexibility to sufficiently explore a wide range of structures. Taking advantage of microstructure reconstruction methods and finite element simulations, we first explore qualitative relationships between microstructure descriptors and mechanical properties, resulting in new findings regarding the interplay of interphase, volume fraction and dispersion. Then we present a novel deep learning approach that combines convolutional neural networks with multi-task learning for building quantitative correlations between microstructures and property values. The performance of the model is compared with other state-of-the-art strategies including two-point statistics and structure descriptor-based approaches. Lastly, the interpretation of the deep learning model is investigated to show that the model is able to capture physical understandings while learning. 
    more » « less