skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Li, Xinchuan"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Geochemical data from ancient marine sediments are crucial for studying palaeo-environments, palaeo-climates, and elemental cycles. With increased accessibility to geochemical data, many databases have emerged. However, there remains a need for a more comprehensive database that focuses on deep-time marine sediment records. Here, we introduce the Deep-Time Marine Sedimentary Element Database (DM-SED). The DM-SED has been built upon the Sedimentary Geochemistry and Paleoenvironments Project (SGP) database with a new compilation of 34 874 data entries from 433 studies, totalling 63 627 entries. The DM-SED contains 2 522 255 discrete marine sedimentary data points, including major and trace elements and some stable isotopes. It includes 9207 entries from the Precambrian and 54 420 entries from the Phanerozoic, thus providing significant references for reconstructing deep-time Earth system evolution. The data files described in this paper are available at https://doi.org/10.5281/zenodo.14771859 (Lai et al., 2025). 
    more » « less
    Free, publicly-accessible full text available January 1, 2026
  2. Deep learning is an important technique for extracting value from big data. However, the effectiveness of deep learning requires large volumes of high quality training data. In many cases, the size of training data is not large enough for effectively training a deep learning classifier. Data augmentation is a widely adopted approach for increasing the amount of training data. But the quality of the augmented data may be questionable. Therefore, a systematic evaluation of training data is critical. Furthermore, if the training data is noisy, it is necessary to separate out the noise data automatically. In this paper, we propose a deep learning classifier for automatically separating good training data from noisy data. To effectively train the deep learning classifier, the original training data need to be transformed to suit the input format of the classifier. Moreover, we investigate different data augmentation approaches to generate sufficient volume of training data from limited size original training data. We evaluated the quality of the training data through cross validation of the classification accuracy with different classification algorithms. We also check the pattern of each data item and compare the distributions of datasets. We demonstrate the effectiveness of the proposed approach through an experimental investigation of automated classification of massive biomedical images. Our approach is generic and is easily adaptable to other big data domains. 
    more » « less