NSF PAR Search | NSF Public Access Repository

Geochemical data from ancient marine sediments are crucial for studying palaeo-environments, palaeo-climates, and elemental cycles. With increased accessibility to geochemical data, many databases have emerged. However, there remains a need for a more comprehensive database that focuses on deep-time marine sediment records. Here, we introduce the Deep-Time Marine Sedimentary Element Database (DM-SED). The DM-SED has been built upon the Sedimentary Geochemistry and Paleoenvironments Project (SGP) database with a new compilation of 34 874 data entries from 433 studies, totalling 63 627 entries. The DM-SED contains 2 522 255 discrete marine sedimentary data points, including major and trace elements and some stable isotopes. It includes 9207 entries from the Precambrian and 54 420 entries from the Phanerozoic, thus providing significant references for reconstructing deep-time Earth system evolution. The data files described in this paper are available at https://doi.org/10.5281/zenodo.14771859 (Lai et al., 2025).

Augmentation and evaluation of training data for deep learning

https://doi.org/10.1109/BigData.2017.8258220

Ding, Junhua; Li, XinChuan; Gudivada, Venkat N. (December 2017, 2017 IEEE International Conference on Big Data)

Deep learning is an important technique for extracting value from big data. However, the effectiveness of deep learning requires large volumes of high quality training data. In many cases, the size of training data is not large enough for effectively training a deep learning classifier. Data augmentation is a widely adopted approach for increasing the amount of training data. But the quality of the augmented data may be questionable. Therefore, a systematic evaluation of training data is critical. Furthermore, if the training data is noisy, it is necessary to separate out the noise data automatically. In this paper, we propose a deep learning classifier for automatically separating good training data from noisy data. To effectively train the deep learning classifier, the original training data need to be transformed to suit the input format of the classifier. Moreover, we investigate different data augmentation approaches to generate sufficient volume of training data from limited size original training data. We evaluated the quality of the training data through cross validation of the classification accuracy with different classification algorithms. We also check the pattern of each data item and compare the distributions of datasets. We demonstrate the effectiveness of the proposed approach through an experimental investigation of automated classification of massive biomedical images. Our approach is generic and is easily adaptable to other big data domains.

Full Text Available

Search for: All records