- NSF-PAR ID:
- 10194010
- Date Published:
- Journal Name:
- ISPRS international journal of geoinformation
- Volume:
- 8
- Issue:
- 11
- ISSN:
- 2220-9964
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
null (Ed.)Sea ice acts as both an indicator and an amplifier of climate change. High spatial resolution (HSR) imagery is an important data source in Arctic sea ice research for extracting sea ice physical parameters, and calibrating/validating climate models. HSR images are difficult to process and manage due to their large data volume, heterogeneous data sources, and complex spatiotemporal distributions. In this paper, an Arctic Cyberinfrastructure (ArcCI) module is developed that allows a reliable and efficient on-demand image batch processing on the web. For this module, available associated datasets are collected and presented through an open data portal. The ArcCI module offers an architecture based on cloud computing and big data components for HSR sea ice images, including functionalities of (1) data acquisition through File Transfer Protocol (FTP) transfer, front-end uploading, and physical transfer; (2) data storage based on Hadoop distributed file system and matured operational relational database; (3) distributed image processing including object-based image classification and parameter extraction of sea ice features; (4) 3D visualization of dynamic spatiotemporal distribution of extracted parameters with flexible statistical charts. Arctic researchers can search and find arctic sea ice HSR image and relevant metadata in the open data portal, obtain extracted ice parameters, and conduct visual analytics interactively. Users with large number of images can leverage the service to process their image in high performance manner on cloud, and manage, analyze results in one place. The ArcCI module will assist domain scientists on investigating polar sea ice, and can be easily transferred to other HSR image processing research projects.more » « less
-
We overview CiteSeerX, the pioneer digital library search engine, that has been serving academic communities for more than 20 years (first released in 1998), from three perspectives. The system perspective summarizes its architecture evolution in three phases over the past 20 years. The data perspective describes how CiteSeerX has created searchable scholarly big datasets and made them freely available for multiple purposes. In order to be scalable and effective, AI technologies are employed in all essential modules. To effectively train these models, a sufficient amount of data has been labeled, which can then be reused for training future models. Finally, we discuss the future of CiteSeerX. Our ongoing work is to make Cite- SeerX more sustainable. To this end, we are working to ingest all open access scholarly papers, estimated to be 30-40 million. Part of the plan is to discover dataset mentions and metadata in scholarly articles and make them more accessible via search interfaces. Users will have more opportunities to explore and trace datasets that can be reused and discover other datasets for new research projects. We summarize what was learned to make a similar system more sustainable and useful.more » « less
-
Abstract Background The proliferation of metagenomic sequencing technologies has enabled novel insights into the functional genomic potentials and taxonomic structure of microbial communities. However, cyberinfrastructure efforts to manage and enable the reproducible analysis of sequence data have not kept pace. Thus, there is increasing recognition of the need to make metagenomic data discoverable within machine-searchable frameworks compliant with the FAIR (Findability, Accessibility, Interoperability, and Reusability) principles for data stewardship. Although a variety of metagenomic web services exist, none currently leverage the hierarchically structured terminology encoded within common life science ontologies to programmatically discover data.
Results Here, we integrate large-scale marine metagenomic datasets with community-driven life science ontologies into a novel FAIR web service. This approach enables the retrieval of data discovered by intersecting the knowledge represented within ontologies against the functional genomic potential and taxonomic structure computed from marine sequencing data. Our findings highlight various microbial functional and taxonomic patterns relevant to the ecology of prokaryotes in various aquatic environments.
Conclusions In this work, we present and evaluate a novel Semantic Web architecture that can be used to ask novel biological questions of existing marine metagenomic datasets. Finally, the FAIR ontology searchable data products provided by our API can be leveraged by future research efforts.
-
Abstract Studying past climate variability is fundamental to our understanding of current changes. In the era of Big Data, the value of paleoclimate information critically depends on our ability to analyze large volume of data, which itself hinges on standardization. Standardization also ensures that these datasets are more Findable, Accessible, Interoperable, and Reusable. Building upon efforts from the paleoclimate community to standardize the format, terminology, and reporting of paleoclimate data, this article describes PaleoRec, a recommender system for the annotation of such datasets. The goal is to assist scientists in the annotation task by reducing and ranking relevant entries in a drop-down menu. Scientists can either choose the best option for their metadata or enter the appropriate information manually. PaleoRec aims to reduce the time to science while ensuring adherence to community standards. PaleoRec is a type of sequential recommender system based on a recurrent neural network that takes into consideration the short-term interest of a user in a particular dataset. The model was developed using 1996 expert-annotated datasets, resulting in 6,512 sequences. The performance of the algorithm, as measured by the Hit Ratio, varies between 0.7 and 1.0. PaleoRec is currently deployed on a web interface used for the annotation of paleoclimate datasets using emerging community standards.more » « less
-
null (Ed.)ncreasingly, large collections of datasets are made available to the public via the Web, ranging from government-curated datasets like those of data.gov to communally-sourced datasets such as Wikipedia tables. It has become clear that traditional search techniques are insufficient for such sources, especially when the user is unfamiliar with the terminology used by the creators of the relevant datasets. We propose to address this problem by elevating the datum to a first-class object that is indexed, thereby making it less dependent on how a dataset is structured. In a data table, a cell contains a value for a particular row as described by a particular column. In our cell-centric indexing approach, we index the metadata of each cell, so that information about its dataset and column simply become metadata rather than constraining concepts. In this paper we define cell-centric indexing and present a system architecture that supports its use in exploring datasets. We describe how cell-centric indexing can be implemented using traditional information retrieval technology and evaluate the scalability of the architecture.more » « less