Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract The size distribution of submicron particles is essential for understanding their biogeochemical and optical roles, but it has seldom been measured. This study utilizes ViewSizer 3000, an instrument that tracks Brownian motions of particles, to measure the particle size distributions (PSD) from 250 to 1,050 nm in the North Pacific Ocean (NP) and the North Atlantic Ocean (NA) at depths from 5 to 500 m. The concentration of particles varies over one order of magnitude at any given size bin, with greater variations up to two orders of magnitude at sizes >600 nm. In both locations, concentrations decrease with depth. Bacterioplankton are a dominant component, accounting for 65%–90% of the submicron particles in the surface waters (<100 m) and approximately 30%–40% at depths >150 m at both sites. In the NP, the volume mean diameter increased approximately 5% from the morning to noon at the surface, probably resulting from the diurnal growth of bacterioplankton. In the NA, the concentration and mean size increased by >60% and ∼10% respectively after one storm that introduced a different particle population into the study area.more » « lessFree, publicly-accessible full text available October 1, 2025
-
We released open-source software Hadoop-GIS in 2011, and presented and published the work in VLDB 2013. This work initiated the development of a new spatial data analytical ecosystem characterized by its large-scale capacity in both computing and data storage, high scalability, compatibility with low-cost commodity processors in clusters and open-source software. After more than a decade of research and development, this ecosystem has matured and is now serving many applications across various fields. In this paper, we provide the background on why we started this project and give an overview of the original Hadoop-GIS software architecture, along with its unique technical contributions and legacy. We present the evolution of the ecosystem and its current state-of-the- art, which has been influenced by the Hadoop-GIS project. We also describe the ongoing efforts to further enhance this ecosystem with hardware accelerations to meet the increasing demands for low latency and high throughput in various spatial data analysis tasks. Finally, we will summarize the insights gained and lessons learned over more than a decade in pursuing high-performance spatial data analytics.more » « lessFree, publicly-accessible full text available August 26, 2025
-
We released open-source software Hadoop-GIS in 2011, and presented and published the work in VLDB 2013. This work initiated the development of a new spatial data analytical ecosystem characterized by its large-scale capacity in both computing and data storage, high scalability, compatibility with low-cost commodity processors in clusters and open-source software. After more than a decade of research and development, this ecosystem has matured and is now serving many applications across various fields. In this paper, we provide the background on why we started this project and give an overview of the original Hadoop-GIS software architecture, along with its unique technical contributions and legacy. We present the evolution of the ecosystem and its current state-of the-art, which has been influenced by the Hadoop-GIS project. We also describe the ongoing efforts to further enhance this ecosystem with hardware accelerations to meet the increasing demands for low latency and high throughput in various spatial data analysis tasks. Finally, we will summarize the insights gained and lessons learned over more than a decade in pursuing high-performance spatial data analytics.more » « lessFree, publicly-accessible full text available August 26, 2025
-
Free, publicly-accessible full text available May 30, 2025
-
Fixed-point decimal operations in databases with arbitrary-precision arithmetic refer to the ability to store and operate decimal fraction numbers with an arbitrary length of digits. This type of operation has become a requirement for many applications, including scientific databases, financial data processing, geometric data processing, and cryptography. However, the state-of-the-art fixed-point decimal technology either provides high performance for low-precision operations or supports arbitrary-precision arithmetic operations at low performance. In this paper, we present a design and implementation of a framework called UltraPrecise which supports arbitraryprecision arithmetic for databases on GPU, aiming to gain high performance for arbitrary-precision arithmetic operations. We build our framework based on the just-in-time compilation technique and optimize its performance via data representation design, PTX acceleration, and expression scheduling. UltraPrecise achieves comparable performance to other high-performance databases for low-precision arithmetic operations. For highprecision, we show that UltraPrecise consistently outperforms existing databases by two orders of magnitude, including workloads of RSA encryption and trigonometric function approximation.more » « lessFree, publicly-accessible full text available May 13, 2025
-
The tree edit distance (TED) has been found in a wide spectrum of applications in artificial intelligence, bioinformatics, and other areas, which serves as a metric to quantify the dissimilarity between two trees. As applications continue to scale in data size, with a growing demand for fast response time, TED has become even more increasingly data- and computing-intensive. Over the years, researchers have made dedicated efforts to improve sequential TED algorithms by reducing their high complexity. However, achieving efficient parallel TED computation in both algorithm and implementation is challenging due to its dynamic programming nature involving non-trivial issues of data dependency, runtime execution pattern changes, and optimal utilization of limited parallel resources. Having comprehensively investigated the bottlenecks in the existing parallel TED algorithms, we develop a massive parallel computation framework for TED and its implementation on GPU, which is called X-TED. For a given TED computation, X-TED applies a fast preprocessing algorithm to identify dependency relationships among millions of dynamic programming tables. Subsequently, it adopts a dynamic parallel strategy to handle various processing stages, aiming to best utilize GPU cores and the limited device memory in an adaptive and automatic way. Our intensive experimental results demonstrate that X-TED surpasses all existing solutions, achieving up to 42x speedup over the state-of-the-art sequential AP-TED, and outperforming the existing multicore parallel MC-TED by an average speedup of 31x.more » « less
-
Indexing is a core technique for accelerating predicate evaluation in databases. After many years of effort, the indexing performance has reached its peak on the existing hardware infrastructure. We propose to use ray tracing (RT) cores to move the indexing performance and efficiency to another level by addressing the following technical challenges: (1) the lack of an efficient mapping of predicate evaluation to a ray tracing job and (2) the poor performance by the heavy and imbalanced ray load when processing skewed datasets. These challenges set obstacles to effectively exploiting RT cores for predicate evaluation. In this paper, we propose RTScan, an approach that leverages RT cores to accelerate index scans. RTScan transforms the evaluation of conjunctive predicates into an efficient ray tracing job in a three-dimensional space. A set of techniques are designed in RTScan, i.e., Uniform Encoding, Data Sieving, and Matrix RT Refine, which significantly enhances the parallelism of scans on RT cores while lightening and balancing the ray load. With the proposed techniques, RTScan achieves high performance for datasets with either uniform or skewed distributions and queries with different selectivities. Extensive evaluations demonstrate that RTScan enhances the scan performance on RT cores by five orders of magnitude and outperforms the state-of-the-art approach on CPU by up to 4.6×.more » « less
-
Abstract. The effects of anthropogenic warming on the hydroclimate of California are becoming more pronounced, with increased frequency of multi-year droughts and flooding. As a past analog for the future, the Paleocene-Eocene Thermal Maximum (PETM) is a unique natural experiment for assessing global and regional hydroclimate sensitivity to greenhouse gas warming. Globally, extensive evidence (i.e., observations, climate models with high pCO2) demonstrates hydrological intensification with significant variability from region to region (i.e., dryer or wetter, or greater frequency and/or intensity of extreme events). Central California (paleolatitude ~42° N), roughly at the boundary between dry subtropical highs and mid-latitude low pressure systems, would have been particularly susceptible to shifts in atmospheric circulation and precipitation patterns/intensity. Here, we present new observations and climate model output on regional/local hydroclimate responses in central California during PETM. Our findings based on multi-proxy evidence within the context of model output suggest a transition to an overall drier climate punctuated by increased precipitation during summer months along the central coastal California during the PETM.more » « less
-
R-tree is a foundational data structure used in spatial databases and scientific databases. With the advancement of networks and computer architectures, in-memory data processing for R-tree in distributed systems has become a common platform. We have observed new performance challenges to process R-tree as the amount of multidimensional datasets become increasingly high. Specifically, an R-tree server can be heavily overloaded while the network and client CPU are lightly loaded, and vice versa. In this article, we present the design and implementation of Catfish, an RDMA-enabled R-tree for low latency and high throughput by adaptively utilizing the available network bandwidth and computing resources to balance the workloads between clients and servers. We design and implement two basic mechanisms of using RDMA for a client-server R-tree data processing system. First, in the fast messaging design, we use RDMA writes to send R-tree requests to the server and let server threads process R-tree requests to achieve low query latency. Second, in the RDMA offloading design, we use RDMA reads to offload tree traversal from the server to the client, which rescues the server as it is overloaded. We further develop an adaptive scheme to effectively switch an R-tree search between fast messaging and RDMA offloading, maximizing the overall performance. Our experiments show that the adaptive solution of Catfish on InfiniBand significantly outperforms R-tree that uses only fast messaging or only RDMA offloading in both latency and throughput. Catfish can also deliver up to one order of magnitude performance over the traditional schemes using TCP/IP on 1 and 40 Gbps Ethernet. We make a strong case to use RDMA to effectively balance workloads in distributed systems for low latency and high throughput.more » « less