skip to main content

Search for: All records

Creators/Authors contains: "Ton That, Dai Hai"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Commercial cloud database services increase availability of data and provide reliable access to data. Routine database maintenance tasks such as clustering, however, increase the costs of hosting data on commercial cloud instances. Clustering causes an I/O burst; clustering in one-shot depletes I/O credit accumulated by an instance and increases the cost of hosting data. An unclustered database decreases query performance by scanning large amounts of data, gradually depleting I/O credits. In this paper, we introduce Physical Location Index Plus (PLI+), an indexing method for databases hosted on commercial cloud. PLI+ relies on internal knowledge of data layout, building a physical location index, which maps a range of physical co-locations with a range of attribute values to create approximately sorted buckets. As new data is inserted, writes are partitioned in memory based on incoming data distribution. The data is written to physical locations on disk in block-based partitions to favor large granularity I/O. Incoming SQL queries on indexed attribute values are rewritten in terms of the physical location ranges. As a result, PLI+ does not decrease query performance on an unclustered cloud database instance, DBAs may choose to cluster the instance when they have sufficiently large I/O credit available for clusteringmore »thus delaying the need for clustering. We evaluate query performance over PLI+ by comparing it with clustered, unclustered (secondary) indexes, and log-structured merge trees on real datasets. Experiments show that PLI+ significantly delays clustering, and yet does not degrade query performance—thus achieving higher level of sortedness than unclustered indexes and log-structured merge trees. We also evaluate the quality of clustering by introducing a measure of interval sortedness, and the size of index.« less