<?xml version="1.0" encoding="UTF-8"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcq="http://purl.org/dc/terms/"><records count="1" morepages="false" start="1" end="1"><record rownumber="1"><dc:product_type>Conference Paper</dc:product_type><dc:title>Stratified Random Sampling over Streaming and Stored Data</dc:title><dc:creator>Nguyen, T; Shih, M; Srivastava, D; Tirthapura, S; Xu, B</dc:creator><dc:corporate_author/><dc:editor/><dc:description>Stratified random sampling (SRS) is a widely used sampling technique for approximate query processing. We consider SRS on continuously arriving data streams, and make the following contributions. We present a lower bound that shows that any streaming algorithm for SRS must have (in the worst case) a variance
that is Ω(r) factor away from the optimal, where r is the number
of strata. We present S-VOILA, a streaming algorithm for SRS
that is locally variance-optimal. Results from experiments on real
and synthetic data show that S-VOILA results in a variance that is
typically close to an optimal offline algorithm, which was given
the entire input beforehand. We also present a variance-optimal
offline algorithm VOILA for stratified random sampling. VOILA
is a strict generalization of the well-known Neyman allocation,
which is optimal only under the assumption that each stratum is
abundant, i.e. has a large number of data points to choose from.
Experiments show that VOILA can have significantly smaller variance (1.4x to 50x) than Neyman allocation on real-world data.</dc:description><dc:publisher/><dc:date>2019-03-01</dc:date><dc:nsf_par_id>10110905</dc:nsf_par_id><dc:journal_name>Advances in Database Technology - 22nd International Conference on Extending Database Technology (EDBT)</dc:journal_name><dc:journal_volume/><dc:journal_issue/><dc:page_range_or_elocation>25-36</dc:page_range_or_elocation><dc:issn/><dc:isbn/><dc:doi>https://doi.org/10.5441/002/edbt.2019.04</dc:doi><dcq:identifierAwardId>1725702; 1527541</dcq:identifierAwardId><dc:subject/><dc:version_number/><dc:location/><dc:rights/><dc:institution/><dc:sponsoring_org>National Science Foundation</dc:sponsoring_org></record></records></rdf:RDF>