Search for: All records

Award ID contains: 1527541

« Prev Next »

Total Resources

5

Resource Type
Conference Paper

2

Conference Proceeding

0

Dataset

0

Journal Article

3

Workshop Report

0

Availability
Full Text / Resource Available

5

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Incremental maintenance of maximal cliques in a dynamic graph

https://doi.org/10.1007/s00778-019-00540-5

Das, Apurba ; Svendsen, Michael ; Tirthapura, Srikanta ( June 2019 , The VLDB Journal)

Full Text Available
Weighted Reservoir Sampling from Distributed Streams

https://doi.org/10.1145/3294052.3319696

Jayaram, Rajesh ; Sharma, Gokarna ; Tirthapura, Srikanta ; Woodruff, David P. ( June 2019 , Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems)

We consider message-efficient continuous random sampling from a distributed stream, where the probability of inclusion of an item in the sample is proportional to a weight associated with the item. The unweighted version, where all weights are equal, is well studied, and admits tight upper and lower bounds on message complexity. For weighted sampling with replacement, there is a simple reduction to unweighted sampling with replacement. However, in many applications the stream may have only a few heavy items which may dominate a random sample when chosen with replacement. Weighted sampling without replacement (weighted SWOR) eludes this issue, since such heavy items can be sampled at most once. In this work, we present the first message-optimal algorithm for weighted SWOR from a distributed stream. Our algorithm also has optimal space and time complexity. As an application of our algorithm for weighted SWOR, we derive the first distributed streaming algorithms for tracking heavy hitters with residual error. Here the goal is to identify stream items that contribute significantly to the residual stream, once the heaviest items are removed. Residual heavy hitters generalize the notion of $\ell_1$ heavy hitters and are important in streams that have a skewed distribution of weights. In addition to the upper bound, we also provide a lower bound on the message complexity that is nearly tight up to a $łog(1/\eps)$ factor. Finally, we use our weighted sampling algorithm to improve the message complexity of distributed $L_1$ tracking, also known as count tracking, which is a widely studied problem in distributed streaming. We also derive a tight message lower bound, which closes the message complexity of this fundamental problem.
more » « less
Full Text Available
Stratified Random Sampling over Streaming and Stored Data

https://doi.org/10.5441/002/edbt.2019.04

Nguyen, T ; Shih, M ; Srivastava, D ; Tirthapura, S ; Xu, B ( March 2019 , Advances in Database Technology - 22nd International Conference on Extending Database Technology (EDBT))

Stratified random sampling (SRS) is a widely used sampling technique for approximate query processing. We consider SRS on continuously arriving data streams, and make the following contributions. We present a lower bound that shows that any streaming algorithm for SRS must have (in the worst case) a variance that is Ω(r) factor away from the optimal, where r is the number of strata. We present S-VOILA, a streaming algorithm for SRS that is locally variance-optimal. Results from experiments on real and synthetic data show that S-VOILA results in a variance that is typically close to an optimal offline algorithm, which was given the entire input beforehand. We also present a variance-optimal offline algorithm VOILA for stratified random sampling. VOILA is a strict generalization of the well-known Neyman allocation, which is optimal only under the assumption that each stratum is abundant, i.e. has a large number of data points to choose from. Experiments show that VOILA can have significantly smaller variance (1.4x to 50x) than Neyman allocation on real-world data.
more » « less
Full Text Available
Incremental Maintenance of Maximal Bicliques in a Dynamic Bipartite Graph

https://doi.org/10.1109/TMSCS.2018.2802920

Das, Apurba ; Tirthapura, Srikanta ( July 2018 , IEEE Transactions on Multi-Scale Computing Systems)

Full Text Available
Work-efficient parallel union-find: Work-efficient parallel union-find

https://doi.org/10.1002/cpe.4333

Simsiri, Natcha ; Tangwongsan, Kanat ; Tirthapura, Srikanta ; Wu, Kun-Lung ( February 2018 , Concurrency and Computation: Practice and Experience)

Full Text Available