NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

DriftSurf: Stable-State / Reactive-State Learning under Concept Drift

Tahmasbi, Ashraf; Jothimurugesan, Ellango; Tirthapura, Srikanta; Gibbons, Phillip (July 2021, Proceedings of the 38th International Conference on Machine Learning, ICML'21)

Full Text Available
DriftSurf: Stable-State / Reactive-State Learning under Concept Drift

Tahmasbi, Ashraf; Jothimurugesan, Ellango; Tirthapura, Srikanta; Gibbons, Phillip B. (July 2021, Proceedings of the 38th International Conference on Machine Learning, {ICML} 2021, 18-24 July 2021, Virtual Even)

When learning from streaming data, a change in the data distribution, also known as concept drift, can render a previously-learned model inaccurate and require training a new model. We present an adaptive learning algorithm that extends previous drift-detection-based methods by incorporating drift detection into a broader stable-state/reactive-state process. The advantage of our approach is that we can use aggressive drift detection in the stable state to achieve a high detection rate, but mitigate the false positive rate of standalone drift detection via a reactive state that reacts quickly to true drifts while eliminating most false positives. The algorithm is generic in its base learner and can be applied across a variety of supervised learning problems. Our theoretical analysis shows that the risk of the algorithm is (i) statistically better than standalone drift detection and (ii) competitive to an algorithm with oracle knowledge of when (abrupt) drifts occur. Experiments on synthetic and real datasets with concept drifts confirm our theoretical analysis.
more » « less
Full Text Available
Stratified random sampling from streaming and stored data

https://doi.org/10.1007/s10619-020-07315-w

Nguyen, Trong Duc; Shih, Ming-Hung; Srivastava, Divesh; Tirthapura, Srikanta; Xu, Bojian (September 2021, Distributed and Parallel Databases)

Full Text Available
Shared-Memory Parallel Maximal Biclique Enumeration

https://doi.org/10.1109/HiPC.2019.00016

Das, Apurba; Tirthapura, Srikanta (December 2019, 26th {IEEE} International Conference on High Performance Computing)

We present shared memory parallel algorithms for maximal biclique enumeration (MBE), the task of enumerating all complete dense subgraphs (maximal bicliques) from a bipartite graph, which is widely used in the analysis of social, biological, and transactional networks. Since MBE is computationally expensive, it is necessary to use parallel computing to scale to large graphs. Our parallel algorithm ParMBE efficiently uses the power of multiple cores that share memory. From a theoretical view, ParMBE is work-efficient with respect to a state-of-the-art sequential algorithm. Our experimental evaluation shows that ParMBE scales well up to 64 cores, and is significantly faster than current parallel algorithms. Since ParMBE was yielding a super-linear speedup compared to the sequential algorithm on which it was based (MineLMBC), we develop an improved sequential algorithm FMBE, through "sequentializing" ParMBE.
more » « less
Full Text Available
Shared-memory Parallel Maximal Clique Enumeration from Static and Dynamic Graphs

https://doi.org/10.1145/3380936

Das, Apurba; Sanei-Mehri, Seyed-Vahid; Tirthapura, Srikanta (April 2020, ACM Transactions on Parallel Computing)

Full Text Available
DriftSurf: A Risk-competitive Learning Algorithm under Concept Drift

Tahmasbi, Ashraf; Jothimurugesan, Ellango; Tirthapura, Srikanta; Gibbons, Phillip B. (January 2020, ArXivorg)

When learning from streaming data, a change in the data distribution, also known as concept drift, can render a previously-learned model inaccurate and require training a new model. We present an adaptive learning algorithm that extends previous drift-detection-based methods by incorporating drift detection into a broader stable-state/reactive-state process. The advantage of our approach is that we can use aggressive drift detection in the stable state to achieve a high detection rate, but mitigate the false positive rate of standalone drift detection via a reactive state that reacts quickly to true drifts while eliminating most false positives. The algorithm is generic in its base learner and can be applied across a variety of supervised learning problems. Our theoretical analysis shows that the risk of the algorithm is competitive to an algorithm with oracle knowledge of when (abrupt) drifts occur. Experiments on synthetic and real datasets with concept drifts confirm our theoretical analysis.
more » « less
Full Text Available
Mining Largest Maximal Quasi-Cliques

https://doi.org/10.1145/3446637

Sanei-Mehri, Seyed-Vahid; Das, Apurba; Hashemi, Hooman; Tirthapura, Srikanta (June 2021, ACM Transactions on Knowledge Discovery from Data)

Quasi-cliques are dense incomplete subgraphs of a graph that generalize the notion of cliques. Enumerating quasi-cliques from a graph is a robust way to detect densely connected structures with applications in bioinformatics and social network analysis. However, enumerating quasi-cliques in a graph is a challenging problem, even harder than the problem of enumerating cliques. We consider the enumeration of top- k degree-based quasi-cliques and make the following contributions: (1) we show that even the problem of detecting whether a given quasi-clique is maximal (i.e., not contained within another quasi-clique) is NP-hard. (2) We present a novel heuristic algorithm K ernel QC to enumerate the k largest quasi-cliques in a graph. Our method is based on identifying kernels of extremely dense subgraphs within a graph, followed by growing subgraphs around these kernels, to arrive at quasi-cliques with the required densities. (3) Experimental results show that our algorithm accurately enumerates quasi-cliques from a graph, is much faster than current state-of-the-art methods for quasi-clique enumeration (often more than three orders of magnitude faster), and can scale to larger graphs than current methods.
more » « less
Full Text Available
Incremental maintenance of maximal cliques in a dynamic graph

https://doi.org/10.1007/s00778-019-00540-5

Das, Apurba; Svendsen, Michael; Tirthapura, Srikanta (June 2019, The VLDB Journal)

Full Text Available
Weighted Reservoir Sampling from Distributed Streams

https://doi.org/10.1145/3294052.3319696

Jayaram, Rajesh; Sharma, Gokarna; Tirthapura, Srikanta; Woodruff, David P. (June 2019, Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems)

We consider message-efficient continuous random sampling from a distributed stream, where the probability of inclusion of an item in the sample is proportional to a weight associated with the item. The unweighted version, where all weights are equal, is well studied, and admits tight upper and lower bounds on message complexity. For weighted sampling with replacement, there is a simple reduction to unweighted sampling with replacement. However, in many applications the stream may have only a few heavy items which may dominate a random sample when chosen with replacement. Weighted sampling without replacement (weighted SWOR) eludes this issue, since such heavy items can be sampled at most once. In this work, we present the first message-optimal algorithm for weighted SWOR from a distributed stream. Our algorithm also has optimal space and time complexity. As an application of our algorithm for weighted SWOR, we derive the first distributed streaming algorithms for tracking heavy hitters with residual error. Here the goal is to identify stream items that contribute significantly to the residual stream, once the heaviest items are removed. Residual heavy hitters generalize the notion of $$\ell_1$$ heavy hitters and are important in streams that have a skewed distribution of weights. In addition to the upper bound, we also provide a lower bound on the message complexity that is nearly tight up to a $$łog(1/\eps)$$ factor. Finally, we use our weighted sampling algorithm to improve the message complexity of distributed $$L_1$$ tracking, also known as count tracking, which is a widely studied problem in distributed streaming. We also derive a tight message lower bound, which closes the message complexity of this fundamental problem.
more » « less
Full Text Available
Incremental Maintenance of Maximal Bicliques in a Dynamic Bipartite Graph

https://doi.org/10.1109/TMSCS.2018.2802920

Das, Apurba; Tirthapura, Srikanta (July 2018, IEEE Transactions on Multi-Scale Computing Systems)

Full Text Available

« Prev Next »

Search for: All records