Search for: All records

Creators/Authors contains: "Rundensteiner, Elke A."

« Prev Next »

Total Resources

12

Resource Type
Conference Paper

8

Conference Proceeding

0

Dataset

0

Journal Article

4

Workshop Report

0

Availability
Full Text / Resource Available

11

Citation Only

1

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

AutoOD: Automatic Outlier Detection

https://doi.org/10.1145/3588700

Cao, Lei ; Yan, Yizhou ; Wang, Yu ; Madden, Samuel ; Rundensteiner, Elke A. ( May 2023 , Proceedings of the ACM on Management of Data)

Outlier detection is critical in real world. Due to the existence of many outlier detection techniques which often return different results for the same data set, the users have to address the problem of determining which among these techniques is the best suited for their task and tune its parameters. This is particularly challenging in the unsupervised setting, where no labels are available for cross-validation needed for such method and parameter optimization. In this work, we propose AutoOD which uses the existing unsupervised detection techniques to automatically produce high quality outliers without any human tuning. AutoOD's fundamentally new strategy unifies the merits of unsupervised outlier detection and supervised classification within one integrated solution. It automatically tests a diverse set of unsupervised outlier detectors on a target data set, extracts useful signals from their combined detection results to reliably capture key differences between outliers and inliers. It then uses these signals to produce a "custom outlier classifier" to classify outliers, with its accuracy comparable to supervised outlier classification models trained with ground truth labels - without having access to the much needed labels. On a diverse set of benchmark outlier detection datasets, AutoOD consistently outperforms the best unsupervised outlier detector selected from hundreds of detectors. It also outperforms other tuning-free approaches from 12 to 97 points (out of 100) in the F-1 score.
more » « less
Free, publicly-accessible full text available May 26, 2024
Measuring the Uncertainty of Environmental Good Preferences with Bayesian Deep Learning

https://doi.org/10.1145/3524458.3547250

Flores, Ricardo ; Tlachac, ML ; Rundensteiner, Elke A. ( September 2022 , 2022 ACM Conference on Information Technology for Social Good)

Due to climate change and resulting natural disasters, there has been a growing interest in measuring the value of social goods to our society, like environmental conservation. Traditionally, the stated preference, such as contingent valuation, captures an economics-perspective on the value of environmental goods through the willingness to pay (WTP) paradigm. Where the economics theory to estimate the WTP using machine learning is the random utility model. However, the estimation of WTP depends on rather simple preference assumptions based on a linear functional form. These models are therefore unable to capture the complex uncertainty in the human decision-making process. Further, contingent valuation only uses the mean or median estimation of WTP. Yet it has been recognized that other quantiles of the WTP would be valuable to ensure the provision of social goods. In this work, we propose to leverage the Bayesian Deep Learning (BDL) models to capture the uncertainty in stated preference estimation. We focus on the probability of paying for an environmental good and the conditional distribution of WTP. The Bayesian deep learning model connects with the economics theory of the random utility model through the stochastic component on the individual preferences. For testing our proposed model, we work with both synthetic and real-world data. The results on synthetic data suggest the BDL can capture the uncertainty consistently with different distribution of WTP. For the real-world data, a forest conservation contingent valuation survey, we observed a high variability in the distribution of the WTP, suggesting high uncertainty in the individual preferences for social goods. Our research can be used to inform environmental policy, including the preservation of natural resources and other social good.
more » « less
Full Text Available
Text Generation to Aid Depression Detection: A Comparative Study of Conditional Sequence Generative Adversarial Networks

https://doi.org/10.1109/BigData55660.2022.10020224

Tlachac, ML ; Gerych, Walter ; Agrawal, Kratika ; Litterer, Benjamin ; Jurovich, Nicholas ; Thatigotla, Saitheeraj ; Thadajarassiri, Jidapa ; Rundensteiner, Elke A. ( December 2022 , 2022 IEEE International Conference on Big Data (Big Data))

Corpuses of unstructured textual data, such as text messages between individuals, are often predictive of medical issues such as depression. The text data usually used in healthcare applications has high value and great variety, but is typically small in volume. Generating labeled unstructured text data is important to improve models by augmenting these small datasets, as well as to facilitate anonymization. While methods for labeled data generation exist, not all of them generalize well to small datasets. In this work, we thus perform a much needed systematic comparison of conditional text generation models that are promising for small datasets due to their unified architectures. We identify and implement a family of nine conditional sequence generative adversarial networks for text generation, which we collectively refer to as cSeqGAN models. These models are characterized along two orthogonal design dimensions: weighting strategies and feedback mechanisms. We conduct a comparative study evaluating the generation ability of the nine cSeqGAN models on three diverse text datasets with depression and sentiment labels. To assess the quality and realism of the generated text, we use standard machine learning metrics as well as human assessment via a user study. While the unconditioned models produced predictive text, the cSeqGAN models produced more realistic text. Our comparative study lays a solid foundation and provides important insights for further text generation research, particularly for the small datasets common within the healthcare domain.
more » « less
Full Text Available
Automated Construction of Lexicons to Improve Depression Screening with Text Messages

https://doi.org/10.1109/JBHI.2022.3203345

Tlachac, ML ; Shrestha, Avantika ; Shah, Mahum ; Litterer, Benjamin ; Rundensteiner, Elke A. ( August 2022 , IEEE Journal of Biomedical and Health Informatics)

Given that depression is one of the most prevalent mental illnesses, developing effective and unobtrusive diagnosis tools is of great importance. Recent work that screens for depression with text messages leverage models relying on lexical category features. Given the colloquial nature of text messages, the performance of these models may be limited by formal lexicons. We thus propose a strategy to automatically construct alternative lexicons that contain more relevant and colloquial terms. Specifically, we generate 36 lexicons from fiction, forum, and news corpuses. These lexicons are then used to extract lexical category features from the text messages. We utilize machine learning models to compare the depression screening capabilities of these lexical category features. Out of our 36 constructed lexicons, 14 achieved statistically significantly higher average F1 scores over the pre-existing formal lexicon and basic bag-of-words approach. In comparison to the pre-existing lexicon, our best performing lexicon increased the average F1 scores by 10%. We thus confirm our hypothesis that less formal lexicons can improve the performance of classification models that screen for depression with text messages. By providing our automatically constructed lexicons, we aid future machine learning research that leverages less formal text.
more » « less
Full Text Available
Gloria: Graph-based Sharing Optimizer for Event Trend Aggregation

https://doi.org/10.1145/3514221.3526145

Ma, Lei ; Lei, Chuan ; Poppe, Olga ; Rundensteiner, Elke A. ( June 2022 , SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data)

Large workloads of event trend aggregation queries are widely deployed to derive high-level insights about current event trends in near real time. To speed-up the execution, we identify and leverage sharing opportunities from complex patterns with flat Kleene operators or even nested Kleene expressions. We propose Gloria, a graph-based sharing optimizer for event trend aggregation. First, we map the sharing optimization problem to a graph path search problem in the Gloria graph with execution costs encoded as weights. Second, we shrink the search space by applying cost-driven pruning principles that guarantee optimality of the reduced Gloria graph in most cases. Lastly, we propose a path search algorithm that identifies the sharing plan with minimum execution costs. Our experimental study on three real-world data sets demonstrates that our Gloria optimizer effectively reduces the search space, leading to 5-fold speed-up in optimization time. The optimized plan consistently reduces the query latency by 68%-93% compared to the plan generated by state-of-the-art approaches.
more » « less
Full Text Available
ELITE: Robust Deep Anomaly Detection with Meta Gradient

https://doi.org/10.1145/3447548.3467320

Zhang, Huayi ; Cao, Lei ; VanNostrand, Peter ; Madden, Samuel ; Rundensteiner, Elke A. ( August 2021 , KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining)

Full Text Available
To Share, or not to Share Online Event Trend Aggregation Over Bursty Event Streams

https://doi.org/10.1145/3448016.3452785

Poppe, Olga ; Lei, Chuan ; Ma, Lei ; Rozet, Allison ; Rundensteiner, Elke A. ( June 2021 , Proceedings of the 2021 International Conference on Management of Data (SIGMOD’21), June 18–27, 2021, Virtual Event, China.)
null (Ed.)
Complex event processing (CEP) systems continuously evaluate large workloads of pattern queries under tight time constraints. Event trend aggregation queries with Kleene patterns are commonly used to retrieve summarized insights about the recent trends in event streams. State-of-art methods are limited either due to repetitive computations or unnecessary trend construction. Existing shared approaches are guided by statically selected and hence rigid sharing plans that are often sub-optimal under stream fluctuations. In this work, we propose a novel framework Hamlet that is the first to overcome these limitations. Hamlet introduces two key innovations. First, Hamlet adaptively decides at run time whether to share or not to share computations depending on the current stream properties to harvest the maximum sharing benefit. Second, Hamlet is equipped with a highly efficient shared trend aggregation strategy that avoids trend construction. Our experimental study on both real and synthetic data sets demonstrates that Hamlet consistently reduces query latency by up to five orders of magnitude compared to state-of-the-art approaches.
more » « less
Full Text Available
Muse: Multi-query Event Trend Aggregation

https://doi.org/10.1145/3340531.3412138

Rozet, Allison ; Poppe, Olga ; Lei, Chuan ; Rundensteiner, Elke A. ( October 2020 , Proceedings of the 29th ACM International Conference on Information & Knowledge Management)
null (Ed.)
ABSTRACT Streaming analytics deploy Kleene pattern queries to detect and aggregate event trends on high-rate data streams. Despite increasing workloads, most state-of-the-art systems process each query independently, thus missing cost-saving sharing opportunities. Sharing event trend aggregation poses several technical challenges. First, Kleene patterns are in general difficult to share due to complex nesting and arbitrarily long matches. Second, not all sharing opportunities are beneficial because sharing Kleene patterns incurs non-trivial overhead to ensure the correctness of final aggregation results. We propose Muse (Multi-query Shared Event trend aggregation), the first framework that shares aggregation queries with Kleene patterns while avoiding expensive trend construction. To find the beneficial sharing plan, the Muse optimizer effectively selects robust sharing candidates from the exponentially large search space. Our experiments demonstrate that Muse increases throughput by 4 orders of magnitude compared to state-of-the-art approaches. ACM Reference Format: Allison Rozet, Olga Poppe, Chuan Lei, and Elke A. Rundensteiner. 2020. MUSE: Multi-query Event Trend Aggregation. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM ’20), October 19–23, 2020, Virtual Event, Ireland. ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3340531.3412138
more » « less
Full Text Available
Continuously Adaptive Similarity Search

https://doi.org/10.1145/3318464.3380601

Zhang, Huayi ; Cao, Lei ; Yan, Yizhou ; Madden, Samuel ; Rundensteiner, Elke A. ( June 2020 , Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data)

Similarity search is the basis for many data analytics techniques, including k-nearest neighbor classification and outlier detection. Similarity search over large data sets relies on i) a distance metric learned from input examples and ii) an index to speed up search based on the learned distance metric. In interactive systems, input to guide the learning of the distance metric may be provided over time. As this new input changes the learned distance metric, a naive approach would adopt the costly process of re-indexing all items after each metric change. In this paper, we propose the first solution, called OASIS, to instantaneously adapt the index to conform to a changing distance metric without this prohibitive re-indexing process. To achieve this, we prove that locality-sensitive hashing (LSH) provides an invariance property, meaning that an LSH index built on the original distance metric is equally effective at supporting similarity search using an updated distance metric as long as the transform matrix learned for the new distance metric satisfies certain properties. This observation allows OASIS to avoid recomputing the index from scratch in most cases. Further, for the rare cases when an adaption of the LSH index is shown to be necessary, we design an efficient incremental LSH update strategy that re-hashes only a small subset of the items in the index. In addition, we develop an efficient distance metric learning strategy that incrementally learns the new metric as inputs are received. Our experimental study using real world public datasets confirms the effectiveness of OASIS at improving the accuracy of various similarity search-based data analytics tasks by instantaneously adapting the distance metric and its associated index in tandem, while achieving an up to 3 orders of magnitude speedup over the state-of-art techniques.
more » « less
Full Text Available
Event Trend Aggregation Under Rich Event Matching Semantics

https://doi.org/10.1145/3299869.3319862

Poppe, Olga ; Lei, Chuan ; Rundensteiner, Elke A. ; Maier, David ( June 2019 , SIGMOD)
null (Ed.)
Streaming applications from cluster monitoring to algorithmic trading deploy Kleene queries to detect and aggregate event trends. Rich event matching semantics determine how to compose events into trends. The expressive power of stateof- the-art streaming systems remains limited since they do not support many of these semantics. Worse yet, they suffer from long delays and high memory costs because they maintain aggregates at a fine granularity. To overcome these limitations, our Coarse-Grained Event Trend Aggregation (Cogra) approach supports a rich variety of event matching semantics within one system. Better yet, Cogra incrementally maintains aggregates at the coarsest granularity possible for each of these semantics. In this way, Cogra minimizes the number of aggregates – reducing both time and space complexity. Our experiments demonstrate that Cogra achieves up to six orders of magnitude speed-up and up to seven orders of magnitude memory reduction compared to state-of-the-art approaches.
more » « less
Full Text Available

« Prev Next »