skip to main content


Search for: All records

Creators/Authors contains: "Rundensteiner, Elke"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. In social choice, traditional Kemeny rank aggregation combines the preferences of voters, expressed as rankings, into a single consensus ranking without consideration for how this ranking may unfairly affect marginalized groups (i.e., racial or gender). Developing fair rank aggregation methods is critical due to their societal influence in applications prioritizing job applicants, funding proposals, and scheduling medical patients. In this work, we introduce the Fair Exposure Kemeny Aggregation Problem (FairExp-kap) for combining vast and diverse voter preferences into a single ranking that is not only a suitable consensus, but ensures opportunities are not withheld from marginalized groups. In formalizing FairExp-kap, we extend the fairness of exposure notion from information retrieval to the rank aggregation context and present a complimentary metric for voter preference representation. We design algorithms for solving FairExp-kap that explicitly account for position bias, a common ranking-based concern that end-users pay more attention to higher ranked candidates. epik solves FairExp-kap exactly by incorporating non-pairwise fairness of exposure into the pairwise Kemeny optimization; while the approximate epira is a candidate swapping algorithm, that guarantees ranked candidate fairness. Utilizing comprehensive synthetic simulations and six real-world datasets, we show the efficacy of our approach illustrating that we succeed in mitigating disparate group exposure unfairness in consensus rankings, while maximally representing voter preferences. 
    more » « less
    Free, publicly-accessible full text available June 12, 2024
  2. Multi-label classification (MLC), which assigns multiple labels to each instance, is crucial to domains from computer vision to text mining. Conventional methods for MLC require huge amounts of labeled data to capture complex dependencies between labels. However, such labeled datasets are expensive, or even impossible, to acquire. Worse yet, these pre-trained MLC models can only be used for the particular label set covered in the training data. Despite this severe limitation, few methods exist for expanding the set of labels predicted by pre-trained models. Instead, we acquire vast amounts of new labeled data and retrain a new model from scratch. Here, we propose combining the knowledge from multiple pre-trained models (teachers) to train a new student model that covers the union of the labels predicted by this set of teachers. This student supports a broader label set than any one of its teachers without using labeled data. We call this new problem knowledge amalgamation for multi-label classification. Our new method, Adaptive KNowledge Transfer (ANT), trains a student by learning from each teacher’s partial knowledge of label dependencies to infer the global dependencies between all labels across the teachers. We show that ANT succeeds in unifying label dependencies among teachers, outperforming five state-of-the-art methods on eight real-world datasets. 
    more » « less
    Free, publicly-accessible full text available June 27, 2024
  3. For applications where multiple stakeholders provide recommendations, a fair consensus ranking must not only ensure that the preferences of rankers are well represented, but must also mitigate disadvantages among socio-demographic groups in the final result. However, there is little empirical guidance on the value or challenges of visualizing and integrating fairness metrics and algorithms into human-in-the-loop systems to aid decision-makers. In this work, we design a study to analyze the effectiveness of integrating such fairness metrics-based visualization and algorithms. We explore this through a task-based crowdsourced experiment comparing an interactive visualization system for constructing consensus rankings, ConsensusFuse, with a similar system that includes visual encodings of fairness metrics and fair-rank generation algorithms, FairFuse. We analyze the measure of fairness, agreement of rankers’ decisions, and user interactions in constructing the fair consensus ranking across these two systems. In our study with 200 participants, results suggest that providing these fairness-oriented support features nudges users to align their decision with the fairness metrics while minimizing the tedious process of manually having to amend the consensus ranking. We discuss the implications of these results for the design of next-generation fairness oriented-systems and along with emerging directions for future research. 
    more » « less
    Free, publicly-accessible full text available June 12, 2024
  4. Outlier detection is critical in real world. Due to the existence of many outlier detection techniques which often return different results for the same data set, the users have to address the problem of determining which among these techniques is the best suited for their task and tune its parameters. This is particularly challenging in the unsupervised setting, where no labels are available for cross-validation needed for such method and parameter optimization. In this work, we propose AutoOD which uses the existing unsupervised detection techniques to automatically produce high quality outliers without any human tuning. AutoOD's fundamentally new strategy unifies the merits of unsupervised outlier detection and supervised classification within one integrated solution. It automatically tests a diverse set of unsupervised outlier detectors on a target data set, extracts useful signals from their combined detection results to reliably capture key differences between outliers and inliers. It then uses these signals to produce a "custom outlier classifier" to classify outliers, with its accuracy comparable to supervised outlier classification models trained with ground truth labels - without having access to the much needed labels. On a diverse set of benchmark outlier detection datasets, AutoOD consistently outperforms the best unsupervised outlier detector selected from hundreds of detectors. It also outperforms other tuning-free approaches from 12 to 97 points (out of 100) in the F-1 score. 
    more » « less
    Free, publicly-accessible full text available May 26, 2024
  5. Human context recognition (HCR) using sensor data is a crucial task in Context-Aware (CA) applications in domains such as healthcare and security. Supervised machine learning HCR models are trained using smartphone HCR datasets that are scripted or gathered in-the-wild. Scripted datasets are most accurate because of their consistent visit patterns. Supervised machine learning HCR models perform well on scripted datasets but poorly on realistic data. In-the-wild datasets are more realistic, but cause HCR models to perform worse due to data imbalance, missing or incorrect labels, and a wide variety of phone placements and device types. Lab-to-field approaches learn a robust data representation from a scripted, high-fidelity dataset, which is then used for enhancing performance on a noisy, in-the-wild dataset with similar labels. This research introduces Triplet-based Domain Adaptation for Context REcognition (Triple-DARE), a lab-to-field neural network method that combines three unique loss functions to enhance intra-class compactness and inter-class separation within the embedding space of multi-labeled datasets: (1) domain alignment loss in order to learn domain-invariant embeddings; (2) classification loss to preserve task-discriminative features; and (3) joint fusion triplet loss. Rigorous evaluations showed that Triple-DARE achieved 6.3% and 4.5% higher F1-score and classification, respectively, than state-of-the-art HCR baselines and outperformed non-adaptive HCR models by 44.6% and 10.7%, respectively. 
    more » « less
  6. Deep learning often relies on the availability of a large amount of high-quality labeled data, which can be very limited in novel domains. To address such data scarcity, domain adaptation is one promising approach that allows for deep networks to leverage large amounts of available data from a source domain to enhance the model’s efficacy on the target domain of interest. However, while there is a plethora of alternate models for domain adaptation proposed over many years in the literature, there is a dearth of studies that objectively compare the relative effectiveness of these models in a rigorous, empirical study. To fill this gap, we provide a thorough, unbiased, empirical study of five state-of-the-art (SOTA) deep domain adaptation models proposed over the past 6 years whose codes are publicly available. Models are evaluated on the complex and diverse domain adaptation tasks featured in the DomainNet benchmark dataset as well as the popular Office-31 dataset. Our results suggest that (1) all 5 models perform similarly, on average, and do not even significantly beat the oldest model, and (2) counter to their intended purpose, the transfer loss functions in the literature do not contribute significantly to learning transferable representations. Our observations suggest that domain adaptation research needs to more thoroughly compare newly proposed models against existing works, along with assessing their loss functions’ utility thoroughly. Our code and data splits are made public for reproducibility of results by the community. 
    more » « less
  7. Human activity recognition (HAR) is the process of using mobile sensor data to determine the physical activities performed by individuals. HAR is the backbone of many mobile healthcare applications, such as passive health monitoring systems, early diagnosing systems, and fall detection systems. Effective HAR models rely on deep learning architectures and big data in order to accurately classify activities. Unfortunately, HAR datasets are expensive to collect, are often mislabeled, and have large class imbalances. State-of-the-art approaches to address these challenges utilize Generative Adversarial Networks (GANs) for generating additional synthetic data along with their labels. Problematically, these HAR GANs only synthesize continuous features — features that are represented by real numbers — recorded from gyroscopes, accelerometers, and other sensors that produce continuous data. This is limiting since mobile sensor data commonly has discrete features that provide additional context such as device location and the time-of-day, which have been shown to substantially improve HAR classification. Hence, we studied Conditional Tabular Generative Adversarial Networks (CTGANs) for data generation to synthesize mobile sensor data containing both continuous and discrete features, a task never been done by state-of-the-art approaches. We show HAR-CTGANs generate data with greater realism resulting in allowing better downstream performance in HAR models, and when state-of-the-art models were modified with HAR-CTGAN characteristics, downstream performance also improves. 
    more » « less
  8. Explainability helps users trust deep learning solutions for time series classification. However, existing explainability methods for multi-class time series classifiers focus on one class at a time, ignoring relationships between the classes. Instead, when a classifier is choosing between many classes, an effective explanation must show what sets the chosen class apart from the rest. We now formalize this notion, studying the open problem of class-specific explainability for deep time series classifiers, a challenging and impactful problem setting. We design a novel explainability method, DEMUX, which learns saliency maps for explaining deep multi-class time series classifiers by adaptively ensuring that its explanation spotlights the regions in an input time series that a model uses specifically to its predicted class. DEMUX adopts a gradient-based approach composed of three interdependent modules that combine to generate consistent, class-specific saliency maps that remain faithful to the classifier’s behavior yet are easily understood by end users. Our experimental study demonstrates that DEMUX outperforms nine state-of-the-art alternatives on five popular datasets when explaining two types of deep time series classifiers. Further, through a case study, we demonstrate that DEMUX’s explanations indeed highlight what separates the predicted class from the others in the eyes of the classifier. 
    more » « less
  9. Recurrent Classifier Chains (RCCs) are a leading approach for multi-label classification as they directly model the interdependencies between classes. Unfortunately, existing RCCs assume that every training instance is completely labeled with all its ground truth classes. In practice often only a subset of an instance's labels are annotated, while the annotations for other classes are missing. RCCs fail in this missing label scenario, predicting many false negatives and potentially missing important classes. In this work, we propose Robust-RCC, the first strategy for tackling this open problem of RCCs failing for multi-label missing-label data. Robust-RCC is a new type of deep recurrent classifier chain empowered to model inter-class relationships essential for predicting the complete label set most likely to match the ground truth. The key to Robust-RCC is the design of the Multi Incomplete Label Risk (MILR) function, which we prove to be equal in expectation to the true risk of the ground truth full label set despite being computed from incompletely labeled data. Our experimental study demonstrates that Robust-RCC consistently beats six state-of-of-the-art methods by as much as 30% in predicting the true labels. 
    more » « less
  10. Due to climate change and resulting natural disasters, there has been a growing interest in measuring the value of social goods to our society, like environmental conservation. Traditionally, the stated preference, such as contingent valuation, captures an economics-perspective on the value of environmental goods through the willingness to pay (WTP) paradigm. Where the economics theory to estimate the WTP using machine learning is the random utility model. However, the estimation of WTP depends on rather simple preference assumptions based on a linear functional form. These models are therefore unable to capture the complex uncertainty in the human decision-making process. Further, contingent valuation only uses the mean or median estimation of WTP. Yet it has been recognized that other quantiles of the WTP would be valuable to ensure the provision of social goods. In this work, we propose to leverage the Bayesian Deep Learning (BDL) models to capture the uncertainty in stated preference estimation. We focus on the probability of paying for an environmental good and the conditional distribution of WTP. The Bayesian deep learning model connects with the economics theory of the random utility model through the stochastic component on the individual preferences. For testing our proposed model, we work with both synthetic and real-world data. The results on synthetic data suggest the BDL can capture the uncertainty consistently with different distribution of WTP. For the real-world data, a forest conservation contingent valuation survey, we observed a high variability in the distribution of the WTP, suggesting high uncertainty in the individual preferences for social goods. Our research can be used to inform environmental policy, including the preservation of natural resources and other social good. 
    more » « less